Our great sponsors
-
annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The actual data that is used by Spotify that is in fast storage is likely in a compressed feature vector format (see https://github.com/spotify/annoy) that makes no sense to humans. The process of getting the “raw” data likely isn’t optimized; and the business has no appetite in optimizing this process because no one has literally died from not getting their raw data in 10 seconds
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- Vector Databases 101
- I'm an undergraduate data science intern and trying to run kmodes clustering. Did this elbow method to figure out how many clusters to use, but I don't really see an "elbow". Tips on number of clusters?
- Calculating document similarity in a special domain
- Can Parquet file format index string columns?
- Billion-Scale Approximate Nearest Neighbor Search [pdf]