Our great sponsors
-
featurebase
Discontinued A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
-
annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Ducks, the story:
I was using Python in-memory vector search engine called Annoy [1] to do semantic search on various kinds of data. It worked great for finding "similar" objects. Story A has similar text to story B, image A looks like image B, etc.
But doing basic metadata lookups was surprisingly hard. How do I get all images matching some criteria (say, size range, or tags)? I'd have to serialize them all into a DB, and use a DB index. Databases are great, but they add code bloat and overhead; I'm usually working Jupyter notebooks and I like keeping as few external dependencies as possible.
So I wrote ducks as a quick, convenient way to index anything.
There's lots of other usage patterns of course, it's very generic. It makes a great Wordle / crossword solver too. "Find me words where the first letter is A and the fifth letter is L" is very fast in ducks.
Indexing is just one of those things you always need. Python didn't have a good way to do it, and now it does!
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Related posts
- Vector Databases 101
- I'm an undergraduate data science intern and trying to run kmodes clustering. Did this elbow method to figure out how many clusters to use, but I don't really see an "elbow". Tips on number of clusters?
- Calculating document similarity in a special domain
- Can Parquet file format index string columns?
- Billion-Scale Approximate Nearest Neighbor Search [pdf]