Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
You then build an index from words to documents: for each word, you keep the set of documents that contain the word. One way to do this is to number the documents, so your word-to-document index is really a boolean array (less than 40 million boolean array in case of spotify). You may think it is too large, but compressed bitmaps are a thing, with multiple approaches.
You then build an index from words to documents: for each word, you keep the set of documents that contain the word. One way to do this is to number the documents, so your word-to-document index is really a boolean array (less than 40 million boolean array in case of spotify). You may think it is too large, but compressed bitmaps are a thing, with multiple approaches.
Great answer! Spurred me on to make an implementation in rust if anyone wants to have a look around a working implementation. (Also very open to critique as I know GitHub repos are popular to include in CVs / resumes)