Our great sponsors
-
swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
openembeddings
Discontinued Self-hostable pay for what you use embedding server for bge-large-en and arbitrary embedding models using crypto
Thanks for pointing out those models. I see from a quick Huggingface search that the bge model is available in GGML format. You can trivially add new GGML format models to the code by simply adding the direct download link to this line:
https://github.com/Dicklesworthstone/llama_embeddings_fastap...
So to add the base bge model, you could just add this URL to the list:
https://huggingface.co/maikaarda/bge-base-en-ggml/resolve/ma...
I will add that as an additional default.
Thanks for pointing out those models. I see from a quick Huggingface search that the bge model is available in GGML format. You can trivially add new GGML format models to the code by simply adding the direct download link to this line:
https://github.com/Dicklesworthstone/llama_embeddings_fastap...
So to add the base bge model, you could just add this URL to the list:
https://huggingface.co/maikaarda/bge-base-en-ggml/resolve/ma...
I will add that as an additional default.
What's wrong with just using Torchserve[1]? We've been using it to serve embedding models in production.
[1] https://pytorch.org/serve/
It's much better to use something like bge-large-en - https://github.com/arguflow/openembeddings