Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
llama-hub
Discontinued A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
RAG is a very useful flow but I agree the complexity is often overwhelming, esp as you move from a toy example to a real production deployment. It's not just choosing a vector DB (last time I checked there were about 50), managing it, deciding on how to chunk data, etc. You also need to ensure your retrieval pipeline is accurate and fast, ensuring data is secure and private, and manage the whole thing as it scales. That's one of the main benefits of using Vectara (https://vectara.com; FD: I work there) - it's a GenAI platform that abstracts all this complexity away, and you can focus on building your application.
For local stuff with a handful of documents, you can even just throw it into a json and call it a day. The similarity search is as simple as an np.dot: https://github.com/gsuuon/llm.nvim/blob/main/python3/store.p...
Kudos to the team for a very detailed notebook going into things like pipeline evaluation wrt performance and costs etc. Even if we ignore the framework specific bits, it is a great guide to follow when building RAG systems in production.
We have been building RAG systems in production for a few months and have been tinkering with different strategies to get the most performance out of these pipelines. As others have pointed out, vector database may not be the right strategy for every problem. Similarly there are things like lost in the middle problems (https://arxiv.org/abs/2307.03172) that one may have to deal with. We put together our learnings building and optimizing these pipelines in a post at https://llmstack.ai/blog/retrieval-augmented-generation.
https://github.com/trypromptly/LLMStack is a low-code platform we open-sourced recently that ships these RAG pipelines out of the box with some app templates if anyone wants to try them out.
My favorite example is the asana loader[0] for llama-index. It's literally just the most basic wrapper around the Asana SDK to concatenate some strings.
[0] - https://github.com/emptycrown/llama-hub/blob/main/llama_hub/...
This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod