Repo to create embeddings of your website's content for a Q&A bot / chatbot

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

content-chatbot

5 509 6.9 Python

Build a chatbot or Q&A bot of your website's content

Thanks for sharing the code. What happen when the existing content get updated and new contents created, would it need to create embeddings for all contents again? The current approach is not good as create embeddings cost money? Please see https://github.com/mpaepper/content-chatbot/blob/main/create.... Would it be possible progressively update the vector store?
Please advise. Thank you.

qdrant

140 17,943 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Looks interesting! Have you considered a proper vector database like Qdrant (https://qdrant.tech)? FAISS runs on a single machine, but if you want to scale things up, then a real database makes it a lot easier. And with a free 1GB cluster on Qdrant Cloud (https://cloud.qdrant.io), you can store quite a lot of vectors. Qdrant is also already integrated with Langchain.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
faiss

71 28,202 9.4 C++

A library for efficient similarity search and clustering of dense vectors.

Woah, that's a huge site!
Should be fine, though, as it iterates over it, it creates embeddings and then stores them in the FAISS store (https://github.com/facebookresearch/faiss) which was created to handle a large amount of embeddings.
For the actual queries, it filters it down by the most relevant documents which are closest in the embedding space, so this should work.
Let me know how it goes!

slothbot

2 3 4.2 Python

SlothBot | A generally useful analytical Discord bot that does support and writes SQL.

Using something like Weaviate, which can be started in Docker with a one-liner, will give the ability to move away or toward dense vectors by concept. While doing dot product with manual code is fairly easy, using Weaviate to do the lifting (for embeddings as well) makes things super simple.
https://github.com/FeatureBaseDB/slothbot/blob/slothbot-work...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project