Repo to create embeddings of your website's content for a Q&A bot / chatbot

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • content-chatbot

    Build a chatbot or Q&A bot of your website's content

  • Thanks for sharing the code. What happen when the existing content get updated and new contents created, would it need to create embeddings for all contents again? The current approach is not good as create embeddings cost money? Please see https://github.com/mpaepper/content-chatbot/blob/main/create.... Would it be possible progressively update the vector store?

    Please advise. Thank you.

  • qdrant

    Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

  • Looks interesting! Have you considered a proper vector database like Qdrant (https://qdrant.tech)? FAISS runs on a single machine, but if you want to scale things up, then a real database makes it a lot easier. And with a free 1GB cluster on Qdrant Cloud (https://cloud.qdrant.io), you can store quite a lot of vectors. Qdrant is also already integrated with Langchain.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • Woah, that's a huge site!

    Should be fine, though, as it iterates over it, it creates embeddings and then stores them in the FAISS store (https://github.com/facebookresearch/faiss) which was created to handle a large amount of embeddings.

    For the actual queries, it filters it down by the most relevant documents which are closest in the embedding space, so this should work.

    Let me know how it goes!

  • slothbot

    SlothBot | A generally useful analytical Discord bot that does support and writes SQL.

  • Using something like Weaviate, which can be started in Docker with a one-liner, will give the ability to move away or toward dense vectors by concept. While doing dot product with manual code is fairly easy, using Weaviate to do the lifting (for embeddings as well) makes things super simple.

    https://github.com/FeatureBaseDB/slothbot/blob/slothbot-work...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts