RasaGPT: First headless LLM chatbot built on top of Rasa, Langchain and FastAPI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Totally. Rasa (https://github.com/RasaHQ/rasa) is an open source chatbot platform.

    It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.

    It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.

    My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.

    I hope that makes sense, let me know if not!

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • pgvector

    Open-source vector similarity search for Postgres

    I agree. I mentioned in a thread below that these frameworks are useful for discovering appropriate index-retrieval strategy that works best for you product.

    On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.

    As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)

    Thanks for the observation.

  • RasaGPT

    💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram

  • rasa-haystack

    Also, with Haystack and a smaller Transformer model to address the long-tail of answers https://github.com/deepset-ai/rasa-haystack (and https://www.deepset.ai/blog/build-smart-conversational-agent...)

  • lmql

    A language for constraint-guided and efficient LLM programming.

    LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow

    https://lmql.ai/

  • kor

    LLM(😽)

    yes. there are a few approaches which i intend to take and some helpful resources:

    You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

    You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor

    in short and as mentioned in the README.md this is absolutely vulnerable to prompt injection. I think this is not a fully solved issue but some interesting community research has been done to help address these things in production

  • text-generation-webui

    A Gradio web UI for Large Language Models.

    ARM-based Macs are the easiest way to get an acceptable performance without the headaches right now, if you can afford the price.

    Install https://github.com/oobabooga/text-generation-webui, update pytorch and llamacpp-python, and you should be able to run pretty much all models out there, in all formats, both on GPU and CPU.

    If you're after the raw performance, I suggest using GGML models (meant for llama.cpp, but it's bundled in textgen, so you can use it there with the convenience of a web ui). q4_0 is the fastest quantization, while the q5_1 is the best quality right now.

    If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).

    With 16GB RAM you can run any 13G model, as long as it's quantized to 4/5 bits. 32GB RAM allows you running 30/33G models and 64GB RAM - 65G models. 30G and 60G models are way more useful for real world tasks, but they are more expensive to train, so there aren't as many to choose from compared to 7/13. 7B and anything less is a toy in my opinion while 13B is good enough for experimentation and prototyping.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • NeMo-Guardrails

    NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

    Thanks, I hadn't seen those. I did find https://github.com/NVIDIA/NeMo-Guardrails earlier but haven't looked into it yet.

    I'm not sure it solves the problem of restricting the information it uses though. For example, as a proof of concept for a customer, I tried providing information from a vector database as context, but GPT would still answer questions that were not provided in that context. It would base its answers on information that was already crawled from the customer website and in the model. That is concerning because the website might get updated but you can't update the model yourself (among other reasons).

  • lambdaprompt

    λprompt - A functional programming interface for building AI systems

    https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch

    Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.

    All together its only ~600 lines of python.

    I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.

    I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)

  • sketch

    AI code-writing assistant that understands data content

    https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch

    Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.

    All together its only ~600 lines of python.

    I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.

    I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)

  • motorhead

    🧠 Motorhead is a memory and information retrieval server for LLMs.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • 10 MLOps Tools That Comply With the EU AI Act

    2 projects | dev.to | 15 Oct 2024
  • [Python] How do we lazyload a Python module? - analyzing LazyLoader from MLflow

    3 projects | dev.to | 5 Oct 2024
  • Screenpipe: 24/7 local AI screen and mic recording

    2 projects | news.ycombinator.com | 30 Sep 2024
  • Records your screen and then runs what you did through Ollama

    1 project | news.ycombinator.com | 26 Sep 2024
  • Postgres Learns to RAG: Wikipedia Q&A using Llama 3.1 inside the database

    1 project | news.ycombinator.com | 24 Sep 2024

Did you konow that Python is
the 1st most popular programming language
based on number of metions?