RasaGPT: First headless LLM chatbot built on top of Rasa, Langchain and FastAPI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

  • Totally. Rasa (https://github.com/RasaHQ/rasa) is an open source chatbot platform.

    It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.

    It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.

    My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.

    I hope that makes sense, let me know if not!

  • pgvector

    Open-source vector similarity search for Postgres

  • I agree. I mentioned in a thread below that these frameworks are useful for discovering appropriate index-retrieval strategy that works best for you product.

    On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.

    As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)

    Thanks for the observation.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • RasaGPT

    💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram

  • rasa-haystack

  • Also, with Haystack and a smaller Transformer model to address the long-tail of answers https://github.com/deepset-ai/rasa-haystack (and https://www.deepset.ai/blog/build-smart-conversational-agent...)

  • lmql

    A language for constraint-guided and efficient LLM programming.

  • LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow

    https://lmql.ai/

  • kor

    LLM(😽)

  • yes. there are a few approaches which i intend to take and some helpful resources:

    You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

    You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor

    in short and as mentioned in the README.md this is absolutely vulnerable to prompt injection. I think this is not a fully solved issue but some interesting community research has been done to help address these things in production

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • ARM-based Macs are the easiest way to get an acceptable performance without the headaches right now, if you can afford the price.

    Install https://github.com/oobabooga/text-generation-webui, update pytorch and llamacpp-python, and you should be able to run pretty much all models out there, in all formats, both on GPU and CPU.

    If you're after the raw performance, I suggest using GGML models (meant for llama.cpp, but it's bundled in textgen, so you can use it there with the convenience of a web ui). q4_0 is the fastest quantization, while the q5_1 is the best quality right now.

    If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).

    With 16GB RAM you can run any 13G model, as long as it's quantized to 4/5 bits. 32GB RAM allows you running 30/33G models and 64GB RAM - 65G models. 30G and 60G models are way more useful for real world tasks, but they are more expensive to train, so there aren't as many to choose from compared to 7/13. 7B and anything less is a toy in my opinion while 13B is good enough for experimentation and prototyping.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • NeMo-Guardrails

    NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

  • Thanks, I hadn't seen those. I did find https://github.com/NVIDIA/NeMo-Guardrails earlier but haven't looked into it yet.

    I'm not sure it solves the problem of restricting the information it uses though. For example, as a proof of concept for a customer, I tried providing information from a vector database as context, but GPT would still answer questions that were not provided in that context. It would base its answers on information that was already crawled from the customer website and in the model. That is concerning because the website might get updated but you can't update the model yourself (among other reasons).

  • lambdaprompt

    λprompt - A functional programming interface for building AI systems

  • https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch

    Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.

    All together its only ~600 lines of python.

    I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.

    I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)

  • sketch

    AI code-writing assistant that understands data content

  • https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch

    Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.

    All together its only ~600 lines of python.

    I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.

    I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)

  • motorhead

    🧠 Motorhead is a memory and information retrieval server for LLMs.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Document your Software project with AI

    1 project | dev.to | 6 May 2024
  • Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations

    1 project | dev.to | 26 Apr 2024
  • Using Bitcoin and Blockchain ideas to Secure our AI Chatbot

    1 project | dev.to | 19 Apr 2024
  • Rolling your own CAPTCHA solution

    1 project | dev.to | 18 Apr 2024
  • Succeeding where NASDAQ fails

    1 project | dev.to | 17 Apr 2024