-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Totally. Rasa (https://github.com/RasaHQ/rasa) is an open source chatbot platform.
It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.
It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.
My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.
I hope that makes sense, let me know if not!
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
I agree. I mentioned in a thread below that these frameworks are useful for discovering appropriate index-retrieval strategy that works best for you product.
On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.
As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)
Thanks for the observation.
-
RasaGPT
💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram
-
Also, with Haystack and a smaller Transformer model to address the long-tail of answers https://github.com/deepset-ai/rasa-haystack (and https://www.deepset.ai/blog/build-smart-conversational-agent...)
-
LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow
https://lmql.ai/
-
yes. there are a few approaches which i intend to take and some helpful resources:
You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor
in short and as mentioned in the README.md this is absolutely vulnerable to prompt injection. I think this is not a fully solved issue but some interesting community research has been done to help address these things in production
-
ARM-based Macs are the easiest way to get an acceptable performance without the headaches right now, if you can afford the price.
Install https://github.com/oobabooga/text-generation-webui, update pytorch and llamacpp-python, and you should be able to run pretty much all models out there, in all formats, both on GPU and CPU.
If you're after the raw performance, I suggest using GGML models (meant for llama.cpp, but it's bundled in textgen, so you can use it there with the convenience of a web ui). q4_0 is the fastest quantization, while the q5_1 is the best quality right now.
If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).
With 16GB RAM you can run any 13G model, as long as it's quantized to 4/5 bits. 32GB RAM allows you running 30/33G models and 64GB RAM - 65G models. 30G and 60G models are way more useful for real world tasks, but they are more expensive to train, so there aren't as many to choose from compared to 7/13. 7B and anything less is a toy in my opinion while 13B is good enough for experimentation and prototyping.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
NeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Thanks, I hadn't seen those. I did find https://github.com/NVIDIA/NeMo-Guardrails earlier but haven't looked into it yet.
I'm not sure it solves the problem of restricting the information it uses though. For example, as a proof of concept for a customer, I tried providing information from a vector database as context, but GPT would still answer questions that were not provided in that context. It would base its answers on information that was already crawled from the customer website and in the model. That is concerning because the website might get updated but you can't update the model yourself (among other reasons).
-
https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch
Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.
All together its only ~600 lines of python.
I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.
I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)
-
https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch
Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.
All together its only ~600 lines of python.
I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.
I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)
-
Related posts
-
10 MLOps Tools That Comply With the EU AI Act
-
[Python] How do we lazyload a Python module? - analyzing LazyLoader from MLflow
-
Screenpipe: 24/7 local AI screen and mic recording
-
Records your screen and then runs what you did through Ollama
-
Postgres Learns to RAG: Wikipedia Q&A using Llama 3.1 inside the database