Our great sponsors
- Onboard AI - Learn any GitHub repo in 59 seconds
- InfluxDB - Collect and Analyze Billions of Data Points in Real Time
- Revelo Payroll - Free Global Payroll designed for tech teams
- Sonar - Write Clean Python Code. Always.
-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Totally. Rasa (https://github.com/RasaHQ/rasa) is an open source chatbot platform.
It allows you to setup "Input Channels" e.g. slack telegram, and has an intents and response pipeline.
It leverages pre-LLM NLU models (NLTK, BERT, etc.) to score intents and based on that intent it will automate a pre-configured response.
My implementation allows you directly route (or fallback to) to GPT-3 or GPT-4 via Langchain document retrieval. So essentially this is an example of a knowledgebase customer support bot.
I hope that makes sense, let me know if not!
-
I agree. I mentioned in a thread below that these frameworks are useful for discovering appropriate index-retrieval strategy that works best for you product.
On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.
As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)
Thanks for the observation.
-
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
RasaGPT
💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram
-
Also, with Haystack and a smaller Transformer model to address the long-tail of answers https://github.com/deepset-ai/rasa-haystack (and https://www.deepset.ai/blog/build-smart-conversational-agent...)
-
LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow
-
yes. there are a few approaches which i intend to take and some helpful resources:
You could implement a Dual LLM Pattern Model https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
You could also leverage a concept like Kor which is a kind of pydantic for LLMs: https://github.com/eyurtsev/kor
in short and as mentioned in the README.md this is absolutely vulnerable to prompt injection. I think this is not a fully solved issue but some interesting community research has been done to help address these things in production
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.
ARM-based Macs are the easiest way to get an acceptable performance without the headaches right now, if you can afford the price.
Install https://github.com/oobabooga/text-generation-webui, update pytorch and llamacpp-python, and you should be able to run pretty much all models out there, in all formats, both on GPU and CPU.
If you're after the raw performance, I suggest using GGML models (meant for llama.cpp, but it's bundled in textgen, so you can use it there with the convenience of a web ui). q4_0 is the fastest quantization, while the q5_1 is the best quality right now.
If the GGML is not available, you can generate it quite easily from the safetensors yourself (not the you need enough RAM to load the model in pytorch though).
With 16GB RAM you can run any 13G model, as long as it's quantized to 4/5 bits. 32GB RAM allows you running 30/33G models and 64GB RAM - 65G models. 30G and 60G models are way more useful for real world tasks, but they are more expensive to train, so there aren't as many to choose from compared to 7/13. 7B and anything less is a toy in my opinion while 13B is good enough for experimentation and prototyping.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
NeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Thanks, I hadn't seen those. I did find https://github.com/NVIDIA/NeMo-Guardrails earlier but haven't looked into it yet.
I'm not sure it solves the problem of restricting the information it uses though. For example, as a proof of concept for a customer, I tried providing information from a vector database as context, but GPT would still answer questions that were not provided in that context. It would base its answers on information that was already crawled from the customer website and in the model. That is concerning because the website might get updated but you can't update the model yourself (among other reasons).
-
https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch
Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.
All together its only ~600 lines of python.
I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.
I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)
-
https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch
Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.
All together its only ~600 lines of python.
I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.
I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)
-
Related posts
- Workers AI: serverless GPU-powered inference on Cloudflare’s global network
- intelligent-trading-bot: NEW Other Models - star count:567.0
- intelligent-trading-bot: NEW Other Models - star count:567.0
- intelligent-trading-bot: NEW Other Models - star count:567.0
- intelligent-trading-bot: NEW Other Models - star count:567.0