Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
It works like this:
- The AI Horde hosts a web app (Kobold Lite) geared towards LLM chat and RP. Its mature, predating LLAMA and GPT 3.5 and largely developed when the RP community was running GPT-J finetunes. There are mature desktop apps that can access this API as well.
- The user sets the chat syntax/format and picks a LLM host (or multiple hosts).
- These hosts run simple API endpoints from any PC for Horde users to access. The backends de-joure are koboldcpp, a frontend for llama.cpp which is excellent, portable and literally one click, and KoboldAI, with the very fast and vram-efficient exllamav2 backend:
https://github.com/LostRuins/koboldcpp
https://github.com/henk717/KoboldAI
- Hosts pick a quantized community LLM to run, which is (IMO) the real magic of this system. Cloud services tend to run generic Llama chat/instruct models, OpenAI API models, or maybe a single proprietary finetune, but the Llama/Mistral finetuning community is red hot. New finetines and crazy merges/hybrids that outperform llama-chat in specific tasks (mostly Chat/Story/RP) come out every day, and each one has a different "flavor" and format:
https://huggingface.co/models?sort=modified&search=mistral+g...
Related posts
- Building Job Consultant Bot with Lyzr SDK
- Show HN: Find similar folders based on folder name, folder size, and count
- Show HN: I made a privacy friendly and simple app to track my menstruation
- Show HN: Open-Source Image Model Leaderboard with Public Preference Data
- Show HN: Define and implement any function on the fly with LLMs