TruthfulQA
FastChat
TruthfulQA | FastChat | |
---|---|---|
4 | 83 | |
508 | 34,514 | |
- | 4.3% | |
2.8 | 9.6 | |
6 months ago | 5 days ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
TruthfulQA
-
airoboros gpt-4 instructed + context-obedient question answering
Dataset: https://github.com/sylinrl/TruthfulQA
-
Scaling Transformer to 1M tokens and beyond with RMT
this is a great point.
do you know of any benchmarks doing this today?
given the acute need to evaluate models on contextual factuality, we're exploring how to create a benchmark for this purpose but prefer existing benchmarks if possible.
openai's truthfulqa[0] is close but does not focus on contextual factuality and targets a much harder problem of absolute truth.
if none exist, and people are interested in contributing, please reach out.
[0] https://github.com/sylinrl/TruthfulQA
-
[D] Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate?
I agree they show that you can brute-force mimick uncertainty estimates to some degree, and that the model is generally well calibrated (though on what is basically a set of trivia questions, so YMMV)... yet:
-
[R] TruthfulQA: Measuring How Models Mimic Human Falsehoods
Code for https://arxiv.org/abs/2109.07958 found: https://github.com/sylinrl/TruthfulQA
FastChat
-
GPT4.5 or GPT5 being tested on LMSYS?
gpt2-chatbot isn't the only "mystery model" on LMSYS. Another is "deluxe-chat".
When asked about it in October last year, LMSYS replied [0] "It is an experiment we are running currently. More details will be revealed later"
One distinguishing feature of "deluxe-chat": although it gives high quality answers, it is very slow, so slow that the arena displays a warning whenever it is invoked
[0] https://github.com/lm-sys/FastChat/issues/2527
-
LLMs on your local Computer (Part 1)
FastChat
- FLaNK AI for 11 March 2024
- FLaNK 04 March 2024
- ChatGPT for Teams
- FastChat: An open platform for training and serving large language models
-
LM Studio – Discover, download, and run local LLMs
How does it compare with something like FastChat? https://github.com/lm-sys/FastChat
Feature set seems like a decent amount of overlap. One limitation of FastChat, as far as I can tell, is that one is limited to the models that FastChat supports (though I think it would be minor to modify it to support arbitrary models?)
-
Video-LLaVA
Looks like the Vicuna repo is Apache 2.0 also[1].
What's the interpretation of copyright law that would prevent the code being Apache 2.0 based on the source of the fine-tuning dataset?
[1] https://github.com/lm-sys/FastChat
-
🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬
Check how to start with FastChat. Support FastChat on GitHub ⭐
-
Show HN: ChatAPI – PWA to Use ChatGPT by API Build with Alpine.js
For something a little heavier but much more robust in terms of features/functionality I've been enjoying FastChat: https://github.com/lm-sys/FastChat
It allows you to plug in different backends so that you can use OpenAI compatible clients with various LLM's, selfhosted or otherwise.
What are some alternatives?
safari - Convolutions for Sequence Modeling
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
recurrent-memory-transformer - [NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.
llama.cpp - LLM inference in C/C++
auto-evaluator
gpt4all - gpt4all: run open-source LLMs anywhere
heinsen_routing - Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.
bitsandbytes - Accessible large language models via k-bit quantization for PyTorch.
JARVIS - JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
LocalAI - :robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
llama-cpp-python - Python bindings for llama.cpp
mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.