The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 20 language-model Open-Source Projects
-
petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
ecco
Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
-
FARM
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
-
Awesome-LLM-Reasoning
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
-
Get-Things-Done-with-Prompt-Engineering-and-LangChain
LangChain & Prompt Engineering tutorials on Large Language Models (LLMs) such as ChatGPT with custom data. Jupyter notebooks on loading and indexing data, creating prompt templates, CSV agents, and using retrieval QA chains to query the custom data. Projects for using a private LLM (Llama 2) for chat with PDF files, tweets sentiment analysis.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
happy-transformer
Happy Transformer makes it easy to fine-tune and perform inference with NLP Transformer models.
-
adaptnlp
An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.
-
agency
🕵️♂️ Library designed for developers eager to explore the potential of Large Language Models (LLMs) and other generative AI through a clean, effective, and Go-idiomatic approach. (by neurocult)
-
chat.petals.dev
💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
-
extreme-bert
ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.
-
voice-assistant-whisper-chatgpt
This repository will guide you to create your own Smart Virtual Assistant like Google Assistant using Open AI's ChatGPT, Whisper. The entire solution is created using Python & Gradio.
-
AREkit
Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML
-
code-representations-ml-brain
[NeurIPS 2022] "Convergent Representations of Computer Programs in Human and Artificial Neural Networks" by Shashank Srikant*, Benjamin Lipkin*, Anna A. Ivanova, Evelina Fedorenko, Una-May O'Reilly.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Most of this tutorial is based on Hugging Face course about Transformers and on Niels Rogge's Transformers tutorials: make sure to check their work and give them a star on GitHub, if you please ❤️
So how long until we can do an open source Mistral Large?
We could make a start on Petals or some other open source distributed training network cluster possibly?
Project mention: Fast and secure translation on your local machine with a GUI | news.ycombinator.com | 2024-04-13Interestingly, I think this is actually related to the offline translation features built into Firefox. Both are products of "Project Bergamot", but the Mozilla-maintained version was later merged into the Firefox application:
https://blog.mozilla.org/en/mozilla/local-translation-add-on...
https://hacks.mozilla.org/2022/06/training-efficient-neural-...
https://github.com/mozilla/firefox-translations
https://firefox-source-docs.mozilla.org/toolkit/components/t...
Extra webpage with screenshot and links, impossible to search for normally:
https://translatelocally.com/downloads/
Does one thing and does it well.
Oh— For downloading models, it's much easier to pipe/`xargs` `translateLocally --available-models` into `translateLocally -d` than go through the GUI.
---
Other self-hostable translation tools:
https://www.apertium.org/index.eng.html
- Traditional rule-based translation. Seems to work pretty well, but no good desktop frontend.
https://www.argosopentech.com/
- Works, but crashy desktop app.
- API wrapping Argos Translate.
https://lingva.thedaviddelta.com/
- Google Translate scraper/privacy frontend.
- Proprietary, subscription trialware.
Project mention: Techbro says that GPT models will soon have over 9000 IQ in ~5 years | /r/SneerClub | 2023-05-04
Project mention: Get-Things-Done-with-Prompt-Engineering-and-LangChain: NEW Data - star count:617.0 | /r/algoprojects | 2023-12-10
Project mention: [Research] [Project] Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | /r/MachineLearning | 2023-05-04Found relevant code at https://github.com/declare-lab/tango + all code implementations here
Project mention: GPT-based ontological extraction tools, including SPIRES | news.ycombinator.com | 2023-07-24
I would, at the very least, wrap the errors being returned inside the process function https://github.com/neurocult/agency/blob/14b14e50a7570189388...
Or, I suppose the user must handle exception behavior in their custom `OperationHandler`
ETA: https://chat.petals.dev
Project mention: Gemini is only 1x Chinchilla, so it undertrained for production | /r/singularity | 2023-12-071x chinchilla means it's not really undertrained but that more could be squeezed without excessive difficulty https://arxiv.org/abs/2305.16264
language-models related posts
- Mistral Large
- Gemini is only 1x Chinchilla, so it undertrained for production
- Can LLMs learn from a single example?
- Chinchilla’s Death
- GPT-based ontological extraction tools, including SPIRES
- RWKV Pile+ seems to be training on far more tokens than any LLM ever has
- [Research] [Project] Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
-
A note from our sponsor - WorkOS
workos.com | 19 Apr 2024
Index
What are some of the best open-source language-model projects? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 124,557 |
2 | petals | 8,631 |
3 | argos-translate | 3,208 |
4 | ecco | 1,899 |
5 | FARM | 1,723 |
6 | Awesome-LLM-Reasoning | 1,062 |
7 | Get-Things-Done-with-Prompt-Engineering-and-LangChain | 922 |
8 | tango | 901 |
9 | happy-transformer | 497 |
10 | xmtf | 493 |
11 | ontogpt | 493 |
12 | adaptnlp | 414 |
13 | agency | 374 |
14 | chat.petals.dev | 296 |
15 | extreme-bert | 283 |
16 | datablations | 282 |
17 | voice-assistant-whisper-chatgpt | 219 |
18 | dsir | 185 |
19 | AREkit | 52 |
20 | code-representations-ml-brain | 6 |