Top 8 Python language-modeling Projects

RL4LMs

5 2,084 0.0 Python

A modular RL library to fine-tune language models to human preferences

Project mention: How To Setup a Model With Guardrails? | /r/LocalLLaMA | 2023-05-12

I think of guardrails as another dimension of human preferences: whether you are training a model to answer questions more gooder or avoid saying horrifying stuff, you are teaching the model a preference. So I thinks it's a straightforward RLHF problem but from a different perspective.

tape

1 620 0.0 Python

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (by songlab-cal)
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
long-form-factuality

2 435 5.9 Python

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

Project mention: An Open Source Tool for Multimodal Fact Verification | news.ycombinator.com | 2024-04-06

Isn't this similar to the Deepmind paper on long form factuality posted a few days ago?
https://arxiv.org/abs/2403.18802
https://github.com/google-deepmind/long-form-factuality/tree...

memprompt

4 320 1.7 Python

A method to fix GPT-3 after deployment with user feedback, without re-training.
FActScore

1 210 6.4 Python

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

Project mention: Long-form factuality in large language models | news.ycombinator.com | 2024-04-06

Looks like a slight modification of FActScore [1], but instead of using Wikipedia as a verification source, they use Google Search. They also claim to include a wider range of topics. That said, FActScore allows you to use whatever knowledge source and topics you want [2].
[1]: https://arxiv.org/abs/2305.14251
[2]: https://github.com/shmsw25/FActScore?tab=readme-ov-file#to-u...

deepblast

2 96 4.8 Python

Neural Networks for Protein Sequence Alignment

Project mention: [D] To all the machine learning engineers: most difficult model task/type you’ve ever had to work with? | /r/MachineLearning | 2023-07-03

recurrent-fwp

1 46 0.7 Python

Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
verified-smart-contracts

1 18 3.1 Python

:page_facing_up: Verified Ethereum Smart Contract dataset (by andstor)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).