detoxify
cedille-ai
detoxify | cedille-ai | |
---|---|---|
4 | 9 | |
839 | 201 | |
1.9% | 0.0% | |
6.2 | 0.0 | |
24 days ago | about 2 years ago | |
Python | ||
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
detoxify
-
ML Discord Moderation Bot
I created a small discord moderation bot, src can be found at https://gist.github.com/KrautByte/975f404969f4de8f4147e1bb4f7b64cb using https://github.com/unitaryai/detoxify
- Cedille, the largest French language model , released in open source
-
Show HN: Cedille, the largest French language model, released in open source
Yeah, this kind of toxic output sadly still can happen :-/
We have fully analyzed the training dataset (1128 GB) using Detoxify (https://github.com/unitaryai/detoxify) to filter out problematic content. But of course detecting toxicity is a tough challenge in itself, so this process is imperfect at best.
We are using the RealToxicityPrompt framework (https://realtoxicityprompts.apps.allenai.org/) to analyse how toxic our models are and to steer our efforts in this direction. This means we are generating thousands of completions and analysing them to see how "nasty" the model is. We plan to write more on this topic soon.
But yeah, this is definitely far from being a solved problem, and our model (as well as all large language models) should be handled with care.
-
Implementing a toxicity detector in your chatbots
Detoxify is the result of three Kaggle competitions proposed to improve toxicity classifiers. Each had a different purpose within the toxicity classifiers context.
cedille-ai
-
Happy 2nd birthday to GPT-3!
GPT-3’s release has inspired a gold rush, with over 30 new large language models trained since May/2020, especially through North America and China, but also in places like Israel, Germany, Switzerland, and Abu Dhabi.
- Publiez votre conte de Noël avec Cedille!
- Cedille: The largest French language model (r/MachineLearning)
- [P] Cedille: The largest French language model
- Cedille, the largest French language model, open source with a freely accessible playground
-
Cedille, the largest French language model , released in open source
Le repo sur GitHub : https://github.com/coteries/cedille-ai
-
Show HN: Cedille, the largest French language model, released in open source
We are excited to announce Cedille, the largest language model for French (6b parameters).
Demo: https://cedille.ai
Language models are general purpose AI systems that are able to solve a range of tasks by simply being prompted for it. It can be used for example to summarize text, do translations, or for idea generation & overcoming writer's block.
You may know GPT-3, the humongous model from OpenAI. Cedille is a similar model targeting the French demographic - but smaller, as we don’t yet have $1b in the bank like they do. Although GPT-3 supports multiple languages including French, our model is competitive with GPT-3 on a range of French tasks! Plus, of course we’re open source while they keep their model closed and heavily restrict access to it.
You can try it out right away from our playground: https://app.cedille.ai
We are proponents of “open AI” and as such have released a checkpoint for the world to use (MIT license): https://github.com/coteries/cedille-ai
One of the problems with large language models is the potentially toxic, sexist or in other ways unpleasant output. We tried our best to avoid this issue by doing extensive dataset filtering. As a result, our benchmark indicates that Cedille is indeed less toxic than GPT-3.
-
[P] Cedille, the largest French language model (6b), released in open source
We are proponents of “open AI” and as such have released a checkpoint for the world to use (MIT license) : https://github.com/coteries/cedille-ai
What are some alternatives?
quickai - QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.
allennlp - An open-source NLP research library, built on PyTorch.
kogpt - KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)
mesh-transformer-jax - Model parallel transformers in JAX and Haiku
multi-label-sentiment-classifier - How to build a multi-label sentiment classifiers with Tez and PyTorch
awesome-huggingface - 🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries.
lm-evaluation-harness - A framework for few-shot evaluation of language models.
finetune-gpt2xl - Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
Awesome-pytorch-list - A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
google-local-results-ai-server - A server code for serving BERT-based models for text classification. It is designed by SerpApi for heavy-load prototyping and production tasks, specifically for the implementation of the google-local-results-ai-parser gem.
labs-detoxify-server - Detoxify server for Xatkit