gpt-3
cdx-index-client
Our great sponsors
gpt-3 | cdx-index-client | |
---|---|---|
39 | 1 | |
9,406 | 171 | |
- | - | |
3.5 | 10.0 | |
over 3 years ago | over 5 years ago | |
Python | ||
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gpt-3
-
Can ChatGPT improve my L2 grammar?
Are generative AI models useful for learning a language, and if so which languages? Over 90% of ChatGPT's training data was in English. The remaining 10% of data was split unevenly between 100+ languages. This suggests that the quality of the outputs will vary from language to language.
-
GPT4 Can’t Ace MIT
I have doubts it was extensively trained on German data. Who knows about GPT4, but GPT3 is ~92% of English and ~1.5% of German, which means it saw more "die, motherfucker, die" than on "die Mutter".
(https://github.com/openai/gpt-3/blob/master/dataset_statisti...)
- Necesito ayuda.
-
[R] PaLM 2 Technical Report
Catalan was 0.018 % of GPT-3's training corpus. https://github.com/openai/gpt-3/blob/master/dataset_statistics/languages_by_word_count.csv.
- I'm seriously concerned that if I lost ChatGPT-4 I would be handicapped
- The responses I got from bard after asking why 100 times… he was pissed 😂
-
BharatGPT: India's Own ChatGPT
>Certainly it is pleasing that they are not just doing Hindi, but some of these languages must be represented online by a very small corpus of text indeed. I wonder how effectively an LLM can be trained on such a small training set for any given language?
as long as it's not the main language it doesn't really matter. Besides English(92.6%), the biggest language by representation (word count) is taken up by french at 1.8%. Most of the languages GPT-3 knows are sitting at <0.2% representation.
https://github.com/openai/gpt-3/blob/master/dataset_statisti...
Competence in the main language will bleed into the rest.
- GPT-4 gets a B on Scott Aaronson's quantum computing final exam
-
[D] Dumb question: is GPT3 model open-sourced?
And from skimming their GH page, it seems it'd be costly to host as well
- ChatGPT and the Daily Question Thread, re-evaluated with GPT-4.
cdx-index-client
-
DeepMind’s New Language Model,Chinchilla(70B Parameters),Which Outperforms GPT-3
Common Crawl actually does not contain Twitter, you can go check the indexes https://github.com/ikreymer/cdx-index-client . Twitter is extremely aggressive about scraping/caching, and I guess that blocks CC. Models like GPT-3 still know a decent amount of Twitter material, and I figure that this is due to tweets being excerpts or mirrored manually in non-Twitter.com URLs (eg all the Twitter-mirroring bots on Reddit).
What are some alternatives?
dalle-mini - DALL·E Mini - Generate images from a text prompt
mup - maximal update parametrization (µP)
DALL-E - PyTorch package for the discrete VAE used for DALL·E.
DALLE-mtf - Open-AI's DALL-E for large scale training in mesh-tensorflow.
stylegan2-pytorch - Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement
v-diffusion-pytorch - v objective diffusion inference code for PyTorch.
dalle-2-preview
tensorrtx - Implementation of popular deep learning networks with TensorRT network definition API
gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners"
jukebox - Code for the paper "Jukebox: A Generative Model for Music"
automl - Google Brain AutoML
bevy_retro - Plugin pack for making 2D games with Bevy