SaaSHub helps you find the best software and product alternatives Learn more β
The-pile Alternatives
Similar projects and alternatives to the-pile
-
opendyslexic
OpenDyslexic, a typeface that uses typeface shapes & features to help offset some visual symptoms of Dyslexia. Now in SIL-OFL.
-
jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
datasets
π€ The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools (by EleutherAI)
-
DebateSum
Corresponding code repo for the paper at COLING 2020 - ARGMIN 2020: "DebateSum: A large-scale argument mining and summarization dataset"
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
the-pile reviews and mentions
-
The Pile
[2] https://github.com/EleutherAI/the-pile/issues/56
-
The Pile: a dataset for language modeling [pdf]
I came so close to getting my dataset DebateSum (https://huggingface.co/datasets/Hellisotherpeople/DebateSum) into the pile, but they decided at the last minute not to add it: https://github.com/EleutherAI/the-pile/issues/56
I'm still a tiny bit salty about that.
-
Sarah Silverman is suing OpenAI and Meta for copyright infringement
Anyone want to check if the book in question is in ThePile dataset?:
https://github.com/EleutherAI/the-pile/blob/master/the_pile/...
-
What Types Of Websites Are Typically Scraped To Train LLMs?
All of it, itβs quite diverse. Especially the commoncrawl bit, https://github.com/EleutherAI/the-pile.
-
Can anyone answer some questions on how GPT-NeoX-20B was developed, and future models?
For example, before this I didn't realize one of the sources of data that the pile uses is a massive number of emails gathered during the Enron lawsuits. Weird, but cool I guess.
-
How do I add AI modules?
NovelAI's Krake and Euterpe, and the rest, are finetuned versions of existing models. The original models were trained on a mass of text. Krake is a finetune of Neo-X 20b, which was trained on The Pile. NovelAI's finetunes involve further training but on various works of fiction rather than more text trawled from the internet. The statistical rules in the existing models are thus shifted in a (slightly) new direction. Modules refine those statistical rules, or weights, just a little bit more.
- GitHub - EleutherAI/the-pile
-
Sounds about right π /s
Literally The Pile.
-
What is the difference between OpenAI and the gpt3 algorithm?
The parameters are taken from large datasets like The Pile.
-
Official Beta AMA @ June 14th, 12pm EST
We use the GPT-Neo as our base model which trained on The Pile and you can see it's contents in their github repo: https://github.com/EleutherAI/the-pile
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 May 2024
Stats
EleutherAI/the-pile is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of the-pile is Python.
Sponsored