The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Datablations Alternatives
Similar projects and alternatives to datablations
-
guidance
Discontinued A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance] (by microsoft)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
SuperAGI
<⚡️> SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
-
tree-of-thoughts
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
-
chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
-
DB-GPT
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
-
TheVault
[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
datablations reviews and mentions
-
Gemini is only 1x Chinchilla, so it undertrained for production
1x chinchilla means it's not really undertrained but that more could be squeezed without excessive difficulty https://arxiv.org/abs/2305.16264
- Can LLMs learn from a single example?
-
Chinchilla’s Death
You might want to give a read to "Scaling Data-Constrained Language Models" [1]. They basically generalized the Chinchilla scaling law by investigating behavior on multi-epoch runs.
[1] https://arxiv.org/abs/2305.16264
-
RWKV Pile+ seems to be training on far more tokens than any LLM ever has
I would imagine that there is a lot of overlap, yeah. That said, training on repeated data does seem to be effective at this level.
-
(2/2) May 2023
Scaling Data-Constrained Language Models (https://arxiv.org/abs/2305.16264)
- How to Keep Scaling Large Language Models when Data Runs Out? A New AI Research Trains 400 Models with up to 9B Parameters and 900B Tokens to Create an Extension of Chinchilla Scaling Laws for Repeated Data
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Stats
huggingface/datablations is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of datablations is Jupyter Notebook.
Sponsored