Stability AI Launches the First of Its StableLM Suite of Language Models

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

StableLM

43 15,853 5.0 Jupyter Notebook

StableLM: Stability AI Language Models

It's unclear which models will be trained to 1.5T tokens. The details of how many tokens each model saw in training are on Github - https://github.com/stability-AI/stableLM/ . But only for the ones that have been released.

lm-evaluation-harness

34 4,957 9.9 Python

A framework for few-shot evaluation of language models.

Looks like my edit window closed, but my results ended up being very low so there must be something wrong (I've reached out to StabilityAI just in case). It does however seem to roughly match another user's 3B testing: https://twitter.com/abacaj/status/1648881680835387392
The current scores I have place it between gpt2_774M_q8 and pythia_deduped_410M (yikes!). Based on training and specs you'd expect it to outperform Pythia 6.9B at least... this is running on a HEAD checkout of https://github.com/EleutherAI/lm-evaluation-harness (releases don't support hf-casual) for those looking to replicate/debug.
Note, another LLM currently being trained, GeoV 9B, already far outperforms this model at just 80B tokens trained: https://github.com/geov-ai/geov/blob/master/results.080B.md

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
alpaca_lora_4bit

41 528 8.6 Python

That's not going to happen. But it's likely that StableLM 175B will rival GPT-4.
Also, you can finetune Base StableLM yourself on any consumer GPU with 8GB of VRAM in a couple of hours and it will be commercial licensed. (using https://github.com/johnsmith0031/alpaca_lora_4bit)
You can even use the exact same dataset StabilityAI used.

transformers

175 124,557 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
txtinstruct

13 215 5.0 Python

📚 Datasets and models for instruction-tuning

Great to see the continued release of open models. The only disappointing thing is that models keep building on CC-BY-NC licensed datasets, which severely limits their use.
Hopefully, people consider txtinstruct (https://github.com/neuml/txtinstruct) and other approaches to generate instruction-tuning datasets without the baggage.

sparsegpt

16 620 3.2 Python

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

I think "sweet spot" is going to depend on your task, but here's a good recent paper that may give you some more context on thinking about training and model sizes: https://www.harmdevries.com/post/model-size-vs-compute-overh...
There have also been quite a few developments on sparsity lately. Here's a technique SparseGPT which suggests that you can prune 50% of parameters with almost no loss in performance for example: https://arxiv.org/abs/2301.00774

safetensors

31 2,426 8.4 Python

Simple, safe way to store and distribute tensors

I've been diving in lately and while it's not efficient, the only way to do manage is to create a new conda/mamba environment, or a custom Docker image for all the conflicting packages.
For safety and speed, you should prefer the safetensor format: https://huggingface.co/docs/safetensors/speed
If you know what you are doing you can do your own conversions: https://github.com/huggingface/safetensors or for safety, https://huggingface.co/spaces/diffusers/convert

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
instruct-eval

6 468 8.0 Python

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

I really dislike this approach of announcing new models that some companies have taken, they don't mention evaluation results or performance of the model, but instead talk about how "transparent", "accessible" and "supportive" these models are.
Anyway, I have benchmarked stablelm-base-alpha-3b (the open-source version, not the fine-tuned one which is under a NC license) using the MMLU benchmark and the results are rather underwhelming compared to other open source models:
* stablelm-base-alpha-3b (3B params): 25.6% average accuracy
* flan-t5-xl (3B params): 49.3% average accuracy
* flan-t5-small (80M params): 29.4% average accuracy
MMLU is just one benchmark, but based on the blog post, I don't think it will yield much better results in others. I'll leave links to the MMLU results of other proprietary[0] and open-access[1] models (results may vary by ±2% depending on the parameters used during inference).
[0]: https://paperswithcode.com/sota/multi-task-language-understa...
[1]: https://github.com/declare-lab/flan-eval/blob/main/mmlu.py#L...

lm-evaluation-harness

1 91 3.7 Python

A framework for few-shot evaluation of autoregressive language models. (by bigscience-workshop)

Yeah, although looks like it currently has some issues with coqa: https://github.com/EleutherAI/lm-evaluation-harness/issues/2...
There's also the bigscience fork, but I ran into even more problems (although I didn't try too hard) https://github.com/bigscience-workshop/lm-evaluation-harness
And there's https://github.com/EleutherAI/lm-eval2/ (not sure if it's just starting over w/ a new repo or what?) but it has limited tests available

lm-eval2

1 13 10.0 Python

Yeah, although looks like it currently has some issues with coqa: https://github.com/EleutherAI/lm-evaluation-harness/issues/2...
There's also the bigscience fork, but I ran into even more problems (although I didn't try too hard) https://github.com/bigscience-workshop/lm-evaluation-harness
And there's https://github.com/EleutherAI/lm-eval2/ (not sure if it's just starting over w/ a new repo or what?) but it has limited tests available

cformers

1 6 6.7 C

SoTA Transformers with C-backend for fast inference on your CPU. (by antimatter15)
stanford_alpaca

108 28,761 2.0 Python

Code and documentation to train Stanford's Alpaca models, and generate the data.
flash-attention

26 10,773 9.4 Python

Fast and memory-efficient exact attention

https://github.com/HazyResearch/flash-attention#memory
"standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length."

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

llama.cpp has preliminary support already. https://github.com/ggerganov/llama.cpp/issues/1063#issuecomm...

AlpacaDataCleaned

14 1,394 7.6 Python

Alpaca dataset from Stanford, cleaned and curated

That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned

geov

2 122 5.0 Jupyter Notebook

The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER). We have shared a pre-trained 9B parameter model.

Looks like my edit window closed, but my results ended up being very low so there must be something wrong (I've reached out to StabilityAI just in case). It does however seem to roughly match another user's 3B testing: https://twitter.com/abacaj/status/1648881680835387392
The current scores I have place it between gpt2_774M_q8 and pythia_deduped_410M (yikes!). Based on training and specs you'd expect it to outperform Pythia 6.9B at least... this is running on a HEAD checkout of https://github.com/EleutherAI/lm-evaluation-harness (releases don't support hf-casual) for those looking to replicate/debug.
Note, another LLM currently being trained, GeoV 9B, already far outperforms this model at just 80B tokens trained: https://github.com/geov-ai/geov/blob/master/results.080B.md

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Schedule-Free Learning – A New Way to Train
3 projects | news.ycombinator.com | 6 Apr 2024
HuggingFace Transformers: Qwen2
1 project | news.ycombinator.com | 11 Jan 2024
HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2
1 project | news.ycombinator.com | 13 Dec 2023
HuggingFace: Support for the Mixtral Moe
1 project | news.ycombinator.com | 11 Dec 2023
Paris-Based Startup and OpenAI Competitor Mistral AI Valued at $2B
4 projects | news.ycombinator.com | 10 Dec 2023

Stability AI Launches the First of Its StableLM Suite of Language Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
language-model NLP instruct-tuning LM Natural Language Processing
Post date: 19 Apr 2023

StableLM

lm-evaluation-harness

WorkOS

alpaca_lora_4bit

transformers

txtinstruct

sparsegpt

safetensors

InfluxDB

instruct-eval

lm-evaluation-harness

lm-eval2

cformers

stanford_alpaca

flash-attention

llama.cpp

AlpacaDataCleaned

geov

SaaSHub

Related posts

Stability AI Launches the First of Its StableLM Suite of Language Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com language-model NLP instruct-tuning LM Natural Language Processing Post date: 19 Apr 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
language-model NLP instruct-tuning LM Natural Language Processing
Post date: 19 Apr 2023