llm.f90 vs curated-transformers

llm.f90

LLM inference in Fortran (by rbitr)

Suggest topics

Source Code

Suggest alternative

Edit details

curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components (by explosion)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llm.f90		curated-transformers
	Project
13	Mentions	7
48	Stars	838
-	Growth	1.2%
8.4	Activity	9.0
about 2 months ago	Latest Commit	21 days ago
Fortran	Language	Python
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

llm.f90

Posts with mentions or reviews of llm.f90. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-08.

llm.f90: LLM Inference in Fortran
1 project | news.ycombinator.com | 8 Apr 2024
karpathy/llm.c
10 projects | news.ycombinator.com | 8 Apr 2024

I'd like to think he took the name from my llm.f90 project https://github.com/rbitr/llm.f90
It was originally based off of Karpathy's llama2.c but I renamed it when I added support for other architectures.
Probable a coincidence :)
Winteracter – The Fortran GUI Toolset
1 project | news.ycombinator.com | 21 Feb 2024

I'm a Fortran hobbyist. I'm working (unfortunately less frequently now) on a LLM framework in Fortan: https://github.com/rbitr/llm.f90
Fortran implementation of phi-2 LLM
1 project | news.ycombinator.com | 19 Jan 2024
Fortran implementation of phi-2 language model
1 project | news.ycombinator.com | 18 Jan 2024
TinyLlama: An Open-Source Small Language Model
3 projects | news.ycombinator.com | 5 Jan 2024

Also, I should promote the code I wrote for running this. It runs models in ggml format, the one I made available is an older checkpoint though. It's easy to convert the newer one. And it's in Fortran but it should be easy to get gfortran if you don't have it installed.
https://github.com/rbitr/llm.f90/tree/optimize16/purefortran
Mamba LLM Inference on CPU
1 project | news.ycombinator.com | 20 Dec 2023
Minimal implementation of Mamba, the new LLM architecture, in 1 file of PyTorch
7 projects | news.ycombinator.com | 20 Dec 2023

The original mamba code has a lot of speed optimizations and other stuff that make it difficult to immediately get so this will help with learning.
I can't help but also plug my own Mamba inference implementation. https://github.com/rbitr/llm.f90/tree/master/ssm
Mamba state-space LLM inference
1 project | news.ycombinator.com | 19 Dec 2023
Guide to the Mamba architecture that claims to be a replacement for Transformers
1 project | news.ycombinator.com | 18 Dec 2023

You may also be interested in https://github.com/rbitr/llm.f90/tree/master/ssm it's my inference only implementation of mamba which ends up being much simpler than the training code in the original repo

curated-transformers

Posts with mentions or reviews of curated-transformers. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-20.

Minimal implementation of Mamba, the new LLM architecture, in 1 file of PyTorch
7 projects | news.ycombinator.com | 20 Dec 2023

https://github.com/explosion/curated-transformers/blob/main/...
Llama 1/2:
https://github.com/explosion/curated-transformers/blob/main/...
MPT:
https://github.com/explosion/curated-transformers/blob/main/...
With various stuff enabled, including support for TorchScript JIT, PyTorch flash attention, etc.
Curated Transformers: MosaicMPT LLM decoder in 90 lines
1 project | news.ycombinator.com | 10 Aug 2023
Non-determinism in GPT-4 is caused by Sparse MoE
3 projects | news.ycombinator.com | 4 Aug 2023

Yeah. In curated transformers [1] we are seeing completely deterministic output across multiple popular transformer architectures on a single GPU (there can be variance between GPUs due to different kernels).
One non-determinism we see with a temperature of 0 is that once you have quantized weights, many predicted pieces will have the same probability, including multiple pieces with the highest probability. And then the sampler (if you are not using a greedy decoder) will sample from those pieces.
In other words, a temperature of 0 is a poor man’s greedy decoding. (It is totally possible that OpenAI’s implementation switches to a greedy decoder with a temperature of 0).
[1] https://github.com/explosion/curated-transformers
Curated Transformers: LLMs from reusable building blocks
1 project | news.ycombinator.com | 4 Aug 2023
Show HN: Curated Transformers – PyTorch LLMs with less code duplication
1 project | news.ycombinator.com | 15 Jul 2023
Show HN: Curated Transformers – Lightweight, composable PyTorch transformers
1 project | news.ycombinator.com | 13 Jul 2023
Falcon LLM – A 40B Model
6 projects | news.ycombinator.com | 17 Jun 2023

There are no big differences compared to other LLM architecturally. The largest differences compared to NeoX are: no biases in linear layers, shared heads for the key and value representations (but not query).
Of course, it has 40B parameters, but there is also a 7B parameter version. The primary issue is that the current upstream version (on Huggingface) hasn't implemented key-value caching correctly. KV caching is needed to bring the complexity down from O(n^3) to O(n^2). The issues are: (1) their implementation uses Torch' scaled dot-product attention, which uses incorrect causal masks when the query/key sizes are not the same (which it the case when generating with a cache. (2) They don't index the rotary embeddings correctly when using key-value cache, so the rotary embedding of the first token is used for all generated tokens. Together, this causes the model to output garbage and it only works when using it without KV caching, making it very slow.
However, this is not a property of the model and they will probably fix this soon. E.g. the transformer library that we are currently developing supports Falcon with key-value caching and it the speed is on-par with other models of the same size:
https://github.com/explosion/curated-transformers/blob/main/...
(This is a correct implementation of the decoder layer.)

What are some alternatives?

When comparing llm.f90 and curated-transformers you can also consider the following projects:

rwkv.f90 - Port of the RWKV-LM model in Fortran (Back to the Future!)

exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

neural-fortran - A parallel framework for deep learning

rust-llm-guide - A guide to building, training and running large language models using Rust.

inference-engine - A deep learning library for use in high-performance computing applications in modern Fortran

ggllm.cpp - Falcon LLM ggml framework with CPU and GPU support

Fortran-code-on-GitHub - Directory of Fortran codes on GitHub, arranged by topic

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

fastGPT - Fast GPT-2 inference written in Fortran

tensorflow - An Open Source Machine Learning Framework for Everyone

mamba-minimal - Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

petals - 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

llm.f90 vs rwkv.f90 curated-transformers vs exllama llm.f90 vs neural-fortran curated-transformers vs rust-llm-guide llm.f90 vs inference-engine curated-transformers vs ggllm.cpp llm.f90 vs Fortran-code-on-GitHub curated-transformers vs Pytorch llm.f90 vs fastGPT curated-transformers vs tensorflow llm.f90 vs mamba-minimal curated-transformers vs petals

Compare llm.f90 vs curated-transformers and see what are their differences.

llm.f90

curated-transformers

llm.f90

curated-transformers

What are some alternatives?