Curated-transformers Alternatives

Similar projects and alternatives to curated-transformers

llama.cpp

775 57,463 10.0 C++ curated-transformers VS llama.cpp

LLM inference in C/C++
Pytorch

340 78,205 10.0 Python curated-transformers VS Pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
tensorflow

223 182,575 10.0 C++ curated-transformers VS tensorflow

An Open Source Machine Learning Framework for Everyone
petals

98 8,710 8.3 Python curated-transformers VS petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
exllama

64 2,609 9.0 Python curated-transformers VS exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
mamba

34 6,280 9.5 C++ curated-transformers VS mamba

The Fast Cross-Platform Package Manager (by mamba-org)
llm.f90

13 48 8.4 Fortran curated-transformers VS llm.f90

LLM inference in Fortran
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
ai-notes

15 4,581 9.8 HTML curated-transformers VS ai-notes

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
ggllm.cpp

8 242 9.5 C curated-transformers VS ggllm.cpp

Falcon LLM ggml framework with CPU and GPU support
rust-llm-guide

1 30 10.0 Rust curated-transformers VS rust-llm-guide

Discontinued A guide to building, training and running large language models using Rust.
mamba-minimal

2 2,273 6.6 Python curated-transformers VS mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
heinsen_sequence

1 70 8.1 curated-transformers VS heinsen_sequence

Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better curated-transformers alternative or higher similarity.

Suggest an alternative to curated-transformers

curated-transformers reviews and mentions

Posts with mentions or reviews of curated-transformers. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-20.

Minimal implementation of Mamba, the new LLM architecture, in 1 file of PyTorch
7 projects | news.ycombinator.com | 20 Dec 2023

https://github.com/explosion/curated-transformers/blob/main/...
Llama 1/2:
https://github.com/explosion/curated-transformers/blob/main/...
MPT:
https://github.com/explosion/curated-transformers/blob/main/...
With various stuff enabled, including support for TorchScript JIT, PyTorch flash attention, etc.
Curated Transformers: MosaicMPT LLM decoder in 90 lines
1 project | news.ycombinator.com | 10 Aug 2023
Non-determinism in GPT-4 is caused by Sparse MoE
3 projects | news.ycombinator.com | 4 Aug 2023

Yeah. In curated transformers [1] we are seeing completely deterministic output across multiple popular transformer architectures on a single GPU (there can be variance between GPUs due to different kernels).
One non-determinism we see with a temperature of 0 is that once you have quantized weights, many predicted pieces will have the same probability, including multiple pieces with the highest probability. And then the sampler (if you are not using a greedy decoder) will sample from those pieces.
In other words, a temperature of 0 is a poor man’s greedy decoding. (It is totally possible that OpenAI’s implementation switches to a greedy decoder with a temperature of 0).
[1] https://github.com/explosion/curated-transformers
Curated Transformers: LLMs from reusable building blocks
1 project | news.ycombinator.com | 4 Aug 2023
Show HN: Curated Transformers – PyTorch LLMs with less code duplication
1 project | news.ycombinator.com | 15 Jul 2023
Show HN: Curated Transformers – Lightweight, composable PyTorch transformers
1 project | news.ycombinator.com | 13 Jul 2023
Falcon LLM – A 40B Model
6 projects | news.ycombinator.com | 17 Jun 2023

There are no big differences compared to other LLM architecturally. The largest differences compared to NeoX are: no biases in linear layers, shared heads for the key and value representations (but not query).
Of course, it has 40B parameters, but there is also a 7B parameter version. The primary issue is that the current upstream version (on Huggingface) hasn't implemented key-value caching correctly. KV caching is needed to bring the complexity down from O(n^3) to O(n^2). The issues are: (1) their implementation uses Torch' scaled dot-product attention, which uses incorrect causal masks when the query/key sizes are not the same (which it the case when generating with a cache. (2) They don't index the rotary embeddings correctly when using key-value cache, so the rotary embedding of the first token is used for all generated tokens. Together, this causes the model to output garbage and it only works when using it without KV caching, making it very slow.
However, this is not a property of the model and they will probably fix this soon. E.g. the transformer library that we are currently developing supports Falcon with key-value caching and it the speed is on-par with other models of the same size:
https://github.com/explosion/curated-transformers/blob/main/...
(This is a correct implementation of the decoder layer.)
A note from our sponsor - SaaSHub
www.saashub.com | 8 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →