Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Curated-transformers Alternatives
Similar projects and alternatives to curated-transformers
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
-
-
petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
-
-
Nutrient
Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
-
-
ai-notes
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
-
-
rust-llm-guide
Discontinued A guide to building, training and running large language models using Rust.
-
MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
-
-
heinsen_sequence
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
curated-transformers discussion
curated-transformers reviews and mentions
-
Minimal implementation of Mamba, the new LLM architecture, in 1 file of PyTorch
https://github.com/explosion/curated-transformers/blob/main/...
Llama 1/2:
https://github.com/explosion/curated-transformers/blob/main/...
MPT:
https://github.com/explosion/curated-transformers/blob/main/...
With various stuff enabled, including support for TorchScript JIT, PyTorch flash attention, etc.
- Curated Transformers: MosaicMPT LLM decoder in 90 lines
-
Non-determinism in GPT-4 is caused by Sparse MoE
Yeah. In curated transformers [1] we are seeing completely deterministic output across multiple popular transformer architectures on a single GPU (there can be variance between GPUs due to different kernels).
One non-determinism we see with a temperature of 0 is that once you have quantized weights, many predicted pieces will have the same probability, including multiple pieces with the highest probability. And then the sampler (if you are not using a greedy decoder) will sample from those pieces.
In other words, a temperature of 0 is a poor man’s greedy decoding. (It is totally possible that OpenAI’s implementation switches to a greedy decoder with a temperature of 0).
[1] https://github.com/explosion/curated-transformers
- Curated Transformers: LLMs from reusable building blocks
- Show HN: Curated Transformers – PyTorch LLMs with less code duplication
- Show HN: Curated Transformers – Lightweight, composable PyTorch transformers
-
Falcon LLM – A 40B Model
There are no big differences compared to other LLM architecturally. The largest differences compared to NeoX are: no biases in linear layers, shared heads for the key and value representations (but not query).
Of course, it has 40B parameters, but there is also a 7B parameter version. The primary issue is that the current upstream version (on Huggingface) hasn't implemented key-value caching correctly. KV caching is needed to bring the complexity down from O(n^3) to O(n^2). The issues are: (1) their implementation uses Torch' scaled dot-product attention, which uses incorrect causal masks when the query/key sizes are not the same (which it the case when generating with a cache. (2) They don't index the rotary embeddings correctly when using key-value cache, so the rotary embedding of the first token is used for all generated tokens. Together, this causes the model to output garbage and it only works when using it without KV caching, making it very slow.
However, this is not a property of the model and they will probably fix this soon. E.g. the transformer library that we are currently developing supports Falcon with key-value caching and it the speed is on-par with other models of the same size:
https://github.com/explosion/curated-transformers/blob/main/...
(This is a correct implementation of the decoder layer.)
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 19 Feb 2025
Stats
explosion/curated-transformers is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of curated-transformers is Python.
Popular Comparisons
- curated-transformers VS llm.f90
- curated-transformers VS ggllm.cpp
- curated-transformers VS mamba-minimal
- curated-transformers VS rust-llm-guide
- curated-transformers VS tensorflow
- curated-transformers VS petals
- curated-transformers VS MindSearch
- curated-transformers VS adaptive-classifier
- curated-transformers VS llama.cpp
- curated-transformers VS Pytorch