tensil
transformers
tensil | transformers | |
---|---|---|
12 | 176 | |
319 | 125,369 | |
0.0% | 1.7% | |
0.0 | 10.0 | |
over 1 year ago | 3 days ago | |
Scala | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tensil
- Tensil
- Introduction to FPGAs
-
ML projects for FPGA
This is an example project on the higher side of complexity: https://github.com/tensil-ai/tensil.
-
Implementing Deep Convolution Neural Network on FPGA
You might be interested to checkout www.tensil.ai, an open source ML accelerator for FPGA. We don't officially support Stratix yet but you should be able to adapt it quite easily. Reach out on our Discord if you want to talk about it!
-
What do think of Chisel HDL? is it worth learning over Verilog/SystemVerilog?
www.tensil.ai
-
NN Inference on PYNQ-Z2
You should check out Tensil. That's what i had the most success with. You can just follow the tutorial for pynq-z1, only diffrence is that you need to define pynq-z2 board files instead of the ones listed in the tutorial when making your vivado project. The developers are also very active and helpful on discord and github. You can find them at www.tensil.ai
-
Launch HN: Tensil (YC S19) – Open-Source ML Accelerators
Hello HN! I'm Tom, co-founder at Tensil (https://www.tensil.ai/). We design free and open source machine learning accelerators that anyone can use.
A machine learning inference accelerator is a specialized chip that can run the operations used in ML models very quickly and efficiently. It can be either an ASIC or an FPGA, with ASIC giving better performance but FPGA being more flexible.
Custom accelerators offer dramatically better performance per watt than existing GPU and CPU options. Massive companies like Google and Facebook use them to make training and inference cheaper. However, everyone else has been left out: small and mid-sized companies, students and academics, hobbyists and tinkerers currently have no chance of getting ML hardware that perfectly suits their needs. We aim to change that, starting with ML inference on embedded and edge FPGA platforms. Our dream is that our accelerators help people make new applications possible that simply weren't feasible before.
We believe that advances in AI go hand in hand with advances in computing hardware. As a couple of software and ML engineers hoping to live in a world alongside intelligent machines, we wanted to know why those hardware advances were taking so long! We taught ourselves digital design and gradually realized that the next generation of hardware will need to be finely customized to enable state of the art ML models at the edge, that is, running on your devices and not in the cloud. In the CPU world, the RISC-V RocketChip implementation has proven the value of customizable compute hardware. The problem was that no-one was building that kind of capability for ML acceleration. We started Tensil to build customizable ML accelerators and see what kind of applications people can create with them.
Tensil is a set of tools for running ML models on custom accelerator architectures. It includes an RTL generator, a model compiler, and a set of drivers. It enables you to create a custom accelerator, compile an ML model targeted at it, and then deploy and run that compiled model. To see how to do this and get it running on an FPGA platform, check out our tutorial at https://www.tensil.ai/docs/tutorials/resnet20-ultra96v2/.
We developed an accelerator generator in Chisel and then wrote a parameterizable graph compiler in Scala. (Fun fact: unlike in software, formal verification is actually a totally viable way to test digital circuits and we have made great use of this technique.) The accelerator generator takes in the desired architecture parameters and produces an instance of the accelerator which can be synthesized using standard EDA tools. The compiler implements ML models using the accelerator’s instruction set and can target any possible instance of the accelerator.
Currently, the accelerator architecture is based around a systolic array, similar to well-known ML ASICs. You can view the architecture spec in our documentation. The compiler performs a wide variety of tasks but is optimized for convolutional neural networks. There are also drivers for each supported platform, currently limited to FPGAs running bare-metal or with a host OS.
When you tell the driver to run your ML model, it sets up the input data and then streams the compiled model into the accelerator. The accelerator independently accesses host memory during execution. When the accelerator is done, the driver is notified and looks for the output in the pre-assigned area of host memory.
How are we different from other accelerator options? There are many ML ASICs out there but they are all locked into a single architecture, whereas we have customization at the core of our technology. This offers the potential for a better trade-off between performance/price/watts/accuracy. Compared with other FPGA options, Xilinx DPU is great but it’s closed source and can be difficult to work with if your model is in any way customized. By going open source, we aim to support the widest possible range of models. FINN is a very cool project but requires big changes to your model in order to work, and also typically requires large FPGAs which are unsuitable for edge deployments. We work out of the box with any model (no need to quantize), and on small edge FPGAs. For embedded systems, tflite/tfmicro are great for deploying very small ML models on extremely constrained edge devices, but they are limited in terms of the performance and accuracy that can be achieved. Our tools allow you to work with full size state of the art models at high accuracy and speed.
Currently we're focused on the edge and embedded ML inference use case. If you
- Tensil - Open source machine learning inference accelerators on FPGA
transformers
-
AI enthusiasm #9 - A multilingual chatbot📣🈸
transformers is a package by Hugging Face, that helps you interact with models on HF Hub (GitHub)
-
Maxtext: A simple, performant and scalable Jax LLM
Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py
-
Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding
The HuggingFace transformers library already has support for a similar method called prompt lookup decoding that uses the existing context to generate an ngram model: https://github.com/huggingface/transformers/issues/27722
I don't think it would be that hard to switch it out for a pretrained ngram model.
-
AI enthusiasm #6 - Finetune any LLM you want💡
Most of this tutorial is based on Hugging Face course about Transformers and on Niels Rogge's Transformers tutorials: make sure to check their work and give them a star on GitHub, if you please ❤️
-
Schedule-Free Learning – A New Way to Train
* Superconvergence + LR range finder + Fast AI's Ranger21 optimizer was the goto optimizer for CNNs, and worked fabulously well, but on transformers, the learning rate range finder sadi 1e-3 was the best, whilst 1e-5 was better. However, the 1 cycle learning rate stuck. https://github.com/huggingface/transformers/issues/16013
-
Gemma doesn't suck anymore – 8 bug fixes
Thanks! :) I'm pushing them into transformers, pytorch-gemma and collabing with the Gemma team to resolve all the issues :)
The RoPE fix should already be in transformers 4.38.2: https://github.com/huggingface/transformers/pull/29285
My main PR for transformers which fixes most of the issues (some still left): https://github.com/huggingface/transformers/pull/29402
- HuggingFace Transformers: Qwen2
- HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2
- HuggingFace: Support for the Mixtral Moe
-
Paris-Based Startup and OpenAI Competitor Mistral AI Valued at $2B
If you want to tinker with the architecture Hugging Face has a FOSS implementation in transformers: https://github.com/huggingface/transformers/blob/main/src/tr...
If you want to reproduce the training pipeline, you couldn't do that even if you wanted to because you don't have access to thousands of A100s.
What are some alternatives?
VexRiscv - A FPGA friendly 32 bit RISC-V CPU implementation
fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
SpinalHDL - Scala based HDL
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT
Rosebud - Framework for FPGA-accelerated Middlebox Development
llama - Inference code for Llama models
chisel-book - Digital Design with Chisel
transformer-pytorch - Transformer: PyTorch Implementation of "Attention Is All You Need"
Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
DFHDL - DFiant HDL (DFHDL): A Dataflow Hardware Descripition Language
huggingface_hub - The official Python client for the Huggingface Hub.