huggingface/transformers is an open source project licensed under Apache License 2.0 which is an OSI approved license.
Similar projects and alternatives to transformers
Test prompts for OpenAI's GPT-3 API and the resulting AI-generated texts.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Scout APM - Leading-edge performance monitoring starting at $39/month. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
A robust Python tool for text-based AI training and generation using GPT-2.
A library for efficient similarity search and clustering of dense vectors.
Sentence Embeddings with BERT & XLNet
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
🎈 Simple reactive notebooks for Julia
Jupyter kernel for the C++ programming language
A fast, efficient universal vector embedding utility package.
AI-powered search engine
AI-powered literature discovery and review engine for medical/scientific papers
:star:Github Ranking:star: Github stars and forks ranking list. Github Top100 stars list of different languages. Automatically update daily. | Github仓库排名，每日自动更新
This is a Next.js, Tailwind CSS blogging starter template. Comes out of the box configured with the latest technologies to make technical writing a breeze. Easily configurable and customizable. Perfect as a replacement to existing Jekyll and Hugo individual blogs.
Ask coding questions directly from the terminal
AI-powered understanding of headlines and story text
Weird A.I. Yankovic neural-net based lyrics parody generator
Pre-built mlpack models (by mlpack)
tiny spectral synthesizer with livecoding support
A Perl 5 REPL written in Bash
HuggingFace Bert Pytorch Implementation Question
reddit.com/r/learnmachinelearning | 2021-04-02
I'm walking through the BertModel code from HuggingFace (https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_bert.py) and it’s mostly straightforward except for the parts related to the “decoder” mode. I am confused about why there's a decoder mode for Bert.. From my understanding (may be wrong?) BERT is just an encoder part of the Transformer with MLM/NSP on top. So when would we need to use cross attention here?
AI Can Generate Convincing Text–and Anyone Can Use It
news.ycombinator.com | 2021-03-29
As someone who works on a Python library solely devoted to making AI text generation more accessible to the normal person (https://github.com/minimaxir/aitextgen ) I think the headline is misleading.
Although the article focuses on the release of GPT-Neo, even GPT-2 released in 2019 was good at generating text, it just spat out a lot of garbage requiring curation, which GPT-3/GPT-Neo still requires albeit with a better signal-to-noise ratio.
GPT-Neo, meanwhile, is such a big model that it requires a bit of data engineering work to get operating and generating text (see the README: https://github.com/EleutherAI/gpt-neo ), and it's unclear currently if it's as good as GPT-3, even when comparing models apples-to-apples.
That said, Hugging Face is adding support for GPT-Neo to Transformers (https://github.com/huggingface/transformers/pull/10848 ) which will help make playing with the model easier, and I'll add support to aitextgen if it pans out.
reddit.com/r/LanguageTechnology | 2021-03-24
Regarding the MLM pretraining task, I believe that a separate language modeling head is attached as a classifier for that. For example, take a look at this object in HuggingFace's Transformesr. You can see that this BertForPreTraining object is initialized with a self.cls = BertPreTrainingHeads(config). If you follow the code, then that should lead you to the BertLMPredictionHead object, where you can see that there is a self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False) initialization.
Releasing dl-translate: a python library for text translation between 50 languages using Neural Networks
reddit.com/r/Python | 2021-03-17
If you have a certain knowledge of Machine Learning, then I recommend checking out the huggingface transformers library and the multilingual BART model (trained on 50 languages), as well as the paper behind it.
Growth Hacking Github - How to Get Github Stars
dev.to | 2021-03-10
2. Huggingface Transformers
[D] What method of fine tuning BERT-like models should be optimal ?
reddit.com/r/MachineLearning | 2021-03-09
I don't think warm-up is really necessary. It was used on the original BERT model, but even in Hugging Face's example code (which I think many people now base their experiments on) the warm-up defaults to 0: https://github.com/huggingface/transformers/blob/72d9e039f9c78b9e6559456310ed2221192c0815/src/transformers/training_args.py#L363-L366
PyTorch 1.8 supports AMD ROCm
reddit.com/r/hardware | 2021-03-04
Then, install this library https://huggingface.co/transformers/ (pip3 install transformers) and follow their documentation. Adding just in case that if you do go for the docker approach, you'll have to install transformers from within the container.
Retrieval Augmented Generation with Huggingface Transformers and Ray
reddit.com/r/deeplearning | 2021-02-10
Improving the scalability RAG distributed fine tuning
Transformers: Natural Language Processing for Pytorch and TensorFlow 2.0
news.ycombinator.com | 2021-02-05
Fine-Tune 11B T5 for Generation Task
reddit.com/r/LanguageTechnology | 2021-02-03
Unless you have massive computational resources available, I would wait until HuggingFace irons out the bugs in DeepSpeed: https://github.com/huggingface/transformers/pull/9610
How to Build an AI Text Generator: Text Generation with a GPT-2 Model
dev.to | 2021-02-02
To fine-tune a pre-trained model, we could use the run_langauge_modeling.py. All we need are two text files; one containing the training text pieces, and another containing the text pieces for evaluation.
Which model should I use to pick the best answer for the TOEIC reading test?
reddit.com/r/LanguageTechnology | 2021-01-30
More details on how these models work: https://github.com/huggingface/transformers/issues/7701#issuecomment-707149546
Tutorial series on txtai
dev.to | 2021-01-28
Replicating GPT-2 at Home
news.ycombinator.com | 2021-01-23
For step 3 you need to send the gradients from each GPU somewhere, and then send back either the averaged gradient or the updated model weights. So when the model is large (say, 3GB for GPT 774M!) that's a lot of GPU-GPU communication!
You're right that for the vast majority of ML cases, the models are small enough that the synchronization cost is negligible, though.
I wrote up some benchmarks here:news.ycombinator.com | 2021-01-23
Linux (Ubuntu 20.04) + Cuda 11.2. For the backend I use PyTorch; Tensorflow has some nice optimizations (like XLA, which uses LLVM to JIT optimized code for the GPU), but I found it very painful to get working reliably, and most of the language modeling stuff I've seen uses PyTorch.
For the language model training itself I've been experimenting with a few different things. I started off with Huggingface because it's very easy to get up and running, and I still use its tokenizers library to do BPE training on the C source dataset (though there are still some hitches there – other libraries expect slightly different formats for the tokenizer model, like using different ways to represent the marker).
After prototyping the C language model training at home, I tried moving the training up to NYU's HPC cluster, which has a bunch of 4xV100 and 4xRTX8000 nodes (mainly because the sound of two powerful GPU fans running at 100% gets a bit old after a while). Unfortunately I discovered that with larger models the GPU-GPU communication overhead can be prohibitive (most of the cluster nodes only support P2P GPU communication over PCIe, which is a lot slower than NVLink), and Huggingface's implementation actually performed worse on multiple GPUs than on two 3090s with NVLink (I opened an issue track it here https://github.com/huggingface/transformers/issues/9371 ).
Currently I'm working on getting DeepSpeed running so that I can hopefully get better scaling even in the absence of a fast GPU-GPU interconnect. This is again a little bit annoying, because it seems like every framework wants a slightly different way of representing the tokenizer and training data – I've had to preprocess the dataset in about 4 different ways (plain text, loose JSON, npy (for DeepSpeed), and a custom indexed binary format for Megatron-LM). I'm also hoping to try out Huggingface's recently-released DeepSpeed integration, which (if it works) would be a really nice combination of usability and performance: https://huggingface.co/blog/zero-deepspeed-fairscale
As for other software stack hitches: so, so many. The main one is just managing the different versions of CUDA. The 3090 is only supported starting with CUDA 11.1, but many packages and frameworks only support 11.0 at best. And some of the newer things like DeepSpeed use PyTorch extensions, which require you to have the exact version of CUDA around that was used to build PyTorch. So I've had to do a fair bit of compiling packages from source rather than relying on prebuilt packages.
The path of least resistance here is probably to use the NVIDIA NGC containers, but it took NVIDIA more than a month to get them updated after the 3090 was released, and I find working inside containers for everything inconvenient anyway (I hate losing my bash history, and I always accidentally end up losing data or local changes when I exit a container).
Anyway, this ended up being a bit more rambling than I intended, but it was helpful to write it all down and maybe it'll help someone else avoid some stumbling blocks :)