Basic transformers repo stats
about 7 hours ago

huggingface/transformers is an open source project licensed under Apache License 2.0 which is an OSI approved license.

Transformers Alternatives

Similar projects and alternatives to transformers

  • GitHub repo gpt-3-experiments

    Test prompts for OpenAI's GPT-3 API and the resulting AI-generated texts.

  • GitHub repo Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • GitHub repo aitextgen

    A robust Python tool for text-based AI training and generation using GPT-2.

  • GitHub repo faiss

    A library for efficient similarity search and clustering of dense vectors.

  • GitHub repo sentence-transformers

    Sentence Embeddings with BERT & XLNet

  • GitHub repo gpt-neo

    An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

  • GitHub repo Pluto.jl

    🎈 Simple reactive notebooks for Julia

  • GitHub repo xeus-cling

    Jupyter kernel for the C++ programming language

  • GitHub repo magnitude

    A fast, efficient universal vector embedding utility package.

  • GitHub repo txtai

    AI-powered search engine

  • GitHub repo paperai

    AI-powered literature discovery and review engine for medical/scientific papers

  • GitHub repo Github-Ranking

    :star:Github Ranking:star: Github stars and forks ranking list. Github Top100 stars list of different languages. Automatically update daily. | Github仓库排名,每日自动更新

  • GitHub repo tailwind-nextjs-starter-blog

    This is a Next.js, Tailwind CSS blogging starter template. Comes out of the box configured with the latest technologies to make technical writing a breeze. Easily configurable and customizable. Perfect as a replacement to existing Jekyll and Hugo individual blogs.

  • GitHub repo codequestion

    Ask coding questions directly from the terminal

  • GitHub repo tldrstory

    AI-powered understanding of headlines and story text

  • GitHub repo weirdai

    Weird A.I. Yankovic neural-net based lyrics parody generator

  • GitHub repo examples

    Pre-built mlpack models (by mlpack)

  • GitHub repo tinyspec-cling

    tiny spectral synthesizer with livecoding support

  • GitHub repo preplish

    A Perl 5 REPL written in Bash

NOTE: The number of mentions on this list indicates mentions on common posts. Hence, a higher number means a better transformers alternative or higher similarity.


Posts where transformers has been mentioned. We have used some of these posts to build our list of alternatives and similar projects - the last one was on 2021-03-29.
  • HuggingFace Bert Pytorch Implementation Question
    I'm walking through the BertModel code from HuggingFace ( and it’s mostly straightforward except for the parts related to the “decoder” mode. I am confused about why there's a decoder mode for Bert.. From my understanding (may be wrong?) BERT is just an encoder part of the Transformer with MLM/NSP on top. So when would we need to use cross attention here?
  • AI Can Generate Convincing Text–and Anyone Can Use It | 2021-03-29
    As someone who works on a Python library solely devoted to making AI text generation more accessible to the normal person ( ) I think the headline is misleading.

    Although the article focuses on the release of GPT-Neo, even GPT-2 released in 2019 was good at generating text, it just spat out a lot of garbage requiring curation, which GPT-3/GPT-Neo still requires albeit with a better signal-to-noise ratio.

    GPT-Neo, meanwhile, is such a big model that it requires a bit of data engineering work to get operating and generating text (see the README: ), and it's unclear currently if it's as good as GPT-3, even when comparing models apples-to-apples.

    That said, Hugging Face is adding support for GPT-Neo to Transformers ( ) which will help make playing with the model easier, and I'll add support to aitextgen if it pans out.

    Regarding the MLM pretraining task, I believe that a separate language modeling head is attached as a classifier for that. For example, take a look at this object in HuggingFace's Transformesr. You can see that this BertForPreTraining object is initialized with a self.cls = BertPreTrainingHeads(config). If you follow the code, then that should lead you to the BertLMPredictionHead object, where you can see that there is a self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False) initialization.
  • Releasing dl-translate: a python library for text translation between 50 languages using Neural Networks | 2021-03-17
    If you have a certain knowledge of Machine Learning, then I recommend checking out the huggingface transformers library and the multilingual BART model (trained on 50 languages), as well as the paper behind it.
  • Growth Hacking Github - How to Get Github Stars | 2021-03-10
    2. Huggingface Transformers
  • [D] What method of fine tuning BERT-like models should be optimal ?
    I don't think warm-up is really necessary. It was used on the original BERT model, but even in Hugging Face's example code (which I think many people now base their experiments on) the warm-up defaults to 0:
  • PyTorch 1.8 supports AMD ROCm | 2021-03-04
    Then, install this library (pip3 install transformers) and follow their documentation. Adding just in case that if you do go for the docker approach, you'll have to install transformers from within the container.
  • Retrieval Augmented Generation with Huggingface Transformers and Ray
    Improving the scalability RAG distributed fine tuning
  • Transformers: Natural Language Processing for Pytorch and TensorFlow 2.0 | 2021-02-05
  • Fine-Tune 11B T5 for Generation Task
    Unless you have massive computational resources available, I would wait until HuggingFace irons out the bugs in DeepSpeed:
  • How to Build an AI Text Generator: Text Generation with a GPT-2 Model | 2021-02-02
    To fine-tune a pre-trained model, we could use the All we need are two text files; one containing the training text pieces, and another containing the text pieces for evaluation.
  • Which model should I use to pick the best answer for the TOEIC reading test?
    More details on how these models work:
  • Tutorial series on txtai | 2021-01-28
  • Replicating GPT-2 at Home | 2021-01-23
    4. Repeat

    For step 3 you need to send the gradients from each GPU somewhere, and then send back either the averaged gradient or the updated model weights. So when the model is large (say, 3GB for GPT 774M!) that's a lot of GPU-GPU communication!

    You're right that for the vast majority of ML cases, the models are small enough that the synchronization cost is negligible, though.

    I wrote up some benchmarks here: | 2021-01-23
    Linux (Ubuntu 20.04) + Cuda 11.2. For the backend I use PyTorch; Tensorflow has some nice optimizations (like XLA, which uses LLVM to JIT optimized code for the GPU), but I found it very painful to get working reliably, and most of the language modeling stuff I've seen uses PyTorch.

    For the language model training itself I've been experimenting with a few different things. I started off with Huggingface because it's very easy to get up and running, and I still use its tokenizers library to do BPE training on the C source dataset (though there are still some hitches there – other libraries expect slightly different formats for the tokenizer model, like using different ways to represent the marker).

    After prototyping the C language model training at home, I tried moving the training up to NYU's HPC cluster, which has a bunch of 4xV100 and 4xRTX8000 nodes (mainly because the sound of two powerful GPU fans running at 100% gets a bit old after a while). Unfortunately I discovered that with larger models the GPU-GPU communication overhead can be prohibitive (most of the cluster nodes only support P2P GPU communication over PCIe, which is a lot slower than NVLink), and Huggingface's implementation actually performed worse on multiple GPUs than on two 3090s with NVLink (I opened an issue track it here ).

    Currently I'm working on getting DeepSpeed running so that I can hopefully get better scaling even in the absence of a fast GPU-GPU interconnect. This is again a little bit annoying, because it seems like every framework wants a slightly different way of representing the tokenizer and training data – I've had to preprocess the dataset in about 4 different ways (plain text, loose JSON, npy (for DeepSpeed), and a custom indexed binary format for Megatron-LM). I'm also hoping to try out Huggingface's recently-released DeepSpeed integration, which (if it works) would be a really nice combination of usability and performance:

    As for other software stack hitches: so, so many. The main one is just managing the different versions of CUDA. The 3090 is only supported starting with CUDA 11.1, but many packages and frameworks only support 11.0 at best. And some of the newer things like DeepSpeed use PyTorch extensions, which require you to have the exact version of CUDA around that was used to build PyTorch. So I've had to do a fair bit of compiling packages from source rather than relying on prebuilt packages.

    The path of least resistance here is probably to use the NVIDIA NGC containers, but it took NVIDIA more than a month to get them updated after the 3090 was released, and I find working inside containers for everything inconvenient anyway (I hate losing my bash history, and I always accidentally end up losing data or local changes when I exit a container).

    Anyway, this ended up being a bit more rambling than I intended, but it was helpful to write it all down and maybe it'll help someone else avoid some stumbling blocks :)