Open-source projects categorized as Bert

Top 23 Bert Open-Source Projects

  • GitHub repo transformers

    🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

    Project mention: HuggingFace Bert Pytorch Implementation Question | reddit.com/r/learnmachinelearning | 2021-04-02

    I'm walking through the BertModel code from HuggingFace (https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_bert.py) and it’s mostly straightforward except for the parts related to the “decoder” mode. I am confused about why there's a decoder mode for Bert.. From my understanding (may be wrong?) BERT is just an encoder part of the Transformer with MLM/NSP on top. So when would we need to use cross attention here?

  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12

    You joke but

  • GitHub repo tokenizers

    💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

    Project mention: [D] What's going to be the dominant language for machine learning in 5 years? | reddit.com/r/MachineLearning | 2021-02-09

    A full machine learning pipeline usually comprises far more than just the model, and this is the area where Rust may shine (the recent work by HuggingFace and their https://github.com/huggingface/tokenizers library is a good example)

  • GitHub repo bertviz

    Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

    Project mention: At which linguistic patterns and features attention heads of BERT look to ? | reddit.com/r/LanguageTechnology | 2021-04-13

    As indirectly mentioned before, you can visualize the attention in you model with the bertviz package: https://github.com/jessevig/bertviz

  • GitHub repo spark-nlp

    State of the Art Natural Language Processing

    Project mention: John Snow Labs Spark-NLP 3.0.0: Supporting Spark 3.x, Scala 2.12, more Databricks runtimes, more EMR versions, performance improvements & lots more | reddit.com/r/AZURE | 2021-03-22

    Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!

  • GitHub repo haystack

    :mag: End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP. Supports DPR, Elasticsearch, HuggingFace’s Modelhub, and much more!

    Project mention: Ask HN: Who is hiring? (April 2021) | news.ycombinator.com | 2021-04-01

    deepset | Python Engineers, DevOps, Frontend | Berlin, Remote (CET +/- 2) | https://deepset.ai/

    We build Haystack, an Open Source-framework that empowers developers to build NLP-powered search pipelines for various use cases: https://github.com/deepset-ai/haystack

    On our mission to bring the State-of-the-Art of NLP into every application, we look for different roles to join our team and our journey! If you want to work in one of the most exciting areas of Machine Learning and actively work with an engaged and fast-growing community, reach out to us!

    You find our open roles here: http://careers.deepset.ai/ In case you identify with our mission but do not find a suitable role, do not hesitate to still reach out to us at [email protected]

  • GitHub repo CodeSearchNet

    Datasets, tools, and benchmarks for representation learning of code.

    Project mention: Speedtyper.dev: Type racing for programmers | reddit.com/r/webdev | 2021-01-16


  • GitHub repo n2o

    ⭕ N2O: Distributed Application Server

  • GitHub repo jiant

    jiant is an NLP toolkit

    Project mention: Looking for a code base to implement multi-task learning in NLP | reddit.com/r/LanguageTechnology | 2021-02-22

    Jiant should fulfill 1, 2, 4 and 5.

  • GitHub repo FARM

    :house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

    Project mention: Has anyone deployed a BERT like model across multiple tasks (Multi-class, NER, outlier detection)? Seeking advice. | reddit.com/r/LanguageTechnology | 2021-01-07

    You can use https://github.com/deepset-ai/FARM or https://github.com/nyu-mll/jiant for multitask learning. The second is more general.

  • GitHub repo Top2Vec

    Top2Vec learns jointly embedded topic, document and word vectors.

    Project mention: [D] Good algorithm for clustering big data (sentences represented as embeddings)? | reddit.com/r/MachineLearning | 2021-03-31
  • GitHub repo scibert

    A BERT model for scientific text.

    Project mention: Looking for an automatic text summarization method for academic papers | reddit.com/r/LanguageTechnology | 2021-01-21

    Hey if you are building Seq2Seq models to summarize papers and already have the dataset you can look into using SciBert by allenai and u might have a look at S2ORC as ur dataset. Its quite vast and expansive.

  • GitHub repo spaGO

    Self-contained Machine Learning and Natural Language Processing library in Go

    Project mention: Show HN: Experiments on Machine Translation in Pure Go | news.ycombinator.com | 2021-02-17
  • GitHub repo BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics.

    Project mention: What are some methods to build Google Trend-like indexes? | reddit.com/r/datascience | 2021-03-28

    check out BERTopic - https://github.com/MaartenGr/BERTopic

  • GitHub repo bertsearch

    Elasticsearch with BERT for advanced document search.

  • GitHub repo DeBERTa

    The implementation of DeBERTa

    Project mention: [D] Paper Explained - DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Full Video Analysis) | reddit.com/r/MachineLearning | 2021-02-25

    Code: https://github.com/microsoft/DeBERTa

  • GitHub repo KeyBERT

    Minimal keyword extraction with BERT

    Project mention: Alternate approaches to TF-IDF? | reddit.com/r/LanguageTechnology | 2021-03-14
  • GitHub repo adapter-transformers

    Huggingface Transformers + Adapters = ❤️

    Project mention: Our new state-of-the-art multilingual NLP Toolkit - Trankit has been released | reddit.com/r/LanguageTechnology | 2021-01-13

    Thanks for the question. The main libraries that Trankit's using are pytorch and adapter-transformers. For the GPU requirement, we have tested our toolkit on different scenarios and found that a single GPU with 4GB of memory would be enough for a comfortable use.

  • GitHub repo beto

    BETO - Spanish version of the BERT model

    Project mention: [D] Are there attempts at a large German-language LM? | reddit.com/r/MachineLearning | 2021-04-05

    Not even BETO?

  • GitHub repo PolyFuzz

    Fuzzy string matching, grouping, and evaluation.

    Project mention: Finding the distance between two sentences that that share mostly the same words. | reddit.com/r/LanguageTechnology | 2021-03-16


  • GitHub repo commit-autosuggestions

    A tool that AI automatically recommends commit messages.

  • GitHub repo detoxify

    Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers.

    Project mention: Implementing a toxicity detector in your chatbots | dev.to | 2021-03-25

    Detoxify is the result of three Kaggle competitions proposed to improve toxicity classifiers. Each had a different purpose within the toxicity classifiers context.

  • GitHub repo kiri

    Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

    Project mention: Show HN: Backprop – a simple library to use and finetune state-of-the-art models | news.ycombinator.com | 2021-03-24
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-04-13.


What are some of the best open-source Bert projects? This list will help you:

Project Stars
1 transformers 43,997
2 bert-as-service 9,125
3 tokenizers 4,448
4 bertviz 2,735
5 spark-nlp 2,067
6 haystack 1,634
7 CodeSearchNet 1,390
8 n2o 1,265
9 jiant 1,178
10 FARM 1,170
11 Top2Vec 1,038
12 scibert 904
13 spaGO 859
14 BERTopic 807
15 bertsearch 694
16 DeBERTa 597
17 KeyBERT 464
18 adapter-transformers 366
19 beto 320
20 PolyFuzz 296
21 commit-autosuggestions 209
22 detoxify 193
23 kiri 155