Top 23 Bert Open-Source Projects
-
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Project mention: HuggingFace Bert Pytorch Implementation Question | reddit.com/r/learnmachinelearning | 2021-04-02I'm walking through the BertModel code from HuggingFace (https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_bert.py) and it’s mostly straightforward except for the parts related to the “decoder” mode. I am confused about why there's a decoder mode for Bert.. From my understanding (may be wrong?) BERT is just an encoder part of the Transformer with MLM/NSP on top. So when would we need to use cross attention here?
-
Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12
You joke but
-
Scout APM
Scout APM - Leading-edge performance monitoring starting at $39/month. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
-
Project mention: [D] What's going to be the dominant language for machine learning in 5 years? | reddit.com/r/MachineLearning | 2021-02-09
A full machine learning pipeline usually comprises far more than just the model, and this is the area where Rust may shine (the recent work by HuggingFace and their https://github.com/huggingface/tokenizers library is a good example)
-
bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Project mention: At which linguistic patterns and features attention heads of BERT look to ? | reddit.com/r/LanguageTechnology | 2021-04-13As indirectly mentioned before, you can visualize the attention in you model with the bertviz package: https://github.com/jessevig/bertviz
-
Project mention: John Snow Labs Spark-NLP 3.0.0: Supporting Spark 3.x, Scala 2.12, more Databricks runtimes, more EMR versions, performance improvements & lots more | reddit.com/r/AZURE | 2021-03-22
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
-
haystack
:mag: End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP. Supports DPR, Elasticsearch, HuggingFace’s Modelhub, and much more!
deepset | Python Engineers, DevOps, Frontend | Berlin, Remote (CET +/- 2) | https://deepset.ai/
We build Haystack, an Open Source-framework that empowers developers to build NLP-powered search pipelines for various use cases: https://github.com/deepset-ai/haystack
On our mission to bring the State-of-the-Art of NLP into every application, we look for different roles to join our team and our journey! If you want to work in one of the most exciting areas of Machine Learning and actively work with an engaged and fast-growing community, reach out to us!
You find our open roles here: http://careers.deepset.ai/ In case you identify with our mission but do not find a suitable role, do not hesitate to still reach out to us at [email protected]
-
https://github.com/github/CodeSearchNet#downloading-data-from-s3
-
-
Project mention: Looking for a code base to implement multi-task learning in NLP | reddit.com/r/LanguageTechnology | 2021-02-22
Jiant should fulfill 1, 2, 4 and 5.
-
FARM
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Project mention: Has anyone deployed a BERT like model across multiple tasks (Multi-class, NER, outlier detection)? Seeking advice. | reddit.com/r/LanguageTechnology | 2021-01-07You can use https://github.com/deepset-ai/FARM or https://github.com/nyu-mll/jiant for multitask learning. The second is more general.
-
Project mention: [D] Good algorithm for clustering big data (sentences represented as embeddings)? | reddit.com/r/MachineLearning | 2021-03-31
-
Project mention: Looking for an automatic text summarization method for academic papers | reddit.com/r/LanguageTechnology | 2021-01-21
Hey if you are building Seq2Seq models to summarize papers and already have the dataset you can look into using SciBert by allenai and u might have a look at S2ORC as ur dataset. Its quite vast and expansive.
-
Project mention: Show HN: Experiments on Machine Translation in Pure Go | news.ycombinator.com | 2021-02-17
-
Project mention: What are some methods to build Google Trend-like indexes? | reddit.com/r/datascience | 2021-03-28
check out BERTopic - https://github.com/MaartenGr/BERTopic
-
-
Project mention: [D] Paper Explained - DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Full Video Analysis) | reddit.com/r/MachineLearning | 2021-02-25
Code: https://github.com/microsoft/DeBERTa
-
-
Project mention: Our new state-of-the-art multilingual NLP Toolkit - Trankit has been released | reddit.com/r/LanguageTechnology | 2021-01-13
Thanks for the question. The main libraries that Trankit's using are pytorch and adapter-transformers. For the GPU requirement, we have tested our toolkit on different scenarios and found that a single GPU with 4GB of memory would be enough for a comfortable use.
-
Project mention: [D] Are there attempts at a large German-language LM? | reddit.com/r/MachineLearning | 2021-04-05
Not even BETO?
-
Project mention: Finding the distance between two sentences that that share mostly the same words. | reddit.com/r/LanguageTechnology | 2021-03-16
PolyFuzz
-
-
detoxify
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers.
Detoxify is the result of three Kaggle competitions proposed to improve toxicity classifiers. Each had a different purpose within the toxicity classifiers context.
-
Project mention: Show HN: Backprop – a simple library to use and finetune state-of-the-art models | news.ycombinator.com | 2021-03-24
Index
What are some of the best open-source Bert projects? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 43,997 |
2 | bert-as-service | 9,125 |
3 | tokenizers | 4,448 |
4 | bertviz | 2,735 |
5 | spark-nlp | 2,067 |
6 | haystack | 1,634 |
7 | CodeSearchNet | 1,390 |
8 | n2o | 1,265 |
9 | jiant | 1,178 |
10 | FARM | 1,170 |
11 | Top2Vec | 1,038 |
12 | scibert | 904 |
13 | spaGO | 859 |
14 | BERTopic | 807 |
15 | bertsearch | 694 |
16 | DeBERTa | 597 |
17 | KeyBERT | 464 |
18 | adapter-transformers | 366 |
19 | beto | 320 |
20 | PolyFuzz | 296 |
21 | commit-autosuggestions | 209 |
22 | detoxify | 193 |
23 | kiri | 155 |