Top 23 Bert Open-Source Projects
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.Project mention: HuggingFace Bert Pytorch Implementation Question | reddit.com/r/learnmachinelearning | 2021-04-02
I'm walking through the BertModel code from HuggingFace (https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_bert.py) and it’s mostly straightforward except for the parts related to the “decoder” mode. I am confused about why there's a decoder mode for Bert.. From my understanding (may be wrong?) BERT is just an encoder part of the Transformer with MLM/NSP on top. So when would we need to use cross attention here?
Mapping a variable-length sentence to a fixed-length vector using BERT modelProject mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12
You joke but
Scout APM - Leading-edge performance monitoring starting at $39/month. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
💥 Fast State-of-the-Art Tokenizers optimized for Research and ProductionProject mention: [D] What's going to be the dominant language for machine learning in 5 years? | reddit.com/r/MachineLearning | 2021-02-09
A full machine learning pipeline usually comprises far more than just the model, and this is the area where Rust may shine (the recent work by HuggingFace and their https://github.com/huggingface/tokenizers library is a good example)
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)Project mention: At which linguistic patterns and features attention heads of BERT look to ? | reddit.com/r/LanguageTechnology | 2021-04-13
As indirectly mentioned before, you can visualize the attention in you model with the bertviz package: https://github.com/jessevig/bertviz
State of the Art Natural Language ProcessingProject mention: John Snow Labs Spark-NLP 3.0.0: Supporting Spark 3.x, Scala 2.12, more Databricks runtimes, more EMR versions, performance improvements & lots more | reddit.com/r/AZURE | 2021-03-22
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
:mag: End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP. Supports DPR, Elasticsearch, HuggingFace’s Modelhub, and much more!Project mention: Ask HN: Who is hiring? (April 2021) | news.ycombinator.com | 2021-04-01
deepset | Python Engineers, DevOps, Frontend | Berlin, Remote (CET +/- 2) | https://deepset.ai/
We build Haystack, an Open Source-framework that empowers developers to build NLP-powered search pipelines for various use cases: https://github.com/deepset-ai/haystack
On our mission to bring the State-of-the-Art of NLP into every application, we look for different roles to join our team and our journey! If you want to work in one of the most exciting areas of Machine Learning and actively work with an engaged and fast-growing community, reach out to us!
Datasets, tools, and benchmarks for representation learning of code.Project mention: Speedtyper.dev: Type racing for programmers | reddit.com/r/webdev | 2021-01-16
⭕ N2O: Distributed Application Server
jiant is an NLP toolkitProject mention: Looking for a code base to implement multi-task learning in NLP | reddit.com/r/LanguageTechnology | 2021-02-22
Jiant should fulfill 1, 2, 4 and 5.
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.Project mention: Has anyone deployed a BERT like model across multiple tasks (Multi-class, NER, outlier detection)? Seeking advice. | reddit.com/r/LanguageTechnology | 2021-01-07
You can use https://github.com/deepset-ai/FARM or https://github.com/nyu-mll/jiant for multitask learning. The second is more general.
Top2Vec learns jointly embedded topic, document and word vectors.Project mention: [D] Good algorithm for clustering big data (sentences represented as embeddings)? | reddit.com/r/MachineLearning | 2021-03-31
A BERT model for scientific text.Project mention: Looking for an automatic text summarization method for academic papers | reddit.com/r/LanguageTechnology | 2021-01-21
Hey if you are building Seq2Seq models to summarize papers and already have the dataset you can look into using SciBert by allenai and u might have a look at S2ORC as ur dataset. Its quite vast and expansive.
Self-contained Machine Learning and Natural Language Processing library in GoProject mention: Show HN: Experiments on Machine Translation in Pure Go | news.ycombinator.com | 2021-02-17
Leveraging BERT and c-TF-IDF to create easily interpretable topics.Project mention: What are some methods to build Google Trend-like indexes? | reddit.com/r/datascience | 2021-03-28
check out BERTopic - https://github.com/MaartenGr/BERTopic
Elasticsearch with BERT for advanced document search.
The implementation of DeBERTaProject mention: [D] Paper Explained - DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Full Video Analysis) | reddit.com/r/MachineLearning | 2021-02-25
Minimal keyword extraction with BERTProject mention: Alternate approaches to TF-IDF? | reddit.com/r/LanguageTechnology | 2021-03-14
Huggingface Transformers + Adapters = ❤️Project mention: Our new state-of-the-art multilingual NLP Toolkit - Trankit has been released | reddit.com/r/LanguageTechnology | 2021-01-13
Thanks for the question. The main libraries that Trankit's using are pytorch and adapter-transformers. For the GPU requirement, we have tested our toolkit on different scenarios and found that a single GPU with 4GB of memory would be enough for a comfortable use.
BETO - Spanish version of the BERT modelProject mention: [D] Are there attempts at a large German-language LM? | reddit.com/r/MachineLearning | 2021-04-05
Not even BETO?
Fuzzy string matching, grouping, and evaluation.Project mention: Finding the distance between two sentences that that share mostly the same words. | reddit.com/r/LanguageTechnology | 2021-03-16
A tool that AI automatically recommends commit messages.
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers.Project mention: Implementing a toxicity detector in your chatbots | dev.to | 2021-03-25
Detoxify is the result of three Kaggle competitions proposed to improve toxicity classifiers. Each had a different purpose within the toxicity classifiers context.
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.Project mention: Show HN: Backprop – a simple library to use and finetune state-of-the-art models | news.ycombinator.com | 2021-03-24
What are some of the best open-source Bert projects? This list will help you: