What are some classification tasks where BERT-based models don't work well? In a similar vein, what are some generative tasks where fine-tuning GPT-2/LM does not work well?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • checklist

    Beyond Accuracy: Behavioral Testing of NLP models with CheckList

    Interesting. Does the model fail on specific nuanced examples or all sentences in general? For e.g, in the Checklist work: https://github.com/marcotcr/checklist, there are examples of some specific sentences, but overall the model works well in a lot of cases. Do you have a code repo/notebook somewhere for experimenting with emotion classification?

  • NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    One place to start is nlp progress if leader boards are your thing, if the model on top of the leader board is not a transformer based model and one further down is, you have your answer.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts