-
ToLD-Br
Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The dataset is based on ToLD-Br, which is a huge dataset of tweets (or is it Xeets now?) that contains some additional info such as a classification if the text contains homophobia, obscenity, insults, racism, misogyny and xenophobia. The dataset for the competition, however, is a simple toxicity column.
And that's it! If you want to check it out and train/test this model yourself, feel free to check the code in my GitHub repository!