clean-discord

Cleaning discord data for NLP (by JEF1056)

Clean-discord Alternatives

Similar projects and alternatives to clean-discord based on common topics and language

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better clean-discord alternative or higher similarity.

clean-discord reviews and mentions

Posts with mentions or reviews of clean-discord. We have used some of these posts to build our list of alternatives and similar projects.
  • Massive multi-turn conversational dataset based on cleaned discord data
    1 project | /r/datasets | 2 Feb 2021
    Splitting up the data currently isn't in the plans yet; if someone (or you) could create a classifier (NOTE: please, please optimize it, the amount of data to process here is not trivial) to split the data into the relevant groups, go ahead and create a branch and pull request to https://github.com/JEF1056/clean-discord and hopefully I can do that for the next release, which is slated toward the end of the year. I can provide small snippets of some of the raw JSON data so you can understand how it's formatted.
  • [R] Massive multi-turn conversational dataset based on cleaned discord data
    1 project | /r/MachineLearning | 2 Feb 2021
    Included in the raw data are a lot of unwanted, non-language behaviors; these include massive blocks of code, bot commands, bot messages, ASCII art, messages with only images attached, messages containing only unicode (non-standard) spaces, etc. Other messages are considered "toxic", e.g. falling under the categories: toxic, obscene, threatening, insulting, or identity hate. The regex and cleaning code can be found here.

Stats

Basic clean-discord repo stats
2
22
0.0
over 2 years ago

The primary programming language of clean-discord is Python.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com