Clean-discord Alternatives
Similar projects and alternatives to clean-discord based on common topics and language
-
DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
-
pyreports
pyreports is a python library that allows you to create complex report from various sources
-
crypto-trading-strategy-backtester
Easy-to-use cryptocurrency trading strategy simulator and backtester
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
clean-discord reviews and mentions
-
Massive multi-turn conversational dataset based on cleaned discord data
Splitting up the data currently isn't in the plans yet; if someone (or you) could create a classifier (NOTE: please, please optimize it, the amount of data to process here is not trivial) to split the data into the relevant groups, go ahead and create a branch and pull request to https://github.com/JEF1056/clean-discord and hopefully I can do that for the next release, which is slated toward the end of the year. I can provide small snippets of some of the raw JSON data so you can understand how it's formatted.
-
[R] Massive multi-turn conversational dataset based on cleaned discord data
Included in the raw data are a lot of unwanted, non-language behaviors; these include massive blocks of code, bot commands, bot messages, ASCII art, messages with only images attached, messages containing only unicode (non-standard) spaces, etc. Other messages are considered "toxic", e.g. falling under the categories: toxic, obscene, threatening, insulting, or identity hate. The regex and cleaning code can be found here.
Stats
The primary programming language of clean-discord is Python.
Sponsored