hate-speech-and-offensive-language VS toxicity

Compare hate-speech-and-offensive-language vs toxicity and see what are their differences.

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
hate-speech-and-offensive-language toxicity
2 12
779 173
- 0.0%
1.9 0.0
over 1 year ago over 2 years ago
Jupyter Notebook
MIT License MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

hate-speech-and-offensive-language

Posts with mentions or reviews of hate-speech-and-offensive-language. We have used some of these posts to build our list of alternatives and similar projects.
  • How to make a class column for a classifier from sentiment analysis results?
    1 project | /r/learnpython | 24 Jan 2022
    I've used NRCLex to perform sentiment analysis on some Twitter data. I have hate speech classifier code (https://github.com/t-davidson/hate-speech-and-offensive-language/blob/master/classifier/final_classifier.ipynb) I want to pass the dataset through, but before I can I need to have a "class" column for the model. For those not familiar, NRCLex returns scores for 10 emotions: anticipation, joy, anger, fear, surprise, disgust, positive, negative, sadness and trust. The table looks like this (letters denoting emotions):
  • Where do we go from here and who is going to step up to help us?
    1 project | news.ycombinator.com | 28 Jan 2021
    Some of this exists, and both Quora and Facebook (among others) use it extensively. Both hate speech and porn are good targets for machine learning. It needs supervision, but it can take a lot of load off human moderators.

    Open source implementations exist, e.g.:

    https://github.com/t-davidson/hate-speech-and-offensive-lang...

    I suspect more message board will want to start applying these sooner rather than later. Most have already figured out that they need anti-spam tools, rather than it coming as a surprise when they roll things out and it fills up with bots. The technology is similar.

    You mention being able to share that information across boards, and I don't know of any widespread implementation of that. You can, at least, let somebody else handle your authentication, which slightly slows their ability to create new accounts when you blacklist one. I'd like to see those sites distinguish "aged" accounts, so that it at least takes some effort or cost to use a new account.

toxicity

Posts with mentions or reviews of toxicity. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-03-24.
  • LLMs Aren't "Trained on the Internet" Anymore
    1 project | news.ycombinator.com | 1 Jun 2024
    Hmm, I don't want to talk specifics about my experience, but maybe check out some of the case studies on Surge's website - https://www.surgehq.ai/ (about halfway down the page).

    > Do they ask PhD to explain root of negative 1 and why is it complex?

    This isn't necessarily impossible, but I would consider it to be infeasible with existing labelling workforces. Of course if you really needed a dataset like this and you were sufficiently resourced and willing to spend, you could maybe make it work (I would question whether you really needed PhDs though, that might be hard to swing at any price point).

    But the core idea behind your question is correct - this is what a dataset might look like and hiring/contracting appropriately-skilled labelers/generators is how you would go about getting it. Depending on the need, it can be quite a bit more complex too - if you needed self-driving car driving behavior data maybe you build a simulator and hire people to drive in the simulator and use that as training data (made up and probably crap example, but it illustrates the possibilities).

    Some people think that labelling workforces are all low skill and there is a lot of good things low skill workforces can do well (visual stuff, basic language and emotion tasks), but you might be surprised at the ability to get skilled labelers. There are lots of smart/educated people around the world and there is ridiculous amounts of money flowing into this space.

  • Perhaps It Is a Bad Thing That the Leading AI Companies Cannot Control Their AIs
    1 project | news.ycombinator.com | 12 Dec 2022
    I'm a PM at a human data company (https://www.surgehq.ai) that helps the large language model companies ensure their models are safe (we're the “clever prompt engineers” who helped Redwood assess their model performance).

    We actually just published a blog today that includes our perspective on building “AI red teams” and best practices for AI alignment/safety: https://www.surgehq.ai/blog/ai-red-teams-for-adversarial-tra...

  • 30% of Google's Emotions Dataset Is Mislabeled
    1 project | news.ycombinator.com | 13 Jul 2022
    I'd love to chat. Want to reach out to the email in my profile? I'm the founder of a much higher-quality data startup (https://www.surgehq.ai), and previously built the human computation platforms at a couple FAANGs.

    We work with a lot of the top AI/NLP companies and research labs, and do both the "typical" data labeling work (sentiment analysis, text categorization, etc), but also a lot more advanced stuff (e.g., training coding assistants, evaluating the new wave of large language models, adversarial labeling, etc -- so not just distinguishing cats and dogs, but rather making full use of the power of the human mind!).

  • Building a No-Code Toxicity Classifier – By Talking to GitHub Copilot
    3 projects | news.ycombinator.com | 24 Mar 2022
    > Rather than operating under a strict definition of toxicity, we asked our team to identify comments that they personally found toxic.

    [0]: https://github.com/surge-ai/toxicity

  • Ask HN: Who is hiring? (January 2022)
    28 projects | news.ycombinator.com | 3 Jan 2022
    Love language? So do we, and our mission is to infuse AI with that same love. At Surge, we're building the human infrastructure to power NLP — from detecting hate speech, to parsing complex documents, to injecting human values into the next wave of language models. Our first product is a platform that helps ML teams create amazing, human-powered datasets to train AI in the richness of language. We're a team of former Google, Facebook, and Airbnb engineering leads, and we work with top companies at the forefront of machine learning. Our tech stack is Ruby on Rails, React, and Python. We’re rapidly growing, and we're looking for full-stack engineers to join the team and develop our product. To apply, please email [email protected] with a resume and 2-3 sentences describing your interest in Surge. We love personal projects and writings too!

    More information: https://www.surgehq.ai/about#careers

    A blog post explaining the problems we are working to solve: https://www.surgehq.ai/blog/the-ai-bottleneck-high-quality-h...

  • The Toxicity Dataset – building the largest free dataset of online toxicity
    1 project | news.ycombinator.com | 9 Dec 2021
  • [Free] The Toxicity Dataset — building the world's largest free dataset of online toxicity [Github]
    1 project | /r/ArtificialInteligence | 9 Dec 2021
  • The Toxicity Dataset — building the world's largest free dataset of online toxicity
    1 project | /r/LanguageTechnology | 9 Dec 2021
  • The Toxicity Dataset (1000 social media comments) — any ideas for interesting visualizations? [github]
    1 project | /r/datavisualization | 8 Dec 2021
  • The Toxicity Dataset - free dataset of online toxicity (Github) - could be used for interesting portfolio projects
    1 project | /r/datascience | 8 Dec 2021

What are some alternatives?

When comparing hate-speech-and-offensive-language and toxicity you can also consider the following projects:

hashformers - Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).

seldon-core - An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Tegridy-MIDI-Dataset - Tegridy MIDI Dataset for precise and effective Music AI models creation.

zotero - Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.

cia - 🐱‍💻 CIA Factbook data analysis and dataset reconstruction, modification, and tuning go here.

Fleet - Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)

PLOD-AbbreviationDetection - This repository contains the PLOD Dataset for Abbreviation Detection released with our LREC 2022 publication

datapane - Build and share data reports in 100% Python

ThoughtSource - A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/

deno - A modern runtime for JavaScript and TypeScript.

airline-sentiment-streaming - Streaming with Airline Sentiment. Utilizing Cloudera Machine Learning, Apache NiFi, Apache Hue, Apache Impala, Apache Kudu

trivy - Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured

Did you konow that Jupyter Notebook is
the 13th most popular programming language
based on number of metions?