Hashids.java vs List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Hashids.java

Hashids algorithm v1.0.0 implementation in Java (by yomorun)

Utility

Source Code

hashids.org

Suggest alternative

Edit details

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

List of Dirty, Naughty, Obscene, and Otherwise Bad Words (by LDNOOBW)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Hashids.java		List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
	Project
31	Mentions	25
1,012	Stars	2,765
0.0%	Growth	1.2%
0.0	Activity	0.0
6 months ago	Latest Commit	2 months ago
Java	Language
MIT License	License	Creative Commons Attribution 4.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Hashids.java

Posts with mentions or reviews of Hashids.java. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-28.

Hashids: Generate short unique ids from integers
1 project | news.ycombinator.com | 22 Jun 2023
Auto Generate Sequential UIID
1 project | /r/csharp | 31 Jan 2023

You basically want Hashids but sequential? Why not simple convert a base 10 (0-9) number to hex? (0-F)
Features I'd Like in PostgreSQL
14 projects | news.ycombinator.com | 28 Jan 2023

I found hashids [1] to be a great compromise between integer ids in the database and copyable non-enumerable strings on the client.
[1] https://hashids.org/
Short, friendly base32 slugs from timestamps
3 projects | news.ycombinator.com | 18 Jan 2023
We Chose NanoIDs for PlanetScale’s API
6 projects | news.ycombinator.com | 29 Dec 2022

I wonder how this might compare to just storing regular autoincrementing ints in the database, and converting to/from hashids (https://hashids.org/) at the edge. It eliminates the collision concern and stores more compactly at the cost of a tiny amount of encode/decode when processing requests. You’d want to push it down as close to the database layer as possible to avoid inadvertent int ID leaks; I added native hashids support to clickhouse but I’m not sure what other database support might entail.
How can I generate truly unique slugs?
2 projects | /r/django | 18 Dec 2022

Since hashids are not really hashes and are not secure at all this is not even achieved. Hashids can be easily decoded without the salt by a simple brute-force attack described by the authors of hashid themselves right on their website: https://hashids.org/
How to handle id-based routes with UUID
1 project | /r/learnprogramming | 14 Dec 2022

You don't necessarily need to use UUIDs. You could use something like Hashids to generate random strings from your sequential IDs in a reversible way, so that users can't predict what their values will be, but you can decode them as needed.
All of my database models have id replaced with UUID4s. Is there any risk to using these in URLs?
1 project | /r/djangolearning | 25 Aug 2022

You should not use UUIDv4 as a primary key. You can use normal int values and then use hashids to make them safe for URL. UUIDv7 might be good to use as well once they are more widely supported as well.
What’s Django’s argument for using 64-bit int as default pk over uuid. Can anyone point me to something I can read?
1 project | /r/django | 18 Aug 2022
Library for generating string IDs from number IDs
1 project | /r/node | 13 Aug 2022

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Posts with mentions or reviews of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-04.

Ask HN: List of Subdomains to Reserve
4 projects | news.ycombinator.com | 4 Mar 2024

Good point. I am already checking against the naughty-words list from here:
https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and...
Where is the banned word list so I can integrate it?
1 project | /r/ecommerce | 27 Jun 2023

https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words is one
We’re Washington Post reporters who analyzed Google’s C4 data set to see which websites AI uses to make itself sound smarter. Ask us Anything!
4 projects | /r/IAmA | 16 May 2023

We know that C4 was used to train Google’s influential T5 model, Facebook’s LLaMA, as well as the open source model Red Pajama. C4 is a very cleaned-up version of a scrape of the internet from the non-profit CommonCrawl taken in 2019. OpenAI’s model GPT-3 used a training dataset that began with 41 scrapes of the web from CommonCrawl from 2016 to 2019 so I think it’s safe to say that something akin to C4 was part of GPT-3. (The researchers who originally looked into C4 argue that these issues are common to all web-scraped datasets.) When we reached out to OpenAI and Google for comment, both companies emphasized that they undergo extensive efforts to weed out potentially problematic data from their training sets. But within the industry, C4 is known as being a heavily filtered dataset and has been criticized, in fact, for eliminating content related to LGBTQ+ identities because of its reliance on a heavy-handed blocklist. (https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words ) We are working on some reporting to try to address your last and very crucial question, but it’s an open area of research and one that even AI developers are struggling to answer.
TIL there's an official list of profanities ChatGPT is trained to avoid
1 project | /r/todayilearned | 20 Apr 2023
Microsoft's paper on OpenAI's GPT-4 had hidden information
3 projects | news.ycombinator.com | 23 Mar 2023

"The Colossal Clean Crawled Corpus, used to train a trillion parameter LM in , is cleaned, inter alia, by discarding any page containing one of a list of about 400 “Dirty, Naughty, Obscene or Otherwise Bad Words”. This list is overwhelmingly words related to sex, with a handful of racial slurs and words related to white supremacy (e.g. swastika, white power) included. While possibly effective at removing documents containing pornography (and the associated problematic stereotypes encoded in the language of such sites) and certain kinds of hate speech, this approach will also undoubtedly attenuate, by suppressing such words as twink, the influence of online spaces built by and for LGBTQ people. If we filter out the discourse of marginalized populations, we fail to provide training data that reclaims slurs and otherwise describes marginalized identities in a positive light"
from "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " https://dl.acm.org/doi/10.1145/3442188.3445922
That list of words is https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and...
Rule
1 project | /r/196 | 17 Mar 2023

Yeah, This is shutterstocks one which they shared
If I made a game with a chatroom, what curses and slurs would I ban?
1 project | /r/gamedev | 3 Mar 2023

I always turn off the chatfilter, so defo let them choose if they want to have it censored or not. For the actual words themselves, there are plenty of lists out there that you can use (like this one). Although these are just regular words, none of the circumvention methods are included
Emad announces a new Stability lab with a new soon model. It looks like a Dall-e 2 style AI to me. Maybe it is our open source Dall-e 2, like KARLO. The images are very interesting. According to Emad "Soon".
1 project | /r/StableDiffusion | 5 Jan 2023

That it's very crudely filtered for naughty words. According to the paper, "We removed any page that contained any word on the “List of Dirty, Naughty, Obscene or Otherwise Bad Words”." That list is here. While it contains a lot of unquestionably ugly words, it also contains words like "tit".
I made a Stable Diffusion for Anime app in your Pocket! Running 100% offline on your Apple Devices (iPhone, iPad, Mac)
4 projects | /r/StableDiffusion | 26 Nov 2022

No problem! I wrote a short json file and Swift script to remove the nsfw words from the prompt during the image generation process, therefore it's not based on the negative prompt. The json file is a txt full with nsfw words so the app can check and remove unwanted prompts, e.g.: https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
Lewdle - A daily lewd word game
1 project | /r/wordle | 27 Jan 2022

This is the closest I’ve come to finding one. It’s not that great.

What are some alternatives?

When comparing Hashids.java and List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words you can also consider the following projects:

BLAKE3 - the official Rust and C implementations of the BLAKE3 cryptographic hash function

google-profanity-words - Full list of bad words and top swear words banned by Google.

uuid7 - UUID version 7, which are time-sortable (following the Peabody RFC4122 draft)

List-of-Dirty-Naughty-Obscene-and

Guava - Google core libraries for Java

git-crypt - Transparent file encryption in git

JGit - JGit project repository (jgit)

following-instructions-human-feedback

Embulk - Embulk: Pluggable Bulk Data Loader.

rmarkdown - Dynamic Documents for R

JADE - a pug implementation written in Java (formerly known as jade)

RedPajama-Data - The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Hashids.java vs BLAKE3 List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words vs google-profanity-words Hashids.java vs uuid7 List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words vs List-of-Dirty-Naughty-Obscene-and Hashids.java vs Guava List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words vs git-crypt Hashids.java vs JGit List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words vs following-instructions-human-feedback Hashids.java vs Embulk List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words vs rmarkdown Hashids.java vs JADE List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words vs RedPajama-Data

Compare Hashids.java vs List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words and see what are their differences.

Hashids.java

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Hashids.java

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

What are some alternatives?