Programming: How would I turn utf 32 characters/symbols into something I can compare a string to?

This page summarizes the projects mentioned and recommended in the original post on /r/AskProgramming

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • unidecode

    Transliteration from Unicode to US-ASCII and ISO 8859-2.

  • Sounds like you're looking for what is known as "unidecode" -- basically it takes Unicode text and converts it to US-ASCII, while preserving as much as possible. Since you've mentioned Java, here's a library that I've tried out for one tiny project some time ago, it should work well for accented letters; the examples should give you an overview of what it does.

  • homoglyph

    A big list of homoglyphs and some code to detect them

  • However I found this library that you could probably use somehow. But honestly right now I would consider it a more pragmatic solution to write a very short Python program that takes any UTF-8 encoded text file as input and produces the normalized variant as output and then use that file for further processing.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts