Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Sounds like you're looking for what is known as "unidecode" -- basically it takes Unicode text and converts it to US-ASCII, while preserving as much as possible. Since you've mentioned Java, here's a library that I've tried out for one tiny project some time ago, it should work well for accented letters; the examples should give you an overview of what it does.
However I found this library that you could probably use somehow. But honestly right now I would consider it a more pragmatic solution to write a very short Python program that takes any UTF-8 encoded text file as input and produces the normalized variant as output and then use that file for further processing.