-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Yes, sorry about that - I omitted a bit of that information for brevity.
If you want to play with allkeys.txt (which is by far much more sequential, simpler data than UnicodeData.txt) then you only need to remove the non-NFD strings (since the Unicode Collation Algorithm's first step requires you to decompose the string's code points to canonical NFD form), that removes ~2,000 entries.
The full file parser code, which strips those out and other useless information like comments and version information can be found here: https://github.com/jecolon/ziglyph/blob/main/src/collator/Al...
If you want to play around with UnicodeData.txt (which is less sequential, more complex data) then only two fields are used (the code point and decomposition field), and only records where the second field is not empty (the full decomposition type name in angle brackets is not needed, only whether it is or is not there is important.)
The full parser code for that file can be found here: https://github.com/jecolon/ziglyph/blob/main/src/normalizer/...
Hope that helps!