[ANN] unicode-collation 0.1

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/haskell

Our great sponsors
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • SonarLint - Clean code begins in your IDE with SonarLint
  • SaaSHub - Software Alternatives and Reviews
  • text-icu

    This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text.

    Until now, the only way to do proper Unicode sorting in Haskell was to depend on text-icu, which wraps the C library icu4c. However, there are disadvantages to depending on an external C library. In addition, the last release of text-icu was in 2015, and since then there have been changes to icu4c that cause build-failures, as noted in this issue.

  • unicode-transforms

    Fast Unicode normalization in Haskell

    Thanks! Here's a puzzle. Profiling shows that about a third of the time in my code is spent in normalize from unicode-transforms. (Normalization is a required step in the algorithm but can be omitted if you know that the input is already in NFD form.) And when I add a benchmark that omits normalization, I see run time cut by a third. But text-icu's run time in my benchmark doesn't seem to be affected much by whether I set the normalization option. I am not sure how to square that with the benchmarks here that seem to show unicode-transforms outperforming text-icu in normalization. text-icu's documentation says that "an incremental check is performed to see whether the input data is in FCD form. If the data is not in FCD form, incremental NFD normalization is performed." I'm not sure exactly what this means, but it may mean that text-icu avoids normalizing the whole string, but just normalizes enough to do the comparison, and sometimes avoids normalization altogether if it can quickly determine that the string is already normalized. I don't see a way to do this currently with unicode-transforms.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • unicode-collation

    Haskell implementation of the Unicode Collation Algorithm

    Why don't you open an issue at https://github.com/jgm/unicode-collation -- it would be a better place to hash out the details than here.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts