So Long Surrogates: How We Moved to UTF-8 in Haskell

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • rust-dominator

    Zero-cost ultra-high-performance declarative DOM library using FRP signals for Rust!

  • Missing support for characters beyond U+FFFF is the main problem caused by surrogates (their existence, even if indirect)—it normally comes of some kind of UCS-2/UTF-16 confusion. It’s not fair to disqualify them. The only (class of) case that I’m aware of for a long time where it’s not linked to that is with MySQL’s idiotic utf8 → utf8mb3 type.

    You may not have encountered such bugs, but I’m very familiar with surrogate-related bugs, because I use a Compose key extensively. I haven’t been using Windows for the last year, but from time to time I would definitely encounter bugs that are certainly due to surrogates. On the web, I found bugs a few times, all but once in Rust WebAssembly things, such as https://github.com/Pauan/rust-dominator/issues/10. And even now I’m back on Linux, I know of one almost certainly surrogate-related bug: I can’t type astral plane characters in Zoom at all; pretty sure I had this problem back on Windows, too. Copy and paste, sure, but type, no, they become REPLACEMENT CHARACTER.

    The history is unfortunate but I strongly refute that they had not much choice. UCS-2 should have been abandoned as a failed experiment. Certainly there had been significant investment into it in the last few years, but with the benefit of hindsight, switching to UTF-8 (which was invented before they decided on surrogates) would have made everyone’s life much easier, especially given its ASCII-compatibility.

    Ah, BOM characters. Haven’t seen one in years. Good riddance.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Migrating a JavaScript frontend to Leptos, a Rust framework

    4 projects | dev.to | 26 Mar 2024
  • Yew | What’s been your experience?

    10 projects | /r/rust | 4 Apr 2023
  • How Discord Stores Trillions of Messages

    7 projects | /r/programming | 6 Mar 2023
  • Reactive web framework in Rust?

    5 projects | /r/rust | 24 Feb 2023
  • Never heard of rust. Rust who?

    3 projects | /r/ProgrammerHumor | 21 Feb 2023