Libgrapheme: A simple freestanding C99 library for Unicode

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • quickjs

    Public repository of the QuickJS Javascript Engine.

  • You can also refer to the Unicode routines of other small JS engines[1,2], those don’t use ICU either, although the implementations are mercilessly size-optimized (to put it politely) and restricted to what the target JS version requires (e.g. casemapping but no normalization).

    [1] https://github.com/bellard/quickjs/blob/master/libunicode.c

    [2] https://github.com/svaarala/duktape/blob/master/src-input/du...

  • Duktape

    Duktape - embeddable Javascript engine with a focus on portability and compact footprint

  • You can also refer to the Unicode routines of other small JS engines[1,2], those don’t use ICU either, although the implementations are mercilessly size-optimized (to put it politely) and restricted to what the target JS version requires (e.g. casemapping but no normalization).

    [1] https://github.com/bellard/quickjs/blob/master/libunicode.c

    [2] https://github.com/svaarala/duktape/blob/master/src-input/du...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • opentype-shaping-documents

    Documentation of OpenType shaping behavior

  • > Off the top of my head, I don't know of a terminal that actually implements the entire (very complex) set of Unicode text rendering behaviors

    There are likely two problems with this:

    First, nobody actually seems to know how bidirectional text should interact with terminal control sequences, or indeed how it should be typeset on a terminal in the first place (where are the paragraph boundaries?). There is the pre-Unicode bi-directional support mode (BDSM, I kid you not) in ECMA-48[1] and TR/53[2], which AFAIK nobody implements nor cares about, and which doesn’t seem to actually; there are terminal emulators made by bidi-language users[3], which AFAIK nobody has written down the behaviour of; there is the Freedesktop bidi terminal spec[4], which is a draft and AFAIK nobody implements yet either but at least some people care about; finally, there are bidi-language users who say that spec is a mistake[5].

    Second, aside from bidi and a smattering of other things such as emoji, there is no detailed “Unicode rendering behaviour”, there are only standards specific to font formats, the most recent being OpenType, which is dubiously compatible across implementations, decently documented only through painstaking reverse engineering (sometimes in words[6], sometimes only in Freetype library code), and generally full of snakes[7]. And it has no notion of monospace font—only of a (proportional) font where all Lat/Cyr/Grk characters just happen to have the same advance.

    AFAICT that is not an oversight or negligence, but rather a concession to the fact that there are scripts which don’t really have a notion of monospace in the typographic tradition and in fact are written such that it’s extremely unclear what monospace would even mean—certainly not one or two cells per codepoint (e.g. Burmese or Tibetan; apparently there are Arabic monospace fonts[8] but I’ve no idea how the hell they work). Not coincidentally, those are the scripts where you need that shaper, otherwise nothing works.

    [1] https://www.ecma-international.org/publications-and-standard...

    [2] https://www.ecma-international.org/publications-and-standard...

    [3] https://news.ycombinator.com/item?id=8086417

    [4] https://terminal-wg.pages.freedesktop.org/bidi/

    [5] http://litcave.rudi.ir/

    [6] https://github.com/n8willis/opentype-shaping-documents

    [7] https://litherum.blogspot.com/2019/03/addition-font.html

    [8] https://news.ycombinator.com/item?id=10395464

  • fbpdf

    A small framebuffer pdf, djvu, epub, xps, and cbz viewer

  • > Off the top of my head, I don't know of a terminal that actually implements the entire (very complex) set of Unicode text rendering behaviors

    There are likely two problems with this:

    First, nobody actually seems to know how bidirectional text should interact with terminal control sequences, or indeed how it should be typeset on a terminal in the first place (where are the paragraph boundaries?). There is the pre-Unicode bi-directional support mode (BDSM, I kid you not) in ECMA-48[1] and TR/53[2], which AFAIK nobody implements nor cares about, and which doesn’t seem to actually; there are terminal emulators made by bidi-language users[3], which AFAIK nobody has written down the behaviour of; there is the Freedesktop bidi terminal spec[4], which is a draft and AFAIK nobody implements yet either but at least some people care about; finally, there are bidi-language users who say that spec is a mistake[5].

    Second, aside from bidi and a smattering of other things such as emoji, there is no detailed “Unicode rendering behaviour”, there are only standards specific to font formats, the most recent being OpenType, which is dubiously compatible across implementations, decently documented only through painstaking reverse engineering (sometimes in words[6], sometimes only in Freetype library code), and generally full of snakes[7]. And it has no notion of monospace font—only of a (proportional) font where all Lat/Cyr/Grk characters just happen to have the same advance.

    AFAICT that is not an oversight or negligence, but rather a concession to the fact that there are scripts which don’t really have a notion of monospace in the typographic tradition and in fact are written such that it’s extremely unclear what monospace would even mean—certainly not one or two cells per codepoint (e.g. Burmese or Tibetan; apparently there are Arabic monospace fonts[8] but I’ve no idea how the hell they work). Not coincidentally, those are the scripts where you need that shaper, otherwise nothing works.

    [1] https://www.ecma-international.org/publications-and-standard...

    [2] https://www.ecma-international.org/publications-and-standard...

    [3] https://news.ycombinator.com/item?id=8086417

    [4] https://terminal-wg.pages.freedesktop.org/bidi/

    [5] http://litcave.rudi.ir/

    [6] https://github.com/n8willis/opentype-shaping-documents

    [7] https://litherum.blogspot.com/2019/03/addition-font.html

    [8] https://news.ycombinator.com/item?id=10395464

  • utf8proc

    a clean C library for processing UTF-8 Unicode data

  • kitty

    Cross-platform, fast, feature-rich, GPU based terminal

  • > If anyone knows of a good one for Linux, please let me know!

    kitty has excellent support for this stuff, and is much more performant than anything based on vte (like gnome-terminal).

    https://github.com/kovidgoyal/kitty

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts