Terminal support for Emoji – or why terminals don't like families

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • DomTerm

    DOM/JavaScript-based terminal-emulator/console

  • DomTerm (https://domterm.org) does a pretty good job IMNSO. See screenshot here: https://domterm.org/Features.html . DomTerm mostly delegates to the browser and the font how to compose extended Grapheme Clusters.

    Unfortunately, as far as I know there is no "complete" monospace font set that handles emoji. Ideally, you want a font with two character widths, with double-width for emoji, hanji (CJK characters), and similar. Instead the browser will substitute these characters from some variable-width font, and then the spacing will be off.

    DomTerm handles this by putting double-width characters as well as Extended Grapheme Clusters in a separate span that is forced to have the correct width. I created a library https://github.com/PerBothner/unicode-properties based on other people's code but optimized for DomTerm's needs: It provides both East Asian Width (for recognizing double-width characters) and character classes (for grapheme clusters) in a single efficient trie structure.

    The DomTerm equivalent of tmux's "select mode" is grapheme-cluster-aware, so left/right-arrow will correct move over an entire grapheme cluster.

  • widecharwidth

    public domain wcwidth implementation

  • >For example, iTerm2 considers the "rosette" emoji to have width 1

    The reason for this is quite possibly that Unicode 9 changed the width for some codepoints (mostly emoji) from 1 to 2, and iTerm until very recently (don't know if it's released yet) defaulted to the Unicode 8 widths, with an opt-in escape sequence to change to Unicode 9.

    >This approach comes from the wcwidth utility, and the comment at the top of the C source file provides further insight into the difficulties faced here.

    That's link goes to Markus Kuhn's implementation from 2007. It supports Unicode 5, and is by now woefully out of date. You don't want to use it anymore.

    Most terminals have their own definition, and the annoying part is that the client application and the terminal need to have theirs in sync or they get weird glitches when moving the cursor.

    Shameless plug: Fish's solution is widecharwidth[0], which is a python script that parses the Unicode data files and generates a wcwidth for C++, Javascript and Rust. It's still a wcwidth, meaning that it has issues with joining code points, but it's at least a start. It's up-to-date with Unicode 14 and, unless they change the data format (again) should be easy to update to future Unicode releases.

    It's public domain and used by at least fish and WezTerm.

    [0]: https://github.com/ridiculousfish/widecharwidth

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Windows Terminal

    The new Windows Terminal and the original Windows console host, all in the same place!

  • Emoji support is currently the top requested feature for Windows Terminal.

    https://github.com/microsoft/terminal/issues?q=is%3Aissue+is...

  • unicode-properties

    Provides fast access to unicode character properties

  • DomTerm (https://domterm.org) does a pretty good job IMNSO. See screenshot here: https://domterm.org/Features.html . DomTerm mostly delegates to the browser and the font how to compose extended Grapheme Clusters.

    Unfortunately, as far as I know there is no "complete" monospace font set that handles emoji. Ideally, you want a font with two character widths, with double-width for emoji, hanji (CJK characters), and similar. Instead the browser will substitute these characters from some variable-width font, and then the spacing will be off.

    DomTerm handles this by putting double-width characters as well as Extended Grapheme Clusters in a separate span that is forced to have the correct width. I created a library https://github.com/PerBothner/unicode-properties based on other people's code but optimized for DomTerm's needs: It provides both East Asian Width (for recognizing double-width characters) and character classes (for grapheme clusters) in a single efficient trie structure.

    The DomTerm equivalent of tmux's "select mode" is grapheme-cluster-aware, so left/right-arrow will correct move over an entire grapheme cluster.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts