The complete guide to working with strings in modern JavaScript

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • grapheme-splitter

    A JavaScript library that breaks strings into their individual user-perceived characters.

  • Exactly, and emoji are outside the BMP, so it's not exactly an edge case, but the norm where two code units (UTF-16 double-bytes) are used to make one code point (Unicode character).

    And it gets even worse, when you consider that for many purposes you're not even interested in code points but in graphemes -- e.g. a single visible emoji might actually be a combination of 5 code points, represented by 8 UTF-8 code units, taking up 16 bytes.

    If you want to split a string by graphemes, you can either use the main dedicated library for it [3], or the relatively new API Intl.Segmenter [4] which is in Chrome and Safari, but still hasn't made it to Firefox [5].

    [1] https://blog.jonnew.com/posts/poo-dot-length-equals-two

    [2] https://www.contentful.com/blog/2016/12/06/unicode-javascrip...

    [3] https://github.com/orling/grapheme-splitter

    [4] https://github.com/tc39/proposal-intl-segmenter

    [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1423593

  • proposal-intl-segmenter

    Unicode text segmentation for ECMAScript

  • Exactly, and emoji are outside the BMP, so it's not exactly an edge case, but the norm where two code units (UTF-16 double-bytes) are used to make one code point (Unicode character).

    And it gets even worse, when you consider that for many purposes you're not even interested in code points but in graphemes -- e.g. a single visible emoji might actually be a combination of 5 code points, represented by 8 UTF-8 code units, taking up 16 bytes.

    If you want to split a string by graphemes, you can either use the main dedicated library for it [3], or the relatively new API Intl.Segmenter [4] which is in Chrome and Safari, but still hasn't made it to Firefox [5].

    [1] https://blog.jonnew.com/posts/poo-dot-length-equals-two

    [2] https://www.contentful.com/blog/2016/12/06/unicode-javascrip...

    [3] https://github.com/orling/grapheme-splitter

    [4] https://github.com/tc39/proposal-intl-segmenter

    [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1423593

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • .NET Runtime

    .NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

  • It's latin1. The same is true of DOM strings in Chromium, like attributes, blocks of text, and inline scripts.

    Webkit and the JDK implement the same string optimization, while .NET unfortunately doesn't: https://github.com/dotnet/runtime/issues/6612

  • zapatos

    Zero-abstraction Postgres for TypeScript: a non-ORM database library

  • I’m surprised to see no mention of tagged literals, a much more complex version of template literals. For users they may seem ~like a function call without parentheses. But they do quite a bit more.

    Short version: they accept an array of raw substrings and a variadic set of arguments corresponding to the runtime values provided in template positions, each positional value corresponding following the raw string preceding it.

    That raw array is more than what it seems, it also has a getter of raw string values for the template expressions. This is what String.raw (also not mentioned) uses to treat those arguments essentially the same way an untagged template literal would.

    It’s an odd design/interface but it can be used to do some pretty cool stuff. For example, Zapatos[1], a type-safe SQL library for TypeScript.

    My only complaints:

    - I can’t think of a real reason for it to be variadic, and this makes authoring them a little more error prone. You should be able to expect one array of strings with a length N, and one array of (type checkable/inferrable) values with a length N-1.

    2. Likewise I can’t think of a real reason for the raw values to be bolted onto a weird array subclass. It could just as easily have been an iterable third argument.

    1: https://github.com/jawj/zapatos

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts