The complete guide to working with strings in modern JavaScript

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

grapheme-splitter

4 894 0.0 JavaScript

A JavaScript library that breaks strings into their individual user-perceived characters.

Exactly, and emoji are outside the BMP, so it's not exactly an edge case, but the norm where two code units (UTF-16 double-bytes) are used to make one code point (Unicode character).
And it gets even worse, when you consider that for many purposes you're not even interested in code points but in graphemes -- e.g. a single visible emoji might actually be a combination of 5 code points, represented by 8 UTF-8 code units, taking up 16 bytes.
If you want to split a string by graphemes, you can either use the main dedicated library for it [3], or the relatively new API Intl.Segmenter [4] which is in Chrome and Safari, but still hasn't made it to Firefox [5].
[1] https://blog.jonnew.com/posts/poo-dot-length-equals-two
[2] https://www.contentful.com/blog/2016/12/06/unicode-javascrip...
[3] https://github.com/orling/grapheme-splitter
[4] https://github.com/tc39/proposal-intl-segmenter
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1423593

proposal-intl-segmenter

5 145 0.0 HTML

Unicode text segmentation for ECMAScript

Exactly, and emoji are outside the BMP, so it's not exactly an edge case, but the norm where two code units (UTF-16 double-bytes) are used to make one code point (Unicode character).
And it gets even worse, when you consider that for many purposes you're not even interested in code points but in graphemes -- e.g. a single visible emoji might actually be a combination of 5 code points, represented by 8 UTF-8 code units, taking up 16 bytes.
If you want to split a string by graphemes, you can either use the main dedicated library for it [3], or the relatively new API Intl.Segmenter [4] which is in Chrome and Safari, but still hasn't made it to Firefox [5].
[1] https://blog.jonnew.com/posts/poo-dot-length-equals-two
[2] https://www.contentful.com/blog/2016/12/06/unicode-javascrip...
[3] https://github.com/orling/grapheme-splitter
[4] https://github.com/tc39/proposal-intl-segmenter
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1423593

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
.NET Runtime

607 14,091 10.0 C#

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

It's latin1. The same is true of DOM strings in Chromium, like attributes, blocks of text, and inline scripts.
Webkit and the JDK implement the same string optimization, while .NET unfortunately doesn't: https://github.com/dotnet/runtime/issues/6612

zapatos

4 1,217 7.3 TypeScript

Zero-abstraction Postgres for TypeScript: a non-ORM database library

I’m surprised to see no mention of tagged literals, a much more complex version of template literals. For users they may seem ~like a function call without parentheses. But they do quite a bit more.
Short version: they accept an array of raw substrings and a variadic set of arguments corresponding to the runtime values provided in template positions, each positional value corresponding following the raw string preceding it.
That raw array is more than what it seems, it also has a getter of raw string values for the template expressions. This is what String.raw (also not mentioned) uses to treat those arguments essentially the same way an untagged template literal would.
It’s an odd design/interface but it can be used to do some pretty cool stuff. For example, Zapatos[1], a type-safe SQL library for TypeScript.
My only complaints:
- I can’t think of a real reason for it to be variadic, and this makes authoring them a little more error prone. You should be able to expect one array of strings with a length N, and one array of (type checkable/inferrable) values with a length N-1.
2. Likewise I can’t think of a real reason for the raw values to be bolted onto a weird array subclass. It could just as easily have been an iterable third argument.
1: https://github.com/jawj/zapatos

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project