proposal-intl-segmenter
compressed-emoji-shortcodes
proposal-intl-segmenter | compressed-emoji-shortcodes | |
---|---|---|
5 | 3 | |
145 | 16 | |
0.7% | - | |
0.0 | 4.3 | |
over 2 years ago | over 3 years ago | |
HTML | Rust | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
proposal-intl-segmenter
-
String encodings
Splitting by grapheme clusters (or the characters the user actually sees): JS doesn't support this natively, so you'll need a library like grapheme-splitter. There's a Stage-4 proposal in the works, though: Intl.Segmenter:
-
Updates from the 86th meeting of TC39
Intl.Segmenter: Unicode segmentation in JavaScript slides.
-
Is there no .reverse() method for a string like there is for an array?
But even that's not bulletproof. The best method is to divide the string into grapheme clusters before reversing, which is where Intl.Segmenter comes in.
-
The complete guide to working with strings in modern JavaScript
Exactly, and emoji are outside the BMP, so it's not exactly an edge case, but the norm where two code units (UTF-16 double-bytes) are used to make one code point (Unicode character).
And it gets even worse, when you consider that for many purposes you're not even interested in code points but in graphemes -- e.g. a single visible emoji might actually be a combination of 5 code points, represented by 8 UTF-8 code units, taking up 16 bytes.
If you want to split a string by graphemes, you can either use the main dedicated library for it [3], or the relatively new API Intl.Segmenter [4] which is in Chrome and Safari, but still hasn't made it to Firefox [5].
[1] https://blog.jonnew.com/posts/poo-dot-length-equals-two
[2] https://www.contentful.com/blog/2016/12/06/unicode-javascrip...
[3] https://github.com/orling/grapheme-splitter
[4] https://github.com/tc39/proposal-intl-segmenter
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1423593
-
Emoji under the hood
Also potentially (but not in practice so far) locale-specific. See the FAQ on Javascript's implementation: https://github.com/tc39/proposal-intl-segmenter#why-should-we-pass-a-locale-and-options-bag-for-grapheme-boundaries-isnt-there-just-one-way-to-do-it
compressed-emoji-shortcodes
- Emoji under the hood
-
Fast case conversion or how to really compress sparse arrays
I recently did something similar with mapping emoji shortcodes to emoji, where I used a similar bag-of-tricks approach to get a really tight compression ratio.
-
A Quest to Find a Highly Compressed Emoji :shortcode: Lookup Function
And that one of the crates is called kowalski-analysis
What are some alternatives?
grapheme-splitter - A JavaScript library that breaks strings into their individual user-perceived characters.
rust-phf - Compile time static maps for Rust
.NET Runtime - .NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
proposal-source-phase-imports - Proposal to enable importing modules at the source phase
zapatos - Zero-abstraction Postgres for TypeScript: a non-ORM database library
proposal-call-this - A proposal for a simple call-this operator in JavaScript.
proposal-error-cause - TC39 proposal for accumulating errors
proposal-destructuring-private - A proposal integrate private fields and destructuring
proposal-regexp-r-escape - Regular Expression `\R` Escape for ECMAScript
proposal-string-cooked - ECMAScript proposal for String.cooked built-in template tag
proposal-array-grouping - A proposal to make grouping of array items easier