grapheme-splitter-lite VS janet-utf8

Compare grapheme-splitter-lite vs janet-utf8 and see what are their differences.

grapheme-splitter-lite

A light-weight Java library that breaks strings into user-perceived characters a.k.a. Grapheme Clusters for common cases. (by hiking93)

janet-utf8

Janet routines for utf8 handling (by andrewchambers)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
grapheme-splitter-lite janet-utf8
1 1
6 16
- -
10.0 10.0
almost 3 years ago over 2 years ago
Kotlin C
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

grapheme-splitter-lite

Posts with mentions or reviews of grapheme-splitter-lite. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-10-02.

janet-utf8

Posts with mentions or reviews of janet-utf8. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-10-02.
  • The Absolute Minimum Every Software Developer Must Know About Unicode in 2023
    7 projects | news.ycombinator.com | 2 Oct 2023
    Regarding UTF-8 encoding:

    “And a couple of important consequences:

    - You CAN’T determine the length of the string by counting bytes.

    - You CAN’T randomly jump into the middle of the string and start reading.

    - You CAN’T get a substring by cutting at arbitrary byte offsets. You might cut off part of the character.”

    One of the things I had to get used to when learning the programming language Janet is that strings are just plain byte sequences, unaware of any encoding. So when I call `length` on a string of one character that is represented by 2 bytes in UTF-8 (e.g. `ä`), the function returns 2 instead of 1. Similar issues occur when trying to take a substring, as mentioned by the author.

    As much as I love the approach Janet took here (it feels clean and simple and works well with their built-in PEGs), it is a bit annoying to work with outside of the ASCII range. Fortunately, there are libraries that can deal with this issue (e.g. https://github.com/andrewchambers/janet-utf8), but I wish they would support conversion to/from UTF-8 out of the box, since I generally like Janet very much.

    One interesting thing I learned from the article is that the first byte can always be determined from its prefix. I always wondered how you would recognize/separate a unicode character in a Janet string since it may have 1-4 bytes length, but I guess this is the answer.

What are some alternatives?

When comparing grapheme-splitter-lite and janet-utf8 you can also consider the following projects:

tonsky.me

text - A spicy text library for C++ that has the explicit goal of enabling the entire ecosystem to share in proper forward progress towards a bright Unicode future.

xi-editor - A modern editor with a backend written in Rust.

hn-search - Hacker News Search