icu4x
UNIC
icu4x | UNIC | |
---|---|---|
25 | 4 | |
1,252 | 234 | |
1.3% | 1.3% | |
9.8 | 0.0 | |
3 days ago | 8 months ago | |
Rust | Rust | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
icu4x
-
Any new Opensource projects in (rust) looking for contributors. I want to start my journey as an OSS contributor.
ICU4X has a large priority backlog which are "issues that the team wants to definitely see fixed, but which currently lack resourcing."
- icu4x: pure rust implementation of the unicode ICU library
-
Self-referential types for fun and profit
this also (probably) means it's safe from LLVM-noalias unsoundness, though it still runs into the same Rust-level unsoundness
- ICU4X: Solving Internationalization for Clients and Limited Environments
-
uni-algo v0.5.0: Modern Unicode Library
Actually, the Rust version also offers multiple modes, see comparison.rs:
-
Announcing ICU4X 1.0 – New Internationalization Library from Unicode
It's generated from https://github.com/unicode-org/icu4x/blob/main/provider/datagen/data/segmenter/dictionary_cj.toml, which in turn comes from ICU4C.
-
The Unicode Consortium announces ICU4X 1.0, its new high-performance internationalization library. It's written in Rust, with official C++ and JavaScript wrappers available.
The code can be seen at https://github.com/unicode-org/icu4x, I count 193 uses of unsafe, though not all are the keyword, and some are in tests.
- icu4x: Can we have `rustc_layout_scalar_valid_range_end` on stable. Lang team: You have `rustc_layout_scalar_valid_range_end` on stable. `rustc_layout_scalar_valid_range_end` on stable:
-
Not a Yoking Matter (Zero-Copy #1)
We've got an issue filed about noalias UB in Yoke.
-
Chinese numerals are not recognized by char::is_numeric
As a reference for his expertise: he's part of team that develops https://github.com/unicode-org/icu4x
UNIC
-
I'm 15 ETH Away from Making the Unicode Character Database (UCD) Available on Rinkeby Testnet
For reference, here is an equivalent library in Rust: https://github.com/open-i18n/rust-unic/
-
icu vs rust_icu
There is also rust-unic which provides both normalization and access to the character database. I have also used this because of their text segmentation support, and I would probably recommend rust-unic in general. I hope to see more progress on that front.
-
Ć Programming Language
I try to be mindful of making my software as accessible as possible, but the following
> creating a lookup table for all the unicode material out there might've been considered impractical or performance-hitting for the developers.
just doesn't ring true to me in any way for current software. I understand that people can be using older software, which is why I strive to restrict myself to ASCII as much as possible for the widest possible support for my users, but my software also supports unicode identifiers, up to and including a whole unicode table to talk about confusables[1]. And not all TTS software "ignores" characters, which is why people advice against using 𝑓𝑎𝑛𝑐𝑦 unicode because it doesn't get read as text but instead each character is described individually. (This is also something that TTS software should support for their users' sake, but I digress.)
[1]: this is thanks to the crate unic-udc containing this information: https://github.com/open-i18n/rust-unic
-
Unicode sorting is hard & why browsers added special emoji matching to regexp
Regarding https://github.com/open-i18n/rust-unic, could it be that the project, or otherwise was superseded by https://github.com/unicode-org/icu4x ?
What are some alternatives?
Fluent - Rust implementation of Project Fluent
I18N - I18N Library for .NET, and Delphi
regex - An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
encoding_rs - A Gecko-oriented implementation of the Encoding Standard in Rust
textwrap - An efficient and powerful Rust library for word wrapping text.
cldr - The home of the Unicode Common Locale Data Repository
whatlang-rs - Natural language detection library for Rust. Try demo online: https://whatlang.org/
rust_icu - rust_icu: rust bindings for ICU (International Components for Unicode) library
cpc - Text calculator with support for units and conversion
verona - Research programming language for concurrent ownership
datamatrix-fu - Data Matrix barcodes in the Fusion programming language