ziglyph
utfcpp
ziglyph | utfcpp | |
---|---|---|
5 | 3 | |
207 | 1,431 | |
- | - | |
6.7 | 7.3 | |
7 months ago | 4 months ago | |
Zig | C++ | |
MIT License | Boost Software License 1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ziglyph
- What are your favorite utility libraries?
-
Failing to Learn Zig via Advent of Code
> My big problem with Zig is that Andrew Kelley is promising a lot of features, but doesn't really deliver much.
Have you, like, seen the release notes for 0.9.0?
https://ziglang.org/download/0.9.0/release-notes.html
> Zig still can't proper handle UTF-8 strings [1] in 2022
There's plenty of discussion on the subject in basically every HN thread about Zig: the stdlib has some utf8 and wtf validation code, ziglyph implements the full unicode spec.
https://github.com/jecolon/ziglyph
You might not like how it's done, but its factually incorrect to state that Zig can't handle unicode.
> In a `recent` interview[2], he claims that Zig is faster than C and Rust, but he refers to extremely short benchmarking that has almost no value in the real world.
From my reddit reply to this same topic:
This podcast interview might not be the best showcase of the practical implications of Zig's take on safety and performance. If you want something with more meat, I highly recommend Andrew's recent talk from Handmade Seattle, where he shows the work being done on the Zig self-hosted compiler.
https://media.handmade-seattle.com/practical-data-oriented-d...
Lots of bit fiddling that can't be fully proven safe statically, but then you get a compiler capable of compiling Zig code stupidly fast, and that's even without factoring in incremental compilation with in-place binary patching, with which we're aiming for sub-millisecond rebuilds of arbitrarily large projects.
> The ecosystem for zig is insignificant now and a stable release would help the language.
I hope you don't mind if we don't take this advice, given the overall tone of your post.
-
Resizable string in Zig?
For Unicode text processing you can take a look at Ziglyph https://github.com/jecolon/ziglyph and for a sample UTF-8 string structure, Zigstr https://github.com/jecolon/zigstr . (bias alert: I'm the author of both. :^D )
-
Maintain It with Zig
Agreed, and Zig also has a lib for that as well:
https://github.com/jecolon/ziglyph/
-
Unicode data file compression: achieving 40-70% reduction over gzip alone
Yes, sorry about that - I omitted a bit of that information for brevity.
If you want to play with allkeys.txt (which is by far much more sequential, simpler data than UnicodeData.txt) then you only need to remove the non-NFD strings (since the Unicode Collation Algorithm's first step requires you to decompose the string's code points to canonical NFD form), that removes ~2,000 entries.
The full file parser code, which strips those out and other useless information like comments and version information can be found here: https://github.com/jecolon/ziglyph/blob/main/src/collator/Al...
If you want to play around with UnicodeData.txt (which is less sequential, more complex data) then only two fields are used (the code point and decomposition field), and only records where the second field is not empty (the full decomposition type name in angle brackets is not needed, only whether it is or is not there is important.)
The full parser code for that file can be found here: https://github.com/jecolon/ziglyph/blob/main/src/normalizer/...
Hope that helps!
utfcpp
-
Current utf8 support options.
std::string is simply a string of bytes, so can already contain utf-8 encoded text. The only problem is when you want to interact with OS (Windows) API and other library APIs that don't expect utf-8 and when you need to count number of characters etc. For that you can look into existing libraries, e.g. the official Unicode ICU or whatever you can find that others have made, e.g.: https://github.com/nemtrif/utfcpp
-
How to cout a non-ASCII character within a non-ASCII string
Suffice it to say, this is a mess. However, there are libraries that make this easier.
-
Maintain It with Zig
> I've always tried as much as possible to treat strings as just opaque data and never look into them, which tends to work well, but in some domains you really need to look at and massage the characters/codepoints/grapheme clusters, and the lack of a first-citizen UTF-8-aware string type is, I think, a bit unfortunate in this day and age.
You don't need a UTF-8 type for that, you just need routines that handle UTF-8 strings, like utfcpp (https://github.com/nemtrif/utfcpp).
What are some alternatives?
zig-string - A String Library made for Zig
icu - The home of the ICU project source code.
zigstr - Zigstr is a UTF-8 string type for Zig programs.
dstep - A tool for converting C and Objective-C headers to D modules
RIIR - why not Rewrite It In Rust
arocc - A C compiler written in Zig.
zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
cc-rs - Rust library for build scripts to compile C/C++ code into a Rust library
mach - zig game engine & graphics toolkit