Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 7 C++ Utf8 Projects
-
-
simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
Project mention: Vectorizing Unicode conversions on real RISC-V hardware | news.ycombinator.com | 2024-01-27The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
Project mention: tiny-utf8 VS codepoint-iterator - a user suggested alternative | libhunt.com/r/tiny-utf8 | 2023-06-04
-
Project mention: Super Colliding Nix Stores: Nix Flakes for Millions of Developers | news.ycombinator.com | 2023-05-25
-
Project mention: uni-algo: Unicode Algorithms Implementation for C/C++ | news.ycombinator.com | 2024-03-25
-
Project mention: Ugrep – a more powerful, ultra fast, user-friendly, compatible grep | news.ycombinator.com | 2023-12-30
Another issue with Hyperscan is that if you enable HS_FLAG_UTF8[1], which hypergrep does[2,3], and then search invalid UTF-8, then the result is UB.
> This flag instructs Hyperscan to treat the pattern as a sequence of UTF-8 characters. The results of scanning invalid UTF-8 sequences with a Hyperscan library that has been compiled with one or more patterns using this flag are undefined.
That's another issue you'll need to grapple with if you use Hyperscan. PCRE2 used to have this issue[4], but they've since defined the semantics of searching invalid UTF-8 with Unicode mode enabled. ripgrep 14 uses that new mode, but I haven't updated that FAQ answer yet.
[1]: https://intel.github.io/hyperscan/dev-reference/api_files.ht...
[2]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[3]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[4]: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#why...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
C++ Utf8 related posts
- uni-algo: Unicode Algorithms Implementation for C/C++
- Vectorizing Unicode conversions on real RISC-V hardware
- Cray-1 performance vs. modern CPUs
- [Preprint] Transcoding Unicode Characters with AVX-512 Instructions
- Why would a language not natively support SIMD?
- High speed Unicode routines using SIMD
- Is just UTF-8 support good enough?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024
Index
What are some of the best open-source Utf8 projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | ImGuiColorTextEdit | 1,322 |
2 | simdutf | 944 |
3 | Rapidcsv | 798 |
4 | tiny-utf8 | 534 |
5 | text | 395 |
6 | uni-algo | 241 |
7 | hypergrep | 163 |