SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Utf8 Open-Source Projects
-
string
Provides an object-oriented API to strings and deals with bytes, UTF-8 code points and grapheme clusters in a unified way (by symfony)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
anyascii
Unicode to ASCII transliteration - C Elixir Go Java JS Julia PHP Python Ruby Rust Shell .NET
-
RSV-Specification
Rows of String Values (RSV Data Format) Specification - A Simple Binary Alternative to CSV
-
boyermoore
Boyer-moore in pure python, search for unicode strings in large files quickly (by eriknyquist)
-
callback_printf
callback_printf allows the implementation of portable sprintf, snprintf, vsprintf and vsnprintf like output functions. The code includes wrappers for those functions. It supports all formats of the C 11 standard. wchar_t arguments and strings are printed as UTF-8. It's pretty fast, threadsafe and has no dependencies to other libraries.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: STB: Single-file public domain libraries for C/C++ | news.ycombinator.com | 2024-01-06
Project mention: Vectorizing Unicode conversions on real RISC-V hardware | news.ycombinator.com | 2024-01-27The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
Project mention: tiny-utf8 VS codepoint-iterator - a user suggested alternative | libhunt.com/r/tiny-utf8 | 2023-06-04
Project mention: Super Colliding Nix Stores: Nix Flakes for Millions of Developers | news.ycombinator.com | 2023-05-25
Project mention: uni-algo: Unicode Algorithms Implementation for C/C++ | news.ycombinator.com | 2024-03-25
Project mention: Ugrep – a more powerful, ultra fast, user-friendly, compatible grep | news.ycombinator.com | 2023-12-30Another issue with Hyperscan is that if you enable HS_FLAG_UTF8[1], which hypergrep does[2,3], and then search invalid UTF-8, then the result is UB.
> This flag instructs Hyperscan to treat the pattern as a sequence of UTF-8 characters. The results of scanning invalid UTF-8 sequences with a Hyperscan library that has been compiled with one or more patterns using this flag are undefined.
That's another issue you'll need to grapple with if you use Hyperscan. PCRE2 used to have this issue[4], but they've since defined the semantics of searching invalid UTF-8 with Unicode mode enabled. ripgrep 14 uses that new mode, but I haven't updated that FAQ answer yet.
[1]: https://intel.github.io/hyperscan/dev-reference/api_files.ht...
[2]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[3]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[4]: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#why...
Project mention: Show HN: Comma Separated Values (CSV) to Unicode Separated Values (USV) | news.ycombinator.com | 2024-03-12A similar concept that is (IMHO) much nicer: RSV
It doesn't need any escaping or quoting: a field just has to be valid UTF-8.
The trick is that the delimiters are bytes that are invalid UTF-8.
The spec fits on a napkin, parsing is trivial, you can jump to the middle of a doc and find the nearest row, etc.
Main downside is you need an editor/viewer that can handle it.
Project mention: nanoprintf VS callback_printf - a user suggested alternative | libhunt.com/r/nanoprintf | 2023-08-16callback_printf is a fully loaded and fast sprintf wrapper that supports a lot of numeral systems and prints Unicode as UTF8. It comes with a little benchmark for checking vsprintf implementations.
Utf8 related posts
- uni-algo: Unicode Algorithms Implementation for C/C++
- Vectorizing Unicode conversions on real RISC-V hardware
- Cray-1 performance vs. modern CPUs
- [Preprint] Transcoding Unicode Characters with AVX-512 Instructions
- What's everyone working on this week (10/2023)?
- Why would a language not natively support SIMD?
- High speed Unicode routines using SIMD
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024
Index
What are some of the best open-source Utf8 projects? This list will help you:
Project | Stars | |
---|---|---|
1 | utf8.h | 1,623 |
2 | string | 1,605 |
3 | ImGuiColorTextEdit | 1,322 |
4 | simdutf | 948 |
5 | Rapidcsv | 798 |
6 | Arduino_GFX | 700 |
7 | tiny-utf8 | 534 |
8 | Portable UTF-8 | 501 |
9 | text | 395 |
10 | uni-algo | 243 |
11 | anyascii | 234 |
12 | hypergrep | 163 |
13 | LuaRT | 152 |
14 | vastringify | 66 |
15 | RSV-Specification | 56 |
16 | with-utf8 | 52 |
17 | utf8 | 49 |
18 | utf8-string | 45 |
19 | sbbs | 43 |
20 | boyermoore | 19 |
21 | utf8-validator | 1 |
22 | callback_printf | 1 |
23 | utf8-conversions | 0 |