Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Unicode Open-Source Projects
-
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
-
Twitter Text Obj
Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.
-
-
ugrep
NEW ugrep 6.1: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
string
Provides an object-oriented API to strings and deals with bytes, UTF-8 code points and grapheme clusters in a unified way (by symfony)
-
-
-
-
simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Continuation passing monads form the basis of a perfectly valid and usable software architecture and programming pattern.
In the case of ostream and operator<<, this pattern reduces the number of intermediate objects that would otherwise be constructed.
If you object to iostream on religious or stylistic grounds, there's always fmt which is more like Go or Python string interpolation.[0]
0. https://fmt.dev
https://symbl.cc/en/unicode/blocks/cjk-unified-ideographs/ unicode table https://www.flokoe.de/posts/database-character-sets-and-collations-explained/ mysql中character set和collation解释 https://gitlab.pyicu.org/main/pyicu python版本icu仓库 https://github.com/dverite/icu_ext/tree/master postgresql icu扩展 https://unicode.org/reports/tr10/ unicode官方collation算法文档 https://www.unicode.org/reports/tr35/tr35-collation.html CLDR collation algorithm,collation算法的补充文档 https://peter.eisentraut.org/blog/2023/03/14/how-collation-works collation工作机制描述 https://peter.eisentraut.org/blog/2023/04/12/how-collation-of-punctuation-and-whitespace-works 同上 https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings#colalternate icu中collation设置参数描述 https://peter.eisentraut.org/blog/2023/06/13/overview-of-icu-collation-settings-part-2 同上 https://www.postgresql.org/docs/current/collation.html pg中collation官方文档 https://unicode-org.github.io/icu/userguide/collation/ icu中collation文档 https://gist.github.com/dpk/8325992 pyicu cheatsheet https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator 前端国际化对象Intl的collator支持文档 https://github.com/unicode-org/icu/blob/main/docs/userguide/collation/customization/index.md icu collation自定义文档 https://www.rfc-editor.org/rfc/rfc4647.html Unicode中collation的locale标准,html中lang也是使用这个 https://aticleworld.com/strxfrm-in-c/ glibc strxfrm示例 https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html glibc升级对pg带来的影响 https://github.com/awslabs/compat-collation-for-glibc/tree/2.17-326.el7 aws应对glibc不同版本影响collate新建的库 https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/don-t-let-collation-versions-corrupt-your-postgresql-indexes/ba-p/1978394 collation版本对pg索引的影响 https://xobo.org/unicode-normalization-nfd-nfc-nfkd-nfkc/ Unicode normalization中文示例 https://www.w3.org/International/articles/language-tags/ Language tags描述 https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry Language subtag注册表 https://www.w3.org/International/questions/qa-choosing-language-tags 如何选择language tags https://icu4c-demos.unicode.org/icu-bin/collation.html Unicode官方collation在线例子 https://github.com/unicode-org/icu-demos/tree/main/webdemo/collation Unicode官方collation例子仓库
Project mention: Ugrep – a more powerful, ultra fast, user-friendly, compatible grep | news.ycombinator.com | 2023-12-30
Project mention: Alacritty – A fast, cross-platform, OpenGL terminal emulator | news.ycombinator.com | 2024-05-21
Project mention: STB: Single-file public domain libraries for C/C++ | news.ycombinator.com | 2024-01-06
One alternative you could consider is narrow fonts like Pragmata[1] (commercial) or Iosevka[2] (gratis, FOSS). Being able to fit more stuff onto you screen side-by-side is what enabled me to get as much into tmux as I am now.
[1] https://www.fsd.it/shop/fonts/pragmatapro/
[2] https://typeof.net/Iosevka/
Project mention: Non-code contributions are the secret to open source success | news.ycombinator.com | 2024-02-13Unit-tests are built into the language, as is comment-based documentation—put those two together and you get unit-tests as documentation examples built into the language; all it takes is to put a documentation comment (which can be blank) right before a `unittest` block after a declaration.
E.g. the examples for the D standard-library's `curry` function are just unit-tests: the docs: https://dlang.org/phobos/std_functional.html#quickindex.curr... the code: https://github.com/dlang/phobos/blob/42b8c65ccfd35c863f7cedf...
IIRC all of the simdutf implementations use a lot of lookup tables except for the AVX512 and RVV backens.
Here is e.g. the NEON code: https://github.com/simdutf/simdutf/blob/1b8ca3d1072a8e2e1026...
Unicode discussion
Unicode related posts
-
排序(collation)探究
-
Inzerosight, an Encoder for Zero-Width
-
Decoding UTF8 with Parallel Extract
-
Glibc Buffer Overflow in Iconv
-
Inzerosight
-
uni-algo: Unicode Algorithms Implementation for C/C++
-
Interval Parsing Grammars for File Format Parsing (2023) [pdf]
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 Jun 2024
Index
What are some of the best open-source Unicode projects? This list will help you:
Project | Stars | |
---|---|---|
1 | {fmt} | 19,666 |
2 | gemoji | 4,367 |
3 | harfbuzz | 3,681 |
4 | Twitter Text Obj | 3,059 |
5 | icu | 2,605 |
6 | ugrep | 2,489 |
7 | contour | 2,283 |
8 | ansiweather | 1,858 |
9 | emoji-regex | 1,707 |
10 | utf8.h | 1,649 |
11 | zws | 1,638 |
12 | string | 1,626 |
13 | Diagon | 1,461 |
14 | tomlplusplus | 1,437 |
15 | UnicodePlots.jl | 1,395 |
16 | pragmatapro | 1,373 |
17 | awesome-typography | 1,333 |
18 | icu4x | 1,274 |
19 | phobos | 1,175 |
20 | simdutf | 996 |
21 | cldr | 863 |
22 | streamly | 851 |
23 | ecoji | 827 |