uni-algo VS simdutf

Compare uni-algo vs simdutf and see what are their differences.

simdutf

Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun. (by simdutf)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
uni-algo simdutf
4 12
255 1,035
2.4% 4.9%
8.9 8.9
7 months ago 7 days ago
C++ C++
GNU General Public License v3.0 or later Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

uni-algo

Posts with mentions or reviews of uni-algo. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-07.
  • uni-algo: Unicode Algorithms Implementation for C/C++
    1 project | news.ycombinator.com | 25 Mar 2024
  • uni-algo v0.7.0: constexpr Unicode library and some talk about C++ safety
    1 project | /r/cpp | 7 Feb 2023
    Safe layer is just bounds checks that work in all cases that I need, before that I was coping with -D_GLIBCXX_DEBUG (doesn't have safe iterators for std::string and std::string_view and that I need the most) and MSVC debug iterators (better but slow as hell in debug). You can read more about the implementation here: https://github.com/uni-algo/uni-algo/blob/main/doc/SAFE_LAYER.md Nothing interesting it's possible to implement all of this even in C++98 but no one cared back then and it's a shame that it's not in C++ standard so we cannot choose to use safe or unsafe std::string for example and must rely on implementations in compilers that are simply incomplete in many cases or implement it from scratch.
  • New Unicode library
    4 projects | /r/cpp | 7 Jul 2022
    Why call your files modules? "Modular programming" or "Modular architecture" is pretty standard term in programming I don't think there are a good synonym for "module" word so I plan to use "Modules" and "C++20 Modules" to avoid ambiguity. You have one cpp file in the project. Any reason for that? Including Unicode data files that may be pretty big into header files hurts compilation speed. C++20 Modules will help with that and I plan to support it. I'd also recommend supporting CMake I will support CMake it just I didn't need it in my development stage. Overall, looks very nice. Thank you. I made a post on GitHub with some of my plans: https://github.com/uni-algo/uni-algo/issues/3

simdutf

Posts with mentions or reviews of simdutf. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-25.
  • Decoding UTF8 with Parallel Extract
    1 project | news.ycombinator.com | 5 May 2024
    IIRC all of the simdutf implementations use a lot of lookup tables except for the AVX512 and RVV backens.

    Here is e.g. the NEON code: https://github.com/simdutf/simdutf/blob/1b8ca3d1072a8e2e1026...

  • Glibc Buffer Overflow in Iconv
    1 project | news.ycombinator.com | 21 Apr 2024
  • Vectorizing Unicode conversions on real RISC-V hardware
    1 project | news.ycombinator.com | 27 Jan 2024
    The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.

    [0] https://github.com/simdutf/simdutf

  • Cray-1 performance vs. modern CPUs
    4 projects | news.ycombinator.com | 25 Dec 2023
    I'm actually doing something quite similar in my, in progress, unicode conversion routines.

    For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...

    Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.

    In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).

    So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)

            # codegen for equivalent of that function
  • What C++ library do you wish existed but hasn’t been created yet?
    18 projects | /r/cpp | 8 Jul 2023
    utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
  • [Preprint] Transcoding Unicode Characters with AVX-512 Instructions
    1 project | /r/asm | 29 Mar 2023
    You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
  • What's everyone working on this week (10/2023)?
    11 projects | /r/rust | 6 Mar 2023
    The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
  • Why would a language not natively support SIMD?
    1 project | /r/C_Programming | 17 Feb 2023
    You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
  • High speed Unicode routines using SIMD
    1 project | news.ycombinator.com | 3 Sep 2022
  • text-2.0-rc1 with UTF8 underlying representation is available for testing!
    1 project | /r/haskell | 20 Nov 2021
    Or via an ultrafast simdutf.

What are some alternatives?

When comparing uni-algo and simdutf you can also consider the following projects:

hikogui - Modern accelerated GUI

simdutf8 - SIMD-accelerated UTF-8 validation for Rust.

quick-lint-js - quick-lint-js finds bugs in JavaScript programs

DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

utf8 - UTF-8 support for Nix

simde - Implementations of SIMD instruction sets for systems which don't natively support them.

tiny-utf8 - Unicode (UTF-8) capable std::string

eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr

colrcv - C Library for converting Colours between different Colour Models

Vc - SIMD Vector Classes for C++

simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

parsing-sandbox

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured

Did you konow that C++ is
the 6th most popular programming language
based on number of metions?