simdutf vs riscv-v-spec

simdutf

Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun. (by simdutf)

Source Code

simdutf.github.io

Suggest alternative

Edit details

riscv-v-spec

Working draft of the proposed RISC-V V vector extension (by riscv)

Suggest topics

DISCONTINUED

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

simdutf		riscv-v-spec
	Project
11	Mentions	43
960	Stars	858
4.4%	Growth	-
9.1	Activity	6.0
3 days ago	Latest Commit	about 2 months ago
C++	Language	Assembly
Apache License 2.0	License	Creative Commons Attribution 4.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

simdutf

Posts with mentions or reviews of simdutf. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-25.

Glibc Buffer Overflow in Iconv
1 project | news.ycombinator.com | 21 Apr 2024
Vectorizing Unicode conversions on real RISC-V hardware
1 project | news.ycombinator.com | 27 Jan 2024

The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
Cray-1 performance vs. modern CPUs
4 projects | news.ycombinator.com | 25 Dec 2023
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
```
        # codegen for equivalent of that function
```
What C++ library do you wish existed but hasn’t been created yet?
18 projects | /r/cpp | 8 Jul 2023

utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
1 project | /r/asm | 29 Mar 2023

You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
What's everyone working on this week (10/2023)?
11 projects | /r/rust | 6 Mar 2023

The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
Why would a language not natively support SIMD?
1 project | /r/C_Programming | 17 Feb 2023

You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
High speed Unicode routines using SIMD
1 project | news.ycombinator.com | 3 Sep 2022
text-2.0-rc1 with UTF8 underlying representation is available for testing!
1 project | /r/haskell | 20 Nov 2021

Or via an ultrafast simdutf.
Simdutf: Unicode validation and transcoding at billions of characters per second
1 project | news.ycombinator.com | 5 Aug 2021

riscv-v-spec

Posts with mentions or reviews of riscv-v-spec. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-25.

Scaleway launches RISC-V servers
1 project | news.ycombinator.com | 3 Mar 2024

Here are some resources I can recommend:
RVV spec (also look at the examples in the repo): https://github.com/riscv/riscv-v-spec/blob/master/v-spec.ado...
RVV intrinsics viewer: https://dzaima.github.io/intrinsics-viewer
Tutorial: RISC-V Vector Extension Demystified (3 hour video going over every instruction): https://youtu.be/oTaOd8qr53U
RISC-V Vector extension in a nutshell: https://fprox.substack.com/p/risc-v-vector-extension-in-a-nu...
If you want to see a more complex example/real world application, then you might also be ibterested ib my article about vectorizing unicode conversions: https://camel-cdr.github.io/rvv-bench-results/articles/vecto...
In terms of development I'd recommend using qemu and a cross compiler, or if you want hardware try to get the kendryte k230 (currently the only sbc with rvv 1.0 support) or wait a bit for better hardware (BPI-F3 and sg2380 should release this year).
Cray-1 performance vs. modern CPUs
4 projects | news.ycombinator.com | 25 Dec 2023
x86 vs ARM; Vector and Matrix Extensions; How do they compare?
2 projects | /r/hardware | 9 Dec 2023

And this isn't just some theoretical or something unlikely to happen - the official spec already contains such a bug. If the writers of the spec can't get things right, even with the small amount of code in the spec, I don't have high hopes that less informed programmers will. RVV being absurdly complicated (IMO, compared to SVE2 and AVX10) doesn't help its cause here.
riscv64 is now an official Debian architecture (rebootstrap in progress)
3 projects | news.ycombinator.com | 23 Jul 2023
Vector vs SIMD
3 projects | /r/RISCV | 29 May 2023
LLVM's libc Gets Much Faster memcpy For RISC-V
1 project | /r/RISCV | 21 May 2023

Will the reference one actually be the most optimal one on future hardware?
Is there any good place to find a copy-paste-able quick reference on RISC-V extensions? Particularly for the vector extension
2 projects | /r/RISCV | 11 Apr 2023
Building a toolchain suitable for compiling V extension code
6 projects | /r/RISCV | 10 Apr 2023

I'll do a deep dive into the https://gms.tf/riscv-vector.html#getting-started tutorial, and probably pop the proverbial stack and just study RVV 0.7.1 on its own (using https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1).
A weird idea for using RV32E on a RV32I core - multithreaded microcontrollers?
1 project | /r/RISCV | 21 Mar 2023

I see your point. You can file a request for it at https://github.com/riscv/riscv-v-spec/issues if you want to pitch it to the relevant ISA bodies. The bar for implementing it pretty high.
Examining the Top Five Fallacies About RISC-V
1 project | news.ycombinator.com | 17 Dec 2022

It's not "unusual"; using data registers for mask is a valid tradeoff especially for low-end implementations, whereas higher-end architectures can easily use shadow registers. Discussed in depth at https://github.com/riscv/riscv-v-spec/issues/811

What are some alternatives?

When comparing simdutf and riscv-v-spec you can also consider the following projects:

simdutf8 - SIMD-accelerated UTF-8 validation for Rust.

riscv-p-spec - RISC-V Packed SIMD Extension

DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

highway - Performance-portable, length-agnostic SIMD with runtime dispatch

simde - Implementations of SIMD instruction sets for systems which don't natively support them.

highway - Highway - A Modern Javascript Transitions Manager

eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr

riscv-bitmanip - Working draft of the proposed RISC-V Bitmanipulation extension

Vc - SIMD Vector Classes for C++

vroom - VRoom! RISC-V CPU

simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

learn-fpga - Learning FPGA, yosys, nextpnr, and RISC-V

simdutf vs simdutf8 riscv-v-spec vs riscv-p-spec simdutf vs DirectXMath riscv-v-spec vs highway simdutf vs simde riscv-v-spec vs highway simdutf vs eve riscv-v-spec vs riscv-bitmanip simdutf vs Vc riscv-v-spec vs vroom simdutf vs simdjson riscv-v-spec vs learn-fpga

Compare simdutf vs riscv-v-spec and see what are their differences.

simdutf

riscv-v-spec

simdutf

riscv-v-spec

What are some alternatives?