nativejson-benchmark vs simdutf

nativejson-benchmark

C/C++ JSON parser/generator benchmark (by miloyip)

simdutf

Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun. (by simdutf)

Utf8 utf16 Unicode Simd Neon Avx2 sse2 Transcoding avx-512 CPP

Source Code

simdutf.github.io

Suggest alternative

Edit details

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

nativejson-benchmark		simdutf
	Project
10	Mentions	12
1,926	Stars	960
-	Growth	4.8%
0.0	Activity	9.1
over 1 year ago	Latest Commit	4 days ago
JavaScript	Language	C++
MIT License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

nativejson-benchmark

Posts with mentions or reviews of nativejson-benchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-06.

Training great LLMs from ground zero in the wilderness as a startup
3 projects | news.ycombinator.com | 6 Mar 2024

Well it would depend on the specifics of the JSON file but eyeballing the stats at https://github.com/miloyip/nativejson-benchmark/tree/master seems to indicate that even on a 2015 MacBook the parsing proceeds using e.g. Configuru parser at several megabytes per second.
What C++ library do you wish existed but hasn’t been created yet?
18 projects | /r/cpp | 8 Jul 2023
How can I quickly parse a huge 45MB JSON file using JsonDecoder
2 projects | /r/swift | 19 Jun 2023

Maybe you need to try some other third party json library and see if it helps. This is a good list https://github.com/miloyip/nativejson-benchmark
Why is Mastodon so slow?
1 project | /r/Mastodon | 10 Nov 2022

Glancing at some benchmarks, RapidJSON stringifies at around 250MB/s on a single core (content-dependent, of course). Does not look like a bottleneck.
Show HN: DAW JSON Link
4 projects | news.ycombinator.com | 12 Aug 2022

How does it compare to the immensely popular JSON for Modern C++ library by nlohmann? https://github.com/nlohmann/json
Also, you should add your library to the JSON benchmarks here: https://github.com/miloyip/nativejson-benchmark#parsing-time
Debunking Cloudflare’s recent performance tests
5 projects | news.ycombinator.com | 6 Dec 2021

I like your ideas, but they seem difficult to enforce. It assumes good faith on all sides. One of the biggest complaints about AI/ML research results: It is frequently hard/impossible to replicate the results.
One idea: The edge competitors can create a public (SourceHut?) project that runs various daily tests against themselves. This would similar to JSON library benchmarks. [1] Then allow each competitors to continuously tweak there settings to accomplish the task in the shortest amount of time.
Also: It would be nice to see a cost analysis. For years, IBM's DB2 was insanely fast if you could afford to pay outrageous hardware, software license, and consulting costs. I'm not in the edge business, but I guess there are some operators where you can just pay a lot more and get better performance -- if you really need it.
[1] https://github.com/miloyip/nativejson-benchmark
How can I parse JSON with C?
3 projects | /r/C_Programming | 21 Oct 2021

There's some useful benchmarks here. I found it while looking for stats on json-c vs parson, which I've used a fair amount.
UniValue JSON Library for C++17 (and above)
3 projects | /r/cpp | 29 Jun 2021

If you looking for benchmarks to show in which cases your library is better than other 30 or so competitors, then see this repo https://github.com/miloyip/nativejson-benchmark
Rocket is a parsing framework for parsing using efficient parsing algorithms
2 projects | /r/dartlang | 29 May 2021

JSON data files from this project: https://github.com/miloyip/nativejson-benchmark
How I cut GTA Online loading times by 70%
3 projects | /r/pcgaming | 28 Feb 2021

Such a shame, really. There is a ton fast json parsers there, like https://github.com/miloyip/nativejson-benchmark#parsing-time. And second issue is just hilarious: let's scan array millions of times, who needs hashmaps anyway?

simdutf

Posts with mentions or reviews of simdutf. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-25.

Decoding UTF8 with Parallel Extract
1 project | news.ycombinator.com | 5 May 2024

IIRC all of the simdutf implementations use a lot of lookup tables except for the AVX512 and RVV backens.
Here is e.g. the NEON code: https://github.com/simdutf/simdutf/blob/1b8ca3d1072a8e2e1026...
Glibc Buffer Overflow in Iconv
1 project | news.ycombinator.com | 21 Apr 2024
Vectorizing Unicode conversions on real RISC-V hardware
1 project | news.ycombinator.com | 27 Jan 2024

The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
Cray-1 performance vs. modern CPUs
4 projects | news.ycombinator.com | 25 Dec 2023
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
```
        # codegen for equivalent of that function
```
What C++ library do you wish existed but hasn’t been created yet?
18 projects | /r/cpp | 8 Jul 2023

utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
1 project | /r/asm | 29 Mar 2023

You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
What's everyone working on this week (10/2023)?
11 projects | /r/rust | 6 Mar 2023

The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
Why would a language not natively support SIMD?
1 project | /r/C_Programming | 17 Feb 2023

You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
High speed Unicode routines using SIMD
1 project | news.ycombinator.com | 3 Sep 2022
text-2.0-rc1 with UTF8 underlying representation is available for testing!
1 project | /r/haskell | 20 Nov 2021

Or via an ultrafast simdutf.

What are some alternatives?

When comparing nativejson-benchmark and simdutf you can also consider the following projects:

json-c - https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/

simdutf8 - SIMD-accelerated UTF-8 validation for Rust.

Jansson - C library for encoding, decoding and manipulating JSON data

DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

EA Standard Template Library - EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.

simde - Implementations of SIMD instruction sets for systems which don't natively support them.

univalue - An easy-to-use and competitively fast JSON parsing library for C++17, forked from Bitcoin Cash Node's own UniValue library.

eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr

text - What a c++ standard Unicode library might look like.

Vc - SIMD Vector Classes for C++

simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks