nativejson-benchmark
simdutf
nativejson-benchmark | simdutf | |
---|---|---|
10 | 12 | |
1,926 | 960 | |
- | 4.8% | |
0.0 | 9.1 | |
over 1 year ago | 4 days ago | |
JavaScript | C++ | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nativejson-benchmark
-
Training great LLMs from ground zero in the wilderness as a startup
Well it would depend on the specifics of the JSON file but eyeballing the stats at https://github.com/miloyip/nativejson-benchmark/tree/master seems to indicate that even on a 2015 MacBook the parsing proceeds using e.g. Configuru parser at several megabytes per second.
- What C++ library do you wish existed but hasn’t been created yet?
-
How can I quickly parse a huge 45MB JSON file using JsonDecoder
Maybe you need to try some other third party json library and see if it helps. This is a good list https://github.com/miloyip/nativejson-benchmark
-
Why is Mastodon so slow?
Glancing at some benchmarks, RapidJSON stringifies at around 250MB/s on a single core (content-dependent, of course). Does not look like a bottleneck.
-
Show HN: DAW JSON Link
How does it compare to the immensely popular JSON for Modern C++ library by nlohmann? https://github.com/nlohmann/json
Also, you should add your library to the JSON benchmarks here: https://github.com/miloyip/nativejson-benchmark#parsing-time
-
Debunking Cloudflare’s recent performance tests
I like your ideas, but they seem difficult to enforce. It assumes good faith on all sides. One of the biggest complaints about AI/ML research results: It is frequently hard/impossible to replicate the results.
One idea: The edge competitors can create a public (SourceHut?) project that runs various daily tests against themselves. This would similar to JSON library benchmarks. [1] Then allow each competitors to continuously tweak there settings to accomplish the task in the shortest amount of time.
Also: It would be nice to see a cost analysis. For years, IBM's DB2 was insanely fast if you could afford to pay outrageous hardware, software license, and consulting costs. I'm not in the edge business, but I guess there are some operators where you can just pay a lot more and get better performance -- if you really need it.
[1] https://github.com/miloyip/nativejson-benchmark
-
How can I parse JSON with C?
There's some useful benchmarks here. I found it while looking for stats on json-c vs parson, which I've used a fair amount.
-
UniValue JSON Library for C++17 (and above)
If you looking for benchmarks to show in which cases your library is better than other 30 or so competitors, then see this repo https://github.com/miloyip/nativejson-benchmark
-
Rocket is a parsing framework for parsing using efficient parsing algorithms
JSON data files from this project: https://github.com/miloyip/nativejson-benchmark
-
How I cut GTA Online loading times by 70%
Such a shame, really. There is a ton fast json parsers there, like https://github.com/miloyip/nativejson-benchmark#parsing-time. And second issue is just hilarious: let's scan array millions of times, who needs hashmaps anyway?
simdutf
-
Decoding UTF8 with Parallel Extract
IIRC all of the simdutf implementations use a lot of lookup tables except for the AVX512 and RVV backens.
Here is e.g. the NEON code: https://github.com/simdutf/simdutf/blob/1b8ca3d1072a8e2e1026...
- Glibc Buffer Overflow in Iconv
-
Vectorizing Unicode conversions on real RISC-V hardware
The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
-
Cray-1 performance vs. modern CPUs
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
# codegen for equivalent of that function
-
What C++ library do you wish existed but hasn’t been created yet?
utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
-
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
-
What's everyone working on this week (10/2023)?
The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
-
Why would a language not natively support SIMD?
You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
- High speed Unicode routines using SIMD
-
text-2.0-rc1 with UTF8 underlying representation is available for testing!
Or via an ultrafast simdutf.
What are some alternatives?
json-c - https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/
simdutf8 - SIMD-accelerated UTF-8 validation for Rust.
Jansson - C library for encoding, decoding and manipulating JSON data
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
EA Standard Template Library - EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
univalue - An easy-to-use and competitively fast JSON parsing library for C++17, forked from Bitcoin Cash Node's own UniValue library.
eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr
text - What a c++ standard Unicode library might look like.
Vc - SIMD Vector Classes for C++
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks