Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ucall

13 982 6.9 C

Remote Procedure Calls - 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & 🔜 REST over io_uring and SIMDJSON ☎️

You are right! For the convenience of Python users, we have to introspect the messages and parse JSON into Python objects. Every member of every dictionary being allocated on heap.
To make it as fast as possible we don't use PyBind, NanoBind, SWIG, or any high-level tooling. Our Python bindings are a pure CPython integration. There is just no way to beat that combo, not that I know.
https://github.com/unum-cloud/ujrpc/blob/main/src/python.c

ustore

15 485 9.6 C++

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

Yes, we also constantly think about that! In the document collections of UKV, for example, we have interoperability between JSON, BSON, and MessagePack objects [1]. CSV is another potential option, but text-based formats aren't ideal for large scale transmissions.
One thing people do - use two protocols. That is the case with Apache Arrow Flight RPC = gRPC for tasks, Arrow for data. It is a viable path, but compiling gRPC is a nightmare, and we don't want to integrate it into our other libraries, as we generally compile everything from sources. Seemingly, UJRPC can replace gRPC, and for the payload we can continue using Arrow. We will see :)
[1]: https://github.com/unum-cloud/ukv/blob/main/src/modality_doc...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
simdjson

63 18,337 9.2 C++

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

I'm talking about with simdjson. Lemire suggested reading line-by-line is not a good idea [0]. So I'm asking about the ideal approach using simdjson, not JSON parsers in general.
[0] https://github.com/simdjson/simdjson/issues/188#issuecomment...

Apache Arrow

75 13,442 10.0 C++

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

If anything you'd probably want to send it in Arrow[1] format. CSV's don't even preserve data types.
[1]: https://arrow.apache.org/

yyjson

5 2,816 7.7 C

The fastest JSON library in C

How does yyjson[0] compare to simdjson? Their benchmarks suggest it could be a positive.
[0] https://github.com/ibireme/yyjson

japronto

3 8,624 0.0 C

Screaming-fast Python 3.5+ HTTP toolkit integrated with pipelining HTTP server based on uvloop and picohttpparser.

100x faster than FastAPI seems easy. I wonder how it compares to other fast Python libraries like Japronto[1] and non-Python ones too.
1 - https://github.com/squeaky-pl/japronto

data-analysis

6 44 7.3 Jupyter Notebook

Absolutely interested, on my end at least. I wrote this to manage the transparency in coverage files: https://github.com/dolthub/data-analysis/tree/main/transpare... but I'm always looking for better techniques.
Oh wow, I see you used it on those exact files. How about that.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
FrameworkBenchmarks

366 7,373 9.8 Java

Source for the TechEmpower Framework Benchmarks project

How large? Also I'm not sure the gRPC C++ server implementations you've tested are the fastest. If you're comparing to FastAPI (which is more of an HTTP server framework) then you should also compare to what is at the top of https://www.techempower.com/benchmarks/#section=data-r21.

is2

2 7 0.0 C++

embedded RESTy http(s) server library from Edgio

I used the rapidjson streams with my little embedded REST HTTP(s) server library: https://github.com/Edgio/is2/

Cap'n Proto

66 11,163 9.3 C++

Cap'n Proto serialization/RPC system - core tools and C++ library

Are there any synergies with capnproto [1] or is the focus here purely on huge payloads?
I'm just an interested hobbyist when it comes to performant RPC frameworks but had some fun benchmarking capnproto for a small gamedev-related project and it was pretty awesome.
[1] https://capnproto.org/

simdjson-go

6 1,753 4.0 Go

Golang port of simdjson: parsing gigabytes of JSON per second

Speaking of Go, there's a simdjson implementation for golang too:
> Performance wise, simdjson-go runs on average at about 40% to 60% of the speed of simdjson. Compared to Golang's standard package encoding/json, simdjson-go is about 10x faster.
I haven't tried it yet but I don't really need that speed.
https://github.com/minio/simdjson-go

jsplit

2 59 10.0 Go

A Go program to split large JSON files into many jsonl files

Regarding the hard way, this little utility does a great job of splitting larger than memory JSON documents into collections of NDJSON files:
https://github.com/dolthub/jsplit

json-buffet

2 0 3.0 C++

Ha! Thanks to you, Today I found out how big those uncompressed JSON files really are (the data wasn't accessible to me, so i shared the tool with my colleague and he was the one who ran the queries on his laptop): https://www.dolthub.com/blog/2022-09-02-a-trillion-prices/ .
And yep, it was more or less they way you did with ijson. I found ijson just a day after I finished the prototype. Rapidjson would probably be faster. Especially after enabling SIMD. But the indexing was a one time thing.
We have open sourced the codebase. Here's the link: https://github.com/multiversal-ventures/json-buffet . Since this was a quick and dirty prototype, comments were sparse. I have updated the Readme, and added a sample json-fetcher. Hope this is more useful for you.
Another unwritten TODO was to nudge the data providers towards a more streaming friendly compression formats - and then just create an index to fetch the data directly from their compressed archives. That would have saved everyone a LOT of $$$.

msgspec

31 1,839 8.9 Python

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

If you're primarily targeting Python as an application layer, you may also want to check out my msgspec library[1]. All the perf benefits of e.g. yyjson, but with schema validation like pydantic. It regularly benchmarks[2] as the fastest JSON library for Python. Much of the overhead of decoding JSON -> Python comes from the python layer, and msgspec employs every trick I know to minimize that overhead.
[1]: https://github.com/jcrist/msgspec
[2]: https://github.com/TkTech/json_benchmark

json_benchmark

2 19 3.7 Python

Python JSON benchmarking and correectness.

If you're primarily targeting Python as an application layer, you may also want to check out my msgspec library[1]. All the perf benefits of e.g. yyjson, but with schema validation like pydantic. It regularly benchmarks[2] as the fastest JSON library for Python. Much of the overhead of decoding JSON -> Python comes from the python layer, and msgspec employs every trick I know to minimize that overhead.
[1]: https://github.com/jcrist/msgspec
[2]: https://github.com/TkTech/json_benchmark

typedload

5 252 8.0 Python

Python library to load dynamically typed data into statically typed data structures

Author of typedload here!
FastAPI relies on (not so fast) pydantic, which is one of the slowest libraries in that category.
Don't expect to find such benchmarks on the pydantic documentation itself, but the competing libraries will have them.
[0] https://ltworf.github.io/typedload/

zsv

25 169 7.4 C

zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more

Parsing CSV doesn't have to be slow if you use something like xsv or zsv (https://github.com/liquidaty/zsv) (disclaimer: I'm an author). The speed of CSV parsers is fast enough that unless you are doing something ultra-trivial such as "count rows", your bottleneck will be elsewhere.
The benefits of CSV are:
- human readable
- does not need to be typed (sometimes, data in the raw such as date-formatted data is not amenable to typing without introducing a pre-processing layer that gets you further from the original data)
- accessible to anyone: you don't need to be a data person to dbl-click and open in Excel or similar
The main drawback is that if your data is already typed, CSV does not communicate what the type is. You can alleviate this through various approaches such as is described at https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql..., though I wouldn't disagree that if you can be assured that your starting data conforms to non-text data types, there are probably better formats than CSV.
The main benefit of Arrow, IMHO, is less as a format for transmitting / communicating but rather as a format for data at rest, that would benefit from having higher performance column-based read and compression

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Command-line data analytics made easy
6 projects | news.ycombinator.com | 3 Nov 2022
[package-find] lsp-bridge
5 projects | /r/emacs | 23 May 2022
How I cut GTA Online loading times by 70%
7 projects | /r/programming | 28 Feb 2021
Show HN: Mutuple – Replace items in Python's "immutable" tuples
2 projects | news.ycombinator.com | 10 Apr 2024
1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words
4 projects | news.ycombinator.com | 9 Mar 2024

Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
JSON Simd Python Arrow HTTP Server
Post date: 6 Mar 2023

ucall

ustore

WorkOS

simdjson

Apache Arrow

yyjson

japronto

data-analysis

InfluxDB

FrameworkBenchmarks

is2

Cap'n Proto

simdjson-go

jsplit

json-buffet

msgspec

json_benchmark

typedload

zsv

SaaSHub

Related posts

Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com JSON Simd Python Arrow HTTP Server Post date: 6 Mar 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
JSON Simd Python Arrow HTTP Server
Post date: 6 Mar 2023