Our great sponsors
-
py-tsbs-benchmark
Benchmark ingestion of the TSBS "dev ops" dataset into QuestDB via ILP using the `questdb` Python library and Pandas.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Hi, I'm the original author of the QuestDB Python client library and benchmark.
It all started when we had one of our users needing to insert quite a bit of data into our database quickly from Pandas. They had a dataframe that took 25 minutes to serialize row-by-row iterating through the dataframe. The culprit was .iterrows(). Now it's a handful of seconds.
This took a few iterations: At first I thought this could all be handled by Python buffer protocol, but that turned out to create a whole bunch of copies, so for a number of dtypes the code now uses Arrow when it's zero-copy.
The main code is in Cython (and the fact that one can inspect the generated C is pretty neat) with supporting code in Rust. The main serialization logic is in Rust and it's in a separate repo: https://github.com/questdb/c-questdb-client/tree/main/questd....
Related posts
- Inserting 1.1M rows/s from Pandas into QuestDB with Arrow, Rust & Cython
- Inserting 1.1M rows/s from Pandas into QuestDB with Arrow, Rust & Cython
- Inserting 1.1M rows/s from Pandas into QuestDB with Arrow, Rust & Cython
- Inserting 1.1M rows/s from Pandas into QuestDB with Arrow, Rust & Cython
- Inserting 1.1M rows/s from Pandas into QuestDB with Arrow, Rust & Cython