Pure Python Distributed SQL Engine

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

datafusion-python

2 285 8.5 Rust

Apache Arrow DataFusion Python Bindings

Interesting, I was wondering if you considered building on top of https://github.com/apache/arrow-datafusion-python
I really do think a distributed db with compute/storage separation and optimized for feature engineering/dataloading (for training NNs) is underserved.
I'd be very interested in the time series aspects of what you're building.

quokka

23 1,081 8.3 Python

Making data lake work for time series (by marsupialtail)
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
arrow-ballista

12 1,259 8.4 Rust

Apache Arrow Ballista Distributed Query Engine

Can you explain how this might differ from something like https://github.com/apache/arrow-ballista
I've seen several variants of "next-gen" spark, but nowhere have I really seen the different tradeoffs/advantages/disadvantages between them.

sqlglot

55 5,441 9.9 Python

Python SQL Parser and Transpiler
pg8000

5 461 7.1 Python

A Pure-Python PostgreSQL Driver

When people say "pure X", to me, it normally means they didn't involve an FFI or external compiler. This is an often beneficial thing since it simplifies your build process.
For example, here [0] is a "pure Python postgres driver" and the implication is that it doesn't use libpg.
Or see also this discussion [1].
[0] https://github.com/tlocke/pg8000
[1] https://www.reddit.com/r/learnpython/comments/nktut1/eli5_th...

polars

144 26,043 10.0 Rust

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Yes, we have basic support.
Here are some examples of how to use it in python:
https://github.com/pola-rs/polars/blob/91a419acaf024e64410e7...
However, full sql support is on the roadmap. It's just a matter of hours in a day...

opteryx

1 43 9.8 Python

🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

Thanks for sharing.
I have a SQL Engine in Python too (https://github.com/mabel-dev/opteryx). I focused my initial effort on supporting SQL statements and making the usage feel like a database - that probably reflects the problem I had in front of me when I set out - only handling handfuls of gigabytes in a batch environment for ETLs with a group of new-to-data-engineering engineers. Have recently started looking more at real-time performance, such as distributing work. Am interesting in how you've approached.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sqlparser-rs

12 2,422 9.2 Rust

Extensible SQL Lexer and Parser for Rust

It uses https://github.com/sqlparser-rs/sqlparser-rs as the parser and lexer. The binder, planner, optimizer and executor are in Python. The optimizer stage only works on the logical plan and the rules are heuristic only.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project