opteryx
datafusion-python
opteryx | datafusion-python | |
---|---|---|
1 | 2 | |
43 | 296 | |
- | 5.7% | |
9.8 | 8.4 | |
6 days ago | 3 days ago | |
Python | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
opteryx
-
Pure Python Distributed SQL Engine
Thanks for sharing.
I have a SQL Engine in Python too (https://github.com/mabel-dev/opteryx). I focused my initial effort on supporting SQL statements and making the usage feel like a database - that probably reflects the problem I had in front of me when I set out - only handling handfuls of gigabytes in a batch environment for ETLs with a group of new-to-data-engineering engineers. Have recently started looking more at real-time performance, such as distributing work. Am interesting in how you've approached.
datafusion-python
-
Pure Python Distributed SQL Engine
hmm I wasn't aware of https://github.com/apache/arrow-datafusion-python... thanks for the pointer.
time series target release by April this year. main challenge is supporting them in the SQL API -- execution engine support is already done
What are some alternatives?
quokka - Making data lake work for time series
sqlglot - Python SQL Parser and Transpiler
nomad - Deprecated and re-branded as Alto
pg8000 - A Pure-Python PostgreSQL Driver
influxdb3-python - Python module that provides a simple and convenient way to interact with InfluxDB 3.0.
datafusion-ballista - Apache Arrow Ballista Distributed Query Engine
sqlparser-rs - Extensible SQL Lexer and Parser for Rust
emr-serverless-samples - Example code for running Spark and Hive jobs on EMR Serverless.