SaaSHub helps you find the best software and product alternatives Learn more →
Top 11 Python data-infrastructure Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
mbtiles-s3-server
Python server to on-the-fly extract and serve vector tiles from an mbtiles file on S3
-
streampq
Python PostgreSQL adapter to stream results of multi-statement queries without a server-side cursor
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
iterable-subprocess
Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed
-
stream-write-ods
Python function to construct an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
-
stream-read-ods
Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
Project mention: Why PostgreSQL High Availability Matters and How to Achieve It | news.ycombinator.com | 2023-06-14one of the solutions which made it pretty simple for us to run postgresql in a ha environment (mostly in k8s, but works standalone as well) is zalandos patroni: https://github.com/zalando/patroni it's really solid and worked for us for a few years already.
or for k8s their operator: https://github.com/zalando/postgres-operator (docker image: https://github.com/zalando/spilo) we've also tried other operators which were easier to get started, but they failed miserably (crunchyrolls operator is basically based on the zalando one)
Project mention: Show HN: stream-unzip – now with an async interface | news.ycombinator.com | 2024-03-23
Project mention: Show HN: Data monitoring and profiling with 1 function call | news.ycombinator.com | 2023-12-13
My own attempt at bridging the Python and, well, maybe not quite shell but more subprocess, boundary: https://github.com/uktrade/iterable-subprocess.
Shameless plug of a couple of Python libraries I’ve been involved with that work around memory issues of ODS files (for very specific use cases):
https://github.com/uktrade/stream-read-ods
Python data-infrastructure related posts
- Show HN: stream-unzip – now with an async interface
- Show HN: stream-zip – now with async support
- Show HN: Data monitoring and profiling with 1 function call
- What If OpenDocument Used SQLite?
- Why PostgreSQL High Availability Matters and How to Achieve It
- Show HN: Open-source infra for building embedded data pipelines
- Python – Writing large ZIP archives without memory inflation
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 Apr 2024
Index
What are some of the best open-source data-infrastructure projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | spilo | 1,307 |
2 | stream-unzip | 250 |
3 | mbtiles-s3-server | 135 |
4 | stream-zip | 85 |
5 | stream-sqlite | 23 |
6 | panda_patrol | 21 |
7 | streampq | 8 |
8 | iterable-subprocess | 7 |
9 | stream-write-ods | 3 |
10 | mirror-git-to-s3 | 2 |
11 | stream-read-ods | 1 |
Sponsored