The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 17 data-infrastructure Open-Source Projects
-
postgres-operator
Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service. (by CrunchyData)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
kubeblocks
KubeBlocks is an open-source control plane that runs and manages databases, message queues and other data infrastructure on K8s.
-
Nakadi
A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
mbtiles-s3-server
Python server to on-the-fly extract and serve vector tiles from an mbtiles file on S3
-
streampq
Python PostgreSQL adapter to stream results of multi-statement queries without a server-side cursor
-
iterable-subprocess
Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed
-
stream-write-ods
Python function to construct an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
-
stream-read-ods
Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
yes, precisely. It's UI part that's broken, which cannot list snapshots. Issue is here, no fix since 2020, sadly: https://github.com/zalando/postgres-operator/issues/937
Project mention: No disk space crashloop but pod healthy · Issue #3788 · CrunchyData/postgres-operator | /r/Health2020 | 2023-12-09
Project mention: Open source db platform for vector db engines, qdrant, milvus, weaviate all in 1 | news.ycombinator.com | 2024-01-02
Project mention: Why PostgreSQL High Availability Matters and How to Achieve It | news.ycombinator.com | 2023-06-14one of the solutions which made it pretty simple for us to run postgresql in a ha environment (mostly in k8s, but works standalone as well) is zalandos patroni: https://github.com/zalando/patroni it's really solid and worked for us for a few years already.
or for k8s their operator: https://github.com/zalando/postgres-operator (docker image: https://github.com/zalando/spilo) we've also tried other operators which were easier to get started, but they failed miserably (crunchyrolls operator is basically based on the zalando one)
Project mention: Show HN: stream-unzip – now with an async interface | news.ycombinator.com | 2024-03-23
Project mention: Show HN: Data monitoring and profiling with 1 function call | news.ycombinator.com | 2023-12-13
My own attempt at bridging the Python and, well, maybe not quite shell but more subprocess, boundary: https://github.com/uktrade/iterable-subprocess.
Shameless plug of a couple of Python libraries I’ve been involved with that work around memory issues of ODS files (for very specific use cases):
https://github.com/uktrade/stream-read-ods
data-infrastructure related posts
- Show HN: stream-unzip – now with an async interface
- Show HN: stream-zip – now with async support
- Show HN: Data monitoring and profiling with 1 function call
- What If OpenDocument Used SQLite?
- Why PostgreSQL High Availability Matters and How to Achieve It
- Every SaaS company will offer native data pipelines like Stripe
- Open source infrastructure for securely sharing data with customers
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Index
What are some of the best open-source data-infrastructure projects? This list will help you:
Project | Stars | |
---|---|---|
1 | postgres-operator | 3,961 |
2 | postgres-operator | 3,719 |
3 | kubeblocks | 1,633 |
4 | tensorbase | 1,423 |
5 | spilo | 1,307 |
6 | Nakadi | 948 |
7 | stream-unzip | 250 |
8 | pipebird | 167 |
9 | mbtiles-s3-server | 135 |
10 | stream-zip | 85 |
11 | stream-sqlite | 23 |
12 | panda_patrol | 21 |
13 | streampq | 8 |
14 | iterable-subprocess | 7 |
15 | stream-write-ods | 3 |
16 | mirror-git-to-s3 | 2 |
17 | stream-read-ods | 1 |
Sponsored