Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →
Top 7 Python data-warehouse Projects
-
PostHog
🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
Posthog is pretty good but very pushy towards using their SaaS (understandably). Self hosting is not really advertised on their main site however is buried in their gh repo as a footnote [1] with indications of vague issues past 100K events/month. Haven’t delved into how to scale it past that though and they do provide some docs that I have yet to review.
Also the primary repo is not FOSS, and that "100% FOSS" repo is buried in yet another footnote [2].
Plausible follows in PH footsteps but is not fully faithful to open source. If you want to self host, you won’t have same set of features as their SaaS and need to rely on long term releases for their "community edition" [3]
On "Ahrefs", is there even an open source version of their product? I couldn’t easily find it (on mobile). [4]
Maybe I’ll take a look at others you mentioned later but if rybbit can remain faithful to their FOSS roots then I think there’s a real chance of it becoming huge.
For thosw that don’t want to self host (mostly corporate shitholes), rybbit can milk them with their managed SaaS product.
[1] https://github.com/PostHog/posthog?tab=readme-ov-file#self-h...
[2] https://github.com/PostHog/posthog?tab=readme-ov-file#open-s...
[3] https://github.com/plausible/analytics?tab=readme-ov-file#ca...
[4] https://ahrefs.com/
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
-
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
-
-
-
-
Project mention: Show HN: Datarepo – a data catalog that doesn't need a service or database | news.ycombinator.com | 2025-07-08
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
Python data-warehouse discussion
Python data-warehouse related posts
-
Show HN: Datarepo – a data catalog that doesn't need a service or database
-
Neuralink Open Sources Data Catalog for Multimodal Data
-
DXY-COVID-19-Data: NEW Data - star count:2218.0
-
How Query Engines Work
-
DXY-COVID-19-Data: NEW Data - star count:2242.0
-
DXY-COVID-19-Data: NEW Data - star count:2242.0
-
DXY-COVID-19-Data: NEW Data - star count:2242.0
-
A note from our sponsor - Stream
getstream.io | 15 Jul 2025
Index
What are some of the best open-source data-warehouse projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | PostHog | 27,709 |
2 | dlt | 3,864 |
3 | Udacity-Data-Engineering-Projects | 1,618 |
4 | Cubes | 1,484 |
5 | versatile-data-kit | 451 |
6 | pgwarehouse | 84 |
7 | datarepo | 79 |