Python data-warehouse

Open-source Python projects categorized as data-warehouse

Top 7 Python data-warehouse Projects

data-warehouse
  1. PostHog

    🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.

    Project mention: Open source Google Analytics replacement | news.ycombinator.com | 2025-05-07

    Posthog is pretty good but very pushy towards using their SaaS (understandably). Self hosting is not really advertised on their main site however is buried in their gh repo as a footnote [1] with indications of vague issues past 100K events/month. Haven’t delved into how to scale it past that though and they do provide some docs that I have yet to review.

    Also the primary repo is not FOSS, and that "100% FOSS" repo is buried in yet another footnote [2].

    Plausible follows in PH footsteps but is not fully faithful to open source. If you want to self host, you won’t have same set of features as their SaaS and need to rely on long term releases for their "community edition" [3]

    On "Ahrefs", is there even an open source version of their product? I couldn’t easily find it (on mobile). [4]

    Maybe I’ll take a look at others you mentioned later but if rybbit can remain faithful to their FOSS roots then I think there’s a real chance of it becoming huge.

    For thosw that don’t want to self host (mostly corporate shitholes), rybbit can milk them with their managed SaaS product.

    [1] https://github.com/PostHog/posthog?tab=readme-ov-file#self-h...

    [2] https://github.com/PostHog/posthog?tab=readme-ov-file#open-s...

    [3] https://github.com/plausible/analytics?tab=readme-ov-file#ca...

    [4] https://ahrefs.com/

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

    Project mention: Data Loading Tool | news.ycombinator.com | 2024-12-14
  4. Udacity-Data-Engineering-Projects

    Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

  5. Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  6. versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  7. pgwarehouse

    Easily sync your Postgres database to a Snowflake, ClickHouse, or DuckDB warehouse.

  8. datarepo

    Project mention: Show HN: Datarepo – a data catalog that doesn't need a service or database | news.ycombinator.com | 2025-07-08
  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-warehouse discussion

Log in or Post with

Python data-warehouse related posts

  • Show HN: Datarepo – a data catalog that doesn't need a service or database

    1 project | news.ycombinator.com | 8 Jul 2025
  • Neuralink Open Sources Data Catalog for Multimodal Data

    1 project | news.ycombinator.com | 24 Jun 2025
  • DXY-COVID-19-Data: NEW Data - star count:2218.0

    1 project | /r/algoprojects | 17 Oct 2023
  • How Query Engines Work

    2 projects | news.ycombinator.com | 8 Sep 2023
  • DXY-COVID-19-Data: NEW Data - star count:2242.0

    1 project | /r/algoprojects | 20 May 2023
  • DXY-COVID-19-Data: NEW Data - star count:2242.0

    1 project | /r/algoprojects | 19 May 2023
  • DXY-COVID-19-Data: NEW Data - star count:2242.0

    1 project | /r/algoprojects | 18 May 2023
  • A note from our sponsor - Stream
    getstream.io | 15 Jul 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source data-warehouse projects in Python? This list will help you:

# Project Stars
1 PostHog 27,709
2 dlt 3,864
3 Udacity-Data-Engineering-Projects 1,618
4 Cubes 1,484
5 versatile-data-kit 451
6 pgwarehouse 84
7 datarepo 79

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?