data-warehouse

Open-source projects categorized as data-warehouse

Top 23 data-warehouse Open-Source Projects

  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • Greenplum

    Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.

  • Project mention: Ask HN: It's 2023, how do you choose between MySQL and Postgres? | news.ycombinator.com | 2023-05-11

    Friends don't let their friends choose Mysql :)

    A super long time ago (decades) when I was using Oracle regularly I had to make a decision on which way to go. Although Mysql then had the mindshare I thought that Postgres was more similar to Oracle, more standards compliant, and more of a real enterprise type of DB. The rumor was also that Postgres was heavier than MySQL. Too many horror stories of lost data (MyIsam), bad transactions (MyIsam lacks transaction integrity), and the number of Mysql gotchas being a really long list influenced me.

    In time I actually found out that I had underestimated one of the most important attributes of Postgres that was a huge strength over Mysql: the power of community. Because Postgres has a really superb community that can be found on Libera Chat and elsewhere, and they are very willing to help out, I think Postgres has a huge advantage over Mysql. RhodiumToad [Andrew Gierth] https://github.com/RhodiumToad & davidfetter [David Fetter] https://www.linkedin.com/in/davidfetter are incredibly helpful folks.

    I don't know that Postgres' licensing made a huge difference or not but my perception is that there are a ton of 3rd party products based on Postgres but customized to specific DB needs because of the more liberalness of the PG license which is MIT/BSD derived https://www.postgresql.org/about/licence/

    Some of the PG based 3rd party DBs:

    Enterprise DB https://www.enterprisedb.com/ - general purpose PG with some variants

    Greenplum https://greenplum.org/ - Data warehousing

    Crunchydata https://www.crunchydata.com/products/hardened-postgres - high security Postgres for regulated environments

    Citus https://www.citusdata.com - Distributed DB & Columnar

    Timescale https://www.timescale.com/

    Why Choose PG today?

    If you want better ACID: Postgres

    If you want more compliant SQL: Postgres

    If you want more customizability to a variety of use-cases: Postgres using a variant

    If you want the flexibility of using NOSQL at times: Postgres

    If you want more product knowledge reusability for other backend products: Postgres

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • materialize

    The data warehouse for operational workloads. (by MaterializeInc)

  • Project mention: Ask HN: How Can I Make My Front End React to Database Changes in Real-Time? | news.ycombinator.com | 2024-04-17

    [2] https://materialize.com/

  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

  • Project mention: Rudderstack Switches to Elastic License | news.ycombinator.com | 2023-09-08
  • hydra

    Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes. (by hydradatabase)

  • Project mention: Using ClickHouse to scale an events engine | news.ycombinator.com | 2024-04-11

    Don't feel bad, lots of people get bitten by not reading all the way down to the bottom of their readme: https://github.com/hydradatabase/hydra/blob/v1.1.2/README.md... While Hydra may very well license their own code Apache 2, they ship the AGPLv3 columnar which to my very best IANAL understanding taints the whole stack and AGPLv3's everything all the way through https://github.com/hydradatabase/hydra/blob/v1.1.2/columnar/...

  • DXY-COVID-19-Data

    2019新型冠状病毒疫情时间序列数据仓库 | COVID-19/2019-nCoV Infection Time Series Data Warehouse

  • Project mention: DXY-COVID-19-Data: NEW Data - star count:2218.0 | /r/algoprojects | 2023-10-17
  • elementary

    The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

  • Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03

    SEEKING FREELANCER | REMOTE | GERMANY

    dltHub is looking for a freelance help in the following repos:

    - https://github.com/dlt-hub/dlt

  • Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  • tensorbase

    TensorBase is a new big data warehousing with modern efforts.

  • Udacity-Data-Engineering-Projects

    Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • bigquery-utils

    Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.

  • Project mention: Swirl: An open-source search engine with LLMs and ChatGPT to provide all the answers you need 🌌 | dev.to | 2023-09-06

    Using the Galaxy UI, knowledge workers can systematically review the best results from all configured services including Apache Solr, ChatGPT, Elastic, OpenSearch, PostgreSQL, Google BigQuery, plus generic HTTP/GET/POST with configurations for premium services like Google's Programmable Search Engine, Miro and Northern Light Research.

  • scratchdata

    Scratch is a swiss army knife for big data.

  • Project mention: Debugging a Golang Bug with Non-Blocking Reads | news.ycombinator.com | 2024-03-12

    Go team does acknowledge [1] it as a bug, so there is some point here

    However, that said, I wonder if OP (duckdb) could have written their solution [2] differently. Shouldn't they be able to select from a Pipe as well as Error channel simultaneously? (similar to how they are doing it inside here [3]). If not, I would have create a go-routine that does blocking read on the Pipe and then pass it on to another channel to select on.

    [1] https://github.com/golang/go/issues/66239

    [2] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...

    [3] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...

  • optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)

  • Project mention: Data Engineering Tools in Go | /r/dataengineering | 2023-05-18

    You can check odpf github, they created some dataops tools using go, one of the example is optimus (https://github.com/odpf/optimus) which is a data pipeline orchestrator

  • Data-Engineering-Projects

    Personal Data Engineering Projects

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • multiwoven

    🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to Hightouch, Census, and RudderStack.

  • Project mention: Multiwoven Reverse ETL (0.2.0) – Open-Source Alternative to Hightouch and Census | news.ycombinator.com | 2024-04-19

    Multiwoven is now a leading Open Source Alternative to Hightouch, Census, and Rudderstack.

    It's been a great journey so far, and we are excited to announce a major update to Multiwoven - our new release, Multiwoven 0.2.0, is now available!

    Repo: https://github.com/Multiwoven/multiwoven

    This release brings a host of new features, enhancements, and bug fixes to streamline data syncs and user experience.

    From new connectors to advanced reporting dashboards, as a team, we have been working hard on these updates based on the feedback and requests from our customers and the community.

    - 10+ new connectors added to Multiwoven, including

  • vulcan-sql

    Data API Framework for AI Agents and Data Apps

  • Project mention: Shout out to Appsmith developers to check out this new tool! | /r/lowcode | 2023-07-09

    I am one of the members of an open-source project VulcanSQL, a Data API Framework for data applications that helps data folks create and share data APIs faster.

  • DomainMOD

    DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location. DomainMOD also includes a Data Warehouse framework that allows you to import your web server data so that you can view, export, and report on your live data.

  • Project mention: Self-hosted nameserver for Domain management | /r/selfhosted | 2023-05-29

    DomainMOD - Application to manage your domains and other internet assets in a central location. DomainMOD includes a Data Warehouse framework that allows you to import your WHM/cPanel web server data so that you can view, export, and report on your data.

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

    Here's the project: https://github.com/vmware/versatile-data-kit

  • space

    Unified storage framework for the entire machine learning lifecycle (by google)

  • Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
  • data-engineering-project-template

    This is a template you can use for your next data engineering portfolio project.

  • beneath

    Beneath is a serverless real-time data platform ⚡️

  • pgwarehouse

    Easily sync your Postgres database to a Snowflake, ClickHouse, or DuckDB warehouse.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-warehouse related posts

Index

What are some of the best open-source data-warehouse projects? This list will help you:

Project Stars
1 awesome-bigdata 12,792
2 Greenplum 6,198
3 materialize 5,567
4 Rudderstack 3,926
5 hydra 2,607
6 DXY-COVID-19-Data 2,181
7 elementary 1,736
8 dlt 1,694
9 Cubes 1,490
10 tensorbase 1,423
11 Udacity-Data-Engineering-Projects 1,295
12 bigquery-utils 1,028
13 scratchdata 1,027
14 optimus 737
15 Data-Engineering-Projects 637
16 multiwoven 617
17 vulcan-sql 592
18 DomainMOD 443
19 versatile-data-kit 410
20 space 135
21 data-engineering-project-template 112
22 beneath 78
23 pgwarehouse 58

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com