The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 data-warehouse Open-Source Projects
-
Greenplum
Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
hydra
Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes. (by hydradatabase)
-
elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
-
bigquery-utils
Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
-
optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)
-
multiwoven
🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to Hightouch, Census, and RudderStack.
-
DomainMOD
DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location. DomainMOD also includes a Data Warehouse framework that allows you to import your web server data so that you can view, export, and report on your live data.
-
data-engineering-project-template
This is a template you can use for your next data engineering portfolio project.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Ask HN: It's 2023, how do you choose between MySQL and Postgres? | news.ycombinator.com | 2023-05-11Friends don't let their friends choose Mysql :)
A super long time ago (decades) when I was using Oracle regularly I had to make a decision on which way to go. Although Mysql then had the mindshare I thought that Postgres was more similar to Oracle, more standards compliant, and more of a real enterprise type of DB. The rumor was also that Postgres was heavier than MySQL. Too many horror stories of lost data (MyIsam), bad transactions (MyIsam lacks transaction integrity), and the number of Mysql gotchas being a really long list influenced me.
In time I actually found out that I had underestimated one of the most important attributes of Postgres that was a huge strength over Mysql: the power of community. Because Postgres has a really superb community that can be found on Libera Chat and elsewhere, and they are very willing to help out, I think Postgres has a huge advantage over Mysql. RhodiumToad [Andrew Gierth] https://github.com/RhodiumToad & davidfetter [David Fetter] https://www.linkedin.com/in/davidfetter are incredibly helpful folks.
I don't know that Postgres' licensing made a huge difference or not but my perception is that there are a ton of 3rd party products based on Postgres but customized to specific DB needs because of the more liberalness of the PG license which is MIT/BSD derived https://www.postgresql.org/about/licence/
Some of the PG based 3rd party DBs:
Enterprise DB https://www.enterprisedb.com/ - general purpose PG with some variants
Greenplum https://greenplum.org/ - Data warehousing
Crunchydata https://www.crunchydata.com/products/hardened-postgres - high security Postgres for regulated environments
Citus https://www.citusdata.com - Distributed DB & Columnar
Timescale https://www.timescale.com/
Why Choose PG today?
If you want better ACID: Postgres
If you want more compliant SQL: Postgres
If you want more customizability to a variety of use-cases: Postgres using a variant
If you want the flexibility of using NOSQL at times: Postgres
If you want more product knowledge reusability for other backend products: Postgres
Project mention: Ask HN: How Can I Make My Front End React to Database Changes in Real-Time? | news.ycombinator.com | 2024-04-17[2] https://materialize.com/
Don't feel bad, lots of people get bitten by not reading all the way down to the bottom of their readme: https://github.com/hydradatabase/hydra/blob/v1.1.2/README.md... While Hydra may very well license their own code Apache 2, they ship the AGPLv3 columnar which to my very best IANAL understanding taints the whole stack and AGPLv3's everything all the way through https://github.com/hydradatabase/hydra/blob/v1.1.2/columnar/...
Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03SEEKING FREELANCER | REMOTE | GERMANY
dltHub is looking for a freelance help in the following repos:
- https://github.com/dlt-hub/dlt
Project mention: Swirl: An open-source search engine with LLMs and ChatGPT to provide all the answers you need 🌌 | dev.to | 2023-09-06Using the Galaxy UI, knowledge workers can systematically review the best results from all configured services including Apache Solr, ChatGPT, Elastic, OpenSearch, PostgreSQL, Google BigQuery, plus generic HTTP/GET/POST with configurations for premium services like Google's Programmable Search Engine, Miro and Northern Light Research.
Go team does acknowledge [1] it as a bug, so there is some point here
However, that said, I wonder if OP (duckdb) could have written their solution [2] differently. Shouldn't they be able to select from a Pipe as well as Error channel simultaneously? (similar to how they are doing it inside here [3]). If not, I would have create a go-routine that does blocking read on the Pipe and then pass it on to another channel to select on.
[1] https://github.com/golang/go/issues/66239
[2] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...
[3] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...
You can check odpf github, they created some dataops tools using go, one of the example is optimus (https://github.com/odpf/optimus) which is a data pipeline orchestrator
Project mention: Multiwoven Reverse ETL (0.2.0) – Open-Source Alternative to Hightouch and Census | news.ycombinator.com | 2024-04-19Multiwoven is now a leading Open Source Alternative to Hightouch, Census, and Rudderstack.
It's been a great journey so far, and we are excited to announce a major update to Multiwoven - our new release, Multiwoven 0.2.0, is now available!
Repo: https://github.com/Multiwoven/multiwoven
This release brings a host of new features, enhancements, and bug fixes to streamline data syncs and user experience.
From new connectors to advanced reporting dashboards, as a team, we have been working hard on these updates based on the feedback and requests from our customers and the community.
- 10+ new connectors added to Multiwoven, including
Project mention: Shout out to Appsmith developers to check out this new tool! | /r/lowcode | 2023-07-09I am one of the members of an open-source project VulcanSQL, a Data API Framework for data applications that helps data folks create and share data APIs faster.
DomainMOD - Application to manage your domains and other internet assets in a central location. DomainMOD includes a Data Warehouse framework that allows you to import your WHM/cPanel web server data so that you can view, export, and report on your data.
Here's the project: https://github.com/vmware/versatile-data-kit
Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
data-warehouse related posts
- Using ClickHouse to scale an events engine
- Debugging a Golang Bug with Non-Blocking Reads
- Moving a Billion Postgres Rows on a $100 Budget
- Hydra (YC W22) adds upsert to columnar Postgres
- Show HN: ScratchDB – Open-Source Snowflake on ClickHouse
- Hydra
- Show HN: ScratchDB – Open-Source Snowflake on ClickHouse
-
A note from our sponsor - WorkOS
workos.com | 24 Apr 2024
Index
What are some of the best open-source data-warehouse projects? This list will help you:
Project | Stars | |
---|---|---|
1 | awesome-bigdata | 12,792 |
2 | Greenplum | 6,198 |
3 | materialize | 5,567 |
4 | Rudderstack | 3,926 |
5 | hydra | 2,607 |
6 | DXY-COVID-19-Data | 2,181 |
7 | elementary | 1,736 |
8 | dlt | 1,694 |
9 | Cubes | 1,490 |
10 | tensorbase | 1,423 |
11 | Udacity-Data-Engineering-Projects | 1,295 |
12 | bigquery-utils | 1,028 |
13 | scratchdata | 1,027 |
14 | optimus | 737 |
15 | Data-Engineering-Projects | 637 |
16 | multiwoven | 617 |
17 | vulcan-sql | 592 |
18 | DomainMOD | 443 |
19 | versatile-data-kit | 410 |
20 | space | 135 |
21 | data-engineering-project-template | 112 |
22 | beneath | 78 |
23 | pgwarehouse | 58 |
Sponsored