Udacity-Data-Engineering-Projects
hydra
Our great sponsors
Udacity-Data-Engineering-Projects | hydra | |
---|---|---|
5 | 26 | |
1,295 | 2,620 | |
- | 5.7% | |
0.0 | 8.5 | |
over 1 year ago | 8 days ago | |
Python | C | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Udacity-Data-Engineering-Projects
- Pitanje za data engineering?
-
✨ 5 Free Resources to Learn Data Engineering 🚀
🔗 https://github.com/san089/Udacity-Data-Engineering-Projects
-
How can I become a big data engineer?
You can start with googling data engineering learning path to get a sense of what you need to know. If you are looking for simple projects to start with then you can look at this as well (https://github.com/san089/Udacity-Data-Engineering-Projects).
-
Beginner DE projects.
For practice, Data Modeling with Postgres and Udacity Data Engineering Projects as examples, and Data Engineering Project for Beginners - Batch edition for a guided tutorial.
- Data Pipeline Examples in Action
hydra
-
Using ClickHouse to scale an events engine
Don't feel bad, lots of people get bitten by not reading all the way down to the bottom of their readme: https://github.com/hydradatabase/hydra/blob/v1.1.2/README.md... While Hydra may very well license their own code Apache 2, they ship the AGPLv3 columnar which to my very best IANAL understanding taints the whole stack and AGPLv3's everything all the way through https://github.com/hydradatabase/hydra/blob/v1.1.2/columnar/...
-
Moving a Billion Postgres Rows on a $100 Budget
Columnar store PostgreSQL extension exists, here are two but I think I’m missing at least another one:
https://github.com/citusdata/cstore_fdw
https://github.com/hydradatabase/hydra
You can also connect other stores using the foreign data wrappers, like parquet files stored on an object store, duckdb, clickhouse… though the joins aren’t optimised as PostgreSQL would do full scan on the external table when joining.
- Hydra (YC W22) adds upsert to columnar Postgres
- Hydra
-
Is ClickHouse Moving Away from Open Source?
New column store alternative : https://github.com/hydradatabase/hydra
HN: https://news.ycombinator.com/item?id=37571974
-
Show HN: Hydra - Open-Source Columnar Postgres
some previous discussions:
https://news.ycombinator.com/item?id=37247945
https://news.ycombinator.com/item?id=36987920
and a relevant observation is that there are actually multiple license files in the repo so the consumer should read their explicit licensing section of the readme <https://github.com/hydradatabase/hydra#license> since the GitHub sidebar is misleading
-
CDC from postgres to postgres.
Hydra DB Link to Github -> Worked well for aggregated query usecases but not for queries that build reports. Also, data insertion and updation is abyssmal on columnar dbs.
-
How Query Engines Work
There's a lot of experience about db operation and how to approach MVCC encoded in PostgreSQL that shouldn't be underestimated.
[0]: https://github.com/hydradatabase/hydra
-
Hydra: Column-Oriented Postgres
And just like last time, watch out for the misleading GitHub license detector because it's not entirely Apache as the GitHub summary claims but rather *some* is Apache and buried in the interior is some AGPL stuff: https://github.com/hydradatabase/hydra#license
What are some alternatives?
data-engineering-zoomcamp - Free Data Engineering course!
duckdb - DuckDB is an in-process SQL OLAP Database Management System
data-engineering-book - Accumulated knowledge and experience in the field of Data Engineering
citus - Distributed PostgreSQL as an extension
ask-astro - An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
ClickHouse - ClickHouse® is a free analytics DBMS for big data
pg-counter-metrics - PG Counter Metrics ( PGCM ) is a tool for publishing PostgreSQL performance data to CloudWatch. By publishing to CloudWatch, dashboards and alarming can be used on the collected data.
postgres - PostgreSQL in Neon
canarypy - CanaryPy - A light and powerful canary release for Data Pipelines
vasco - vasco: MIC & MINE statistics for Postgres
Data-Engineering-Projects - Personal Data Engineering Projects
ClickBench - ClickBench: a Benchmark For Analytical Databases