hudi
missing-semester
Our great sponsors
hudi | missing-semester | |
---|---|---|
20 | 374 | |
5,038 | 4,679 | |
1.7% | 1.2% | |
9.9 | 6.8 | |
6 days ago | about 2 months ago | |
Java | CSS | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hudi
-
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
-
The "Big Three's" Data Storage Offerings
Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
Apache Hudi
- For those of you with Lakehouse Architectures, how do you handle duplicate records?
-
AWS ACID data lakehouse
Try Apache Hudi, it is fully integrated with AWS and offers almost everything that you requested.
-
Data n00b looking for guidance on how to setup data lake/warehouse
the corresponding kafka topics have 30d retention and I intend on having s3 sink connector for long term storage (open to other ideas here too, I noticed theres a hudi connector also)
- apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
- Big Data file formats
-
How-to-Guide: Contributing to Open Source
Apache Hudi
-
What do you use for Data versioning?
You could have a look at Apache Hudi - especially if you're running your Data Pipelines using Spark or Flink.
missing-semester
-
Please advise, still struggling intensely
You mentioned having issues with accessory concepts so perhaps this might help: https://missing.csail.mit.edu/. There's also a chapter on git
- Curso del IPN
-
CS2030S and CS2040S advice
https://missing.csail.mit.edu/ is a good way to pass the Dec-Jan break if you want to prep for CS2030S + some more general stuff.
-
I cancelled my Replit subscription
Reflecting a little bit more I don't think it was replit's fault, per-say. But that change should have been made together with a larger adjustment to the program. Like adding a class/unit in the style of [the missing semester](https://missing.csail.mit.edu/) to make sure people came away with a good range of intuitions.
-
Advice to a Novice Programmer
From MJD's post: I think CS curricula should have a class that focuses specifically on these issues, on the matter of how do you actually write software?
But they never do.
FWIW, MIT's "The Missing Semester of Your CS Education" attempts to deal with this lack, though, even there, it's an unofficial course taught between terms, during MIT's IAP -- Independent Activities Period[1] -- and not an actual CS course.
[0] https://missing.csail.mit.edu/
[1] https://en.wikipedia.org/wiki/Traditions_and_student_activit...
- School of SRE: Curriculum for onboarding non-traditional hires and new grads
-
Advice / Resources from a "Seasoned Beginner"
Link to the "missing semester of your CS degree" course by MIT.
-
MIT's Missing Semester Class: Beyond the CS Curriculum
Rightly called The Missing Semester (of Your CS Education), this class from MIT will teach you how to use some of the tools that are fundamental to the software engineering ecosystem. From shell scripting to the fundamentals of information security—spanning around 12 lectures—you can add a bunch of practical skills to your toolbox.
- ¿Recomendaciones sobre que aprender?
-
How to do Btech without a college
Also highly suggest going through the missing semester
What are some alternatives?
iceberg - Apache Iceberg
cs-topics - My personal curriculum covering basic CS topics. This might be useful for self-taught developers... A work in development! This might take a very long time to get finished!
kudu - Mirror of Apache Kudu
computer-science - :mortar_board: Path to a free self-taught education in Computer Science!
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
CS50x-2021 - 🎓 HarvardX: CS50 Introduction to Computer Science (CS50x)
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
vimrc - The ultimate Vim configuration (vimrc)
pinot - Apache Pinot - A realtime distributed OLAP datastore
javascript - JavaScript Style Guide
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
materials - Bonus materials, exercises, and example projects for our Python tutorials