hudi
nifi
Our great sponsors
hudi | nifi | |
---|---|---|
20 | 35 | |
5,053 | 4,381 | |
2.0% | 3.1% | |
9.9 | 9.9 | |
4 days ago | 2 days ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hudi
-
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
-
The "Big Three's" Data Storage Offerings
Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
Apache Hudi
- For those of you with Lakehouse Architectures, how do you handle duplicate records?
-
AWS ACID data lakehouse
Try Apache Hudi, it is fully integrated with AWS and offers almost everything that you requested.
-
Data n00b looking for guidance on how to setup data lake/warehouse
the corresponding kafka topics have 30d retention and I intend on having s3 sink connector for long term storage (open to other ideas here too, I noticed theres a hudi connector also)
- apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
- Big Data file formats
-
How-to-Guide: Contributing to Open Source
Apache Hudi
-
What do you use for Data versioning?
You could have a look at Apache Hudi - especially if you're running your Data Pipelines using Spark or Flink.
nifi
- FLaNK Stack Weekly 19 Feb 2024
- Ask HN: What are some unpopular technologies you wish people knew more about?
- FLaNK Stack Weekly for 13 November 2023
-
Ask HN: What low code platforms are worth using?
Apache NIFI (https://nifi.apache.org/).
It uses the concept of Flow-based programming. Also its so underacknolged but this tool is very flexible. I have used as an Event Bus all the 3rd-Party Integrations.
- Apache Nifi: easy to use, powerful, reliable system to process, distribute data
- Tool decision - What architecture would you choose and why?
-
Help with choosing techstack for a new DE team
Presently setting up Apache Nifi + Apache MiNiFi for the ETL portion of my work. NiFi was easy enough to figure out; but the docs for MiNiFi have been a pain due to differences between the Java and C++ versions. I then entirely configured it with the Java version so that it was easier to search for answers for the MiNiFi yaml syntax.
-
MS SQL Change Data Capture
Found it
-
Is there something like airflow but written in Scala/Java?
Apache Camel Apache Nifi Spring Cloud
-
Json splitting and Rerouting (new to nifi)
NIFI, like most Apache projects does most of its discussion on its mailing lists, but also has a slack.
What are some alternatives?
iceberg - Apache Iceberg
Logstash - Logstash - transport and process your logs, events, or other data
kudu - Mirror of Apache Kudu
superset - Apache Superset is a Data Visualization and Data Exploration Platform
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
meltano
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
meltano - Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
pinot - Apache Pinot - A realtime distributed OLAP datastore
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum: