Skytrax-Data-Warehouse
spark-rapids
Our great sponsors
Skytrax-Data-Warehouse | spark-rapids | |
---|---|---|
1 | 3 | |
126 | 707 | |
- | 4.8% | |
0.0 | 9.8 | |
almost 4 years ago | 2 days ago | |
Python | Scala | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Skytrax-Data-Warehouse
-
Open source contributions for a Data Engineer?
Always open to accept contributions to my project (Skytrax Data Warehouse). If you are into data stuff support my work at youtube as well (One Developer Pirate), I mostly make data-oriented videos. These days I'm making a SQL course from a data analysis perspective that is expected to release in next week.
spark-rapids
-
Open source contributions for a Data Engineer?
His newer project, Ballista, was also donated to Apache Arrow. I hope to get the Rust skills to collaborate with him on open source work someday too. He's also doing really cool work on spark-rapids FYI.
-
Ballista: New approach for 2021
So, in my day job at NVIDIA, I work on the RAPIDS Accelerator for Apache Spark, which is an open-source plugin that provides GPU-acceleration for ETL workloads, leveraging the RAPIDS cuDF GPU DataFrame library.
What are some alternatives?
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
streamlit - Streamlit — A faster way to build and share data apps.
ballista - Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
dagster - An orchestration platform for the development, production, and observation of data assets.
meltano - Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
quinn - pyspark methods to enhance developer productivity 📣 👯 🎉
dbd - dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
meltano
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.