DataGristle
Skytrax-Data-Warehouse
DataGristle | Skytrax-Data-Warehouse | |
---|---|---|
5 | 1 | |
137 | 131 | |
- | - | |
0.0 | 0.0 | |
3 months ago | about 4 years ago | |
Python | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DataGristle
- What are your weekend side projects?
- Instant data model from 1000s of unique files?
- Using Hashing to detect data changes in ELT
-
How do you sort a CSV file with several million rows?
DataGristle: this one contains some more unusual csv utilities, and what's in master includes the ability to sort by field names rather than offsets: https://github.com/kenfar/DataGristle
-
Open source contributions for a Data Engineer?
DataGristle by u/kenfar who influenced many of us in this sub.
Skytrax-Data-Warehouse
-
Open source contributions for a Data Engineer?
Always open to accept contributions to my project (Skytrax Data Warehouse). If you are into data stuff support my work at youtube as well (One Developer Pirate), I mostly make data-oriented videos. These days I'm making a SQL course from a data analysis perspective that is expected to release in next week.
What are some alternatives?
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
dbd - dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
jaydebeapi - JayDeBeApi module allows you to connect from Python code to databases using Java JDBC. It provides a Python DB-API v2.0 to that database.
didact-engine - The REST API and execution engine for the Didact Platform.
dbt-spotify-analytics - Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
airflow-api-tests - This is a collection of Pytest for the 2.0 Stable Rest Apis for Apache Airflow. I have another repo where you could setup airflow locally and play around with these. I am used to RestAssured, but trying out pytest here.
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
dagster - An orchestration platform for the development, production, and observation of data assets.