DataGristle
premier-league
DataGristle | premier-league | |
---|---|---|
5 | 8 | |
137 | 147 | |
- | - | |
0.0 | 9.4 | |
3 months ago | 4 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DataGristle
- What are your weekend side projects?
- Instant data model from 1000s of unique files?
- Using Hashing to detect data changes in ELT
-
How do you sort a CSV file with several million rows?
DataGristle: this one contains some more unusual csv utilities, and what's in master includes the ability to sort by field names rather than offsets: https://github.com/kenfar/DataGristle
-
Open source contributions for a Data Engineer?
DataGristle by u/kenfar who influenced many of us in this sub.
premier-league
-
Google Cloud Portfolio Projects?
I have a data engineering project that uses BigQuery, Cloud Run, Compute Engine, Cloud SQL, Artifact Registry, Firestore, and Datastream.
-
Am I using (and understanding) dbt correctly?
I'm currently working on a personal project where I am starting to implement dbt as a transformation step. One of my data pipelines runs in the following steps:
-
What are your weekend side projects?
I’ve been building a Premier League Dashboard with Streamlit as a DE project.
-
Introducing Firestore into my Premier League Project
Streamlit Dashboard: https://premierleague.streamlit.app
-
Premier League Project Infrastructure Update
Here is my updated GitHub Actions Workflow file: ci.yml
-
Another data project, this time with Python, Go, (some SQL), Docker, Google Cloud Services, Streamlit, and GitHub Actions
Here is the GitHub repo.
What are some alternatives?
Skytrax-Data-Warehouse - A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
didact-engine - The REST API and execution engine for the Didact Platform.
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
black - The uncompromising Python code formatter
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
chispa - PySpark test helper methods with beautiful error messages
dagster - An orchestration platform for the development, production, and observation of data assets.