Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
analytics
-
I'm not getting it...what's the point of DBT?
Take a look at gitlab's dbt project: https://gitlab.com/gitlab-data/analytics/-/blob/master/transform/snowflake-dbt/models/common/schema.yml
-
How would you structure a repo with 10+ ETL pipelines and shared code?
A good reference is the Gitlab data team repo. https://gitlab.com/gitlab-data/analytics
- What are your favourite GitHub repos that shows how data engineering should be done?
-
Are there any open corporate Data Team repositories / projects besides GitLab?
For example, their Data Team have a public repository, with a bunch of information on how they organize DAGs, machine learning projects, system configuration, etc.
- Kimball Dim Modelling Code Examples
- Can someone help me, an absolute newbie, understand the usage and benefit of dbt with practical example ?
-
Is jinja templating right for DBT?
So I've run through the DBT tutorial stuff and looked over some fairly complex uses of it i.e. GitLab Data and I was wondering if anyone has any opinions or insights into the use of jinja templating in the sql?
-
Where can I find free data engineering ( big data) projects online?
Gitlab has their DBT repo open source and is very useful for seeing how to structure a project at scale. https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt
-
Gitlab's Data Team Platform (in depth look at their stack)
Currently the team is working hard on this: https://gitlab.com/gitlab-data/analytics/-/issues/9508
-
Can someone explain the big deal with dbt?
GitLab's dbt project is an excellent example of a mature project at scale. They also have a comprehensive guide to their methodology.
monosi
-
Open source data observability tools with UI?
I also found https://github.com/monosidev/monosi but it seems there are no activities in the repository from last year.
-
Databricks monitoring/observability
I'm building an open source data observability platform - https://github.com/monosidev/monosi that visualizes metadata collected from data warehouses. Databricks is currently not supported (contributions welcome!), but it may help to take a look at how we approach the anomaly detection & visualization aspects.
-
Monitor PostgreSQL for anomalies in ingested data
Building an open source tool that lets you monitor PostgreSQL instances form anomalies in data coming in - https://github.com/monosidev/monosi
- Open Source Data Observability for BigQuery
-
Metadata extraction and management
It’s open source, check out the repository here - https://github.com/monosidev/monosi
-
How to Monitor Supabase with Monosi
🎉 Congratulations, you've just set up and scheduled a data monitor on your Supabase instance. You can now add more monitors to other tables in your database. Find more information on how to use Monosi here.
-
Setting up data monitoring for PostgreSQL
Now that you’ve worked through an example using a public PostgreSQL instance, you can further extend this to your own data store. For more information, get started here.
- Monosi v0.0.3 Released! Open source Data Observability now with a Web UI, Postgres Support, & more.
-
Sunday Daily Thread: What's everyone working on this week?
Continuing to build out & stabilize Monosi (open source data observability) - https://github.com/monosidev/monosi
-
Data pipeline suggestions
Observability: Monosi
What are some alternatives?
dbt-synapse - dbt adapter for Azure Synapse Dedicated SQL Pools
datahub - The Metadata Platform for your Data Stack
dagster - An orchestration platform for the development, production, and observation of data assets.
jitsu - Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
castled - Castled is an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
soda-spark - Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
AdvancedSQLPuzzles - Welcome to my GitHub repository. I hope you enjoy solving these puzzles as much as I have enjoyed creating them.
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
lightdash - Self-serve BI to 10x your data team ⚡️
great_expectations - Always know what to expect from your data.