monosi
soda-sql
Our great sponsors
monosi | soda-sql | |
---|---|---|
20 | 25 | |
320 | 50 | |
1.3% | - | |
0.0 | 8.2 | |
over 1 year ago | over 1 year ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
monosi
-
Open source data observability tools with UI?
I also found https://github.com/monosidev/monosi but it seems there are no activities in the repository from last year.
-
Databricks monitoring/observability
I'm building an open source data observability platform - https://github.com/monosidev/monosi that visualizes metadata collected from data warehouses. Databricks is currently not supported (contributions welcome!), but it may help to take a look at how we approach the anomaly detection & visualization aspects.
-
Monitor PostgreSQL for anomalies in ingested data
Building an open source tool that lets you monitor PostgreSQL instances form anomalies in data coming in - https://github.com/monosidev/monosi
- Open Source Data Observability for BigQuery
-
Metadata extraction and management
It’s open source, check out the repository here - https://github.com/monosidev/monosi
-
How to Monitor Supabase with Monosi
🎉 Congratulations, you've just set up and scheduled a data monitor on your Supabase instance. You can now add more monitors to other tables in your database. Find more information on how to use Monosi here.
-
Setting up data monitoring for PostgreSQL
Now that you’ve worked through an example using a public PostgreSQL instance, you can further extend this to your own data store. For more information, get started here.
- Monosi v0.0.3 Released! Open source Data Observability now with a Web UI, Postgres Support, & more.
-
Sunday Daily Thread: What's everyone working on this week?
Continuing to build out & stabilize Monosi (open source data observability) - https://github.com/monosidev/monosi
-
Data pipeline suggestions
Observability: Monosi
soda-sql
-
Data Quality - Great Expectations for Data Engineers
I might be a bit biased, but that was my opinion before even I started contributing to Soda SQL.
- dbt vs R/Python for transformation
-
SodaCL - preview of a new "data reliability as code" language
I'm one of the developers of the Open Source soda-sql data quality monitoring library, and over the past year we got some incredible feedback from our users, and based on that we started working on a new DSL for data reliability as code we are calling Soda CL.
-
How do you test your pipelines?
You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.
-
Being constantly shut down by more senior team members when I mention adding some QA in our work
As many have said, there might be business side of things to deliver. Somebody above promised delivery with tight deadlines. Trust me, I am not a fan, but this how the world works and it sucks. I would say in your free time, explore tools like greatexpectations.io https://greatexpectations.io/ or https://github.com/sodadata/soda-sql which are modern ways of testing in your learning curve
- Soda
- How heavily do you use Great Expectations?
-
What are some exciting new tools/libraries in 2021?
soda-sql really cool library to automate data quality checks on SQL tables
-
How do I incorporate testing after the fact?
Look at SodaSQL. It's more enterprise focused than Great Expectations and you can pipe results to a database for downstream actions and analysis.
-
Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ
Certainly! It’s not requested that much 😊 but please add an issue on GitHub . I would love to add at least experimental support.
What are some alternatives?
datahub - The Metadata Platform for your Data Stack
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
jitsu - Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
pandera - A light-weight, flexible, and expressive statistical data testing library
castled - Castled is an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
soda-spark - Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
dbt-sessionization - Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.
great_expectations - Always know what to expect from your data.
re_data - re_data - fix data issues before your users & CEO would discover them 😊
dagster - An orchestration platform for the development, production, and observation of data assets.
trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh