|11 months ago||6 days ago|
|MIT License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How To Event Stream Data From Your Hugo Site To Google Analytics Using RudderStack
4 projects | dev.to | 19 Jan 2022
Send Form Data From Marketo to Multiple Destinations Using RudderStack
1 project | dev.to | 13 Jan 2022
By using RudderStack to understand how users are finding and interacting with your site and then combining that with the data collected by your Marketo forms, you'll get deeper insights about your potential customers and provide higher quality leads to your sales team.
Data Warehouse Integration: Refining Your Customer Data Stack
1 project | dev.to | 4 Jan 2022
RudderStack lets you send the rich analysis from your warehouse to your entire customer data stack. Read more about how RudderStack's Warehouse Actions feature unlocks the data in your warehouse.
How To Event Stream Data From Your Nuxt.Js App Using RudderStack
4 projects | dev.to | 22 Dec 2021
RudderStack is an open-source Customer Data Pipeline that enables you to track events from your web, mobile, and server-side sources and send them to your whole customer data stack in real-time. We have also open-sourced our primary GitHub repository - rudder-server.
Clickstream Data Mining Techniques: An Introduction
3 projects | dev.to | 16 Sep 2021
We highly recommend checking out our Sessionization repository on GitHub to see how to use the sessions in a practical scenario.
How do you test your pipelines?
3 projects | reddit.com/r/dataengineering | 23 Jan 2022
You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.
Being constantly shut down by more senior team members when I mention adding some QA in our work
1 project | reddit.com/r/dataengineering | 10 Jan 2022
As many have said, there might be business side of things to deliver. Somebody above promised delivery with tight deadlines. Trust me, I am not a fan, but this how the world works and it sucks. I would say in your free time, explore tools like greatexpectations.io https://greatexpectations.io/ or https://github.com/sodadata/soda-sql which are modern ways of testing in your learning curve
1 project | reddit.com/r/devopspro | 10 Dec 2021
How heavily do you use Great Expectations?
2 projects | reddit.com/r/dataengineering | 23 Sep 2021
What are some exciting new tools/libraries in 2021?
2 projects | reddit.com/r/datascience | 20 Jun 2021
soda-sql really cool library to automate data quality checks on SQL tables
How do I incorporate testing after the fact?
1 project | reddit.com/r/dataengineering | 18 May 2021
Look at SodaSQL. It's more enterprise focused than Great Expectations and you can pipe results to a database for downstream actions and analysis.
Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ
2 projects | reddit.com/r/dataengineering | 17 May 2021
Certainly! It’s not requested that much 😊 but please add an issue on GitHub . I would love to add at least experimental support.
Open source contributions for a Data Engineer?
17 projects | reddit.com/r/dataengineering | 16 Apr 2021
If you are interested in using/learning Python, SQL and data warehouse skills, take a look at https://github.com/sodadata/soda-sql
Anyone aware of any Data Validation Framework with custom SQL capability
4 projects | reddit.com/r/dataengineering | 18 Mar 2021
Soda-sql looks promising. It has some out of the box tests and you can also provide custom SQL: https://github.com/sodadata/soda-sql
How would you QA data before/after a migration?
2 projects | reddit.com/r/dataengineering | 16 Mar 2021
We just released a new open source tool for testing SQL accessible data and have support for BigQuery: https://docs.soda.io/soda-sql/
What are some alternatives?
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
sqlfluff - A SQL linter and auto-formatter for Humans
trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh
dagster - An orchestration platform for the development, production, and observation of data assets.
airflow-notebook - Airflow-Notebook is an Apache Airflow operator that enables running notebooks or Python scripts as tasks in a DAG.
spark-fast-tests - Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Prefect - The easiest way to automate your data
data_check - data_check is a simple data validation tool
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
airflow-docker - This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
redata - re_data - fix data issues before your users & CEO would discover them 😊
pandera - A light-weight, flexible, and expressive data validation library for dataframes