dbt-expectations
dbt-fal
dbt-expectations | dbt-fal | |
---|---|---|
10 | 12 | |
947 | 851 | |
2.4% | - | |
6.6 | 7.7 | |
9 days ago | 27 days ago | |
Shell | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dbt-expectations
-
Dbt tests vs Soda SQL
Have not used Soda, but dbt indeed is pretty good especially when adding dbt-expectations
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
dbt-expectations
-
Data Quality Dimensions: Assuring Your Data Quality with Great Expectations
I highly.. highly.. recommend the dbt-expectations extension from Catologica for dbt. It's a port of Great Expectations, except you can quickly thunk it in your schema.yml's and have it run as part of your dbt test process. Super powerful and it's prevented us from shipping bad data many times.
-
Managing SQL Tests
I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
-
What are some Data Quality check related frameworks for datasets ranging from 100GB to 1TB in size?
Use dbt's testing functionality during your transformations with catalogica/dbt-expectations (Great Expectations framework ported to dbt)
-
Great Expectations is annoyingly cumbersome
Check out dbt-expectations https://github.com/calogica/dbt-expectations
-
CI/CD in data engineering - help a noob
There are certain things I would like to add such as data quality, I can use something like dbt great expectations, but I am not sure how much more I should force it before getting an airflow setup..
- How do you query and quality check data produced in intermediate steps in analytics pipeline?
-
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
[dbt Labs employee here]
Check out dbt-expectations package[1]. It's a port of the Great Expectations checks to dbt as tests. The advantage of this is you don't need another tool for these pretty standard tests, and can be early incorporated into dbt workflows.
[1] https://github.com/calogica/dbt-expectations
-
Unit testing SQL in DBT
Also check out dbt-expectations that is a port of Great Expectations that greatly expands the configurable (non-assert) tests.
dbt-fal
-
machine learning in snowflake, unhappy data scientists
Happy data scientists use fal and dbt
-
dbt for ML Engineering
fal (https://github.com/fal-ai/fal) helps with this! In fact we wrote a blog post about feature engineering with fal and dbt recently
-
Dbt-fal: a dbt Python adapter with local code execution
We built a dbt adapter that helps you run local Python code with your dbt project with any other data warehouse. You can see it here: https://github.com/fal-ai/fal/tree/main/adapter
This new adapter helps you run your dbt Python models with isolated Python environments using our open source library: https://github.com/fal-ai/isolate
-
Data Stack for Python Scripts (and other transformations)
Have you considered fal? https://github.com/fal-ai/fal
-
Comparing dbt with Delta Live Tables for doing transformations
Something to maybe comment on the post is that dbt is introducing Python transformations on the data warehouse offering (e.g. Snowspark) soon and that there are tools like fal that enable these Python transformations to run in a different environment which you have control over.
-
What are the hottest dbt Repositories you should star on Github 2022? - Here are mine.
Fal-AI ( https://github.com/fal-ai/fal ) Fal helps to run Python scripts directly from the dbt project. For example, you can load dbt models directly into the Python context which helps to apply Data Science libraries like SKlearn and Prophet in the dbt models. This especially improves the data science capabilities within a data pipeline. What I extremely like about fal is that it extends dbt from a interesting angle.
-
What are your hottest dbt repositories in 2022 so far? Here are mine!
- ๐ fal ai: Fal helps to run Python scripts directly from the dbt project. For example you can load dbt models directly into the Python context which helps to apply Data Science libaries like SKlearn and Prophet in the dbt models.
-
Wanting to move away from SQL
I havenโt tried it yet but I know https://fal.ai/ helps you run python alongside dbt.
-
Do I need orchestration for a Fivetran-dbt stack?
Yes I agree with you that having fivetran/airbyte and dbt covers a lot of the airflow use cases.. That being said you might still want to run some scripts after the DBT transformation is over, we ran into this exact problem and built a useful CLI tool for running python scripts alongside the dbt run.
-
Why is Data Build Tool (DBT) is so popular? What are some other alternatives?
Great write-up! For your logging integration, you might have a look at fal. There's an example of sending events to Datadog
What are some alternatives?
dbt-utils - Utility functions for dbt projects.
dbt-metabase - dbt + Metabase integration
dbt-oracle - A dbt adapter for oracle db backend
kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
materialize - The data warehouse for operational workloads.
evidence - Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
airflow-dbt - Apache Airflow integration for dbt
cuetils - CLI and library for diff, patch, and ETL operations on CUE, JSON, and Yaml
re_data - re_data - fix data issues before your users & CEO would discover them ๐