dbt-fal
dbt-utils
dbt-fal | dbt-utils | |
---|---|---|
12 | 7 | |
851 | 1,213 | |
- | 2.9% | |
7.7 | 6.2 | |
about 2 months ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dbt-fal
-
machine learning in snowflake, unhappy data scientists
Happy data scientists use fal and dbt
-
dbt for ML Engineering
fal (https://github.com/fal-ai/fal) helps with this! In fact we wrote a blog post about feature engineering with fal and dbt recently
-
Dbt-fal: a dbt Python adapter with local code execution
We built a dbt adapter that helps you run local Python code with your dbt project with any other data warehouse. You can see it here: https://github.com/fal-ai/fal/tree/main/adapter
This new adapter helps you run your dbt Python models with isolated Python environments using our open source library: https://github.com/fal-ai/isolate
-
Data Stack for Python Scripts (and other transformations)
Have you considered fal? https://github.com/fal-ai/fal
-
Comparing dbt with Delta Live Tables for doing transformations
Something to maybe comment on the post is that dbt is introducing Python transformations on the data warehouse offering (e.g. Snowspark) soon and that there are tools like fal that enable these Python transformations to run in a different environment which you have control over.
-
What are the hottest dbt Repositories you should star on Github 2022? - Here are mine.
Fal-AI ( https://github.com/fal-ai/fal ) Fal helps to run Python scripts directly from the dbt project. For example, you can load dbt models directly into the Python context which helps to apply Data Science libraries like SKlearn and Prophet in the dbt models. This especially improves the data science capabilities within a data pipeline. What I extremely like about fal is that it extends dbt from a interesting angle.
-
What are your hottest dbt repositories in 2022 so far? Here are mine!
- 🐍 fal ai: Fal helps to run Python scripts directly from the dbt project. For example you can load dbt models directly into the Python context which helps to apply Data Science libaries like SKlearn and Prophet in the dbt models.
-
Wanting to move away from SQL
I haven’t tried it yet but I know https://fal.ai/ helps you run python alongside dbt.
-
Do I need orchestration for a Fivetran-dbt stack?
Yes I agree with you that having fivetran/airbyte and dbt covers a lot of the airflow use cases.. That being said you might still want to run some scripts after the DBT transformation is over, we ran into this exact problem and built a useful CLI tool for running python scripts alongside the dbt run.
-
Why is Data Build Tool (DBT) is so popular? What are some other alternatives?
Great write-up! For your logging integration, you might have a look at fal. There's an example of sending events to Datadog
dbt-utils
-
Show HN: Nasty, a cross warehouse, type checked, unit testable analytics library
// To get around this, we can use the approach outlined by how dbt does ansi sql generate_series
// https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/generate_series.sql
-
Anything one should know before going for self-hosted dbt?
I got bit by dbt-utils/deduplicate naively removing any row that contained a null in it recently, but fortunately there was a workaround for Databricks and a few other flavors of SQL.
-
Managing SQL Tests
I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
-
Dbt to acquire Transform to build out its semantic layer
My top three:
- Dev/stag/prod env check numbers before pushing to production.
- Unions between two sources that are not the same shape can be done without the headache. https://github.com/dbt-labs/dbt-utils#union_relations-source
- Macros for common case when statements.
-
Analytics Stacks for Startups
Add tests: unit tests in SQL are still not really practical, but testing the data, before allowing users to see it, is possible. dbt has some basic tests like Non-NULL and so on. dbt_utils supports comparing data across tables. If you need more, there is Great Expectation and similar tools. dbt also supports writing SQL queries which output “bad” rows. Use this to, e.g. check a specific order against manually checked correct data. Tests give you confidence that your pipelines produce correct results: nothing is worse than waking up with a Slack message from your boss that the graphs look wrong… They are especially useful in case you have to refactor a data pipeline. Basically every query you would run during the QA phase of a change request has a high potential to become an automatic test.
- Why is Data Build Tool (DBT) is so popular? What are some other alternatives?
-
Unit testing SQL in DBT
The equality test macro is also in the dbt-utils package from fishtown at https://github.com/fishtown-analytics/dbt-utils/blob/master/macros/schema_tests/equality.sql
What are some alternatives?
dbt-metabase - dbt + Metabase integration
dbt-expectations - Port(ish) of Great Expectations to dbt test macros
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
dbt-oracle - A dbt adapter for oracle db backend
evidence - Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
nodejs-bigquery - Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
streamlit - Streamlit — A faster way to build and share data apps.
airflow-dbt - Apache Airflow integration for dbt
re_data - re_data - fix data issues before your users & CEO would discover them 😊