Scio vs dbt-fal

Scio

A Scala API for Apache Beam and Google Cloud Dataflow. (by spotify)

Source Code

spotify.github.io

Suggest alternative

Edit details

dbt-fal

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models. (by fal-ai)

dbt Python Pandas Machine Learning Machinelearning data-modeling Analytics

DISCONTINUED

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Scio		dbt-fal
	Project
7	Mentions	12
2,523	Stars	851
0.5%	Growth	-
9.6	Activity	7.7
3 days ago	Latest Commit	24 days ago
Scala	Language	Python
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Scio

Posts with mentions or reviews of Scio. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-05-14.

Are there any openly available data engineering projects using Scala and Spark which follow industry conventions like proper folder/package structures and object oriented division of classes/concerns? Most examples I’ve seen have everything in one file without proper separation of concerns.
1 project | /r/dataengineering | 24 Jan 2023
For the DE's that choose Java over Python in new projects, why?
1 project | /r/dataengineering | 2 Jun 2022

I doubt it is possible because I suspect that GIL would like a word. So I could spend nights trying to make it work in Python (and possibly, if not likely, fail). Or I could just use this ready made solution.
what popular companies uses Scala?
3 projects | /r/scala | 14 May 2022

Apache Beam API called Scio. They open sourced it https://spotify.github.io/scio/
Scala or Python
1 project | /r/dataengineering | 19 Apr 2022

Generally Python is a lingua franca. I have never met a data engineer that doesn't know Python. Scala isn't used everywhere. Also, you should know that in Apache Beam (data processing framework that's gaining popularity because it can handle both streaming and batch processing and runs on spark) the language choices are Java, Python, Go and Scala. So, even if you "only" know Java, you can get started with Data engineering through apache beam.
Wanting to move away from SQL
2 projects | /r/dataengineering | 25 Feb 2022

I agree 100%. I haven't used SQL that much in previous data engineering roles, and I refuse to consider jobs that mostly deal with SQL. One of my roles involved using a nice Scala API for apache beam called Scio and it was great. Code was easy to write, maintain, and test. It also worked well with other services like PubSub and BigTable.
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
7 projects | news.ycombinator.com | 8 Oct 2021

If you prefer Scala, then you can try Scio: https://github.com/spotify/scio.
ELT, Data Pipeline
4 projects | dev.to | 1 Jan 2021

To counter the above mentioned problem, we decided to move our data to a Pub/Sub based stream model, where we would continue to push data as it arrives. As fluentd is the primary tool being used in all our servers to gather data, rather than replacing it we leveraged its plugin architecture to use a plugin to stream data into a sink of our choosing. Initially our inclination was towards Google PubSub and Google Dataflow as our Data Scientists/Engineers use Big Query extensively and keeping the data in the same Cloud made sense. The inspiration of using these tools came from Spotify’s Event Delivery – The Road to the Cloud. We did the setup on one of our staging server with Google PubSub and Dataflow. Both didn't really work out for us as PubSub model requires a Subscriber to be available for the Topic a Publisher streams messages to, otherwise the messages are not stored. On top of it there was no way to see which messages are arriving. During this the weirdest thing that we encountered was that the Topic would be orphaned losing the subscribers when working with Dataflow. PubSub we might have managed to live with, the wall in our path was Dataflow. We started off with using SCIO from Spotify to work with Dataflow, there is a considerate lack of documentation over it and found the community to be very reserved on Github, something quite evident in the world of Scala for which they came up with a Code of Conduct for its user base to follow. Something that was required from Dataflow for us was to support batch write option to GCS, after trying our hand at Dataflow to no success to achieve that, Google's staff at StackOverflow were quite responsive and their response confirmed that it was something not available with Dataflow and streaming data to BigQuery, Datastore or Bigtable as a datastore was an option to use. The reason we didn't do that was to avoid high streaming cost to these services to store data, as majority of our jobs from the data team are based on batched hourly data. The initial proposal to the updated pipeline is shown below.

dbt-fal

Posts with mentions or reviews of dbt-fal. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-26.

machine learning in snowflake, unhappy data scientists
2 projects | /r/dataengineering | 26 Mar 2023

Happy data scientists use fal and dbt
dbt for ML Engineering
1 project | /r/dataengineering | 23 Mar 2023

fal (https://github.com/fal-ai/fal) helps with this! In fact we wrote a blog post about feature engineering with fal and dbt recently
Dbt-fal: a dbt Python adapter with local code execution
2 projects | news.ycombinator.com | 12 Jan 2023

We built a dbt adapter that helps you run local Python code with your dbt project with any other data warehouse. You can see it here: https://github.com/fal-ai/fal/tree/main/adapter
This new adapter helps you run your dbt Python models with isolated Python environments using our open source library: https://github.com/fal-ai/isolate
Data Stack for Python Scripts (and other transformations)
1 project | /r/dataengineering | 15 Jul 2022

Have you considered fal? https://github.com/fal-ai/fal
Comparing dbt with Delta Live Tables for doing transformations
1 project | /r/dataengineering | 6 Jul 2022

Something to maybe comment on the post is that dbt is introducing Python transformations on the data warehouse offering (e.g. Snowspark) soon and that there are tools like fal that enable these Python transformations to run in a different environment which you have control over.
What are the hottest dbt Repositories you should star on Github 2022? - Here are mine.
5 projects | dev.to | 8 Jun 2022

Fal-AI ( https://github.com/fal-ai/fal ) Fal helps to run Python scripts directly from the dbt project. For example, you can load dbt models directly into the Python context which helps to apply Data Science libraries like SKlearn and Prophet in the dbt models. This especially improves the data science capabilities within a data pipeline. What I extremely like about fal is that it extends dbt from a interesting angle.
What are your hottest dbt repositories in 2022 so far? Here are mine!
5 projects | /r/dataengineering | 7 Jun 2022

- 🐍 fal ai: Fal helps to run Python scripts directly from the dbt project. For example you can load dbt models directly into the Python context which helps to apply Data Science libaries like SKlearn and Prophet in the dbt models.
Wanting to move away from SQL
2 projects | /r/dataengineering | 25 Feb 2022

I haven’t tried it yet but I know https://fal.ai/ helps you run python alongside dbt.
Do I need orchestration for a Fivetran-dbt stack?
1 project | /r/dataengineering | 5 Dec 2021

Yes I agree with you that having fivetran/airbyte and dbt covers a lot of the airflow use cases.. That being said you might still want to run some scripts after the DBT transformation is over, we ran into this exact problem and built a useful CLI tool for running python scripts alongside the dbt run.
Why is Data Build Tool (DBT) is so popular? What are some other alternatives?
4 projects | /r/dataengineering | 4 Dec 2021

Great write-up! For your logging integration, you might have a look at fal. There's an example of sending events to Datadog

What are some alternatives?

When comparing Scio and dbt-fal you can also consider the following projects:

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

dbt-metabase - dbt + Metabase integration

Apache Flink - Apache Flink

dbt-expectations - Port(ish) of Great Expectations to dbt test macros

Apache Kafka - Mirror of Apache Kafka

kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

beam - Apache Beam is a unified programming model for Batch and Streaming data processing.

evidence - Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown

Reactive-kafka - Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.

Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

metorikku - A simplified, lightweight ETL Framework based on Apache Spark

airflow-dbt - Apache Airflow integration for dbt

Scio vs Apache Spark dbt-fal vs dbt-metabase Scio vs Apache Flink dbt-fal vs dbt-expectations Scio vs Apache Kafka dbt-fal vs kuwala Scio vs beam dbt-fal vs evidence Scio vs Reactive-kafka dbt-fal vs Pandas Scio vs metorikku dbt-fal vs airflow-dbt

Compare Scio vs dbt-fal and see what are their differences.

Scio

dbt-fal

Scio

dbt-fal

What are some alternatives?