spark-fast-tests VS soda-sql

Compare spark-fast-tests vs soda-sql and see what are their differences.

spark-fast-tests

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit) (by MrPowers)
Our great sponsors
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • SonarLint - Clean code begins in your IDE with SonarLint
  • SaaSHub - Software Alternatives and Reviews
spark-fast-tests soda-sql
5 25
377 50
- -
4.1 8.2
9 months ago 3 months ago
Scala Python
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

spark-fast-tests

Posts with mentions or reviews of spark-fast-tests. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-10-15.

soda-sql

Posts with mentions or reviews of soda-sql. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-03-18.

What are some alternatives?

When comparing spark-fast-tests and soda-sql you can also consider the following projects:

deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Prefect - The easiest way to build, run, and monitor data pipelines at scale.

pandera - A light-weight, flexible, and expressive statistical data testing library

sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

chispa - PySpark test helper methods with beautiful error messages

trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh

dbt-sessionization - Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.

re_data - re_data - fix data issues before your users & CEO would discover them 😊

dagster - An orchestration platform for the development, production, and observation of data assets.

airflow-notebook - This repository is no longer maintained.

spark-daria - Essential Spark extensions and helper methods ✨😲