sqlfluff
soda-sql
DISCONTINUED
Our great sponsors
sqlfluff | soda-sql | |
---|---|---|
31 | 25 | |
5,771 | 50 | |
3.8% | - | |
9.4 | 8.2 | |
4 days ago | 5 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sqlfluff
-
Ask HN: How do you test SQL?
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
-
What is something you would learn at college but not a bootcamp (hard skills)
BigQuery SQL and SQLFluff
-
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
-
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
-
How to create projects for myself to enrich my resume?
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
-
I failed a coding interview. Can anyone help me solve this?
Capitals I pretty much auto write, although I'll used the code formatter I wrote if someone sends me something messy. Bad and reused aliases, however, require manual fixing before I can get to the code review stage, so a PR using those will be rejected as needs work. sqlfluff is a decent formatter & linter if you need to get into details like that regularly.
- Terraform - Pre commit hooks
-
How-to-Guide: Contributing to Open Source
SQLFluff
-
Ask HN: Preferred SQL Auto-Formatter?
Not serving all of our needs but it did its job: https://github.com/sqlfluff/sqlfluff
-
This Week In Python
sqlfluff β A SQL linter and auto-formatter for Humans
soda-sql
-
Data Quality - Great Expectations for Data Engineers
I might be a bit biased, but that was my opinion before even I started contributing to Soda SQL.
You can always give Soda a try, more info on soda.io and https://github.com/sodadata/soda-sql. We've put a lot of focus on making it lightweight and easy to use. Disclaimer: I'm one of the founders :).
-
dbt vs R/Python for transformation
Testing and production monitoring of data is still underrated in many teams. In building and operating software systems this has become the norm. In data, there is still a lot of room for improvement. The mentioned tools are insufficient for a thorough testing and monitoring setup. That is why we created Soda with Soda SQL as our open source tool for testing data in and out of pipeline: https://github.com/sodadata/soda-sql
-
How do you test your pipelines?
You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.
- How heavily do you use Great Expectations?
-
What are some exciting new tools/libraries in 2021?
soda-sql really cool library to automate data quality checks on SQL tables
-
Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ
Certainly! Itβs not requested that much π but please add an issue on GitHub . I would love to add at least experimental support.
-
Open source contributions for a Data Engineer?
If you are interested in using/learning Python, SQL and data warehouse skills, take a look at https://github.com/sodadata/soda-sql
-
Anyone aware of any Data Validation Framework with custom SQL capability
Soda-sql looks promising. It has some out of the box tests and you can also provide custom SQL: https://github.com/sodadata/soda-sql
What are some alternatives?
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
dbt-utils - Utility functions for dbt projects.
pandera - A light-weight, flexible, and expressive statistical data testing library
ale - Check syntax in Vim asynchronously and fix files, with Language Server Protocol (LSP) support
trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh
dbt-sessionization - Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.
re_data - re_data - fix data issues before your users & CEO would discover them π
spark-fast-tests - Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
dagster - An orchestration platform for the development, production, and observation of data assets.
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum: