sqlfluff
soda-sql
Our great sponsors
sqlfluff | soda-sql | |
---|---|---|
35 | 25 | |
7,189 | 50 | |
1.9% | - | |
9.6 | 8.2 | |
4 days ago | over 1 year ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sqlfluff
-
ππ 23 issues to grow yourself as an exceptional open-source Python expert π§βπ» π₯
Repo : https://github.com/sqlfluff/sqlfluff
-
SQL Reserved Words β The Empirical List
I'm surprised sqlfluff hasn't been mentioned yet. Perhaps not a comprehensive list, but it's worked for everything I've thrown at it. There's an ANSI keyword list [0], and then dialect-specific lists for everything from DB2 [1] to Snowflake [2].
[0]: https://github.com/sqlfluff/sqlfluff/blob/main/src/sqlfluff/...
-
Show HN: Postgres Language Server
It has tons of annoying quirks, but I couldn't imagine running a DBT project without it: https://github.com/sqlfluff/sqlfluff
-
Front page news headline scraping data engineering project
Move SQL queries to sql files and read from files (Use sqlfluff to lint the code https://github.com/sqlfluff/sqlfluff)
- Anything like SQLFluff written in Rust?
-
Code autoformatter for SQL in VSCode that plays nicely with dbt
SQLFluff is a good CLI tool for this and includes support for jinja and dbt. I don't think there's a VSCode plugin for it yet.
-
Ask HN: How do you test SQL?
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
-
What is something you would learn at college but not a bootcamp (hard skills)
BigQuery SQL and SQLFluff
-
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
-
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
soda-sql
-
Data Quality - Great Expectations for Data Engineers
I might be a bit biased, but that was my opinion before even I started contributing to Soda SQL.
- dbt vs R/Python for transformation
-
SodaCL - preview of a new "data reliability as code" language
I'm one of the developers of the Open Source soda-sql data quality monitoring library, and over the past year we got some incredible feedback from our users, and based on that we started working on a new DSL for data reliability as code we are calling Soda CL.
-
How do you test your pipelines?
You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.
-
Being constantly shut down by more senior team members when I mention adding some QA in our work
As many have said, there might be business side of things to deliver. Somebody above promised delivery with tight deadlines. Trust me, I am not a fan, but this how the world works and it sucks. I would say in your free time, explore tools like greatexpectations.io https://greatexpectations.io/ or https://github.com/sodadata/soda-sql which are modern ways of testing in your learning curve
- Soda
- How heavily do you use Great Expectations?
-
What are some exciting new tools/libraries in 2021?
soda-sql really cool library to automate data quality checks on SQL tables
-
How do I incorporate testing after the fact?
Look at SodaSQL. It's more enterprise focused than Great Expectations and you can pipe results to a database for downstream actions and analysis.
-
Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ
Certainly! Itβs not requested that much π but please add an issue on GitHub . I would love to add at least experimental support.
What are some alternatives?
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
sqlparse - A non-validating SQL parser module for Python
pandera - A light-weight, flexible, and expressive statistical data testing library
dbt-utils - Utility functions for dbt projects.
dbt-sessionization - Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.
ale - Check syntax in Vim/Neovim asynchronously and fix files, with Language Server Protocol (LSP) support
re_data - re_data - fix data issues before your users & CEO would discover them π
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
spark-fast-tests - Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)