Our great sponsors
|4 days ago||8 days ago|
|MIT License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Ask HN: How do you test SQL?
18 projects | news.ycombinator.com | 31 Jan 2023
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
What is something you would learn at college but not a bootcamp (hard skills)
2 projects | reddit.com/r/cscareerquestions | 12 Jan 2023
BigQuery SQL and SQLFluff
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
2 projects | reddit.com/r/dataengineering | 11 Jan 2023
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
How to create projects for myself to enrich my resume?
5 projects | reddit.com/r/dataengineering | 29 Oct 2022
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
I failed a coding interview. Can anyone help me solve this?
2 projects | reddit.com/r/SQL | 13 Oct 2022
Capitals I pretty much auto write, although I'll used the code formatter I wrote if someone sends me something messy. Bad and reused aliases, however, require manual fixing before I can get to the code review stage, so a PR using those will be rejected as needs work. sqlfluff is a decent formatter & linter if you need to get into details like that regularly.
Terraform - Pre commit hooks
2 projects | reddit.com/r/Terraform | 4 Oct 2022
How-to-Guide: Contributing to Open Source
19 projects | reddit.com/r/dataengineering | 11 Jun 2022
Ask HN: Preferred SQL Auto-Formatter?
2 projects | news.ycombinator.com | 21 May 2022
Not serving all of our needs but it did its job: https://github.com/sqlfluff/sqlfluff
This Week In Python
5 projects | dev.to | 8 Apr 2022
sqlfluff – A SQL linter and auto-formatter for Humans
Dbt to acquire Transform to build out its semantic layer
2 projects | news.ycombinator.com | 9 Feb 2023
My top three:
- Dev/stag/prod env check numbers before pushing to production.
- Unions between two sources that are not the same shape can be done without the headache. https://github.com/dbt-labs/dbt-utils#union_relations-source
- Macros for common case when statements.
Analytics Stacks for Startups
8 projects | dev.to | 21 Feb 2022
Add tests: unit tests in SQL are still not really practical, but testing the data, before allowing users to see it, is possible. dbt has some basic tests like Non-NULL and so on. dbt_utils supports comparing data across tables. If you need more, there is Great Expectation and similar tools. dbt also supports writing SQL queries which output “bad” rows. Use this to, e.g. check a specific order against manually checked correct data. Tests give you confidence that your pipelines produce correct results: nothing is worse than waking up with a Slack message from your boss that the graphs look wrong… They are especially useful in case you have to refactor a data pipeline. Basically every query you would run during the QA phase of a change request has a high potential to become an automatic test.
Why is Data Build Tool (DBT) is so popular? What are some other alternatives?
4 projects | reddit.com/r/dataengineering | 4 Dec 2021
Unit testing SQL in DBT
3 projects | reddit.com/r/dataengineering | 6 Feb 2021
The equality test macro is also in the dbt-utils package from fishtown at https://github.com/fishtown-analytics/dbt-utils/blob/master/macros/schema_tests/equality.sql
What are some alternatives?
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
dbt-expectations - Port(ish) of Great Expectations to dbt test macros
ale - Check syntax in Vim asynchronously and fix files, with Language Server Protocol (LSP) support
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
sqlparse - A non-validating SQL parser module for Python
airbyte - Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
streamlit - Streamlit — The fastest way to build data apps in Python
spark-fast-tests - Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
dbt-oracle - A dbt adapter for oracle db backend
superset - Apache Superset is a Data Visualization and Data Exploration Platform
Prefect - The easiest way to build, run, and monitor data pipelines at scale.