spark-fast-tests
sqlfluff
Our great sponsors
spark-fast-tests | sqlfluff | |
---|---|---|
5 | 29 | |
377 | 5,553 | |
- | 2.5% | |
4.1 | 9.7 | |
9 months ago | 7 days ago | |
Scala | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-fast-tests
-
Well designed scala/spark project
https://github.com/MrPowers/spark-fast-tests https://github.com/97arushisharma/Scala_Practice/tree/master/BigData_Analysis_with_Scala_and_Spark/wikipedia
-
Unit & integration testing in Databricks
If the majority of your stuff is not UDF-based there is an OS solution to run assertion tests against full data frames called spark-fast-tests. The idea here is similar in that you have a it notebook that calls your actual notebook against a staged input reads the output and compares it to a prefabed expected output. This does take a bit of setup and trial and error but it’s the closest I’ve been able to get to proper automated regression testing in databricks
-
Show dataengineering: beavis, a library for unit testing Pandas/Dask code
I am the author of spark-fast-tests and chispa, libraries for unit testing Scala Spark / PySpark code.
-
Ask HN: What are some tools / libraries you built yourself?
I built daria (https://github.com/MrPowers/spark-daria) to make it easier to write Spark and spark-fast-tests (https://github.com/MrPowers/spark-fast-tests) to provide a good testing workflow.
quinn (https://github.com/MrPowers/quinn) and chispa (https://github.com/MrPowers/chispa) are the PySpark equivalents.
Built bebe (https://github.com/MrPowers/bebe) to expose the Spark Catalyst expressions that aren't exposed to the Scala / Python APIs.
Also build spark-sbt.g8 to create a Spark project with a single command: https://github.com/MrPowers/spark-sbt.g8
-
Open source contributions for a Data Engineer?
I've built popular PySpark (quinn, chispa) and Scala Spark (spark-daria, spark-fast-tests) libraries.
sqlfluff
-
Ask HN: How do you test SQL?
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
-
What is something you would learn at college but not a bootcamp (hard skills)
BigQuery SQL and SQLFluff
-
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
-
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
-
How to create projects for myself to enrich my resume?
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
-
I failed a coding interview. Can anyone help me solve this?
Capitals I pretty much auto write, although I'll used the code formatter I wrote if someone sends me something messy. Bad and reused aliases, however, require manual fixing before I can get to the code review stage, so a PR using those will be rejected as needs work. sqlfluff is a decent formatter & linter if you need to get into details like that regularly.
- Terraform - Pre commit hooks
-
How-to-Guide: Contributing to Open Source
SQLFluff
-
Ask HN: Preferred SQL Auto-Formatter?
Not serving all of our needs but it did its job: https://github.com/sqlfluff/sqlfluff
-
This Week In Python
sqlfluff – A SQL linter and auto-formatter for Humans
What are some alternatives?
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
dbt-utils - Utility functions for dbt projects.
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
ale - Check syntax in Vim asynchronously and fix files, with Language Server Protocol (LSP) support
chispa - PySpark test helper methods with beautiful error messages
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
airbyte - Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
sqlparse - A non-validating SQL parser module for Python
spark-daria - Essential Spark extensions and helper methods ✨😲
streamlit - Streamlit — The fastest way to build data apps in Python