spark-fast-tests
fselect
spark-fast-tests | fselect | |
---|---|---|
6 | 14 | |
418 | 3,812 | |
- | - | |
0.0 | 8.4 | |
8 days ago | 10 days ago | |
Scala | Rust | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-fast-tests
-
Lakehouse architecture in Azure Synapse without Databricks?
I was a Databricks user for 5 years and spent 95% of my time developing Spark code in IDEs. See the spark-daria and spark-fast-tests projects as Scala examples. I developed internal libraries with all the business logic. The Databricks notebooks would consist of a few lines of code that would invoke a function in the proprietary Spark codebase. The proprietary Spark codebase would depend on the OSS libraries I developed in parallel.
-
Well designed scala/spark project
https://github.com/MrPowers/spark-fast-tests https://github.com/97arushisharma/Scala_Practice/tree/master/BigData_Analysis_with_Scala_and_Spark/wikipedia
-
Unit & integration testing in Databricks
If the majority of your stuff is not UDF-based there is an OS solution to run assertion tests against full data frames called spark-fast-tests. The idea here is similar in that you have a it notebook that calls your actual notebook against a staged input reads the output and compares it to a prefabed expected output. This does take a bit of setup and trial and error but it’s the closest I’ve been able to get to proper automated regression testing in databricks
-
Show dataengineering: beavis, a library for unit testing Pandas/Dask code
I am the author of spark-fast-tests and chispa, libraries for unit testing Scala Spark / PySpark code.
-
Ask HN: What are some tools / libraries you built yourself?
I built daria (https://github.com/MrPowers/spark-daria) to make it easier to write Spark and spark-fast-tests (https://github.com/MrPowers/spark-fast-tests) to provide a good testing workflow.
quinn (https://github.com/MrPowers/quinn) and chispa (https://github.com/MrPowers/chispa) are the PySpark equivalents.
Built bebe (https://github.com/MrPowers/bebe) to expose the Spark Catalyst expressions that aren't exposed to the Scala / Python APIs.
Also build spark-sbt.g8 to create a Spark project with a single command: https://github.com/MrPowers/spark-sbt.g8
-
Open source contributions for a Data Engineer?
I've built popular PySpark (quinn, chispa) and Scala Spark (spark-daria, spark-fast-tests) libraries.
fselect
-
A list of new(ish) command line tools – Julia Evans
Shameless plug: a tool I wrote to manage downloads directory :)
https://github.com/jhspetersson/fselect
- Fselect – a CLI tool to find files with “not quite SQL” query language
-
What's your favorite ls and/or cd replacements, alternatives or helpers?
Mine alternatives/helpers bringing a new extra functionality are the following: - https://github.com/facebook/pathpicker/ - Facebook PathPicker is a simple command line tool that solves the perpetual problem of selecting files out of bash output. - https://github.com/jhspetersson/fselect - Find files with SQL-like queries - https://github.com/junegunn/fzf - fzf is a general-purpose command-line fuzzy finder.
-
Awesome Rewrite It In Rust - A curated list of replacements for existing software written in Rust
I really like fselect, which I use more than fd
-
Ask HN: What are some tools / libraries you built yourself?
https://github.com/jhspetersson/fselect
A tiny tool I wrote to search within file piles (mostly unsorted downloads, torrents, and such). I could never remember `find` options, and more advanced queries are a pain. Now one can use some kind of SQL flavor to get the job done.
-
AWESOME WINDOWS TOOLS
fselect - Command-line tool to search files with SQL-like queries.
- fselect – find files with SQL-like queries
- Fselect: Find files with SQL-like queries
- fselect - Find files with SQL-like queries
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
cakephp-swagger-bake - Automatically generate OpenAPI, Swagger, and Redoc documentation from your existing CakePHP code.
chispa - PySpark test helper methods with beautiful error messages
ion - Mirror of https://gitlab.redox-os.org/redox-os/ion
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
logram - Utility that takes logs from anywhere and sends them to Telegram.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
fd - A simple, fast and user-friendly alternative to 'find'
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
dbmate - :rocket: A lightweight, framework-agnostic database migration tool.
spark-daria - Essential Spark extensions and helper methods ✨😲
awesome-rewrite-it-in-rust - A curated list of replacements for existing software written in Rust [Moved to: https://github.com/TaKO8Ki/awesome-alternatives-in-rust]