spark-fast-tests
yadm
spark-fast-tests | yadm | |
---|---|---|
6 | 81 | |
418 | 4,792 | |
- | - | |
0.0 | 2.4 | |
8 days ago | 3 months ago | |
Scala | Python | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-fast-tests
-
Lakehouse architecture in Azure Synapse without Databricks?
I was a Databricks user for 5 years and spent 95% of my time developing Spark code in IDEs. See the spark-daria and spark-fast-tests projects as Scala examples. I developed internal libraries with all the business logic. The Databricks notebooks would consist of a few lines of code that would invoke a function in the proprietary Spark codebase. The proprietary Spark codebase would depend on the OSS libraries I developed in parallel.
-
Well designed scala/spark project
https://github.com/MrPowers/spark-fast-tests https://github.com/97arushisharma/Scala_Practice/tree/master/BigData_Analysis_with_Scala_and_Spark/wikipedia
-
Unit & integration testing in Databricks
If the majority of your stuff is not UDF-based there is an OS solution to run assertion tests against full data frames called spark-fast-tests. The idea here is similar in that you have a it notebook that calls your actual notebook against a staged input reads the output and compares it to a prefabed expected output. This does take a bit of setup and trial and error but it’s the closest I’ve been able to get to proper automated regression testing in databricks
-
Show dataengineering: beavis, a library for unit testing Pandas/Dask code
I am the author of spark-fast-tests and chispa, libraries for unit testing Scala Spark / PySpark code.
-
Ask HN: What are some tools / libraries you built yourself?
I built daria (https://github.com/MrPowers/spark-daria) to make it easier to write Spark and spark-fast-tests (https://github.com/MrPowers/spark-fast-tests) to provide a good testing workflow.
quinn (https://github.com/MrPowers/quinn) and chispa (https://github.com/MrPowers/chispa) are the PySpark equivalents.
Built bebe (https://github.com/MrPowers/bebe) to expose the Spark Catalyst expressions that aren't exposed to the Scala / Python APIs.
Also build spark-sbt.g8 to create a Spark project with a single command: https://github.com/MrPowers/spark-sbt.g8
-
Open source contributions for a Data Engineer?
I've built popular PySpark (quinn, chispa) and Scala Spark (spark-daria, spark-fast-tests) libraries.
yadm
- Yadm: Yet Another Dotfiles Manager
- YADM: Yet Another Dotfiles Manager
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
Everyone hand-rolls their own dotfile management system, but YADM already does everything you need:
https://yadm.io/
- Yet Another Dotfiles Manager
- Tell HN: My Favorite Tools
-
Dotfiles Matter
I've been working around this using tools built on top of git like [yadm](https://github.com/TheLocehiliosan/yadm) and relying on `ls-files` to list all my tracked dotfiles and their paths.
Still having everything in one place would make things much simpler. Great idea!
-
System settings that aren’t in System Settings
I wonder if the program i use to manage my dotfiles could help manage your scripts and extend your setup to all your desktops? Its called yadm (https://yadm.io/) it makes it so easy to have a laptop and a desktop or two.
-
The right way to keep config files synced across devices?
I really like that one but still prefer yadm because you can just edit your files as usual and then yadm add them wherever you are.
-
Just got a new M2 Pro after my 2016 became outdated. What are your first steps to setting up a new computer?
If you haven’t already, this is the time to install a tool like yadm and get your computer configuration into version control. Your command-line tools can be managed by yadm directly, your system settings can mostly be managed with a yadm bootstrap script that runs things like defaults write, and the software you install can be managed with a Brewfile that the yadm bootstrap script uses to install software with Homebrew. Don’t manually download Xcode, use xcodes to do it.
-
System 76 Linux script to set up a new PC including the personal profile and prefered software installs
I personally use YADM. It's basically a git repo on my home folder, that only tracks what I explicitly set. And you can setup bootstraps to do what you said, install a bunch of stuff or make custom changes. In it's essence, it's a set of bash/sh files that are executed sequentially when you launch the yadm bootstrap command.
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
GNU Stow - GNU Stow - mirror of savannah git repository occasionally with more bleeding-edge branches
chispa - PySpark test helper methods with beautiful error messages
chezmoi - Manage your dotfiles across multiple diverse machines, securely.
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
Home Manager using Nix - Manage a user environment using Nix [maintainer=@rycee]
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
dotbot - A tool that bootstraps your dotfiles ⚡️
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
homesick - Your home directory is your castle. Don't leave your dotfiles behind.
spark-daria - Essential Spark extensions and helper methods ✨😲
Ansible - Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.