spark-fast-tests
gutenberg
Our great sponsors
spark-fast-tests | gutenberg | |
---|---|---|
6 | 106 | |
418 | 12,645 | |
- | 1.7% | |
0.0 | 8.4 | |
3 months ago | 6 days ago | |
Scala | Rust | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-fast-tests
-
Lakehouse architecture in Azure Synapse without Databricks?
I was a Databricks user for 5 years and spent 95% of my time developing Spark code in IDEs. See the spark-daria and spark-fast-tests projects as Scala examples. I developed internal libraries with all the business logic. The Databricks notebooks would consist of a few lines of code that would invoke a function in the proprietary Spark codebase. The proprietary Spark codebase would depend on the OSS libraries I developed in parallel.
-
Well designed scala/spark project
https://github.com/MrPowers/spark-fast-tests https://github.com/97arushisharma/Scala_Practice/tree/master/BigData_Analysis_with_Scala_and_Spark/wikipedia
-
Unit & integration testing in Databricks
If the majority of your stuff is not UDF-based there is an OS solution to run assertion tests against full data frames called spark-fast-tests. The idea here is similar in that you have a it notebook that calls your actual notebook against a staged input reads the output and compares it to a prefabed expected output. This does take a bit of setup and trial and error but it’s the closest I’ve been able to get to proper automated regression testing in databricks
-
Show dataengineering: beavis, a library for unit testing Pandas/Dask code
I am the author of spark-fast-tests and chispa, libraries for unit testing Scala Spark / PySpark code.
-
Ask HN: What are some tools / libraries you built yourself?
I built daria (https://github.com/MrPowers/spark-daria) to make it easier to write Spark and spark-fast-tests (https://github.com/MrPowers/spark-fast-tests) to provide a good testing workflow.
quinn (https://github.com/MrPowers/quinn) and chispa (https://github.com/MrPowers/chispa) are the PySpark equivalents.
Built bebe (https://github.com/MrPowers/bebe) to expose the Spark Catalyst expressions that aren't exposed to the Scala / Python APIs.
Also build spark-sbt.g8 to create a Spark project with a single command: https://github.com/MrPowers/spark-sbt.g8
-
Open source contributions for a Data Engineer?
I've built popular PySpark (quinn, chispa) and Scala Spark (spark-daria, spark-fast-tests) libraries.
gutenberg
-
Replatforming from Gatsby to Zola!
So after shopping around a bit I found a simple, dependency-less static site generator called Zola. The lack of dependencies sounded very attractive after all the headaches trying to update my Gatsby modules. I wanted to give Zola a try and see what tradeoffs I would need to make coming form a React-based framework to this Rust-based generator.
-
Ask HN: What's the simplest static website generator?
I think you're thinking about Zola: https://github.com/getzola/zola
But yes, if I were to recommend something, it'd be Zola given that there's just one executable that you need to run and there's absolutely no setup required.
-
Ask HN: Looking for lightweight personal blogging platform
If I were to start again from scratch, I'd likely use Zola as SSG (https://www.getzola.org/)
- Zola – Single binary static site generator
- Zola
-
Ask HN: So, static website generators and hosting in 2023/24. What's out there?
I've used Zola (https://github.com/getzola/zola) for a static project homepage a few years ago to showcase examples with a simple description and a wasm app embedded in the page, it worked perfectly for me and the docs was clear on how to use it. It was very easy to set up along with a GitHub action to automatically update the wasm binaries when needed. It is definitely a tool I keep in my mental toolbox as a good default.
- Zola: Your one-stop static site engine
-
Gojekyll – 20x faster Go port of jekyll
I'm currently learning https://www.getzola.org/.
It's more manual than idy like but it's gonna be for a small personal and work website so I don't mind much.
It's super fast.
Doesn't seem to fit your use casr but still.
-
The right way to build a dynamic personal website for a physics student?
(Note: that list is overwhelming; you don't need to go through it. Order by popularity and look at the top 3-5 at most. Hugo, Jekyll, Gatsby... Personally I'm using Zola [ https://www.getzola.org/ ] for a couple of sites, but that's just me.)
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
Hugo - The world’s fastest framework for building websites.
chispa - PySpark test helper methods with beautiful error messages
eleventy 🕚⚡️ - A simpler site generator. Transforms a directory of templates (of varying types) into HTML.
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
Nikola - A static website and blog generator
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Rocket - A web framework for Rust.
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
Sapper - A lightweight web framework built on hyper, implemented in Rust language.
spark-daria - Essential Spark extensions and helper methods ✨😲
hakyll - A static website compiler library in Haskell