chispa
Task
Our great sponsors
chispa | Task | |
---|---|---|
12 | 113 | |
508 | 10,017 | |
- | 4.9% | |
6.7 | 9.6 | |
6 days ago | 1 day ago | |
Python | MDX | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
chispa
-
Testing spark applications
Unit and e2e tests using a combination of pytest and chispa (https://github.com/MrPowers/chispa). Custom library to create random test data that fits schema with optional hardcoded overrides for relevant fields to test business logic.
-
Spark open source community is awesome
here's a little README fix a user pushed to chispa
-
Invitation to collaborate on open source PySpark projects
chispa is a library of PySpark testing functions.
-
installing pyspark on my m1 mac, getting an env error
The other approach I've used is Poetry, see the chispa project as an example. Poetry is especially nice for projects that you'd like to publish to PyPi because those commands are built-in.
-
Spark: local dev environment
- All Spark transformations are tested with pytest + chispa (https://github.com/MrPowers/chispa)
-
Pyspark now provides a native Pandas API
Pandas syntax is far inferior to regular PySpark in my opinion. Goes to show how much data analysts value a syntax that they're already familiar with. Pandas syntax makes it harder to reason about queries, abstract DataFrame transformations, etc. I've authored some popular PySpark libraries like quinn and chispa and am not excited to add Pandas syntax support, haha.
-
Show dataengineering: beavis, a library for unit testing Pandas/Dask code
I am the author of spark-fast-tests and chispa, libraries for unit testing Scala Spark / PySpark code.
-
Tips for building popular open source data engineering projects
Blogging has been the main way I've been able to attract users. Someone searches "testing PySpark", they see this blog, and then they're motivated to try chispa.
-
Ask HN: What are some tools / libraries you built yourself?
I built daria (https://github.com/MrPowers/spark-daria) to make it easier to write Spark and spark-fast-tests (https://github.com/MrPowers/spark-fast-tests) to provide a good testing workflow.
quinn (https://github.com/MrPowers/quinn) and chispa (https://github.com/MrPowers/chispa) are the PySpark equivalents.
Built bebe (https://github.com/MrPowers/bebe) to expose the Spark Catalyst expressions that aren't exposed to the Scala / Python APIs.
Also build spark-sbt.g8 to create a Spark project with a single command: https://github.com/MrPowers/spark-sbt.g8
-
Open source contributions for a Data Engineer?
I've built popular PySpark (quinn, chispa) and Scala Spark (spark-daria, spark-fast-tests) libraries.
Task
-
Show HN: Workflow Orchestrator in Golang
So many tools in this space! This one looks a little bit like go-task, but it seems maybe better for production workflows because if timeout support, while go-task seems more aimed to command line work/makefile replacement.
β-
https://github.com/go-task/task
-
Essential Command Line Tools for Developers
View on GitHub
- Task: A task runner / alternative to GNU Make
-
Using Make β writing less Makefile
A similar tool is `task` https://taskfile.dev/ . It is quite capable and also a single executable. I've grown to quite like it.
-
Whatβs with DevOps engineers using `make` of all things?
check out tasks - a bit of a learning curve but arguably more powerful imo
-
Go Development with Hot Reload Using Taskfile
That's when I came across taskfile.dev. Task is an automation tool designed to be more accessible than other options, such as GNU Make.
-
Poetry (Packaging) in motion
Full disclosure, I did not review Conda or Hatch fully. Not that there is anything explicitly wrong with either of them. Conda is too specific to the scientific community for my general taste. Hatch seems to go well with Conda and also uses the PyProject manifest as well. It's nice that it gives you several built in tools, similar to commit hooks, but I tend to like to roll my own via a Taskfile and run them with Poetry.
-
Building RESTful API with Hexagonal Architecture in Go
Taskfile is a tool for streamlining repetitive development tasks. It helps automate activities like building, testing, and deploying applications. Unlike Makefile, Taskfile uses YAML for configuration, making it more readable and user-friendly.
-
We built the fastest CI in the world. It failed
9. We test everything with another promotion which runs make targets which build docker containers to run python scripts (pytest)
This is also built by a complicated web of wildcarded makefile targets, which need to be interoperable and support a few if/else cases for specific components.
My plan is to migrate all of this to something simpler and more straightforward, or at least more maintainable, which is honestly probably going to turn into taskfile[0] instead of makefiles, and then simple python scripts for the glue that ties everything together or does more complex logic.
My hope is that it can be more straightforward and easier to maintain, with more component-ized logic, but realistically every step in that labyrinthine build process (and that's just the open-source version!) came from a decision made by a very talented team of engineers who know far more about the process and the product than I do. At this point I'm wondering if it would make 'more sense' to replace it with a giant python script of some kind and get access to all the logic we need all at once (it would not).
[0] https://taskfile.dev/
-
Exploring GCP With Terraform: Setting Up The Environment And Project
task - a task runner and a replacement for make
What are some alternatives?
spark-fast-tests - Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
just - π€ Just a command runner
spark-daria - Essential Spark extensions and helper methods β¨π²
doit - task management & automation tool
quinn - pyspark methods to enhance developer productivity π£ π― π
goreleaser - Deliver Go binaries as fast and easily as possible
lowdefy - The config web stack for business apps - build internal tools, client portals, web apps, admin panels, dashboards, web sites, and CRUD apps with YAML or JSON.
boilr - :zap: boilerplate template manager that generates files or directories from template repositories
null - Nullable Go types that can be marshalled/unmarshalled to/from JSON.
JobRunner - Framework for performing work asynchronously, outside of the request flow
dagster - An orchestration platform for the development, production, and observation of data assets.
taskctl - Concurrent task runner, developer's routine tasks automation toolkit. Simple modern alternative to GNU Make π§°