luigi
pipeline
luigi | pipeline | |
---|---|---|
14 | 51 | |
17,327 | 8,289 | |
0.5% | 0.3% | |
6.3 | 9.7 | |
9 days ago | 2 days ago | |
Python | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
luigi
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
-
In the context of Python what is a Bob Job?
Maybe if your use case is “smallish” and doesn’t require the whole studio suite you could check out apscheduler for doing python “tasks” on a schedule and luigi to build pipelines.
-
Lessons Learned from Running Apache Airflow at Scale
What are you trying to do? Distributed scheduler with a single instance? No database? Are you sure you don't just mean "a scheduler" ala Luigi? https://github.com/spotify/luigi
-
Apache Airflow. How to make the complex workflow as an easy job
It's good to know what Airflow is not the only one on the market. There are Dagster and Spotify Luigi and others. But they have different pros and cons, be sure that you did a good investigation on the market to choose the best suitable tool for your tasks.
-
DevOps Fundamentals for Deep Learning Engineers
MLOps is a HUGE area to explore, and not surprisingly, there are many startups showing up in this space. If you want to get it on the latest trends, then I would look at workflow orchestration frameworks such as Metaflow (started off at Netflix, is now spinning off into its own enterprise business, https://metaflow.org/), Kubeflow (used at Google, https://www.kubeflow.org/), Airflow (used at Airbnb, https://airflow.apache.org/), and Luigi (used at Spotify, https://github.com/spotify/luigi). Then you have the model serving itself, so there is Seldon (https://www.seldon.io/), Torchserve (https://pytorch.org/serve/), and TensorFlow Serving (https://www.tensorflow.org/tfx/guide/serving). You also have the actual export and transfer of DL models, and ONNX is the most popular here (https://onnx.ai/). Spark (https://spark.apache.org/) still holds up nicely after all these years, especially if you are doing batch predictions on massive amount of data. There is also the GitFlow way of doing things and Data Version Control (DVC, https://dvc.org/) is taken a pole position there.
-
Data pipelines with Luigi
At Wonderflow we're doing a lot of ML / NLP using Python and recently we are enjoying writing data pipelines using Spotify's Luigi.
- Noobie who is trying to use K8s needs confirmation to know if this is the way or he is overestimating Kubernetes.
-
Open Source ETL Project For Startups
💡【About Luigi】 https://github.com/spotify/luigi Luigi was built at Spotify since 2012, it's open source and mainly used for getting data insights by showing recommendations, toplists, A/B test analysis, external reports, internal dashboards, etc.
- Resources/tutorials to help me learn about ETL?
-
Using Terraform to make my many side-projects 'pick up and play'
So to sum that up, I went from having nothing for my side-project set up in AWS to having a Kubernetes cluster with the basic metrics and dashboard, a proper IAM-linked ServiceAccount support for a smooth IAM experience in K8s, and Luigi deployed so that I could then run a Luigi workflow using an ad-hoc run of a CronJob. That's quite remarkable to me. All that took hours to figure out and define when I first did it, over six months ago.
pipeline
-
14 DevOps and SRE Tools for 2024: Your Ultimate Guide to Stay Ahead
Tekton
- GitHub Actions could be so much better
-
Distributed Traces for Testing with Tekton Pipelines and Tracetest
Tekton is an open-source framework for creating efficient CI/CD systems. This empowers developers to seamlessly construct, test, and deploy applications across various cloud environments and on-premise setups.
-
Practical Tips for Refactoring Release CI using GitHub Actions
Despite other alternatives like Circle CI, Travis CI, GitLab CI or even self-hosted options using open-source projects like Tekton or Argo Workflow, the reason for choosing GitHub Actions was straightforward: GitHub Actions, in conjunction with the GitHub ecosystem, offers a user-friendly experience and access to a rich software marketplace.
-
Wolfi: A community Linux OS designed for the container and cloud-native era
[2]: https://github.com/tektoncd/pipeline/issues/5507#issuecommen...
- Nu stiu ce sa fac, orice sfat e bine venit
-
What are some good self-hosted CI/CD tools where pipeline steps run in docker containers?
Drone, or Tekton, Argo Workflows if you’re on k8s
-
Is Jenkins still the king?
If you want a step up, I would recommend trying out Tekton Pipelines. It’s a very popular ci tool, and it runs on Kubernetes. Yes, this would involve setting up a Kubernetes cluster but please don’t run for the hills! You can setup a Kubernetes cluster and install Tekton on top of it with minimal setup using minikube (see here. This would be a great joint exercise as it will give you a bit of Kubernetes understanding alongside it, and the mechanisms of Tekton are a little trickier than GitHub actions imo. It’s all much the same though.
- Is there a way to run a one-off pod that would work as a command line tool?
-
K8s powered Git push deployments
I've recently found this quote by Kelsey Hightower:
"I'm convinced the majority of people managing infrastructure just want a PaaS. The only requirement: it has to be built by them."
Source: https://twitter.com/kelseyhightower/status/85193508753294540...
In the last few weeks, I've experimented a bit with Flux (https://fluxcd.io/), Tekton (https://tekton.dev/) and Cloud Native Buildpacks (https://buildpacks.io/) on how to provide K8s powered git push deployments without using a dedicated CI/CD server.
My project is still in early alpha stage and just a proof of concept :-) My vision is to expand it into an Open Source PaaS in the future.
Do you think the above quote is true? What does an open source PaaS need to be like in order to be accepted by software developers?
Some other projects have been discontinued in the past (like Flynn or Deis) or were created before the Kubernetes era.
Is it the right direction to provide a Heroku like solution based on K8s or is it better to provide an Open Source Infrastructure as Code library with building blocks to avoid everything from scratch?
What are some alternatives?
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
dagger - Application Delivery as Code that Runs Anywhere
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
argo-cd - Declarative Continuous Deployment for Kubernetes
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
kubevela - The Modern Application Platform.
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
tekton-argocd-poc - This a PoC using Tekton (for CI) and ArgoCD (CD). It uses a local k8s cluster (K3D)
Dask - Parallel computing with task scheduling
NUKE - 🏗 The AKEless Build System for C#/.NET
Pinball
skaffold - Easy and Repeatable Kubernetes Development