orange
Airflow
Our great sponsors
orange | Airflow | |
---|---|---|
26 | 169 | |
4,594 | 34,317 | |
1.5% | 1.8% | |
9.6 | 10.0 | |
5 days ago | 6 days ago | |
Python | Python | |
Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
orange
- Ask HN: What Underrated Open Source Project Deserves More Recognition?
-
What exactly is AutoGPT?
Both tools are ripoffs of a data mining framework named Orange 3
- Has anybody used Orange?
-
Book or web book recommendation request: a data visualization cookbook using Python for scientists.
Have you tried Orange? https://orangedatamining.com/ This is not a direct answer to your question but Orange has Python based stuff for data mining and visualization. It is very intuitive as for being a graphical interface.
-
What are strictly data analysis jobs?
Or that you enter into counseling, accreditation: there already are processes somewhat working, and your expertise in (statistical) design of experiments (example entry on CRAN, a blog post) recommends a set of experiments. Your clients perform then the experiments in the lab, and you analyze the data collected. Eventually, the yield of product X is increased, with lower consumption of energy in a shorter time. You can complement R, or Python for this (there is an 101 on learnxinyminutes, too), of course with GUI programs you know and like (e.g., JMP, minitab; orange etc). There are some closer related to chemistry (e.g., DataWarrior.
-
Show HN: Open-Source No-Code Platform for Machine Learning and Data Science
Honestly, I think ML should always involve at least a little bit of coding, which would be more practical. That said, this looks reasonable, good playground for experiment.
A good similar product is Orange: https://orangedatamining.com/
-
Resources for data visualization (free & paid) for scientific publications
Actually....I thought of an interesting free option. Check out orange3. https://orangedatamining.com/
-
Excel Alternatives?
I love to play with my dataset using OrangeDatamining. Very easy to use. Docs and example available. It’s like Data Modeler from IBM but better bcos it is open project :-)
-
[D] Why Hasn't FOSS Drag-and-Drop ML tools taken off yet?
Currently, I am looking around for modules for Knime and Orange and looked at some of the modules, and realized that it does not have enough tools within their tool kit (e.g. text data analysis, network analysis, image classification).
Airflow
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
Airflow is the most widely used and well-known tool for orchestrating data workflows. It allows for efficient pipeline construction, scheduling, and monitoring.
-
Ask HN: What is the correct way to deal with pipelines?
I agree there are many options in this space. Two others to consider:
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
- CĂłmo construir tu propia data platform. From zero to hero.
-
Is it impossible to contribute to open source as a data engineer?
You can try and contribute some new connectors/operators for workflow managers like Airflow or Airbyte
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
Apache Airflow:
-
Python task scheduler with a web UI
Looks interesting as a light-weight alternative to https://www.prefect.io/ (which itself is a lighter-weight / more modern alternative to https://airflow.apache.org/ ).
-
Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift
You can actually setup and delete new Redshift clusters using Apache Airflow. We can see in the example_dags of a DAG that does a complete setup and delete of a Redshift cluster. There are a few things to think about however.
-
.NET Modern Task Scheduler
A few years ago, I opened a GitHub issue with Microsoft telling them that I think the .NET ecosystem needs its own equivalent of Apache Airflow or Prefect. Fast forward 'til now, and I still don't think we have anything close to these frameworks.
-
How do you decide when to keep a project in a single python file vs break it up into multiple files?
Check out taskinstance.py in the Airflow project, it's a well targeted file, it has only one main class TaskInstance and a few small supporting classes and functions. It is ~3000 lines long: https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py
What are some alternatives?
Kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
dagster - An orchestration platform for the development, production, and observation of data assets.
glue - Linked Data Visualizations Across Multiple Files
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Dask - Parallel computing with task scheduling
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing