SaaSHub helps you find the best software and product alternatives Learn more →
Top 13 Python Data processing Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
-
forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
convtools-ita
convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.
-
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
-
SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
-
mongorefine
Experimental headless data wrangling / refining tool over MongoDB, inspired by OpenRefine
Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18
Project mention: Microsoft: Large-scale pretrained models for goal-directed dialog | news.ycombinator.com | 2023-06-05
Project mention: 25 million Creative Commons image dataset released! | /r/StableDiffusion | 2023-10-01Github: https://github.com/ml6team/fondant
Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
Project mention: Show HN: SmartPipeline, robust and light data pipelines in Python | news.ycombinator.com | 2023-05-03
Python Data processing related posts
- 25 million Creative Commons image dataset released!
- [P] AI image generation without copyright infringement
- Functional Python Programming
- Microsoft: Large-scale pretrained models for goal-directed dialog
- Fondant: Easily build and share datasets for foundation model fine-tuning
- Fine-tuning on Sales data?
- Lithops: A multi-cloud framework for embarrassingly parallel jobs
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 Apr 2024
Index
What are some of the best open-source Data processing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pandera | 2,994 |
2 | DialoGPT | 2,315 |
3 | bytewax | 1,144 |
4 | GODEL | 834 |
5 | fondant | 319 |
6 | lithops | 305 |
7 | forte | 236 |
8 | convtools-ita | 183 |
9 | prosto | 89 |
10 | VQASynth | 71 |
11 | SmartPipeline | 22 |
12 | pipe21 | 13 |
13 | mongorefine | 2 |
Sponsored