Python Data processing

Open-source Python projects categorized as Data processing

Top 13 Python Data processing Projects

  • pandera

    A light-weight, flexible, and expressive statistical data testing library

  • DialoGPT

    Large-scale pretraining for dialogue

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • bytewax

    Python Stream Processing

  • Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18
  • GODEL

    Large-scale pretrained models for goal-directed dialog

  • Project mention: Microsoft: Large-scale pretrained models for goal-directed dialog | news.ycombinator.com | 2023-06-05
  • fondant

    Production-ready data processing made easy and shareable

  • Project mention: 25 million Creative Commons image dataset released! | /r/StableDiffusion | 2023-10-01

    Github: https://github.com/ml6team/fondant

  • lithops

    A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

  • forte

    Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • convtools-ita

    convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.

  • prosto

    Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

  • VQASynth

    Compose multimodal datasets 🎹

  • Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
  • SmartPipeline

    A framework for rapid development of robust data pipelines following a simple design pattern

  • Project mention: Show HN: SmartPipeline, robust and light data pipelines in Python | news.ycombinator.com | 2023-05-03
  • pipe21

    Simple functional pipes

  • Project mention: Pipeline-Oriented Programming [video] | news.ycombinator.com | 2024-01-20
  • mongorefine

    Experimental headless data wrangling / refining tool over MongoDB, inspired by OpenRefine

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data processing related posts

Index

What are some of the best open-source Data processing projects in Python? This list will help you:

Project Stars
1 pandera 2,994
2 DialoGPT 2,315
3 bytewax 1,144
4 GODEL 834
5 fondant 319
6 lithops 305
7 forte 236
8 convtools-ita 183
9 prosto 89
10 VQASynth 71
11 SmartPipeline 22
12 pipe21 13
13 mongorefine 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com