Python Parquet

Open-source Python projects categorized as Parquet | Edit details

Top 4 Python Parquet Projects

  • GitHub repo petastorm

    Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

    Project mention: parquet files | reddit.com/r/dataengineering | 2021-09-22
  • GitHub repo pystore

    Fast data store for Pandas time-series data

    Project mention: Roapi: An API Server for Static Datasets | news.ycombinator.com | 2021-10-08
  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • GitHub repo amazon-s3-find-and-forget

    Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

    Project mention: How to handle GDPR requests for data stored in S3 ? | reddit.com/r/dataengineering | 2021-11-22

    S3 Find and Forget is probably worth looking into, even if just to get ideas on how to implement a similar solution for yourself

  • GitHub repo dbd

    dbd is a data loading and transformation tool that enables data analysts and engineers to load and transform data in SQL databases

    Project mention: dbd: create your database from data files on your directory | reddit.com/r/SQL | 2022-01-15

    I work on the new open-sourced tool called dbd that enables you to load data from your local data files to your database and transform it using insert-from-select statements. The tool supports templating (Jinja2). It works with Postgres, MySQL, SQLite, Snowflake, Redshift, and BigQuery.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-15.

Index

What are some of the best open-source Parquet projects in Python? This list will help you:

Project Stars
1 petastorm 1,338
2 pystore 395
3 amazon-s3-find-and-forget 152
4 dbd 15
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.
www.sonarlint.org