Top 4 Python Parquet Projects
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.Project mention: parquet files | reddit.com/r/dataengineering | 2021-09-22
Fast data store for Pandas time-series dataProject mention: Roapi: An API Server for Static Datasets | news.ycombinator.com | 2021-10-08
OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)Project mention: How to handle GDPR requests for data stored in S3 ? | reddit.com/r/dataengineering | 2021-11-22
S3 Find and Forget is probably worth looking into, even if just to get ideas on how to implement a similar solution for yourself
dbd is a data loading and transformation tool that enables data analysts and engineers to load and transform data in SQL databasesProject mention: dbd: create your database from data files on your directory | reddit.com/r/SQL | 2022-01-15
I work on the new open-sourced tool called dbd that enables you to load data from your local data files to your database and transform it using insert-from-select statements. The tool supports templating (Jinja2). It works with Postgres, MySQL, SQLite, Snowflake, Redshift, and BigQuery.
What are some of the best open-source Parquet projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.