Top 5 parquet-file Open-Source Projects
-
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
-
Cinchoo ETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
parquet-floor
A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Welcome to mwmbl, the free, open-source and non-profit search engine | news.ycombinator.com | 2023-09-18ChatGPT has other failure modes. When a question doesn't have an answer written down somewhere, it really struggles. A case is something like "how do I write a parquet file in Java without using Hadoop".
This not at all trivial but quite possible[1], but ChatGPT will in 100% of the time either hallucinate APIs, disregard the instructions to not use Hadoop or give otherwise plausible but incorrect-looking answers.
The trick is that it isn't doable by simply finding the correct dependencies and API calls, you need extract and override filesystem classes from the Hadoop project to cut those ties.
[1] https://github.com/strategicblue/parquet-floor
Index
What are some of the best open-source parquet-file projects? This list will help you:
Project | Stars | |
---|---|---|
1 | petastorm | 1,752 |
2 | Cinchoo ETL | 736 |
3 | parquet4s | 271 |
4 | parquet-floor | 36 |
5 | Threat-Detection-and-Visualization | 35 |
Sponsored