parquet-format VS quack-reduce

Compare parquet-format vs quack-reduce and see what are their differences.

quack-reduce

A playground for running duckdb as a stateless query engine over a data lake (by BauplanLabs)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
parquet-format quack-reduce
4 2
1,655 129
1.8% 14.0%
7.2 4.8
4 days ago 4 months ago
Thrift Python
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

parquet-format

Posts with mentions or reviews of parquet-format. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-16.
  • Summing columns in remote Parquet files using DuckDB
    4 projects | news.ycombinator.com | 16 Nov 2023
    Right, there's all sorts of metadata and often stats included in any parquet file: https://github.com/apache/parquet-format#file-format

    The offsets of said metadata are well-defined (i.e. in the footer) so for S3 / blob storage so long as you can efficiently request a range of bytes you can pull the metadata without having to read all the data.

  • FLaNK Stack for 4th of July
    15 projects | dev.to | 3 Jul 2023
  • I have question related to Parquet files and AWS Glue
    1 project | /r/dataengineering | 18 Jun 2023
    As i read here https://github.com/apache/parquet-format/blob/master/LogicalTypes.md , they are store in Integer formats and these integers represent the number of days (for Date) or number of milliseconds, microseconds or nanoseconds (for DateTime) since 1970-01-01. This works as expected with the parquet file that written by our ETL tool from internal database --> S3, all Data/DateTime columns are Integers, means that in Glue Job, i have to convert these Integers back to Date/Datetime value to do some transformation on them. But when parquet files are written by Spark, they are Date/DateTime (or TimeStamp to be more concise) format not Integers (i checked by read these files again in other Glue Job) and that make me confused.
  • Parquet: More than just “Turbo CSV”
    7 projects | news.ycombinator.com | 3 Apr 2023
    Date is confusing with a timezone (UTC or otherwise) and the doco makes no such suggestion.

    The Parquet datatypes documentation is pretty clear that there is a flag isAdjustedToUTC to define if the timestamp should be interpreted as having Instant semantics or Local semantics.

    https://github.com/apache/parquet-format/blob/master/Logical...

    Still no option to include a TZ offset in the data (so the same datum can be interpreted with both Local and Instant semantics) but not bad really.

quack-reduce

Posts with mentions or reviews of quack-reduce. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-16.
  • quack-reduce: duckdb as a stateless query engine over a data lake
    1 project | news.ycombinator.com | 27 Jan 2024
  • Summing columns in remote Parquet files using DuckDB
    4 projects | news.ycombinator.com | 16 Nov 2023
    We can run a DuckDb instance (EC2/S3) closer to the data so that sorta helps too.

    What I'm really excited about using DuckDB in a similar way to map-reduce. What if there was a way to take some SQL's logical plan and turn it into a physical plan that uses compute resources from a pool serverless DuckDB instances. Starting at the leafs of the graph (physical plan) pulling data from the source (parquet), and returning their completed work up the branches, until it is completed and ready to be used as the results.

    I've seen a few examples of this already, but nothing that I would consider production ready. I have a hunch that someone is going to drop such a project on us shortly, and it's going to change a lot of things we have become use to in the data world.

    https://github.com/BauplanLabs/quack-reduce

What are some alternatives?

When comparing parquet-format and quack-reduce you can also consider the following projects:

rapidgzip - Gzip Decompression and Random Access for Modern Multi-Core Machines

sqlglot - Python SQL Parser and Transpiler

xgen - Salesforce open-source LLMs with 8k sequence length.

ibis - the portable Python dataframe library

wizmap - Explore and interpret large embeddings in your browser with interactive visualization! 📍

vdsql - VisiData interface for databases

FastSAM - Fast Segment Anything

icedb - An in-process Parquet merge engine for better data warehousing in S3

background-removal-js - Remove backgrounds from images directly in the browser environment with ease and no additional costs or privacy concerns. Explore an interactive demo.

graphic-walker - An open source alternative to Tableau. Embeddable visual analytic

mdBook - Create book from markdown files. Like Gitbook but implemented in Rust

papyrus - A simple paper backup tool for GnuPG or SSH keys