parquet-format

Apache Parquet (by apache)

Parquet-format Alternatives

Similar projects and alternatives to parquet-format

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better parquet-format alternative or higher similarity.

parquet-format reviews and mentions

Posts with mentions or reviews of parquet-format. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-16.
  • Summing columns in remote Parquet files using DuckDB
    4 projects | news.ycombinator.com | 16 Nov 2023
    Right, there's all sorts of metadata and often stats included in any parquet file: https://github.com/apache/parquet-format#file-format

    The offsets of said metadata are well-defined (i.e. in the footer) so for S3 / blob storage so long as you can efficiently request a range of bytes you can pull the metadata without having to read all the data.

  • FLaNK Stack for 4th of July
    15 projects | dev.to | 3 Jul 2023
  • I have question related to Parquet files and AWS Glue
    1 project | /r/dataengineering | 18 Jun 2023
    As i read here https://github.com/apache/parquet-format/blob/master/LogicalTypes.md , they are store in Integer formats and these integers represent the number of days (for Date) or number of milliseconds, microseconds or nanoseconds (for DateTime) since 1970-01-01. This works as expected with the parquet file that written by our ETL tool from internal database --> S3, all Data/DateTime columns are Integers, means that in Glue Job, i have to convert these Integers back to Date/Datetime value to do some transformation on them. But when parquet files are written by Spark, they are Date/DateTime (or TimeStamp to be more concise) format not Integers (i checked by read these files again in other Glue Job) and that make me confused.
  • Parquet: More than just “Turbo CSV”
    7 projects | news.ycombinator.com | 3 Apr 2023
    Date is confusing with a timezone (UTC or otherwise) and the doco makes no such suggestion.

    The Parquet datatypes documentation is pretty clear that there is a flag isAdjustedToUTC to define if the timestamp should be interpreted as having Instant semantics or Local semantics.

    https://github.com/apache/parquet-format/blob/master/Logical...

    Still no option to include a TZ offset in the data (so the same datum can be interpreted with both Local and Instant semantics) but not bad really.

  • A note from our sponsor - SaaSHub
    www.saashub.com | 28 Apr 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Stats

Basic parquet-format repo stats
4
1,637
7.4
6 days ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com