SaaSHub helps you find the best software and product alternatives Learn more →
Parquet-format Alternatives
Similar projects and alternatives to parquet-format
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
FLiPStackWeekly
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
configu
a simple, modern, and secure standard for managing and collaborating software configurations ⚙️✨.
-
fast_float
Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12 and WebKit/Safari
-
background-removal-js
Remove backgrounds from images directly in the browser environment with ease and no additional costs or privacy concerns. Explore an interactive demo.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
parquet-format reviews and mentions
-
Summing columns in remote Parquet files using DuckDB
Right, there's all sorts of metadata and often stats included in any parquet file: https://github.com/apache/parquet-format#file-format
The offsets of said metadata are well-defined (i.e. in the footer) so for S3 / blob storage so long as you can efficiently request a range of bytes you can pull the metadata without having to read all the data.
- FLaNK Stack for 4th of July
-
I have question related to Parquet files and AWS Glue
As i read here https://github.com/apache/parquet-format/blob/master/LogicalTypes.md , they are store in Integer formats and these integers represent the number of days (for Date) or number of milliseconds, microseconds or nanoseconds (for DateTime) since 1970-01-01. This works as expected with the parquet file that written by our ETL tool from internal database --> S3, all Data/DateTime columns are Integers, means that in Glue Job, i have to convert these Integers back to Date/Datetime value to do some transformation on them. But when parquet files are written by Spark, they are Date/DateTime (or TimeStamp to be more concise) format not Integers (i checked by read these files again in other Glue Job) and that make me confused.
-
Parquet: More than just “Turbo CSV”
Date is confusing with a timezone (UTC or otherwise) and the doco makes no such suggestion.
The Parquet datatypes documentation is pretty clear that there is a flag isAdjustedToUTC to define if the timestamp should be interpreted as having Instant semantics or Local semantics.
https://github.com/apache/parquet-format/blob/master/Logical...
Still no option to include a TZ offset in the data (so the same datum can be interpreted with both Local and Instant semantics) but not bad really.
-
A note from our sponsor - SaaSHub
www.saashub.com | 28 Apr 2024
Stats
apache/parquet-format is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of parquet-format is Java.
Popular Comparisons
Sponsored