I have question related to Parquet files and AWS Glue

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

parquet-format

4 1,643 7.2 Thrift

Apache Parquet

As i read here https://github.com/apache/parquet-format/blob/master/LogicalTypes.md , they are store in Integer formats and these integers represent the number of days (for Date) or number of milliseconds, microseconds or nanoseconds (for DateTime) since 1970-01-01. This works as expected with the parquet file that written by our ETL tool from internal database --> S3, all Data/DateTime columns are Integers, means that in Glue Job, i have to convert these Integers back to Date/Datetime value to do some transformation on them. But when parquet files are written by Spark, they are Date/DateTime (or TimeStamp to be more concise) format not Integers (i checked by read these files again in other Glue Job) and that make me confused.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

parquet-tools

3 projects | /r/golang | 23 Jan 2022
Top 10 Common Data Engineers and Scientists Pain Points in 2024

1 project | dev.to | 11 Apr 2024
Choosing Between a Streaming Database and a Stream Processing Framework in Python

10 projects | dev.to | 10 Feb 2024
Ask HN: Does (or why does) anyone use MapReduce anymore?

2 projects | news.ycombinator.com | 24 Jan 2024
Go concurrency simplified. Part 4: Post office as a data pipeline

5 projects | dev.to | 21 Dec 2023

I have question related to Parquet files and AWS Glue

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering
Parquet Java Big Data
Post date: 18 Jun 2023

parquet-format

InfluxDB

Related posts

parquet-tools

Top 10 Common Data Engineers and Scientists Pain Points in 2024

Choosing Between a Streaming Database and a Stream Processing Framework in Python

Ask HN: Does (or why does) anyone use MapReduce anymore?

Go concurrency simplified. Part 4: Post office as a data pipeline

I have question related to Parquet files and AWS Glue

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering Parquet Java Big Data Post date: 18 Jun 2023

parquet-format

InfluxDB

Related posts

parquet-tools

Top 10 Common Data Engineers and Scientists Pain Points in 2024

Choosing Between a Streaming Database and a Stream Processing Framework in Python

Ask HN: Does (or why does) anyone use MapReduce anymore?

Go concurrency simplified. Part 4: Post office as a data pipeline

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering
Parquet Java Big Data
Post date: 18 Jun 2023