Apache Spark - A unified analytics engine for large-scale data processing
I would guess that Apache Spark would be an okay choice. With data stored locally in avro or parquet files. Just processing the data in python would also work, IMO.
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.
Introduce Cache Hints to Spark SQL
1 project | reddit.com/r/apachespark | 8 Aug 2022
Late Night Random Discussion Thread - 08 August, 2022
1 project | reddit.com/r/indiasocial | 8 Aug 2022
is anyone want to join maintaining spark java framework?
2 projects | reddit.com/r/java | 21 Jun 2022
What do I need to know about distributed algorithms and systems?
1 project | reddit.com/r/AskProgramming | 22 May 2022
AWS Glue: what is it and how does it work?
1 project | dev.to | 5 May 2022