Test On 4 Concurrent Jobs Using Python-Polars 0.17.11 to GroupBy Billion Rows

This page summarizes the projects mentioned and recommended in the original post on /r/Python

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • peaks-framework

    Discontinued The Peaks Consolidation is equipped with state-of-the-art algorithms and data structures that support high-performance databending exercises. It specializes in management accounting and consolidation, with some special topics in machine learning and bioinformatics. [Moved to: https://github.com/hkpeaks/peaks-consolidation]

  • This project has only 3-month history, first trial vesion to be released in Jun, provide most fundamental commands. For further info, you can visit github.com/hkpeaks/peaks-framework

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • I successfully ran four jobs with a billion rows yesterday while testing trillions of rows for more than a million files using Polars and Peaks on a step-by-step progressive basis. Previously, Polars failed on a single job, but after several bug fixes, it can now handle the workload. You can see https://github.com/pola-rs/polars/issues/7774

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • cloudquery

    The open source high performance ELT framework powered by Apache Arrow

  • CloudQuery supports a lot of APIs https://github.com/cloudquery/cloudquery

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Launch HN: PeerDB (YC S23) – Fast, Native ETL/ELT for Postgres

    2 projects | news.ycombinator.com | 27 Jul 2023
  • PeerDB Streams – Simple, Native Postgres Change Data Capture

    4 projects | news.ycombinator.com | 6 May 2024
  • RDS Database Migration Series - A horror story of using AWS DMS with a happy ending

    1 project | dev.to | 18 Mar 2024
  • Why Python's Integer Division Floors (2010)

    1 project | news.ycombinator.com | 28 Feb 2024
  • Osquery: An sqlite3 virtual table exposing operating system data to SQL

    14 projects | news.ycombinator.com | 25 Feb 2024