Test On 4 Concurrent Jobs Using Python-Polars 0.17.11 to GroupBy Billion Rows

This page summarizes the projects mentioned and recommended in the original post on /r/Python

Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers
Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.
www.nutrient.io
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. peaks-framework

    Discontinued The Peaks Consolidation is equipped with state-of-the-art algorithms and data structures that support high-performance databending exercises. It specializes in management accounting and consolidation, with some special topics in machine learning and bioinformatics. [Moved to: https://github.com/hkpeaks/peaks-consolidation]

    This project has only 3-month history, first trial vesion to be released in Jun, provide most fundamental commands. For further info, you can visit github.com/hkpeaks/peaks-framework

  2. Nutrient

    Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers. Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.

    Nutrient logo
  3. polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

    I successfully ran four jobs with a billion rows yesterday while testing trillions of rows for more than a million files using Polars and Peaks on a step-by-step progressive basis. Previously, Polars failed on a single job, but after several bug fixes, it can now handle the workload. You can see https://github.com/pola-rs/polars/issues/7774

  4. cloudquery

    The open source high performance ELT framework powered by Apache Arrow

    CloudQuery supports a lot of APIs https://github.com/cloudquery/cloudquery

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts