A Polars exploration into Kedro

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Kedro

29 9,353 9.7 Python

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

# pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/kedro@3ea7231", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/kedro-plugins@3b42fae#subdirectory=kedro-datasets", ]

cudf

23 7,274 9.9 C++

cuDF - GPU DataFrame Library

The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
kedro-plugins

1 81 9.2 Python

First-party plugins maintained by the Kedro team.

# pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/kedro@3ea7231", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/kedro-plugins@3b42fae#subdirectory=kedro-datasets", ]

Pandas

393 41,923 10.0 Python

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Traditionally Kedro has favoured pandas as a dataframe library because of its ubiquity and popularity. This means that, for example, to read a CSV file, you would add a corresponding entry to the catalog:

modin

11 9,476 9.6 Python

Modin: Scale your Pandas workflows by changing a single line of code

The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

Apache Arrow

75 13,480 10.0 C++

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Polars is an open-source library for Python, Rust, and NodeJS that provides in-memory dataframes, out-of-core processing capabilities, and more. It is based on the Rust implementation of the Apache Arrow columnar data format (you can read more about Arrow on my earlier blog post “Demystifying Apache Arrow”), and it is optimised to be blazing fast.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

The Design Philosophy of Great Tables (Software Package)
7 projects | news.ycombinator.com | 4 Apr 2024
Welcome to 14 days of Data Science!
1 project | dev.to | 7 Mar 2024
Seaborn bug responsible for finding of declining disruptiveness in science
2 projects | news.ycombinator.com | 25 Feb 2024
Why Pandas feels clunky when coming from R
2 projects | news.ycombinator.com | 23 Feb 2024
Polars
11 projects | news.ycombinator.com | 8 Jan 2024

A Polars exploration into Kedro

This page summarizes the projects mentioned and recommended in the original post on dev.to
Python Pandas Data Science Arrow Dataframe
Post date: 17 May 2023

Kedro

cudf

InfluxDB

kedro-plugins

Pandas

modin

Apache Arrow

Related posts

A Polars exploration into Kedro

This page summarizes the projects mentioned and recommended in the original post on dev.to Python Pandas Data Science Arrow Dataframe Post date: 17 May 2023

Kedro

cudf

InfluxDB

kedro-plugins

Pandas

modin

Apache Arrow

Related posts

This page summarizes the projects mentioned and recommended in the original post on dev.to
Python Pandas Data Science Arrow Dataframe
Post date: 17 May 2023