A Polars exploration into Kedro

This page summarizes the projects mentioned and recommended in the original post on dev.to

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Kedro

    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

    # pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/kedro@3ea7231", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/kedro-plugins@3b42fae#subdirectory=kedro-datasets", ]

  • CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  • cudf

    cuDF - GPU DataFrame Library

    The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

  • kedro-plugins

    First-party plugins maintained by the Kedro team.

    # pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/kedro@3ea7231", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/kedro-plugins@3b42fae#subdirectory=kedro-datasets", ]

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Traditionally Kedro has favoured pandas as a dataframe library because of its ubiquity and popularity. This means that, for example, to read a CSV file, you would add a corresponding entry to the catalog:

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.

  • Apache Arrow

    Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

    Polars is an open-source library for Python, Rust, and NodeJS that provides in-memory dataframes, out-of-core processing capabilities, and more. It is based on the Rust implementation of the Apache Arrow columnar data format (you can read more about Arrow on my earlier blog post “Demystifying Apache Arrow”), and it is optimised to be blazing fast.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Kotlin DataFrame ❤️ Arrow

    3 projects | dev.to | 10 Oct 2024
  • Data Visualisation Basics

    3 projects | dev.to | 6 Sep 2024
  • Useful Python Libraries for AI/ML

    5 projects | dev.to | 10 Aug 2024
  • "No" is not an actionable error message

    1 project | news.ycombinator.com | 3 May 2024
  • Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines

    1 project | news.ycombinator.com | 2 May 2024

Did you konow that Python is
the 2nd most popular programming language
based on number of metions?