Our great sponsors
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
# pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/kedro@3ea7231", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/kedro-plugins@3b42fae#subdirectory=kedro-datasets", ]
-
The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
# pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/kedro@3ea7231", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/kedro-plugins@3b42fae#subdirectory=kedro-datasets", ]
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Traditionally Kedro has favoured pandas as a dataframe library because of its ubiquity and popularity. This means that, for example, to read a CSV file, you would add a corresponding entry to the catalog:
-
The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
Polars is an open-source library for Python, Rust, and NodeJS that provides in-memory dataframes, out-of-core processing capabilities, and more. It is based on the Rust implementation of the Apache Arrow columnar data format (you can read more about Arrow on my earlier blog post “Demystifying Apache Arrow”), and it is optimised to be blazing fast.
Related posts
- Data Science for Beginners - A Curriculum
- Ballista (Rust) vs Apache Spark. A Tale of Woe.
- [OC] Nationwide Public Transit Ridership is down 30% from pre-lockdown levels; San Francisco's BART ridership is down almost 70%
- Evolution and Trends of Data Engineering 2022/23
- Pandas AI – The Future of Data Analysis