Lessons Learned from Scaling to Multi-Terabyte Datasets

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • iceberg-python

    Apache PyIceberg

    Iceberg is working hard to support pure python[0] / rust[1] workflows without Spark. Following Tabular's acquisition [2], I hope it still moves in this direction at the same clip.

    We're using iceberg + duckdb to power analytics in our app[3] and I'm really happy with the combo.

    0 - https://github.com/apache/iceberg-python

    1 - https://github.com/apache/iceberg-rust

    2 - https://x.com/thisritchie/status/1800522255426072647

    3 - https://www.definite.app/

  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • iceberg-rust

    Apache Iceberg

    Iceberg is working hard to support pure python[0] / rust[1] workflows without Spark. Following Tabular's acquisition [2], I hope it still moves in this direction at the same clip.

    We're using iceberg + duckdb to power analytics in our app[3] and I'm really happy with the combo.

    0 - https://github.com/apache/iceberg-python

    1 - https://github.com/apache/iceberg-rust

    2 - https://x.com/thisritchie/status/1800522255426072647

    3 - https://www.definite.app/

  • delta-rs

    A native Rust library for Delta Lake, with bindings into Python

    You can already have it in Delta with Delta Rust and Python bindings: https://github.com/delta-io/delta-rs

  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

    https://github.com/Eventual-Inc/Daft Is also great at these types of workloads since it’s both distributed and vectorized!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Quick tip: Using SingleStore with PyIceberg

    1 project | dev.to | 14 Jul 2024
  • Uv: Python Packaging in Rust

    9 projects | news.ycombinator.com | 15 Feb 2024
  • Ask HN: Show me your half baked project

    163 projects | news.ycombinator.com | 12 Oct 2023
  • Iceberg won the table format war: But not in the way you thought it might

    2 projects | /r/dataengineering | 6 Jul 2023
  • Lakehouse using AWS Athena on Iceberg Concerns

    1 project | /r/dataengineering | 28 May 2023

Did you konow that Rust is
the 5th most popular programming language
based on number of metions?