Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • quokka

    Making data lake work for time series (by marsupialtail)

    SQL support is very challenging.

    I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.

    The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.

  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

    Hi (one of the maintainers here), that is a good suggestion! I wasn't aware of that project. I went ahead and made an issue to add `export DO_NOT_TRACK=1` as one of the variables we track! https://github.com/Eventual-Inc/Daft/issues/1015

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    Please integrate it with Fugue.

    https://github.com/fugue-project/fugue

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts