Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • quokka

    Making data lake work for time series (by marsupialtail)

  • SQL support is very challenging.

    I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.

    The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.

  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

  • Hi (one of the maintainers here), that is a good suggestion! I wasn't aware of that project. I went ahead and made an issue to add `export DO_NOT_TRACK=1` as one of the variables we track! https://github.com/Eventual-Inc/Daft/issues/1015

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  • Please integrate it with Fugue.

    https://github.com/fugue-project/fugue

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts