Polars Cloud: The Distributed Cloud Architecture to Run Polars Anywhere

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io
featured
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
  1. Bodo

    High-Performance Python Compute Engine for Data and AI

    This is very interesting, clearly there's a major pain point here to be addressed, especially the delta between local pandas work and distributed [pyspark] work!

    Would love to test this out and do benchmarks against us/ Dask/ Spark/ Ray etc which have been our primary testing ground. Full disclosure, work at Bodo which has similar-ish aspirations (https://github.com/bodo-ai/Bodo), but FOSS all the way.

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. datafusion-comet

    Apache DataFusion Comet Spark Accelerator

    This is really cool, not sure how I missed it. I assume catalog support will be added fairly quickly. But ironically I think the biggest barrier to adoption will be the lack of an off-ramp to a FOSS solution that companies can self-host. Obviously Polars itself is FOSS, but it understandably seems like there's no way to self-host a backend to point a `pc.ComputeContext` to. That will be an especially tough selling point for companies that are already on Spark. I wonder how much they'll focus on startups vs. trying to get bigger companies to switch, and whether they'll try a Spark compatibility layer like DataFusion (https://github.com/apache/datafusion-comet).

  4. ibis

    the portable Python dataframe library

    Ibis also solves this problem by providing a portable dataframe API that works across multiple backends (DuckDB by default): https://ibis-project.org/

  5. empirical-lang

    A language for time-series analysis

    Statically typed dataframes are exactly why I created the Empirical programming language:

    https://www.empirical-soft.com

    It can infer the column names and types from a CSV file at compile time.

    Here's an example that misspells the "ask" column as if it were plural:

      let quotes = load("quotes.csv")

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts