-
This is very interesting, clearly there's a major pain point here to be addressed, especially the delta between local pandas work and distributed [pyspark] work!
Would love to test this out and do benchmarks against us/ Dask/ Spark/ Ray etc which have been our primary testing ground. Full disclosure, work at Bodo which has similar-ish aspirations (https://github.com/bodo-ai/Bodo), but FOSS all the way.
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
This is really cool, not sure how I missed it. I assume catalog support will be added fairly quickly. But ironically I think the biggest barrier to adoption will be the lack of an off-ramp to a FOSS solution that companies can self-host. Obviously Polars itself is FOSS, but it understandably seems like there's no way to self-host a backend to point a `pc.ComputeContext` to. That will be an especially tough selling point for companies that are already on Spark. I wonder how much they'll focus on startups vs. trying to get bigger companies to switch, and whether they'll try a Spark compatibility layer like DataFusion (https://github.com/apache/datafusion-comet).
-
Ibis also solves this problem by providing a portable dataframe API that works across multiple backends (DuckDB by default): https://ibis-project.org/
-
Statically typed dataframes are exactly why I created the Empirical programming language:
https://www.empirical-soft.com
It can infer the column names and types from a CSV file at compile time.
Here's an example that misspells the "ask" column as if it were plural:
let quotes = load("quotes.csv")