Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Still has more functionality than PySpark (e.g. CandarIntervalType). I am trying to help get CalendarIntervalType merged with PySpark.
Some Python libraries are easily wrapped via UDFs, see ceja, but don't overestimate this benefit. Most Python libraries cannot be wrapped and run via Spark in a performant manner.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.