Our great sponsors
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask without any rewrites.
There's dask-sql, but I think it is being abandoned for fugue-project. I'm actually excited for this project as it is trying to provide a backend agnostic solution, which would seem like a difficult, lofty goal. I wish them luck.
-
Pandas syntax is far inferior to regular PySpark in my opinion. Goes to show how much data analysts value a syntax that they're already familiar with. Pandas syntax makes it harder to reason about queries, abstract DataFrame transformations, etc. I've authored some popular PySpark libraries like quinn and chispa and am not excited to add Pandas syntax support, haha.
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
Pandas syntax is far inferior to regular PySpark in my opinion. Goes to show how much data analysts value a syntax that they're already familiar with. Pandas syntax makes it harder to reason about queries, abstract DataFrame transformations, etc. I've authored some popular PySpark libraries like quinn and chispa and am not excited to add Pandas syntax support, haha.
Related posts
- Why Databricks Is Winning
- Show dataengineering: beavis, a library for unit testing Pandas/Dask code
- Is Spark - The Defenitive Guide outdated?
- [P] Microsoft releases SynapseML v0.9.5 with support for speech synthesis, anomaly detection, and geospatial analytics on large-scale data
- Microsoft releases SynapseML v0.9.5 for distributed geospatial analytics, speech synthesis, and anomaly detection in PySpark.