Apache Spark
FrameworkBenchmarks
Apache Spark | FrameworkBenchmarks | |
---|---|---|
101 | 366 | |
38,414 | 7,391 | |
0.7% | 0.5% | |
10.0 | 9.8 | |
4 days ago | 4 days ago | |
Scala | Java | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Spark
- "xAI will open source Grok"
-
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
-
🦿🛴Smarcity garbage reporting automation w/ ollama
Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
-
Go concurrency simplified. Part 4: Post office as a data pipeline
also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
-
Five Apache projects you probably didn't know about
Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
-
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Integrate Pyspark Structured Streaming with confluent-kafka
Apache Spark - https://spark.apache.org/
-
Spark – A micro framework for creating web applications in Kotlin and Java
A JVM based framework named "Spark", when https://spark.apache.org exists?
- Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
-
PySpark SparkSession Builder with Kubernetes Master
I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.
FrameworkBenchmarks
-
Why choose async/await over threads?
Neat. Thanks for sharing!
Interestingly, may-minihttp is faring very well in the TechEmpower benchmark [1], for whatever those benchmarks are worth. The code is also surprisingly straightforward [2].
[1] https://www.techempower.com/benchmarks/
[2] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
-
Ntex: Powerful, pragmatic, fast framework for composable networking services
ntex was formed after a schism in actix-web and Rust safety/unsafety, with ntex allowing more unsafe code for better performance.
ntex is at the top of the TechEmpower benchmarks, although those benchmarks are not apples-to-apples since each uses its own tricks: https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s...
-
A decent VS Code and Ruby on Rails setup
Ruby is slow. Very slow. How much you may ask? https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s... fastest Ruby entry is at 272th place. Sure, top entries tend to have questionable benchmark-golfing implementations, but it gives you a good primer on the overhead imposed by Ruby.
It is also not early 00s anymore, when you pick an interpreted language, you are not getting "better productivity and tooling". In fact, most interpreted languages lag behind other major languages significantly in the form of JS/TS, Python and Ruby suffering from different woes when it comes to package management and publishing. I would say only TS/JS manages to stand apart with being tolerable, and Python sometimes too by a virtue of its popularity and the amount of information out there whenever you need to troubleshoot.
If you liked Go but felt it being a too verbose to your liking, give .NET a try. I am advocating for it here on HN mostly for fun but it is, in fact, highly underappreciated, considered unsexy and boring while it's anything but after a complete change of trajectory in the last 3-5 years. It is actually the* stack people secretly want but simply don't know about because it is bundled together with Java in the public perception.
*productive CLI tooling, high performance, works well in a really wide range of workloads from low to high level, by far the best ORM across all languages and back-end framework that is easier to work with than Node.JS while consuming 0.1x resources
-
The Erlang Ecosystem [video]
Although that seems to have improved in recent years.
https://www.techempower.com/benchmarks/#hw=ph&test=json§...
-
Ruby 3.3
RoR and whatever C++ based web backend there is count as a valid comparison in my book. But comparing the languages itself is maybe a bit off.
On a side note, you can actually compare their performance here if you’re really curious. But take it with a grain of salt since these are synthetic benchmarks.
https://www.techempower.com/benchmarks
-
API: Go, .NET, Rust
Most benchmarks you'll find essentially have someone's thumb on the scale (intentionally or unintentionally). Most people won't know the different languages well enough to create comparable implementations and if you let different people create the implementations, cheating happens. The TechEmpower benchmarks aren't bad, but many implementations put their thumb on the scale (https://www.techempower.com/benchmarks). For example, a lot of the Go implementations avoid the GC by pre-allocating/reusing structs or allocate arrays knowing how big they need to be in advance (despite that being against the rules). At some point, it becomes "how many features have you turned off." Some Go http routers (like fasthttp and those built off it like Atreugo and Fiber) aren't actually correct and a lot of people in the Go community discourage their use, but they certainly top the benchmarks. Gin and Echo are usually the ones that are well-respected in the Go community.
-
Rage: Fast web framework compatible with Rails
There is certainly a lot of speculation in Techempower benchmarks and top entries can utilize questionable techniques like simply writing a byte array literal to output stream instead of constructing a response, or (in the past) DB query coalescing to work around inherent limitations of the DB in case of Fortunes or DB quries.
And yet, the fastest Ruby entry is at 274th place while Rails is at 427th.
https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s...
-
Node.js – v20.8.1
oh what machine? with how many workers? doing what?
search for "node" on this page: https://www.techempower.com/benchmarks/#section=data-r21
-
Strong typing, a hill I'm willing to die on
JustJS would like a word https://www.techempower.com/benchmarks/#section=data-r20&tes...
-
Rust vs Go: A Hands-On Comparison
In terms of RPS, this web service is more-or-less the fortunes benchmark in the techempower benchmarks, once the data hits the cache: https://www.techempower.com/benchmarks/#section=data-r21
Or, at least, they would be after applying optimizations to them.
In short, both of these would serve more rps than you will likely ever need on even the lowest end virtual machines. The underlying API provider will probably cut you off from querying them before you run out of RPS.
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
zio-http - A next-generation Scala framework for building scalable, correct, and efficient HTTP clients and servers
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
drogon - Drogon: A C++14/17 based HTTP web application framework running on Linux/macOS/Unix/Windows [Moved to: https://github.com/drogonframework/drogon]
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
django-ninja - 💨 Fast, Async-ready, Openapi, type hints based framework for building APIs
Scalding - A Scala API for Cascading
LiteNetLib - Lite reliable UDP library for Mono and .NET
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
C++ REST SDK - The C++ REST SDK is a Microsoft project for cloud-based client-server communication in native code using a modern asynchronous C++ API design. This project aims to help C++ developers connect to and interact with services.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
SQLBoiler - Generate a Go ORM tailored to your database schema.