Our great sponsors
-
This actually can be at the level of a missed optimization. A run queue with a lock-shared queue amongs all the threads scales even worse than the tokio version. Sharding the run queues and changing the notification algorithm, even while keeping locks on the sharded queues improves throughput drastically.
Tokio is an async runtime, but I don't see why being an async runtime should make it worse from a throughput perspective for a thread pool. I actually started on a Rust version [0] to test out this theory of whether async-rust was the culprit, but realized that I was being nerd-sniped [1] at this point and I should continue my Zig work instead. If you're still interested, I'm open to receiving PRs and questions on that if you want to see that in action.
It's still correct to benchmark and compare tokio here given the scheduler I was designing was mean to be used with async tasks: a bunch of concurrent and small-executing work units. I mention this in the second paragraph of "Why Build Your Own?".
The thread pool in the post is meant to be used to distribute I/O bound work. A friend of mine hooked up cross-platform I/O abstractions to the thread pool [2], benchmarked it against tokio to be have greater throughput and slightly worse tail latency under a local load [3]. The thread pool serves it's purpose and the quicksort benchmark is to show how schedulers behave under relatively concurrent work-loads. I could've used a benchmark with smaller tasks than the cpu-bound partition()/insertion_sort() but this worked as a common example.
I've already mentioned why rayon isn't a good comparison: 1. It doesn't support async root concurrency. 2. scoped() waits for tasks to complete by either blocking the OS thread or using similar inline-scheduler-loop optimizations. This risks stack overflow and isn't available as a use case in other async runtimes due to primarily being a fork-join optimization.
[0]: https://github.com/kprotty/zap/blob/blog-rust/src/thread_poo...
[2]: https://github.com/lithdew/hyperia
[3]: https://gist.github.com/kprotty/5a41e9612657de00788478a7dde4...
-
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
tokio
A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
-
Elastic Scheduling for Streaming Runtimes", https://www.scott-a-s.com/files/pldi2017_lf_elastic_scheduli.... The source code for the product implementation is now open source. Most is in https://github.com/IBMStreams/OSStreams/blob/main/src/cpp/SP... and https://github.com/IBMStreams/OSStreams/blob/main/src/cpp/SP....
-
This actually can be at the level of a missed optimization. A run queue with a lock-shared queue amongs all the threads scales even worse than the tokio version. Sharding the run queues and changing the notification algorithm, even while keeping locks on the sharded queues improves throughput drastically.
Tokio is an async runtime, but I don't see why being an async runtime should make it worse from a throughput perspective for a thread pool. I actually started on a Rust version [0] to test out this theory of whether async-rust was the culprit, but realized that I was being nerd-sniped [1] at this point and I should continue my Zig work instead. If you're still interested, I'm open to receiving PRs and questions on that if you want to see that in action.
It's still correct to benchmark and compare tokio here given the scheduler I was designing was mean to be used with async tasks: a bunch of concurrent and small-executing work units. I mention this in the second paragraph of "Why Build Your Own?".
The thread pool in the post is meant to be used to distribute I/O bound work. A friend of mine hooked up cross-platform I/O abstractions to the thread pool [2], benchmarked it against tokio to be have greater throughput and slightly worse tail latency under a local load [3]. The thread pool serves it's purpose and the quicksort benchmark is to show how schedulers behave under relatively concurrent work-loads. I could've used a benchmark with smaller tasks than the cpu-bound partition()/insertion_sort() but this worked as a common example.
I've already mentioned why rayon isn't a good comparison: 1. It doesn't support async root concurrency. 2. scoped() waits for tasks to complete by either blocking the OS thread or using similar inline-scheduler-loop optimizations. This risks stack overflow and isn't available as a use case in other async runtimes due to primarily being a fork-join optimization.
[0]: https://github.com/kprotty/zap/blob/blog-rust/src/thread_poo...
[2]: https://github.com/lithdew/hyperia
[3]: https://gist.github.com/kprotty/5a41e9612657de00788478a7dde4...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.