Random123 Alternatives

Similar projects and alternatives to random123

llama.cpp

776 57,463 10.0 C++ random123 VS llama.cpp

LLM inference in C/C++
bevy

574 32,489 9.9 Rust random123 VS bevy

A refreshingly simple data-driven game engine built in Rust
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
kompute

37 1,489 8.1 C++ random123 VS kompute

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
ZLUDA

35 7,725 7.0 Rust random123 VS ZLUDA

CUDA on AMD GPUs
HIP

30 3,462 8.9 C++ random123 VS HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
Cgml

22 39 8.6 C++ random123 VS Cgml

GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.
wonnx

18 1,501 6.3 Rust random123 VS wonnx

A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
intel-extension-for-pytorch

16 1,351 9.7 Python random123 VS intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
HIPIFY

5 402 9.7 C++ random123 VS HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code
HIPCC

2 38 5.8 C++ random123 VS HIPCC

HIPCC: HIP compiler driver
OpenRAND

1 24 7.3 C++ random123 VS OpenRAND

Reproducible random number generation for parallel computations

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better random123 alternative or higher similarity.

Suggest an alternative to random123

random123 reviews and mentions

Posts with mentions or reviews of random123. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-14.

Intel CEO: 'The entire industry is motivated to eliminate the CUDA market'
13 projects | news.ycombinator.com | 14 Dec 2023

for GPGPU, the better approach is CBRNG like random123.
https://github.com/DEShawResearch/random123
if you accept the principles of encryption, then the bits of the output of crypt(key, message) should be totally uncorrelated to the output of crypt(key, message+1). and this requires no state other than knowing the key and the position in the sequence.
moreover, you can then define the key in relation to your actual data. the mental shift from what you're talking about is that in this model, a PRNG isn't something that belongs to the executing thread. every element can get its own PRNG and keystream. And if you use a contextually-meaningful value for the element key, then you already "know" the key from your existing data. And this significantly improves determinism of the simulation etc because PRNG output is tied to the simulation state, not which thread it happens to be scheduled on.
(note that the property of cryptographic non-correlation is NOT guaranteed across keystreams - (key, counter) is NOT guaranteed to be uncorrelated to (key+1, counter), because that's not how encryption usually is used. with a decent crypto, it should still be very good, but, it's not guaranteed to be attack-resistant/etc. so notionally if you use a different key index for every element, element N isn't guaranteed to be uncorrelated to element N+1 at the same place in the keystream. If this is really important then maybe you want to pass your array indexes through a key-spreading function etc.)
there are several benefits to doing it like this. first off obviously you get a keystream for each element of interest. but also there is no real state per-thread either - the key can be determined by looking at the element, but generating a new value doesn't change the key/keystream. so there is nothing to store and update, and you can have arbitrary numbers of generators used at any given time. Also, since this computation is purely mathematical/"pure function", it doesn't really consume any memory-bandwidth to speak of, and since computation time is usually not the limiting element in GPGPU simulations this effectively makes RNG usage "free". my experience is that this increases performance vs CuRand, even while using less VRAM, even just directly porting the "1 thread = 1 generator" idiom.
Also, by storing "epoch numbers" (each iteration of the sim, etc), or calculating this based on predictions of PRNG consumption ("each iteration uses at most 16 random numbers"), you can fast-forward or rewind the PRNG to arbitrary times, and you can use this to lookahead or lookback on previous events from the keystream, meaning it serves as a massively potent form of compression as well. Why store data in memory and use up your precious VRAM, when you could simply recompute it on-demand from the original part of the original keystream used to generate it in the first place? (assuming proper "object ownership" ofc!) And this actually is pretty much free in performance terms, since it's a "pure function" based on the function parameters, and the GPGPU almost certainly has an excess of computation available.
In the extreme case, you should be able to theoretically "walk" huge parts of the keystream and find specific events you need, even if there is no other reference to what happened at that particular time in the past. Like why not just walk through parts of the keystream until you find the event that matches your target criteria? Remember since this is basically pure math, it's generated on-demand by mathing it out, it's pretty much free, and computation is cheap compared to cache/memory or notarizing.
(ie this is a weird form of "inverted-index searching", analogous to Elastic/Solr's transformers and how this allows a large number of individual transformers (which do their own searching/indexing for each query, which will be generally unindexable operations like fulltext etc) to listen to a single IO stream as blocks are broadcast from the disk in big sequential streaming batches. Instead of SSD batch reads you'd be aiming for computation batch reads from a long range within a keystream.)
Anyway I don't know how much that maps to your particular use-case but that's the best advice I can give. Procedural generation using a rewindable, element-specific keystream is a very potent form of compression, and very cheap. But, even if all you are doing is just avoiding having to store a bunch of CuRand instances in VRAM... that's still an enormous win even if you directly port your existing application to simply use the globalThreadId like it was a CuRand stateful instance being loaded/saved back to VRAM. Like I said, my experience is that because you're changing mutation to computation, this runs faster and also uses less VRAM, it is both smaller and better and probably also statistically better randomness (especially if you choose the "hard" algorithms instead of the "optimized" versions like threefish instead of threefry etc).
That is the reason why you shouldn't do the "just download random numbers", as a sibling comment mentions (probably a joke) - that consumes VRAM, or at least system memory (and pcie bandwidth). and you know what's usually way more available as a resource in most GPGPU applications than VRAM or PCIe bandwidth? pure ALU/FPU computation time.
buddy, everyone has random numbers, they come with the fucking xbox. ;)