gemm-benchmark vs DOOM

gemm-benchmark

Simple [sd]gemm benchmark, similar to ACES dgemm (by danieldk)

Suggest topics

Source Code

Suggest alternative

Edit details

DOOM

DOOM Open Source Release (by id-Software)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

gemm-benchmark		DOOM
	Project
6	Mentions	91
8	Stars	12,871
-	Growth	2.5%
3.5	Activity	2.2
6 months ago	Latest Commit	7 days ago
Rust	Language	C++
Apache License 2.0	License	GNU General Public License v3.0 only

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

gemm-benchmark

Posts with mentions or reviews of gemm-benchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-20.

Running Stable Diffusion in 260MB of RAM
3 projects | news.ycombinator.com | 20 Jul 2023

And PyTorch on the M1 (without Metal) uses the fast AMX matrix multiplication units (through the Accelerate Framework). The matrix multiplication on the M1 is on par with ~10 threads/cores of Ryzen 5900X.
[1] https://github.com/danieldk/gemm-benchmark#example-results
Ask HN: What is a AI chip and how does it work?
4 projects | news.ycombinator.com | 27 May 2023

Apple Silicon Macs have special matrix multiplication units (AMX) that can do matrix multiplication fast and with low energy requirements [1]. These AMX units can often beat matrix multiplication on AMD/Intel CPUs (especially those without a very large number of cores). Since a lot of linear algebra code uses matrix multiplication and using the AMX units is only a matter of linking against Accelerate (for its BLAS interface), a lot of software that uses BLAS is faster o Apple Silicon Macs.
That said, the GPUs in your M1 Mac are faster than the AMX units and any reasonably modern NVIDIA GPU will wipe the floor with the AMX units or Apple Silicon GPUs in raw compute. However, a lot of software does not use CUDA by default and for small problem sets AMX units or CPUs with just AVX can be faster because they don't incur the cost of data transfers from main memory to GPU memory and vice versa.
[1] Benchmarks:
https://github.com/danieldk/gemm-benchmark#example-results
https://explosion.ai/blog/metal-performance-shaders (scroll down a bit for AMX and MPS numbers)
Apple previews Live Speech, Personal Voice, and more new accessibility features
3 projects | news.ycombinator.com | 16 May 2023
How to Get 1.5 TFlops of FP32 Performance on a Single M1 CPU Core
1 project | news.ycombinator.com | 5 Jan 2023

Yes, there is one per core cluster. The title is a bit misleading, because it suggests that going to two or three cores will scale linearly, though it won't be much faster. See here for sgemm benchmarks for everything from the M1 to M1 Ultra and 1 to 16 threads:
https://github.com/danieldk/gemm-benchmark#1-to-16-threads
WebAssembly Techniques to Speed Up Matrix Multiplication by 120x
4 projects | news.ycombinator.com | 25 Jan 2022

There's always been a tradeoff in writing code between developer experience and taking full advantage of what the hardware is capable of. That "waste" in execution efficiency is often worth it for the sake of representing helpful abstractions and generally helping developer productivity.
The GFLOP/s is 1/28th of what you'd get when using the native Accelerate framework on M1 Macs [1]. I am all in for powerful abstractions, but not using native APIs for this (even if it's just the browser calling Accelerate in some way) is just a huge waste of everyone's CPU cycles and electricity.
[1] https://github.com/danieldk/gemm-benchmark#1-to-16-threads

DOOM

Posts with mentions or reviews of DOOM. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-17.

Doom Released Under GPLv2
4 projects | news.ycombinator.com | 17 Jan 2024

commercially exploit or use for any commercial purpose."
[1] https://github.com/id-Software/DOOM/commit/4eb368a960647c8cc...
GTA 5 source code leaks online
3 projects | news.ycombinator.com | 25 Dec 2023
The original Doom had third-party audio playback routines, so the source came with a rewritten sound server: https://github.com/id-Software/DOOM/tree/master/sndserv
```
    The bad news:  this code only compiles and runs on linux.  We couldn't
```
What you can do with C ?
1 project | /r/C_Programming | 29 Nov 2023
Software Disenchantment
5 projects | news.ycombinator.com | 23 Oct 2023

Here's a repo for you with no test coverage and no auto-generated DI. They using unsafe pointers all over the place, too!
https://github.com/id-Software/DOOM
Shall I prepare the postage for the letter in which you'll call John Carmack an MBA? Should we send another to Chris Sawyer? I heard he didn't even write a formal design doc for Roller Coaster Tycoon!
Ask HN: Good practices for my first C project
3 projects | news.ycombinator.com | 18 Oct 2023

cURL is one of the most used C libs and is an example of good quality C code. If you follow the style used there, see e.g. https://github.com/curl/curl/blob/master/lib/dynhds.h (and associated dynhds.c) you will be good.
Looking at the source of some of the old game-engines from the era that have since been released as open-source can also be helpful, like https://github.com/id-Software/DOOM.
In both cases notice how simple and elegant a lot of the code is. There is already enough complexity inherent in the problem they are solving, and that is where the focus should be.
Any IDE with a working language server to make it easy to jump around and refactor should work fine. Limitations might be due to the C language itself?
Error handling on such a fixed platform does not need to be super-advanced. You should always be within the confines of the system so there shouldn't be much that can go wrong. If stuff goes wrong anyway just being able call a function Fatal("FooBar failed with code 34") when unexpected stuff happens and have it log somewhere to be able to dig around should be enough. You never need to be able to recover and retry.
Make sure to use https://clang.llvm.org/docs/AddressSanitizer.html or a similar tool when developing outside of the PSOne.
That said, consider statically allocating global buffers for most stuff and avoid using the heap for most stuff.
Good luck working within the confines of the PSOne! Many hackers have pulled the hair from their head on that platform ;)
Ask HN: Where do I find good code to read?
22 projects | news.ycombinator.com | 24 Aug 2023
Running Stable Diffusion in 260MB of RAM
3 projects | news.ycombinator.com | 20 Jul 2023

Probably more easily than you'd think. DOOM is open source[1], and as GP alludes, is probably the most frequently ported game in existence, so its source code almost certainly appears multiple times in GPT-4's training set, likely alongside multiple annotated explanations.
[1] https://github.com/id-Software/DOOM
Where can I get game files to study?
1 project | /r/GameDevelopment | 11 Jul 2023
Some were meant for C [pdf]
2 projects | news.ycombinator.com | 21 Jun 2023

I'd define an arena as the pattern where the arena itself owns N objects. So you free the arena to free all objects.
My first job was at EA working on console games (PS2, GameCube, XBox, no OS or virtual memory on any of them), and while at the time I was too junior to touch the memory allocators themselves, we were definitely not malloc-ing and freeing all the time.
It was more like you load data for the level in one stage, which creates a ton of data structures, and then you enter a loop to draw every frame quickly. There were many global variables.
---
Wikipedia calls it a region, zone, arena, area, or memory context, and that seems about right:
https://en.wikipedia.org/wiki/Region-based_memory_management
It describes history from 1967 (before C was invented!) and has some good examples from Apache ("pools") and Postgres ("memory contexts").
I also just looked at these codebases:
https://github.com/mit-pdos/xv6-public (based on code from the 70's)
https://github.com/id-Software/DOOM (1997)
I looked at allocproc() in xv6, and gives you an object from a fixed global array. A lot of C code in the 80's and 90's was essentially "kernel code" in that it didn't have an OS underneath it. Embedded systems didn't run on full-fledges OSes.
DOOM tends to use a lot of what I would call "pools" -- arrays of objects of a fixed size, and that's basically what I remember from EA.
Though in g_game.c, there is definitely an arena of size 0x20000 called "demobuffer". It's used with a bump allocator.
---
So I'd say
- malloc / free of individual objects was NEVER what C code looked like (aside from toy code in college)
- arena allocators were used, but global vars and pools are also very common.
- arenas are more or less wash for memory safety. they help you in some ways, but hurt you in others.
The reason C programmers don't malloc/free all the time is for speed, not memory safety. Arenas are still unsafe.
When you free an arena, you have no guarantee there's nothing that points to it anymore.
Also, something that shouldn't be underestimated is that arena allocators break tools like ASAN, which use the malloc() free() interface. This was underscored to me by writing a garbage collector -- the custom allocator "broke" ASAN, and that was actually a problem:
https://www.oilshell.org/blog/2023/01/garbage-collector.html
If you want memory safety in your C code, you should be using ASAN (dynamically instrumented allocators) and good test coverage. Arenas don't help -- they can actually hurt. An arena is a trivial idea -- the problem is more if that usage pattern actually matches your application, and apps evolve over time.
What is your gender?
1 project | /r/teenagers | 18 Jun 2023

Doom

What are some alternatives?

When comparing gemm-benchmark and DOOM you can also consider the following projects:

XNNPACK - High-efficiency floating-point neural network inference operators for mobile, server, and Web

open-watcom-v2 - Open Watcom V2.0 - Source code repository, Wiki, Latest Binary build, Archived builds including all installers for download.

rknn-toolkit

project-based-tutorials-in-c - A curated list of project-based tutorials in C

OnnxStream - Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers.

Apollo-11 - Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.

armnn - Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn

doomgeneric - Easily portable doom

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

luxtorpeda - Steam Play compatibility tool to run games using native Linux engines

piper - A fast, local neural text to speech system

angband - A free, single-player roguelike dungeon exploration game

gemm-benchmark vs XNNPACK DOOM vs open-watcom-v2 gemm-benchmark vs rknn-toolkit DOOM vs project-based-tutorials-in-c gemm-benchmark vs OnnxStream DOOM vs Apollo-11 gemm-benchmark vs armnn DOOM vs doomgeneric gemm-benchmark vs Pytorch DOOM vs luxtorpeda gemm-benchmark vs piper DOOM vs angband

Compare gemm-benchmark vs DOOM and see what are their differences.

gemm-benchmark

DOOM

gemm-benchmark

DOOM

What are some alternatives?