gemm-benchmark VS DOOM

Compare gemm-benchmark vs DOOM and see what are their differences.

gemm-benchmark

Simple [sd]gemm benchmark, similar to ACES dgemm (by danieldk)

DOOM

DOOM Open Source Release (by id-Software)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
gemm-benchmark DOOM
6 91
8 12,871
- 2.5%
3.5 2.2
6 months ago 7 days ago
Rust C++
Apache License 2.0 GNU General Public License v3.0 only
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

gemm-benchmark

Posts with mentions or reviews of gemm-benchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-20.
  • Running Stable Diffusion in 260MB of RAM
    3 projects | news.ycombinator.com | 20 Jul 2023
    And PyTorch on the M1 (without Metal) uses the fast AMX matrix multiplication units (through the Accelerate Framework). The matrix multiplication on the M1 is on par with ~10 threads/cores of Ryzen 5900X.

    [1] https://github.com/danieldk/gemm-benchmark#example-results

  • Ask HN: What is a AI chip and how does it work?
    4 projects | news.ycombinator.com | 27 May 2023
    Apple Silicon Macs have special matrix multiplication units (AMX) that can do matrix multiplication fast and with low energy requirements [1]. These AMX units can often beat matrix multiplication on AMD/Intel CPUs (especially those without a very large number of cores). Since a lot of linear algebra code uses matrix multiplication and using the AMX units is only a matter of linking against Accelerate (for its BLAS interface), a lot of software that uses BLAS is faster o Apple Silicon Macs.

    That said, the GPUs in your M1 Mac are faster than the AMX units and any reasonably modern NVIDIA GPU will wipe the floor with the AMX units or Apple Silicon GPUs in raw compute. However, a lot of software does not use CUDA by default and for small problem sets AMX units or CPUs with just AVX can be faster because they don't incur the cost of data transfers from main memory to GPU memory and vice versa.

    [1] Benchmarks:

    https://github.com/danieldk/gemm-benchmark#example-results

    https://explosion.ai/blog/metal-performance-shaders (scroll down a bit for AMX and MPS numbers)

  • Apple previews Live Speech, Personal Voice, and more new accessibility features
    3 projects | news.ycombinator.com | 16 May 2023
  • How to Get 1.5 TFlops of FP32 Performance on a Single M1 CPU Core
    1 project | news.ycombinator.com | 5 Jan 2023
    Yes, there is one per core cluster. The title is a bit misleading, because it suggests that going to two or three cores will scale linearly, though it won't be much faster. See here for sgemm benchmarks for everything from the M1 to M1 Ultra and 1 to 16 threads:

    https://github.com/danieldk/gemm-benchmark#1-to-16-threads

  • WebAssembly Techniques to Speed Up Matrix Multiplication by 120x
    4 projects | news.ycombinator.com | 25 Jan 2022
    There's always been a tradeoff in writing code between developer experience and taking full advantage of what the hardware is capable of. That "waste" in execution efficiency is often worth it for the sake of representing helpful abstractions and generally helping developer productivity.

    The GFLOP/s is 1/28th of what you'd get when using the native Accelerate framework on M1 Macs [1]. I am all in for powerful abstractions, but not using native APIs for this (even if it's just the browser calling Accelerate in some way) is just a huge waste of everyone's CPU cycles and electricity.

    [1] https://github.com/danieldk/gemm-benchmark#1-to-16-threads

DOOM

Posts with mentions or reviews of DOOM. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-17.
  • Doom Released Under GPLv2
    4 projects | news.ycombinator.com | 17 Jan 2024
    commercially exploit or use for any commercial purpose."

    [1] https://github.com/id-Software/DOOM/commit/4eb368a960647c8cc...

  • GTA 5 source code leaks online
    3 projects | news.ycombinator.com | 25 Dec 2023
    The original Doom had third-party audio playback routines, so the source came with a rewritten sound server: https://github.com/id-Software/DOOM/tree/master/sndserv

        The bad news:  this code only compiles and runs on linux.  We couldn't
  • What you can do with C ?
    1 project | /r/C_Programming | 29 Nov 2023
  • Software Disenchantment
    5 projects | news.ycombinator.com | 23 Oct 2023
    Here's a repo for you with no test coverage and no auto-generated DI. They using unsafe pointers all over the place, too!

    https://github.com/id-Software/DOOM

    Shall I prepare the postage for the letter in which you'll call John Carmack an MBA? Should we send another to Chris Sawyer? I heard he didn't even write a formal design doc for Roller Coaster Tycoon!

  • Ask HN: Good practices for my first C project
    3 projects | news.ycombinator.com | 18 Oct 2023
    cURL is one of the most used C libs and is an example of good quality C code. If you follow the style used there, see e.g. https://github.com/curl/curl/blob/master/lib/dynhds.h (and associated dynhds.c) you will be good.

    Looking at the source of some of the old game-engines from the era that have since been released as open-source can also be helpful, like https://github.com/id-Software/DOOM.

    In both cases notice how simple and elegant a lot of the code is. There is already enough complexity inherent in the problem they are solving, and that is where the focus should be.

    Any IDE with a working language server to make it easy to jump around and refactor should work fine. Limitations might be due to the C language itself?

    Error handling on such a fixed platform does not need to be super-advanced. You should always be within the confines of the system so there shouldn't be much that can go wrong. If stuff goes wrong anyway just being able call a function Fatal("FooBar failed with code 34") when unexpected stuff happens and have it log somewhere to be able to dig around should be enough. You never need to be able to recover and retry.

    Make sure to use https://clang.llvm.org/docs/AddressSanitizer.html or a similar tool when developing outside of the PSOne.

    That said, consider statically allocating global buffers for most stuff and avoid using the heap for most stuff.

    Good luck working within the confines of the PSOne! Many hackers have pulled the hair from their head on that platform ;)

  • Ask HN: Where do I find good code to read?
    22 projects | news.ycombinator.com | 24 Aug 2023
  • Running Stable Diffusion in 260MB of RAM
    3 projects | news.ycombinator.com | 20 Jul 2023
    Probably more easily than you'd think. DOOM is open source[1], and as GP alludes, is probably the most frequently ported game in existence, so its source code almost certainly appears multiple times in GPT-4's training set, likely alongside multiple annotated explanations.

    [1] https://github.com/id-Software/DOOM

  • Where can I get game files to study?
    1 project | /r/GameDevelopment | 11 Jul 2023
  • Some were meant for C [pdf]
    2 projects | news.ycombinator.com | 21 Jun 2023
    I'd define an arena as the pattern where the arena itself owns N objects. So you free the arena to free all objects.

    My first job was at EA working on console games (PS2, GameCube, XBox, no OS or virtual memory on any of them), and while at the time I was too junior to touch the memory allocators themselves, we were definitely not malloc-ing and freeing all the time.

    It was more like you load data for the level in one stage, which creates a ton of data structures, and then you enter a loop to draw every frame quickly. There were many global variables.

    ---

    Wikipedia calls it a region, zone, arena, area, or memory context, and that seems about right:

    https://en.wikipedia.org/wiki/Region-based_memory_management

    It describes history from 1967 (before C was invented!) and has some good examples from Apache ("pools") and Postgres ("memory contexts").

    I also just looked at these codebases:

    https://github.com/mit-pdos/xv6-public (based on code from the 70's)

    https://github.com/id-Software/DOOM (1997)

    I looked at allocproc() in xv6, and gives you an object from a fixed global array. A lot of C code in the 80's and 90's was essentially "kernel code" in that it didn't have an OS underneath it. Embedded systems didn't run on full-fledges OSes.

    DOOM tends to use a lot of what I would call "pools" -- arrays of objects of a fixed size, and that's basically what I remember from EA.

    Though in g_game.c, there is definitely an arena of size 0x20000 called "demobuffer". It's used with a bump allocator.

    ---

    So I'd say

    - malloc / free of individual objects was NEVER what C code looked like (aside from toy code in college)

    - arena allocators were used, but global vars and pools are also very common.

    - arenas are more or less wash for memory safety. they help you in some ways, but hurt you in others.

    The reason C programmers don't malloc/free all the time is for speed, not memory safety. Arenas are still unsafe.

    When you free an arena, you have no guarantee there's nothing that points to it anymore.

    Also, something that shouldn't be underestimated is that arena allocators break tools like ASAN, which use the malloc() free() interface. This was underscored to me by writing a garbage collector -- the custom allocator "broke" ASAN, and that was actually a problem:

    https://www.oilshell.org/blog/2023/01/garbage-collector.html

    If you want memory safety in your C code, you should be using ASAN (dynamically instrumented allocators) and good test coverage. Arenas don't help -- they can actually hurt. An arena is a trivial idea -- the problem is more if that usage pattern actually matches your application, and apps evolve over time.

  • What is your gender?
    1 project | /r/teenagers | 18 Jun 2023
    Doom

What are some alternatives?

When comparing gemm-benchmark and DOOM you can also consider the following projects:

XNNPACK - High-efficiency floating-point neural network inference operators for mobile, server, and Web

open-watcom-v2 - Open Watcom V2.0 - Source code repository, Wiki, Latest Binary build, Archived builds including all installers for download.

rknn-toolkit

project-based-tutorials-in-c - A curated list of project-based tutorials in C

OnnxStream - Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers.

Apollo-11 - Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.

armnn - Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn

doomgeneric - Easily portable doom

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

luxtorpeda - Steam Play compatibility tool to run games using native Linux engines

piper - A fast, local neural text to speech system

angband - A free, single-player roguelike dungeon exploration game