Stable Diffusion in pure C/C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • stable-diffusion.cpp

    Stable Diffusion in pure C/C++

  • Interesting. It still doesn't seem to be very quick: https://github.com/leejet/stable-diffusion.cpp/issues/6

    But don't get me wrong, I look forward to playing with ggml SD and its development.

  • tinygrad

    You like pytorch? You like micrograd? You love tinygrad! ❤️

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • HIP

    HIP: C++ Heterogeneous-Compute Interface for Portability

  • root

    The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

  • That Python ML code is calling C++ code running in the GPU, one more reason to use C++ across the whole stack.

    CERN already used prototyping in C++, with ROOT and CINT, 20 years ago.

    https://root.cern/

    Nowadays it is even usable from Netbooks via Xeus.

    It is more a matter of lack of exposure to C++ interpreters than anything else.

  • ggml

    Tensor library for machine learning

  • I did a quick run under profiler and on my AVX2-laptop the slowest part (>50%) was matrix multiplication (sgemm).

    In current version of GGML if OpenBLAS is enabled, they convert matrices to FP32 before running sgemm.

    If OpenBLAS is disabled, on AVX2 plaftorm they convert FP16 to FP32 on every FMA operation, which even worse (due to repetition). After that, both ggml_vec_dot_f16 and ggml_vec_dot_f32 took first place in profiler.

    Source: https://github.com/ggerganov/ggml/blob/master/src/ggml.c#L10...

  • Vrmac

    Vrmac Graphics, a cross-platform graphics library for .NET. Supports 3D, 2D, and accelerated video playback. Works on Windows 10 and Raspberry Pi4.

  • I have minimal experience with Rust. OTOH, programming C++ for living since 2000, with a few gaps when I used other languages like Obj-C and C#.

    I agree C++ is very hard to learn if you only have experience with higher-level languages like Python and Scala. I think there’re two reasons for that.

    C++ is unsafe. There’s no way around this one, it was designed that way, like C or assembly. Still, with modern toolset it’s not terribly bad. Compilers print warnings, BTW I typically ask them to treat warnings as errors to deliberately fail the build. On Windows, a combination of debug build, debug C runtime, and visual studio debugger helps tremendously. Linux compilers have these sanitizers (address, memory, thread, undefined behavior) which are comparable, they too sacrifice runtime speed for diagnostics and debuggability.

    Another reason, the language itself is very complicated, especially the templates. However, just because something is in the language doesn’t mean it’s a good idea to use it. You don’t need to be familiar with that stuff unless doing something very advanced, like customizing the Eigen C++ library. Don’t follow the patterns found in the standard library: unlike your code, that library has good reasons to use that template BS. If instead of templates you do something else, C++ becomes much easier to use, and most importantly other people will still be able to read and understand your code. Another reason to avoid excessive template metaprogramming, it slows down the compiler, because template-heavy code often needs to be in headers as opposed to cpp files.

    P.S. If you don’t need extreme levels of performance (defined as “approach the numbers listed in CPU specs”, the numbers are FLOPS or memory bandwidth), and you don’t need the ecosystem too much, consider C# instead of C++. Much faster than Python, often faster than Scala or Java, easy integration with C should you need that (same as Rust, much easier than Python or Java), the only downside is these ~100MB of the runtime. The reputation is weird, but technically the language and runtime are pretty good. For example, here’s a C# library which re-implements a subset of ffmpeg and libavcodec C libraries: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo

  • sd-extension-system-info

    System and platform info and standardized benchmarking extension for SD.Next and WebUI

  • That seems a lot worse than a 2060 SUPER with PyTorch in A1111.

    https://vladmandic.github.io/sd-extension-system-info/pages/... (search for 2060 SUPER)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts