Halide
stb
Halide | stb | |
---|---|---|
43 | 164 | |
5,714 | 25,128 | |
0.5% | - | |
9.5 | 6.4 | |
3 days ago | 3 days ago | |
C++ | C | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Halide
-
Show HN: Flash Attention in ~100 lines of CUDA
If CPU/GPU execution speed is the goal while simultaneously code golfing the source size, https://halide-lang.org/ might have come in handy.
- Halide v17.0.0
-
From slow to SIMD: A Go optimization story
This is a task where Halide https://halide-lang.org/ could really shine! It disconnects logic from scheduling (unrolling, vectorizing, tiling, caching intermediates etc), so every step the author describes in the article is a tunable in halide. halide doesn't appear to have bindings for golang so calling C++ from go might be the only viable option.
-
Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)
Probably would have been much easier to do 15 times in https://halide-lang.org/
The idea behind Halide is that scheduling memory access patterns is critical to performance. But, access patterns being interwoven into arithmetic algorithms makes them difficult to modify separately.
So, in Halide you specify the arithmetic and the schedule separately so you can rapidly iterate on either.
- Making Hard Things Easy
-
Deepmind Alphadev: Faster sorting algorithms discovered using deep RL
It is not the sorting per-se which was improved here, but sorting (particularly short sequences) on modern CPUs with really the complexity being on the difficulty of predicting what will work quickly on these modern CPUs.
Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/
-
Two-tier programming language
Halide https://halide-lang.org/
- Best book on writing an optimizing compiler (inlining, types, abstract interpretation)?
-
Blog Post: Can You Trust a Compiler to Optimize Your Code?
It doesn’t apply in this case, but in general if you really want the best vectorization I would suggest using https://halide-lang.org instead of trying to coerce your compiler.
-
What would make you try a new language?
If we drop the "APL" requirement, wouldn't Halide fit your criteria for the third?
stb
- Lessons learned about how to make a header-file library (2013)
-
Nebula is an open-source and free-to-use modern C++ game engine
Have you considered not using an engine at all, in favor of libraries? There are many amazing libraries I've used for game development - all in C/C++ - that you can piece together:
* General: [stb](https://github.com/nothings/stb)
- STB: Single-file public domain libraries for C/C++
-
Writing a TrueType font renderer
Great to see more accessible references on font internals. I have dabbled on this a bit last year and managed to have a parser and render the points of a glyph's contour (I stopped before Bezier and shape filling stuff). I still have not considered hinting, so it's nice that it's covered. What helped me was an article from the Handmade Network [1] and the source of stb_truetype [2] (also used in Dear ImGUI).
[1] https://handmade.network/forums/articles/t/7330-implementing....
[2] https://github.com/nothings/stb/blob/master/stb_truetype.h
-
Capturing the WebGPU Ecosystem
So I read through the materials on mesh shaders and work graphs and looked at sample code. These won't really work (see below). As I implied previously, it's best to research/discuss these sort of matters with professional graphics programmers who have experience actually using the technologies under consideration.
So for the sake of future web searchers who discover this thread: there are only two proven ways to efficiently draw thousands of unique textures of different sizes with a single draw call that are actually used by experienced graphics programmers in production code as of 2023.
Proven method #1: Pack these thousands of textures into a texture atlas.
Proven method #2: Use bindless resources, which is still fairly bleeding edge, and will require fallback to atlases if targeting the PC instead of only high end console (Xbox Series S|X...).
Mesh shaders by themselves won't work: These have similar texture access limitations to the old geometry/tessellation stage they improve upon. A limited, fixed number of textures still must be bound before each draw call (say, 16 or 32 textures, not 1000s), unless bindless resources are used. So mesh shaders must be used with an atlas or with bindless resources.
Work graphs by themselves won't work: This feature is bleeding edge shader model 6.8 whereas bindless resources are SM 6.6. (Xbox Series X|S might top out at SM 6.7, I can't find an authoritative answer.) It looks like work graphs might only work well on nVidia GPUs and won't work well on Intel GPUs anytime soon (but, again, I'm not knowledgeable enough to say this authoritatively). Furthermore, this feature may have a hard dependency on using bindless to begin with. That is, I can't tell if one is allowed to execute a work graph that binds and unbinds individual texture resources. And if one could do such a thing, it would certainly be slower than using bindless. The cost of bindless is paid "up front" when the textures are uploaded.
Some programmers use Texture2DArray/GL_TEXTURE_2D_ARRAY as an alternative to atlases but two limitations are (1) the max array length (e.g. GL_MAX_ARRAY_TEXTURE_LAYERS) might only be 256 (e.g. for OpenGL 3.0), (2) all textures must be the same size.
Finally, for the sake of any web searcher who lands on this thread in the years to come, to pack an atlas well a good packing algorithm is needed. It's harder to pack triangles than rectangles but triangles use atlas memory more efficiently and a good triangle packing will outperform the fancy new bindless rendering. Some open source starting points for packing:
https://github.com/nothings/stb/blob/master/stb_rect_pack.h
https://github.com/ands/trianglepacker
-
Www Which WASM Works
The STB headers are mostly built like that: https://github.com/nothings/stb
You could also add an optional 'convenience API' over the lower-level flexible-but-inconvenient core API, as long as core library can be compiled on its own.
In essence it's just a way to decouple the actually important library code from runtime environment details which might be better implemented outside the C/C++ stdlib.
It's already as simple as the stdlib IO functions not being asynchrononous while many operating systems provide more modern alternatives. For a specific type of library (such an image decoder) it's often better to delegate such details to the library user instead of circumventing the stdlib and talking directly to OS APIs.
-
File for Divorce from LLVM
My stuff for instance:
https://github.com/floooh/sokol
...inspired by:
https://github.com/nothings/stb
But it's not so much about the build system, but requiring a separate C/C++ compiler toolchain (Rust needs this, Zig currently does not - unless the proposal is implemented).
-
What C libraries do you use the most?
STB Libraries: https://github.com/nothings/stb
-
[Noob Question] How do C programmers get around not having hash maps?
stb_ds is also very popular.
- Is there an existing multidimensional hash table implementation in C?
What are some alternatives?
taichi - Productive, portable, and performant GPU programming in Python.
Vcpkg - C++ Library Manager for Windows, Linux, and MacOS
futhark - :boom::computer::boom: A data-parallel functional programming language
imgui-node-editor - Node Editor built using Dear ImGui
Image-Convolutaion-OpenCL
ZXing - ZXing ("Zebra Crossing") barcode scanning library for Java, Android
TensorOperations.jl - Julia package for tensor contractions and related operations
freetype-gl - OpenGL text using one vertex buffer, one texture and FreeType
triton - Development repository for the Triton language and compiler
ImageMagick - 🧙♂️ ImageMagick 7
ponyc - Pony is an open-source, actor-model, capabilities-secure, high performance programming language
Cppcheck - static analysis of C/C++ code