sse2neon
stb
sse2neon | stb | |
---|---|---|
7 | 164 | |
1,224 | 25,128 | |
1.2% | - | |
7.3 | 6.4 | |
16 days ago | 3 days ago | |
C++ | C | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sse2neon
- sse2neon - A C/C++ header file that converts Intel SSE intrinsics to Aarch64 NEON intrinsic
- A C/C++ header file that converts Intel SSE intrinsics to Aarch64 NEON intrinsic
-
Porting Architecture Specific C/C++ Intrinsics to Graviton
The sse2neon project is a quick way to get C/C++ applications compiling and running on Graviton. The sse2neon header file provides NEON implementations for x64 intrinsics so no source code changes are needed. Each function call (intrinsic) is simply replaced with NEON instructions and will just work on Graviton.
-
An AWS Community Builder Story
To continue our collaboration I contributed some small changes to KasmVNC on GitHub to use sse2neon for a performance critical part of the application which uses SSE intrinsics and needed to be changed to NEON intrinsics.
-
Deserializing JSON Fast
I think the talk is very clearly laid out as an incremental journey, and each stepping stone involves contextual decision-making. I don't think Andreas is saying "you must end up with the SSE2 implementation at the end". Using machine-specific intrinsics is another dependency decision very similar to deciding to use a given library. I would have loved the talk and probably still thought of it and posted it, even if it ended before the intrinsics (but I think he does an excellent job at that part too).
And porting SSE2 to Neon is actually pretty easy -- if you use https://github.com/DLTcollab/sse2neon, IME it's very easy to do incrementally (or avoid or postpone indefinitely, depending on your needs).
-
PortableGL: An MIT licensed implementation of OpenGL 3.x-ish in clean C
I have a private cross-platform port, I’m waiting on the resolution of his latest GitHub issue to submit my changes. sse2neon (https://github.com/DLTcollab/sse2neon) was a big help - I also wrote a very primitive sse2scalar for raspbian builds where neon is unavailable. Honestly SIMD doesn’t help much, as you’re usually memory bound under SWGL. The biggest perf win is any amount of asynchronous execution - running off the main thread is good enough and could be applied to your library externally through a command buffer without any changes to your code.
-
Success porting VCV into aarch64 linux! (Usable on Android Devices)
You should go to /include/simd and download sse2neon.h into the folder. Replace appearing in any source files in that directory with "sse2neon.h". You will still encounter errors; remove the lines causing problems, typically containing the phrase ZERO_MODE. ARM processors does not require it.
stb
- Lessons learned about how to make a header-file library (2013)
-
Nebula is an open-source and free-to-use modern C++ game engine
Have you considered not using an engine at all, in favor of libraries? There are many amazing libraries I've used for game development - all in C/C++ - that you can piece together:
* General: [stb](https://github.com/nothings/stb)
- STB: Single-file public domain libraries for C/C++
-
Writing a TrueType font renderer
Great to see more accessible references on font internals. I have dabbled on this a bit last year and managed to have a parser and render the points of a glyph's contour (I stopped before Bezier and shape filling stuff). I still have not considered hinting, so it's nice that it's covered. What helped me was an article from the Handmade Network [1] and the source of stb_truetype [2] (also used in Dear ImGUI).
[1] https://handmade.network/forums/articles/t/7330-implementing....
[2] https://github.com/nothings/stb/blob/master/stb_truetype.h
-
Capturing the WebGPU Ecosystem
So I read through the materials on mesh shaders and work graphs and looked at sample code. These won't really work (see below). As I implied previously, it's best to research/discuss these sort of matters with professional graphics programmers who have experience actually using the technologies under consideration.
So for the sake of future web searchers who discover this thread: there are only two proven ways to efficiently draw thousands of unique textures of different sizes with a single draw call that are actually used by experienced graphics programmers in production code as of 2023.
Proven method #1: Pack these thousands of textures into a texture atlas.
Proven method #2: Use bindless resources, which is still fairly bleeding edge, and will require fallback to atlases if targeting the PC instead of only high end console (Xbox Series S|X...).
Mesh shaders by themselves won't work: These have similar texture access limitations to the old geometry/tessellation stage they improve upon. A limited, fixed number of textures still must be bound before each draw call (say, 16 or 32 textures, not 1000s), unless bindless resources are used. So mesh shaders must be used with an atlas or with bindless resources.
Work graphs by themselves won't work: This feature is bleeding edge shader model 6.8 whereas bindless resources are SM 6.6. (Xbox Series X|S might top out at SM 6.7, I can't find an authoritative answer.) It looks like work graphs might only work well on nVidia GPUs and won't work well on Intel GPUs anytime soon (but, again, I'm not knowledgeable enough to say this authoritatively). Furthermore, this feature may have a hard dependency on using bindless to begin with. That is, I can't tell if one is allowed to execute a work graph that binds and unbinds individual texture resources. And if one could do such a thing, it would certainly be slower than using bindless. The cost of bindless is paid "up front" when the textures are uploaded.
Some programmers use Texture2DArray/GL_TEXTURE_2D_ARRAY as an alternative to atlases but two limitations are (1) the max array length (e.g. GL_MAX_ARRAY_TEXTURE_LAYERS) might only be 256 (e.g. for OpenGL 3.0), (2) all textures must be the same size.
Finally, for the sake of any web searcher who lands on this thread in the years to come, to pack an atlas well a good packing algorithm is needed. It's harder to pack triangles than rectangles but triangles use atlas memory more efficiently and a good triangle packing will outperform the fancy new bindless rendering. Some open source starting points for packing:
https://github.com/nothings/stb/blob/master/stb_rect_pack.h
https://github.com/ands/trianglepacker
-
Www Which WASM Works
The STB headers are mostly built like that: https://github.com/nothings/stb
You could also add an optional 'convenience API' over the lower-level flexible-but-inconvenient core API, as long as core library can be compiled on its own.
In essence it's just a way to decouple the actually important library code from runtime environment details which might be better implemented outside the C/C++ stdlib.
It's already as simple as the stdlib IO functions not being asynchrononous while many operating systems provide more modern alternatives. For a specific type of library (such an image decoder) it's often better to delegate such details to the library user instead of circumventing the stdlib and talking directly to OS APIs.
-
File for Divorce from LLVM
My stuff for instance:
https://github.com/floooh/sokol
...inspired by:
https://github.com/nothings/stb
But it's not so much about the build system, but requiring a separate C/C++ compiler toolchain (Rust needs this, Zig currently does not - unless the proposal is implemented).
-
What C libraries do you use the most?
STB Libraries: https://github.com/nothings/stb
-
[Noob Question] How do C programmers get around not having hash maps?
stb_ds is also very popular.
- Is there an existing multidimensional hash table implementation in C?
What are some alternatives?
yenten-arm-miner-yespowerr16 - ARM 64 CPU miner for Yespower variant algorithms
Vcpkg - C++ Library Manager for Windows, Linux, and MacOS
KasmVNC - Modern VNC Server and client, web based and secure
imgui-node-editor - Node Editor built using Dear ImGui
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
ZXing - ZXing ("Zebra Crossing") barcode scanning library for Java, Android
Tow-Boot - An opinionated distribution of U-Boot. — https://matrix.to/#/#Tow-Boot:matrix.org?via=matrix.org
freetype-gl - OpenGL text using one vertex buffer, one texture and FreeType
libsamplerate - An audio Sample Rate Conversion library
ImageMagick - 🧙♂️ ImageMagick 7
cglm - 📽 Highly Optimized 2D / 3D Graphics Math (glm) for C
Cppcheck - static analysis of C/C++ code