Singeli
tinygrad
Singeli | tinygrad | |
---|---|---|
7 | 58 | |
92 | 17,800 | |
- | - | |
9.1 | 9.7 | |
2 months ago | 10 months ago | |
C | Python | |
ISC License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Singeli
- Singeli: High-level interface for low-level programming
-
YAML Parser for Dyalog APL
I don't put a lot of stock in the "write-only" accusation. I think it's mostly used by those who don't know APL because, first, it's clever, and second, they can't read the code. However, if I remember I implemented something in J 10 years ago, I will definitely dig out the code because that's the fastest way by far for me to remember how it works.
This project specifically looks to be done in a flat array style similar to Co-dfns[0]. It's not a very common way to use APL. However, I've maintained an array-based compiler [1] for several years, and don't find that reading is a particular difficulty. Debugging is significantly easier than a scalar compiler, because the computation works on arrays drawn from the entire source code, and it's easy to inspect these and figure out what doesn't match expectations. I wrote most of [2] using a more traditional compiler architecture and it's easier to write and extend but feels about the same for reading and small tweaks. See also my review [3] of the denser compiler and precursor Co-dfns.
As for being read by others, short snippets are definitely fine. Taking some from the last week or so in the APL Farm, {⍵÷⍨+/|-/¯9 ¯11+.○?2⍵2⍴0} and {(⍸⍣¯1+\⎕IO,⍺)⊂[⎕IO]⍵} seemed to be easily understood. Forum links at [4]; the APL Orchard is viewable without signup and tends to have a lot of code discussion. There are APL codebases with many programmers, but they tend to be very verbose with long names. Something like the YAML parser here with no comments and single-letter names would be hard to get into. I can recognize, say, that c⌿¨⍨←(∨⍀∧∨⍀U⊖)∘(~⊢∊LF⍪WS⍨)¨c trims leading and trailing whitespace from each string in a few seconds, but in other places there are a lot of magic numbers so I get the "what" but not the "why". Eh, as I look over it things are starting to make sense, could probably get through this in an hour or so. But a lot of APLers don't have experience with the patterns used here.
[0] https://github.com/Co-dfns/Co-dfns
[1] https://github.com/mlochbaum/BQN/blob/master/src/c.bqn
[2] https://github.com/mlochbaum/Singeli/blob/master/singeli.bqn
[3] https://mlochbaum.github.io/BQN/implementation/codfns.html
[4] https://aplwiki.com/wiki/Chat_rooms_and_forums
- Singeli: A DSL for building SIMD algorithms
-
Tolower() in Bulk at Speed
Here's an AVX-2 implementation that assumes it can read up to 31 bytes past the end of the input: https://godbolt.org/z/P7PP1MnK7
Requires -fno-unroll-loops as otherwise clang gets overly unroll-y; the code is fast enough. Tail is dealt with by blending the originally read value with the new one.
(yes, that's autogenerated; from some https://github.com/mlochbaum/singeli code)
-
Jd
It's not ideal, but I've done this in BQN and it took about 15 lines. I didn't need to handle comments or escapes, which would add a little complexity. See functions ParseXml and ParseAttr here: https://github.com/mlochbaum/Singeli/blob/master/data/iintri...
XML is particularly simple though, dealing with something like JPEG would be an entirely different experience.
tinygrad
- tinygrad: extreme simplicity, easiest framework to add new accelerators to
-
GGML – AI at the Edge
Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?
[0] https://github.com/geohot/tinygrad
-
Render neural network into CUDA/HIP code
at first glance i thought may its like tinygrad. but looks has many ops than that tiny grad but most maps to underlying hardware provided ops?
i wonder how well tinygrad's apporach will work out, ops fusion sounds easy, just a walk a graph, pattern match it and lower to hardware provided ops?
Anyway if anyone wants to understand the philosophy behind tinygrad, this file is great start https://github.com/geohot/tinygrad/blob/master/docs/abstract...
-
llama.cpp now officially supports GPU acceleration.
There are currently at least 3 ways to run llama on m1 with GPU acceleration. - mlc-llm (pre-built, only 1 model has been ported) - tinygrad (very memory efficient, not that easy to integrate into other projects) - llama-mps (original llama codebase + llama adapter support)
- George Hotz building an AMD competitor to Nvidia.
-
George Hotz ROCm adventures
Hopefully we will see now full support with AMD hardware on https://github.com/geohot/tinygrad. You can read more about it on https://tinygrad.org/
-
The Coming of Local LLMs
tinygrad
https://github.com/geohot/tinygrad/tree/master/accel/ane
But I have not tested it on Linux since Asahi has not yet added support.
llama.cpp runs at 18ms per token (7B) and 200ms per token (65B) without quantization.
- Everything we know about Apple's Neural Engine
- Everything we know about the Apple Neural Engine (ANE)
- How 'Open' Is OpenAI, Really?
What are some alternatives?
emojicode - 😀😜🔂 World’s only programming language that’s bursting with emojis
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
data_jd - Jd
llama.cpp - LLM inference in C/C++
rust - Empowering everyone to build reliable and efficient software.
openpilot - openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for 250+ supported car makes and models.
BQN-autograd - Autograd library in BQN using (generalized) dual numbers
llama - Inference code for Llama models
CBQN - a BQN implementation in C
tensorflow_macos - TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ