A64FX
PurefunctionPipelineDataflow
A64FX | PurefunctionPipelineDataflow | |
---|---|---|
7 | 172 | |
435 | 439 | |
0.5% | - | |
2.8 | 7.4 | |
6 months ago | 17 days ago | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
A64FX
-
How about HPC on ARM?
This is their main repo https://github.com/fujitsu/A64FX
-
AMD-Powered Frontier Supercomputer Breaks the Exascale Barrier, Now Fastest in the World
You should check the architecture manual https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.0.pdf
-
How many x86 instructions are there?
> I'm somewhat curmudgeonly w.r.t. SVE, insisting that while the sole system in existence is a HPC machine from Fujitsu, that for practical purposes it doesn't really exist and isn't worth learning. I will likely revise this opinion when ARM vendors decide to ship something (likely soon, by most roadmaps).
Fair enough. I have high hopes for SVE, though. The first-faulting memory ops and predicate bisection features look like a vectorization godsend.
> There's only so much space in my brain.
I'm still going to attempt a nerd-sniping with the published architecture manual. Fujitsu includes a detailed pipeline description including instruction latencies. Granted its just one part, and its an HPC-focused part at that. But its not every day that this level of detail gets published in the ARM world.
https://github.com/fujitsu/A64FX/tree/master/doc
> I was irate to discover that you can't do logic ops on 8b/16b lanes with masking; as usual the 32b/64b mafia strike again.
SVE is blessedly uniform in this regard.
> It would be nice if the explicit mask operations were cheaper. Unfortunately, they crowd out SIMD operations.
This goes both ways, though. A64FX has two vector execution pipelines and one dedicated predicate execution pipeline. Since the vector pipelines cannot execute predicate ops, I expect it is not difficult to construct cases where code gets starved for predicate execution resources.
-
“Is Parallel Programming Hard, and, If So, What Can You Do About It?” v2 Is Out
The A64fx also has hardware synchronization barriers to synchronize cores, which is a pretty GPU-like thing (at least it is very common on GPUs, and rare on CPUs).
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Speci...
- Fujitsu A64FX Microarchitecture Manual [pdf]
PurefunctionPipelineDataflow
-
Goodbye, Clean Code
Implement relational data model and programming based on hash-map (NoSQL)
-
How can I learn functional programming?
The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model
-
Does Intel have an answer (or developing one) for AMDs Infinity Fabric?
I criticized "AMD Infinity Fabric Architecture" at the end of my article "Prediction: Intel will use "RISC-V plus x86 compatibility layer" or "RISC-V plus x86 heterogeneous computing architecture" to develop a new generation of "warehouse/workshop model" CPU".
- The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with principle-based Warehouse/Workshop Model
-
What should I do to defend my rights if the architecture of the Apple M1 chip is plagiarized from my theory and architecture?
What's more, you're being somewhat liberal with your "invention" dates here anyway. I'm sure you realize that anyone can review the commit history to see when content was added to the repo. As of Nov 2020, the day Apple announced a fully operational and tested, ready-to-ship silicon package the repo was a just series of bullet points listing out well-known concepts of functional programming sprinkled with some religious analogies and inspirational quotes. The farther you go back in the repo commit history, the less content is there.
- Apple M1 Ultra's architecture is a mistake, and Why Apple is not the creator of the M1 architecture? (with comment from chip designer who have worked at Apple for decades)
- M1 Ultra's architecture is a mistake, and Why Apple is not the creator of the M1 architecture? (with comment from chip designers who have worked at Apple for decades)
What are some alternatives?
concurrencpp - Modern concurrency for C++. Tasks, executors, timers and C++20 coroutines to rule them all
gophernotes - The Go kernel for Jupyter notebooks and nteract.
verona - Research programming language for concurrent ownership
clojurust - A proof of concept version of Clojure in Rust.
refterm - Reference monospace terminal renderer
BetterDummy - Unlock your displays on your Mac! Smooth scaling, HiDPI unlock, XDR/HDR extra brightness upscale, DDC, brightness and dimming, dummy displays, PIP and lots more! [Moved to: https://github.com/waydabber/BetterDisplay]
chromium - The official GitHub mirror of the Chromium source
rss-proxy - RSS-proxy allows you to do create an RSS or ATOM feed of almost any website, just by analyzing just the static HTML structure.
fa - Lin Pengcheng Financial Analyser Homepage (林鹏程财务分析软件)
Lark - Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
penpot - Penpot: The open-source design tool for design and code collaboration
clasp - clasp Common Lisp environment