Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Doesn't RISC-V have an add-with-carry instruction as part of the vector extension? I see it listed here: https://github.com/riscv/riscv-v-spec/releases/tag/v1.0
Flamewars are definitely not expected - they're against the rules and something we try to dampen in every way we know.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
https://news.ycombinator.com/newsguidelines.html
Cortex A78 and Neoverse N1 have 4 wide decode.
ARM uses compressed encoding in their 32 bit A-series CPUs, for example the Cortex A7, A15 and so on. The A15 is pretty fast, running at up to 2.5 GHz. It was used in phones such as the Galaxy S4 and Note 3 back before 64 bit became a selling point.
Several organisations are making wide RISC-V implementations. Most of them aren't disclosing what they are doing, but one has actually published details of how it's 4-8 wide RISC-V decoder works -- they decode 16 bytes of code at a time, which is 4 instructions if they are all 32 bit instructions, 8 instructions if they are all 16 bit instructions, somewhere between for a mix.
https://github.com/MoonbaseOtago/vroom
Everything is there, in the open, including the GPL licensed SystemVerilog source code. It's not complex. The decode scheme is modular and extensible to as wide as you want, with no increase in complexity, just slightly longer latency.
There are practical limits to how wide is useful not because you can't build it, but because most code has a branch every 5 or 6 instructions on average. You can build a 20-wide machine if you want -- it just won't be any faster because it doesn't fit most of the code you'll be executing.