-
Interesting read. I have a lot of respect for people who develop emulator for x86 processors. It is a complicated processor and from first hand experience I know that developing and debugging emulators for CPU's can be very challenging. In the past year, I spend some time developing a very limited i386 emulator [1] including some system calls for executing the first steps of live-bootstrap [2], primarily to figure out how it is working. I learned a lot about system calls and ELF.
[1] https://github.com/FransFaase/Emulator/
[2] https://github.com/fosslinux/live-bootstrap/
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
QEMU
Official QEMU mirror. Please see https://www.qemu.org/contribute/ for how to submit changes to QEMU. Pull Requests are ignored. Please only use release tarballs from the QEMU website.
Over the last year I have been rewriting QEMU's x86 decoder. I am now at a point where it should not be too hard to add APX support.
My decoder is mostly based on the tables in the manual, and the code is mostly okay—not too much indentation and phases mostly easy to separate/identify. Nevertheless there are several cases in which the manual is wrong or doesn't say the whole story.
The top comment explains a bit what's going on: https://github.com/qemu/qemu/blob/59084feb256c617063e0dbe7e6...
-
Thanks for the pointer to QEMU's decoder! I actually never looked at it before.
So you coded all the tables manually in C -- interesting, that's quite some effort. I opted to autogenerate the tables (and keep them as data only => smaller memory footprint) [1,2]. That's doable, because x86 encodings are mostly fairly consistent. I can also generate an encoder from it (ok, you don't need that). Re 'custom size "xh"': AVX-512 also has fourth and eighth. Also interesting that you have a separate row for "66+F2". I special case these two (CRC32, MOVBE) instructions with a flag.
I think the prefix decoding is not quite right for x86-64: 26/2e/36/3e are ignored in 64-bit mode, except for 2e/3e as branch-not-taken/taken hints and 3e as notrack. (See SDM Vol. 1 3.3.7.1 "Other segment override prefixes (CS, DS, ES, and SS) are ignored.") Also, REX prefixes that don't immediately preceed the opcode (or VEX/EVEX prefix) are ignored. Anyhow, I need to take a closer look at the decoder with more time. :-)
> For EVEX my plan is to keep the raw bits until after the opcode has been read
I came to the same conclusion that this is necessary with APX. The map+prefix+opcode combination identifies how the other fields are to be interpreted. For AVX-512, storing the last byte was sufficient, but with APX, vvvv got a second meaning.
> Nevertheless there are several cases in which the manual is wrong or doesn't say the whole story.
Yes... especially for corner cases, getting real hardware is the only reliable way to find out, how the CPU behaves.
[1]: https://github.com/aengelke/fadec/blob/master/instrs.txt
-
> Other architectures, like [...] ARMv8, are much more consistent.
From an instruction/operation perspective, AArch64 is more clean. However, from an instruction operand and encoding perspective, AArch64 is a lot less consistent than x86. Consider the different operand types: on x86, there are a dozen register types, immediate (8/16/32/64 bits), and memory operands (always the same layout). On AArch64, there's: GP regs, incremented GP reg (MOPS extension), extended GP reg (e.g., SXTB), shifted GP reg, stack pointer, FP reg, vector register, vector register element, vector register table, vector register table element, a dozen types of memory operands, conditions, and a dozen types of immediate encodings (including the fascinating and very useful, but also very non-trivial encoding of logical immediates [1]).
AArch64 also has some register constraints: some vector operations can only encode register 0-15 or 0-7; not to mention SVE with it's "movprfx" prefix instruction that is only valid in front of a few selected instructions.
[1]: https://github.com/aengelke/disarm/blob/master/encode.c#L19-...
-
Glad you like it. I used m10c, with a few tweaks: https://github.com/vaga/hugo-theme-m10c
-
> you've written an an ARM disassembler
Here's my AArch64 disassembler work in progress:
https://github.com/dlang/dmd/blob/master/compiler/src/dmd/ba...
I add to it in tandem with writing the code generator. It helps flush out bugs in both by doing this. I.e. generate the instruction, the disassemble it and compare with what I thought it should be.
It's quite a bit more complicated than the corresponding x86 disassembler:
https://github.com/dlang/dmd/blob/master/compiler/src/dmd/ba...
-
For an implementation of logical immediate encoding without the loop, see https://github.com/LuaJIT/LuaJIT/blob/04dca7911ea255f37be799...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
It's probably no longer maintained, but a former colleague of mine did some work on this for C++: https://github.com/ainfosec/shoulder. Obviously if the docs are lying it doesn't help much, but there was another effort he had https://github.com/ainfosec/scapula that tried to automate detecting behavior differences between the docs and the hardware implementation.
-
It's probably no longer maintained, but a former colleague of mine did some work on this for C++: https://github.com/ainfosec/shoulder. Obviously if the docs are lying it doesn't help much, but there was another effort he had https://github.com/ainfosec/scapula that tried to automate detecting behavior differences between the docs and the hardware implementation.