Why LuaJIT's interpreter is written in assembly

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

LuaJIT

39 4,396 8.9 C

Mirror of the LuaJIT git repository

> There is nothing to add to it.
I'm not sure that's true. Maybe LuaJIT was never going to add the features it's missing from Lua 5.2, 5.3 and 5.4. However, when Mike Pall stepped back in 2015 [0], he had still been planning to further improve the implementation - for example with a new garbage collector [1] and "hyperblock scheduling" [2] (which remain unimplemented), plus 64-bit pointer support (which was eventually completed by other people).
[0] https://www.freelists.org/post/luajit/Looking-for-new-LuaJIT...
[1] http://wiki.luajit.org/New-Garbage-Collector
[2] https://github.com/LuaJIT/LuaJIT/issues/37

luajit2

9 1,164 9.0 C

OpenResty's Branch of LuaJIT 2

https://github.com/openresty/luajit2
It has a few extras but they agree with the original luajit authors opinion that not every 5.2 feature can be made in jit.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
hexdump

1 84 0.0 C

hexdump.c: Single file C library implementation of the BSD command-line utility (by wahern)

It's not just as fast, but the difference is much less. For example, my hexdump (https://github.com/wahern/hexdump) clone compiles format specifications to bytecode for a tiny built-in VM. If I compile with VM_FASTER=1 (the default if __GNUC__ is defined), I still get an ~10% speed-up on Intel Core i9 (2019 MacBook Pro over compiling with VM_FASTER=0. Interestingly, I only get an ~5% speed-up on M1 (2020 MacBook Air), and on my AMD EPYC 3251 an ~2% slow-down unless I compile with -march=native, in which cases there indeed is no appreciable difference.
Note that I'm rounding speed-up percentages down. Using Apple clang version 12.0.0 on the i9 and M1, and GCC 9.3.0 on the EPYC. Running `LC_ALL=C time ./hexdump -C /dev/null`. Switching between -O2 and -O3 doesn't seem to change relative gains. Ditto for using using -march=native except on EPYC.
I don't have a pre-Haswell box around to test, but IIRC the difference used to be 30% or greater.
Also, AFAIU GCC and clang have improved their switch statement support. For many years GCC and clang didn't actually optimize switch statements as well as the literature or what students were taught. Aggressive switch statement optimizations resulted in too many performance regressions in real-world applications. It seems only in the past few years have GCC and clang figured out how to apply those optimizations more aggressively while avoiding performance regressions.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

ctypes.sh: A foreign function interface for bash (2023)
1 project | news.ycombinator.com | 27 Apr 2024
Variadic Functions in C
1 project | dev.to | 27 Apr 2024
Moviecart – full length color movie and audio cartridges for stock Atari 2600
2 projects | news.ycombinator.com | 27 Apr 2024
12to11 – run Wayland applications on an X server
1 project | news.ycombinator.com | 27 Apr 2024
SQLite VFS for ZSTD seekable format
2 projects | news.ycombinator.com | 26 Apr 2024

Why LuaJIT's interpreter is written in assembly

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 11 Feb 2021

LuaJIT

luajit2

WorkOS

hexdump

Related posts