Ask HN: Why are there no traditional language compilers that target the JVM?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • renjin

    JVM-based interpreter for the R language for the statistical analysis.

  • There is the GraalVM Python Runtime, Renjin GCC-Bridge (for C, C++, R)...

    I feel like all of this kind of exists but it's quite esoteric "non-standard stuff" and not necessarily something sane people want in production.

    https://github.com/bedatadriven/renjin/tree/master/tools/gcc...

    https://www.graalvm.org/python/

  • asmble

    Compile WebAssembly to JVM and other WASM tools

  • Sure, compile to WASM and then use https://github.com/cretz/asmble to convert to JVM bytecode.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • oberonc

    An Oberon-07 compiler for the JVM

  • The Oberon programming language is 37 years old. Since it is a memory safe language a compiler for the JVM can be written (with some workarounds), for example see the self-hosting compiler oberonc [0].

    [0] https://github.com/lboasso/oberonc

  • tracer

    Graal based x86 interpreter with separate execution trace analyzer (by pekd)

  • > How do you mean, support mmap efficiently? Do you mean being able to close the mapping without a GC? If so then Panama is fixing that.

    Everything relevant that can be done with Panama in this context can already be done in a Truffle language with sun.misc.Unsafe and e.g. Sulong used it for exactly this purpose. In fact the Unsafe allowed a lot more with a much simpler API because you really get a function for raw memory access to arbitrary addresses.

    But what's the problem anyway? For any normal compiled program, the dynamic linker will mmap the code and data into memory during startup. And a standard memory allocator in the libc like what's used by malloc also uses mmap (and sbrk) internally. Some C programs also directly use mmap to map files into memory or to reserve large amounts of memory, potentially at fixed addresses and with custom protection bits. All of this requires a proper implementation of mmap in the VM if you want to run such programs, in a way that accesses to unmapped or protected memory can be caught without crashing the VM. Side note: Panama does not provide this. The problem here is that the emulated address space only contains a few mapped regions and a lot of unmapped space in between, so you have to come up with a good way how to implement this. You could implement the emulated memory purely in Java, but it is quite slow because you essentially recreate an MMU. For performance reasons you really have to use the hardware MMU in a smart way. It can be done in the GraalVM (and was done in the GraalVM based x86_64 interpreter [1]), but it's not obvious how to do it and it's not particularly efficient either, at least if you want to catch segfaults properly. To efficiently catch segfaults, changes to at least HotSpot would be necessary.

    What's even worse here is that it's perfectly valid for a program to register a segfault handler, then cause a segfault and catch it. A few real world programs do exactly this, including the JVM itself. You might ignore such custom signal handling from the guest program, but you definitely have to avoid VM crashes caused by such signals. And again, Panama cannot do it.

    > Yes, C can do anything and if it does stuff like trying to disassemble itself, then that will clearly fail. But then you could argue it's not really written in C.

    Sure, C programs can do anything, but the problem is that on e.g. a Linux/x86_64 system many things are allowed, certain low level hacks are necessary for performance reasons, and therefore many real world Linux/x86_64 programs do weird things internally, even if it's hidden within some library where you'll never see it. If you want to run an "average" program, you'll have to handle many such cases in your VM. Otherwise you end up with a toy VM which can run a lot of toy programs but fails at larger "real world" programs.

    You can do what Sulong does and say "I don't care, I'll just pass malloc/free/mmap/... calls directly to the OS", but then you run into various problems, like e.g. you'll be unable to properly sandbox memory = the guest program can easily crash the VM. You can also do what Sulong in GraalVM Enterprise does and say "we don't support certain features like mmap", but a lot of interesting real world programs won't run. Or you can do what the x86_64 interpreter does and properly (although with reduced performance) emulate all these features, but then you end up building a Java implementation of e.g. the Linux kernel.

    In case you wonder, the x86_64 interpreter I mentioned started as a tech demo to show that you can in fact emulate x86_64 with a limited Linux userspace in a fully sandboxed and cross-platform way and with somewhat decent performance on the GraalVM. It even supported Truffle interop with standard Linux .so libraries in the past. Of course it also showed some limitations of the Graal compiler and Truffle, after all that was the entire point of the project. Don't expect Sulong-like peak performance, it's much slower than that. One of the more interesting findings was that machine code emulation with Graal is feasible and works even for larger real world programs like GCC or xpdf or CPython and peak performance can (or at least could at some point in the past) somewhat compete with qemu which uses a hand crafted JIT compiler. This was especially interesting since machine code is the worst imaginable "language" for Graal and Graal is absolutely not built for this.

    [1] https://github.com/pekd/tracer/tree/master/vmx86

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts