The Julia language has a number of correctness flaws

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • julia

    The Julia Programming Language

    So this one is a tough one for me, because Yuri has certainly spent significant time with Julia and I think he's a very competent programmer, so his criticism is certainly to be taken seriously and I'm sad to hear he ended up with a sour opinion.

    There's a lot of different issues mentioned in the post, so I'm not really sure what angle to best go at it from, but let me give it a shot anyway. I think there's a couple of different threads of complaints here. There's certainly one category of issues that are "just bugs" (I'm thinking of things like the HTTP, JSON, etc. issues mentioned). I guess the claim is that this happens more in Julia than in other systems. I don't really know how to judge this. Not that I think that the julia ecosystem has few bugs, just that in my experience, I basically see 2-3 critical issues whenever I try a new piece of software independent of what language it's written in.

    I think the other thread is "It's hard to know what's expected to work". I think that's a fair criticism and I agree with Yuri that there's some fundamental design decisions that are contributing here. Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other. That's a critical feature that makes Julia as powerful as it is, but of course you can easily end up with situations where one or the other package is making implicit assumptions that are not documented (because the author didn't think the assumptions were important in the context of their own package) and you end up with correctness issues. This one is a bit of a tricky design problem. Certainly adding more language support for interfaces and verification thereof could be helpful, but not all implicit assumptions are easily capturable in interfaces. Perhaps there needs to be more explicit documentation around what combinations of packages are "supported". Usually the best way to tell right now is to see what downstream tests are done on CI and if there are any integration tests for the two packages. If there are, they're probably supposed to work together.

    To be honest, I'm a bit pained by the list of issues in the blog post. I think the bugs linked here will get fixed relatively quickly by the broader community (posts like this tend to have that effect), but as I said I do agree with Yuri that we should be thinking about some more fundamental improvements to the language to help out. Unfortunately, I can't really say that that is high priority at the moment. The way that most Julia development has worked for the two-ish years is that there are a number of "flagship" applications that are really pushing the boundary of what Julia can do, but at the same time also need a disproportionate amount of attention. I think it's overall a good development, because these applications are justifying many people's full time attention on improving Julia, but at the same time, the issues that these applications face (e.g. - "LLVM is too slow", better observability tooling, GC latency issues) are quite different from the issues that your average open source julia developer encounters. Pre 1.0 (i.e. in 2018) there was a good 1-2 year period where all we did was think through and overhaul the generic interfaces in the language. I think we could use another one of those efforts now, but at least that this precise moment, I don't think we have the bandwidth for it. Hopefully in the future, once things settle down a bit, we'll be able to do that, which would presumably be what becomes Julia 2.0.

    Lastly, some nitpicking on the HN editorialization of the title. Only of the issues linked (https://github.com/JuliaLang/julia/issues/41096) is actually a bug in the language - the rest are various ecosystem issues. Now, I don't want to disclaim responsibility there, because a lot of those packages are also co-maintained by core julia developers and we certainly feel responsibility to make those work well, but if you're gonna call my baby ugly, at least point at the right baby ;)

  • StatsBase.jl

    Basic statistics for Julia

    Most of these seem to be about packages in the ecosystem (which, after clicking through all links, actually almost all got fixed in a very timely manner, sometimes already in a newer version of the packages than the author was using), not about the language itself. Other than that, the message of this seems to be "newer software has bugs", which yes is a thing..?

    For example, the majority of issues referenced are specific to a single package, StatsBase.jl - which apparently was written before OffsetArrays.jl was a thing and thus is known to be incompatible:

    > Yes, lots of JuliaStats packages have been written before offset axes existed. Feel free to make a PR adding checks.

    https://github.com/JuliaStats/StatsBase.jl/issues/646#issuec...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • Distributions.jl

    A Julia package for probability distributions and associated functions.

  • Optimization.jl

    Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.

    > but would you say most packages follow or enforce SemVer?

    The package ecosystem pretty much requires SemVer. If you just say `PackageX = "1"` inside of a Project.toml [compat], then it will assume SemVer, i.e. any version 1.x is non-breaking an thus allowed, but not version 2. Some (but very few) packages do `PackageX = ">=1"`, so you could say Julia doesn't force SemVar (because a package can say that it explicitly believes it's compatible with all future versions), but of course that's nonsense and there will always be some bad actors around. So then:

    > Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

    That's not the issue. As above, the dependency graphs are very strict. The issue is always at the periphery (for any package ecosystem really). In Julia, one thing that can amplify it is the fact that Requires.jl, the hacky conditional dependency system that is very not recommended for many reasons, cannot specify version requirements on conditional dependencies. I find this to be the root cause of most issues in the "flow" of the package development ecosystem. Most packages are okay, but then oh, I don't want to depend on CUDA for this feature, so a little bit of Requires.jl here, and oh let me do a small hack for OffSetArrays. And now these little hacky features on the edge are both less tested and not well versioned.

    Thankfully there's a better way to do it by using multi-package repositories with subpackages. For example, https://github.com/SciML/GalacticOptim.jl is a global interface for lots of different optimization libraries, and you can see all of the different subpackages here https://github.com/SciML/GalacticOptim.jl/tree/master/lib. This lets there be a GalacticOptim and then a GalacticBBO package, each with versioning, but with tests being different while allowing easy co-development of the parts. Very few packages in the Julia ecosystem actually use this (I only know of one other package in Julia making use of this) because the tooling only recently was able to support it, but this is how a lot of packages should be going.

    The upside too is that Requires.jl optional dependency handling is by far and away the main source of loading time issues in Julia (because it blocks precompilation in many ways). So it's really killing two birds with one stone: decreasing package load times by about 99% (that's not even a joke, it's the huge majority of the time for most packages which are not StaticArrays.jl) while making version dependencies stricter. And now you know what I'm doing this week and what the next blog post will be on haha.

  • StaticLint.jl

    Static Code Analysis for Julia

    It is correct if `A` is of type `Array` as normal Array in julia has 1-based indexing. It is incorrect if `A` is of some other type which subtypes `AbstractArray` as these may not follow 1-based indexing. But this case errors normally due to bounds checking. The OP talks about the case where even bounds checking is turned off using `@inbounds` for speed and thus silently giving wrong answers without giving an error.

    An issue was created sometime ago in StaticLint.jl to fix this: https://github.com/julia-vscode/StaticLint.jl/issues/337

  • Nim

    Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).

    Or Nim [1]..kind of an Ada with Lisp macros and more Pythonesque surface syntax.

    All three are ahead-of-time compiled/less REPL friendly, though. Taking more than 100 milliseconds to compile can be a deal breaker for some, especially in exploratory data analysis/science settings where the mindset is "try out this idea..no wait, this one..oops, I forgot a -1" and so on. In my experience, it's unfortunately hard to get scientist developers onboard with "wait for this compile" workflows.

    [1] https://nim-lang.org/

  • diffrax

    Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable. https://docs.kidger.site/diffrax/

    You may already know of it, but if you want differential-equations-in-JAX then allow me to quickly advertise Diffrax: https://github.com/patrick-kidger/diffrax (of which I am the author, disclaimer).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • clasp

    clasp Common Lisp environment (by clasp-developers)

  • Petalisp

    Elegant High Performance Computing

  • avm

    Efficient and expressive arrayed vector math library with multi-threading and CUDA support in Common Lisp.

  • Lux.jl

    Explicitly Parameterized Neural Networks in Julia

    Lots of things are being rewritten. Remember we just released a new neural network library the other day, SimpleChains.jl, and showed that it gave about a 10x speed improvement on modern CPUs with multithreading enabled vs Jax Equinox (and 22x when AVX-512 is enabled) for smaller neural network and matrix-vector types of cases (https://julialang.org/blog/2022/04/simple-chains/). Then there's Lux.jl fixing some major issues of Flux.jl (https://github.com/avik-pal/Lux.jl). Pretty much everything is switching to Enzyme which improves performance quite a bit over Zygote and allows for full mutation support (https://github.com/EnzymeAD/Enzyme.jl). So an entire machine learning stack is already seeing parts release.

    Right now we're in a bit of an uncomfortable spot where we have to use Zygote for a few things and then Enzyme for everything else, but the custom rules system is rather close and that's the piece that's needed to make the full transition.

  • Enzyme.jl

    Julia bindings for the Enzyme automatic differentiator

    Lots of things are being rewritten. Remember we just released a new neural network library the other day, SimpleChains.jl, and showed that it gave about a 10x speed improvement on modern CPUs with multithreading enabled vs Jax Equinox (and 22x when AVX-512 is enabled) for smaller neural network and matrix-vector types of cases (https://julialang.org/blog/2022/04/simple-chains/). Then there's Lux.jl fixing some major issues of Flux.jl (https://github.com/avik-pal/Lux.jl). Pretty much everything is switching to Enzyme which improves performance quite a bit over Zygote and allows for full mutation support (https://github.com/EnzymeAD/Enzyme.jl). So an entire machine learning stack is already seeing parts release.

    Right now we're in a bit of an uncomfortable spot where we have to use Zygote for a few things and then Enzyme for everything else, but the custom rules system is rather close and that's the piece that's needed to make the full transition.

  • Enzyme

    High-performance automatic differentiation of LLVM and MLIR. (by EnzymeAD)

    Enzyme dev here, so take everything I say as being a bit biased:

    While, by design Enzyme is able to run very fast by operating within the compiler (see https://proceedings.neurips.cc/paper/2020/file/9332c513ef44b... for details) -- it aggressively prioritizes correctness. Of course that doesn't mean that there aren't bugs (we're only human and its a large codebase [https://github.com/EnzymeAD/Enzyme], especially if you're trying out newly-added features).

    Notably, this is where the current rough edges for Julia users are -- Enzyme will throw an error saying it couldn't prove correctness, rather than running (there is a flag for "making a best guess, but that's off by default"). The exception to this is garbage collection, for which you can either run a static analysis, or stick to the "officially supported" subset of Julia that Enzyme specifies.

    Incidentally, this is also where being a cross-language tool is really nice -- namely we can see edge cases/bug reports from any LLVM-based language (C/C++, Fortran, Swift, Rust, Python, Julia, etc). So far the biggest code we've handled (and verified correctness for) was O(1million) lines of LLVM from some C++ template hell.

    I will also add that while I absolutely love (and will do everything I can to support) Enzyme being used throughout arbitrary Julia code: in addition to exposing a nice user-facing interface for custom rules in the Enzyme Julia bindings like Chris mentioned, some Julia-specific features (such as full garbage collection support) also need handling in Enzyme.jl, before Enzyme can be considered an "all Julia AD" framework. We are of course working on all of these things (and the more the merrier), but there's only a finite amount of time in the day. [^]

    [^] Incidentally, this is in contrast to say C++/Fortran/Swift/etc, where Enzyme has much closer to whole-language coverage than Julia -- this isn't anything against GC/Julia/etc, but we just have things on our todo list.

  • OffsetArrays.jl

    Fortran-like arrays with arbitrary, zero or negative starting indices.

    Similar correctness issues are a big part of the reason that, several years ago, I submitted a series of pull requests to Julia so that its entire test suite would run without memory errors under Valgrind, save for a few that either (i) we understood and wrote suppressions for, or (ii) we did not understand and had open issues for. Unfortunately, no one ever integrated Valgrind into the CI system, so the test suite no longer fully runs under it, last time I checked. (The test suite took nearly a day to run under Valgrind on a fast desktop machine when it worked, so is infeasible for every pull request, but could be done periodically, e.g. once every few days.)

    Even a revived effort on getting core Julia tests to pass under Valgrind would not do much to help catch correctness bugs due to composing different packages in the ecosystem. For that, running in testing with `--check-bounds=yes` is probably a better solution, and much quicker to execute as well. (see e.g. https://github.com/JuliaArrays/OffsetArrays.jl/issues/282)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts