“Why I still recommend Julia”

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • julia

    The Julia Programming Language

  • The issue is not that the bugs are with correctness of multiple dispatch, but that multiple dispatch allows you to combine generic programming with abstract data types. Thus, when I have a generic implementation, someone can pass a new user data type - a combination that can easily not work. Thus, the discussion here, tends to focus on defining interfaces, and of course on better testing of uncommonly used data types.

    In general, we've not had a formal roadmap - but we present a "State of Julia" talk at JuliaCon every year. But very broadly, the list (of the top of my head) includes: improving a lot of the underlying compiler infrastructure overall, improving support differentiable programming, improving garbage collection, support for GPUs from multiple vendors (too many of those now), supporting apple silicon, type system support for tools like JET.jl.

    NEWS.md is generally updated during the course of a release cycle, which eventually becomes release notes, and then post release, we put together a highlights blog post. https://github.com/JuliaLang/julia/blob/master/NEWS.md

  • SciPy

    SciPy library main repository

  • I don't see how it addresses the original complaint. Vishnevsky basically stated that if you are trying to run a scientific experiment on a supercomputer, maybe it's a risky idea to use a new programming language with a new stdlib and a bunch of OSS libraries vs using an old language like C with very stable set of existing code because new things tend to have unknown bugs? Vishnevsky has a point, but unless you are running some critical computations on supercomputers, maybe it doesn't apply to you?

    To be clear, in supercomputing environments people still use old versions of CentOS just to make sure that library version updates do not change their computation results. I don't think many people here would say "I am sticking to Ubuntu 16.04 because I am afraid that the updates to some library like gmplib will slightly change my computation results in a way that is hard for me to detect".

    Also, just staying with the old doesn't mean it's correct. You can also introduce bugs to your libs. I think NASA thought this through long time ago and solved it by making sure critical parts of the code are implemented twice using different stacks with different programmers.

    If you are NASA, CERN, LLNL, or a bank, maybe it's a good idea to implement your computations once in Python and once in Julia (by at least two different programmers) and compare the outputs. And I don't think in this situation Julia is any different from other languages (other than you may put too much trust into it and skip this dual implementation). Case in point: https://github.com/scipy/scipy/issues?q=is%3Aissue+is%3Aclos...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • RecursiveArrayTools.jl

    Tools for easily handling objects like arrays of arrays and deeper nestings in scientific machine learning (SciML) and other applications

  • The load times on some core packages were reduced by an order of magnitude this month. For example, RecursiveArrayTools went from 6228.5 ms to 292.7 ms. This was due to the new `@time_imports` in the Julia v1.8-beta helping to isolate load time issues. See https://github.com/SciML/RecursiveArrayTools.jl/pull/217 . This of course doesn't mean load times have been solved everywhere, but we now have the tooling to identify the root causes and it's actively being worked on from multiple directions.

  • Lux.jl

    Explicitly Parameterized Neural Networks in Julia

  • Can you point to a concrete example of one that someone would run into when using the differential equation solvers with the default and recommended Enzyme AD for vector-Jacobian products? I'd be happy to look into it, but there do not currently seem to be any correctness issues in the Enzyme issue tracker that are current (3 issues are open but they all seem to be fixed, other than https://github.com/EnzymeAD/Enzyme.jl/issues/278 which is actually an activity analysis bug in LLVM). So please be more specific. The issue with Enzyme right now seems to moreso be about finding functional forms that compile, and it throws compile-time errors in the event that it cannot fully analyze the program and if it has too much dynamic behavior (example: https://github.com/EnzymeAD/Enzyme.jl/issues/368).

    Additional note, we recently did a overhaul of SciMLSensitivity (https://sensitivity.sciml.ai/dev/) and setup a system which amounts to 15 hours of direct unit tests doing a combinatoric check of arguments with 4 hours of downstream testing (https://github.com/SciML/SciMLSensitivity.jl/actions/runs/25...). What that identified is that any remaining issues that can arise are due to the implicit parameters mechanism in Zygote (Zygote.params). To counteract this upstream issue, we (a) try to default to never default to Zygote VJPs whenever we can avoid it (hence defaulting to Enzyme and ReverseDiff first as previously mentioned), and (b) put in a mechanism for early error throwing if Zygote hits any not implemented derivative case with an explicit error message (https://github.com/SciML/SciMLSensitivity.jl/blob/v7.0.1/src...). We have alerted the devs of the machine learning libraries, and from this there has been a lot of movement. In particular, a globals-free machine learning library, Lux.jl, was created with fully explicit parameters https://lux.csail.mit.edu/dev/, and thus by design it cannot have this issue. In addition, the Flux.jl library itself is looking to do a redesign that eliminates implicit parameters (https://github.com/FluxML/Flux.jl/issues/1986). Which design will be the one in the end, that's uncertain right now, but it's clear that no matter what the future designs of the deep learning libraries will fully cut out that part of Zygote.jl. And additionally, the other AD libraries (Enzyme and Diffractor for example) do not have this "feature", so it's an issue that can only arise from a specific (not recommended) way of using Zygote (which now throws explicit error messages early and often if used anywhere near SciML because I don't tolerate it).

    So from this, SciML should be rather safe and if not, please share some details and I'd be happy to dig in.

  • Enzyme.jl

    Julia bindings for the Enzyme automatic differentiator

  • Can you point to a concrete example of one that someone would run into when using the differential equation solvers with the default and recommended Enzyme AD for vector-Jacobian products? I'd be happy to look into it, but there do not currently seem to be any correctness issues in the Enzyme issue tracker that are current (3 issues are open but they all seem to be fixed, other than https://github.com/EnzymeAD/Enzyme.jl/issues/278 which is actually an activity analysis bug in LLVM). So please be more specific. The issue with Enzyme right now seems to moreso be about finding functional forms that compile, and it throws compile-time errors in the event that it cannot fully analyze the program and if it has too much dynamic behavior (example: https://github.com/EnzymeAD/Enzyme.jl/issues/368).

    Additional note, we recently did a overhaul of SciMLSensitivity (https://sensitivity.sciml.ai/dev/) and setup a system which amounts to 15 hours of direct unit tests doing a combinatoric check of arguments with 4 hours of downstream testing (https://github.com/SciML/SciMLSensitivity.jl/actions/runs/25...). What that identified is that any remaining issues that can arise are due to the implicit parameters mechanism in Zygote (Zygote.params). To counteract this upstream issue, we (a) try to default to never default to Zygote VJPs whenever we can avoid it (hence defaulting to Enzyme and ReverseDiff first as previously mentioned), and (b) put in a mechanism for early error throwing if Zygote hits any not implemented derivative case with an explicit error message (https://github.com/SciML/SciMLSensitivity.jl/blob/v7.0.1/src...). We have alerted the devs of the machine learning libraries, and from this there has been a lot of movement. In particular, a globals-free machine learning library, Lux.jl, was created with fully explicit parameters https://lux.csail.mit.edu/dev/, and thus by design it cannot have this issue. In addition, the Flux.jl library itself is looking to do a redesign that eliminates implicit parameters (https://github.com/FluxML/Flux.jl/issues/1986). Which design will be the one in the end, that's uncertain right now, but it's clear that no matter what the future designs of the deep learning libraries will fully cut out that part of Zygote.jl. And additionally, the other AD libraries (Enzyme and Diffractor for example) do not have this "feature", so it's an issue that can only arise from a specific (not recommended) way of using Zygote (which now throws explicit error messages early and often if used anywhere near SciML because I don't tolerate it).

    So from this, SciML should be rather safe and if not, please share some details and I'd be happy to dig in.

  • SciMLSensitivity.jl

    A component of the DiffEq ecosystem for enabling sensitivity analysis for scientific machine learning (SciML). Optimize-then-discretize, discretize-then-optimize, adjoint methods, and more for ODEs, SDEs, DDEs, DAEs, etc.

  • Can you point to a concrete example of one that someone would run into when using the differential equation solvers with the default and recommended Enzyme AD for vector-Jacobian products? I'd be happy to look into it, but there do not currently seem to be any correctness issues in the Enzyme issue tracker that are current (3 issues are open but they all seem to be fixed, other than https://github.com/EnzymeAD/Enzyme.jl/issues/278 which is actually an activity analysis bug in LLVM). So please be more specific. The issue with Enzyme right now seems to moreso be about finding functional forms that compile, and it throws compile-time errors in the event that it cannot fully analyze the program and if it has too much dynamic behavior (example: https://github.com/EnzymeAD/Enzyme.jl/issues/368).

    Additional note, we recently did a overhaul of SciMLSensitivity (https://sensitivity.sciml.ai/dev/) and setup a system which amounts to 15 hours of direct unit tests doing a combinatoric check of arguments with 4 hours of downstream testing (https://github.com/SciML/SciMLSensitivity.jl/actions/runs/25...). What that identified is that any remaining issues that can arise are due to the implicit parameters mechanism in Zygote (Zygote.params). To counteract this upstream issue, we (a) try to default to never default to Zygote VJPs whenever we can avoid it (hence defaulting to Enzyme and ReverseDiff first as previously mentioned), and (b) put in a mechanism for early error throwing if Zygote hits any not implemented derivative case with an explicit error message (https://github.com/SciML/SciMLSensitivity.jl/blob/v7.0.1/src...). We have alerted the devs of the machine learning libraries, and from this there has been a lot of movement. In particular, a globals-free machine learning library, Lux.jl, was created with fully explicit parameters https://lux.csail.mit.edu/dev/, and thus by design it cannot have this issue. In addition, the Flux.jl library itself is looking to do a redesign that eliminates implicit parameters (https://github.com/FluxML/Flux.jl/issues/1986). Which design will be the one in the end, that's uncertain right now, but it's clear that no matter what the future designs of the deep learning libraries will fully cut out that part of Zygote.jl. And additionally, the other AD libraries (Enzyme and Diffractor for example) do not have this "feature", so it's an issue that can only arise from a specific (not recommended) way of using Zygote (which now throws explicit error messages early and often if used anywhere near SciML because I don't tolerate it).

    So from this, SciML should be rather safe and if not, please share some details and I'd be happy to dig in.

  • Flux.jl

    Relax! Flux is the ML library that doesn't make you tensor

  • Can you point to a concrete example of one that someone would run into when using the differential equation solvers with the default and recommended Enzyme AD for vector-Jacobian products? I'd be happy to look into it, but there do not currently seem to be any correctness issues in the Enzyme issue tracker that are current (3 issues are open but they all seem to be fixed, other than https://github.com/EnzymeAD/Enzyme.jl/issues/278 which is actually an activity analysis bug in LLVM). So please be more specific. The issue with Enzyme right now seems to moreso be about finding functional forms that compile, and it throws compile-time errors in the event that it cannot fully analyze the program and if it has too much dynamic behavior (example: https://github.com/EnzymeAD/Enzyme.jl/issues/368).

    Additional note, we recently did a overhaul of SciMLSensitivity (https://sensitivity.sciml.ai/dev/) and setup a system which amounts to 15 hours of direct unit tests doing a combinatoric check of arguments with 4 hours of downstream testing (https://github.com/SciML/SciMLSensitivity.jl/actions/runs/25...). What that identified is that any remaining issues that can arise are due to the implicit parameters mechanism in Zygote (Zygote.params). To counteract this upstream issue, we (a) try to default to never default to Zygote VJPs whenever we can avoid it (hence defaulting to Enzyme and ReverseDiff first as previously mentioned), and (b) put in a mechanism for early error throwing if Zygote hits any not implemented derivative case with an explicit error message (https://github.com/SciML/SciMLSensitivity.jl/blob/v7.0.1/src...). We have alerted the devs of the machine learning libraries, and from this there has been a lot of movement. In particular, a globals-free machine learning library, Lux.jl, was created with fully explicit parameters https://lux.csail.mit.edu/dev/, and thus by design it cannot have this issue. In addition, the Flux.jl library itself is looking to do a redesign that eliminates implicit parameters (https://github.com/FluxML/Flux.jl/issues/1986). Which design will be the one in the end, that's uncertain right now, but it's clear that no matter what the future designs of the deep learning libraries will fully cut out that part of Zygote.jl. And additionally, the other AD libraries (Enzyme and Diffractor for example) do not have this "feature", so it's an issue that can only arise from a specific (not recommended) way of using Zygote (which now throws explicit error messages early and often if used anywhere near SciML because I don't tolerate it).

    So from this, SciML should be rather safe and if not, please share some details and I'd be happy to dig in.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • SciMLStyle

    A style guide for stylish Julia developers

  • No, you do get type errors during runtime. The most common one is a MethodNotFound error, which corresponds to a dispatch not being found. This is the one that people then complain about for long stacktraces and as being hard to read (and that's a valid criticism). The reason for it is because if you do xy with a type combination that does not have a corresponding dispatch, i.e. (x::T1,y::T2) not defined anywhere, then it looks through the method table of the function, does not find one, and throws this MethodNotFound error. You will only get no error if a method is found. Now what can happen is that you can have a method to an abstract type, *(x::T1,y::AbstractArray), but `y` does not "actually" act like an AbstractArray in some way. If the way that it's "not an AbstractArray" is that it's missing some method overloads of the AbstractArray interface (https://docs.julialang.org/en/v1/manual/interfaces/#man-inte...), you will get a MethodNotFound error thrown on that interface function. Thus you will only not get an error if someone has declared `typeof(y) <: AbstractArray` and implemented the AbstractArray interface.

    However, what Yuri pointed out is that there are some packages (specifically in the statistics area) which implemented functions like `f(A::AbstractArray)` but used `for i in 1:length(A)` to iterate through x's values. Notice that the AbstractArray interface has interface functions for "non-traditional indices", including `axes(A)` which is a function to call to get "the a tuple of AbstractUnitRange{<:Integer} of valid indices". Thus these codes are incorrect, because by the definition of the interface you should be doing `for i in axes(A)` if you want to support an AbstractArray because there is no guarantee that its indices go from `1:length(A)`. Note that this was added to the `AbstractArray` interface in the v1.0 change, which is notably after the codes he referenced were written, and thus it's more that they were not updated to handle this expanded interface when the v1.0 transition occurred.

    This is important to understand because the criticisms and proposed "solutions" don't actually match the case... at all. This is not a case of Julia just letting anything through: someone had to purposefully define these functions for them to exist. And interfaces are not a solution here because there is an interface here, its rules were just not followed. I don't know of an interface system which would actually throw an error if someone does a loop `for i in 1:length(A)` in a code where `A` is then indexed by the element. That analysis is rather difficult at the compiler level because it's non-local: `length(A)` is valid since querying for the length is part of the AbstractArray interface (for good reasons), so then `1:length(A)` is valid since that's just range construction on integers, so the for loop construction itself is valid, and it's only invalid because of some other knowledge about how `A[i]` should work (this look structure could be correct if it's not used to `A[i]` but rather do something like `sum(i)` without indexing). If you want this to throw an error, the only real thing you could do is remove indexing from the AbstractArray interface and solely rely on iteration, which I'm not opposed to (given the relationship to GPUs of course), but etc. you can see the question to solving this is "what is the right interface?" not "are there even interfaces?" (of which the answer is, yes but the errors are thrown at runtime MethodNotFound instead of compile time MethodNotImplemented for undefined things, the latter would be cool for better debugging and stacktraces but isn't a solution).

    This is why the real discussions are not about interfaces as a solution, they don't solve this issue, and even further languages with interfaces also have this issue. It's about tools for helping code style. You probably should just never do `for i in 1:length(A)`, probably you should always do `for i in eachindex(A)` or `for i in axes(A)` because those iteration styles work for `Array` but also work for any `AbstractArray` and thus it's just a safer way to code. That is why there are specific mentions to not do this in style guides (for example, https://github.com/SciML/SciMLStyle#generic-code-is-preferre...), and things like JuliaFormatter automatically flag it as a style break (which would cause CI failures in organizations like SciML which enforce SciML Style formatting as a CI run with Github Actions https://github.com/SciML/ModelingToolkit.jl/blob/v8.14.1/.gi...). There's a call to add linting support for this as well, flagging it any time someone writes this code. If everyone is told to not assume 1-based indexing, formatting CI fails if it is assumed, and the linter underlines every piece of code that does it as red, (along with many other measures, which includes extensive downstream testing, fuzzing against other array types, etc.) then we're at least pretty well guarded against it. And many Julia organizations, like SciML, have these practices in place to guard against it. Yuri's specific discussion is more that JuliaStats does not.

  • ModelingToolkit.jl

    An acausal modeling framework for automatically parallelized scientific machine learning (SciML) in Julia. A computer algebra system for integrated symbolics for physics-informed machine learning and automated transformations of differential equations

  • No, you do get type errors during runtime. The most common one is a MethodNotFound error, which corresponds to a dispatch not being found. This is the one that people then complain about for long stacktraces and as being hard to read (and that's a valid criticism). The reason for it is because if you do xy with a type combination that does not have a corresponding dispatch, i.e. (x::T1,y::T2) not defined anywhere, then it looks through the method table of the function, does not find one, and throws this MethodNotFound error. You will only get no error if a method is found. Now what can happen is that you can have a method to an abstract type, *(x::T1,y::AbstractArray), but `y` does not "actually" act like an AbstractArray in some way. If the way that it's "not an AbstractArray" is that it's missing some method overloads of the AbstractArray interface (https://docs.julialang.org/en/v1/manual/interfaces/#man-inte...), you will get a MethodNotFound error thrown on that interface function. Thus you will only not get an error if someone has declared `typeof(y) <: AbstractArray` and implemented the AbstractArray interface.

    However, what Yuri pointed out is that there are some packages (specifically in the statistics area) which implemented functions like `f(A::AbstractArray)` but used `for i in 1:length(A)` to iterate through x's values. Notice that the AbstractArray interface has interface functions for "non-traditional indices", including `axes(A)` which is a function to call to get "the a tuple of AbstractUnitRange{<:Integer} of valid indices". Thus these codes are incorrect, because by the definition of the interface you should be doing `for i in axes(A)` if you want to support an AbstractArray because there is no guarantee that its indices go from `1:length(A)`. Note that this was added to the `AbstractArray` interface in the v1.0 change, which is notably after the codes he referenced were written, and thus it's more that they were not updated to handle this expanded interface when the v1.0 transition occurred.

    This is important to understand because the criticisms and proposed "solutions" don't actually match the case... at all. This is not a case of Julia just letting anything through: someone had to purposefully define these functions for them to exist. And interfaces are not a solution here because there is an interface here, its rules were just not followed. I don't know of an interface system which would actually throw an error if someone does a loop `for i in 1:length(A)` in a code where `A` is then indexed by the element. That analysis is rather difficult at the compiler level because it's non-local: `length(A)` is valid since querying for the length is part of the AbstractArray interface (for good reasons), so then `1:length(A)` is valid since that's just range construction on integers, so the for loop construction itself is valid, and it's only invalid because of some other knowledge about how `A[i]` should work (this look structure could be correct if it's not used to `A[i]` but rather do something like `sum(i)` without indexing). If you want this to throw an error, the only real thing you could do is remove indexing from the AbstractArray interface and solely rely on iteration, which I'm not opposed to (given the relationship to GPUs of course), but etc. you can see the question to solving this is "what is the right interface?" not "are there even interfaces?" (of which the answer is, yes but the errors are thrown at runtime MethodNotFound instead of compile time MethodNotImplemented for undefined things, the latter would be cool for better debugging and stacktraces but isn't a solution).

    This is why the real discussions are not about interfaces as a solution, they don't solve this issue, and even further languages with interfaces also have this issue. It's about tools for helping code style. You probably should just never do `for i in 1:length(A)`, probably you should always do `for i in eachindex(A)` or `for i in axes(A)` because those iteration styles work for `Array` but also work for any `AbstractArray` and thus it's just a safer way to code. That is why there are specific mentions to not do this in style guides (for example, https://github.com/SciML/SciMLStyle#generic-code-is-preferre...), and things like JuliaFormatter automatically flag it as a style break (which would cause CI failures in organizations like SciML which enforce SciML Style formatting as a CI run with Github Actions https://github.com/SciML/ModelingToolkit.jl/blob/v8.14.1/.gi...). There's a call to add linting support for this as well, flagging it any time someone writes this code. If everyone is told to not assume 1-based indexing, formatting CI fails if it is assumed, and the linter underlines every piece of code that does it as red, (along with many other measures, which includes extensive downstream testing, fuzzing against other array types, etc.) then we're at least pretty well guarded against it. And many Julia organizations, like SciML, have these practices in place to guard against it. Yuri's specific discussion is more that JuliaStats does not.

  • dex-lang

    Research language for array processing in the Haskell/ML family

  • Dex proves indexing correctness without a full dependent type system, including loops.

    See: https://github.com/google-research/dex-lang/pull/969

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts