Distributions.jl VS MLJ.jl

Compare Distributions.jl vs MLJ.jl and see what are their differences.

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
Distributions.jl MLJ.jl
6 6
1,070 1,720
0.9% 1.2%
7.6 8.8
4 days ago 7 days ago
Julia Julia
GNU General Public License v3.0 or later GNU General Public License v3.0 or later
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Distributions.jl

Posts with mentions or reviews of Distributions.jl. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-22.
  • Yann Lecun: ML would have advanced if other lang had been adopted versus Python
    9 projects | news.ycombinator.com | 22 Feb 2023
    If you look at Julia open source projects you'll see that the projects tend to have a lot more contributors than the Python counterparts, even over smaller time periods. A package for defining statistical distributions has had 202 contributors (https://github.com/JuliaStats/Distributions.jl), etc. Julia Base even has had over 1,300 contributors (https://github.com/JuliaLang/julia) which is quite a lot for a core language, and that's mostly because the majority of the core is in Julia itself.

    This is one of the things that was noted quite a bit at this SIAM CSE conference, that Julia development tends to have a lot more code reuse than other ecosystems like Python. For example, the various machine learning libraries like Flux.jl and Lux.jl share a lot of layer intrinsics in NNlib.jl (https://github.com/FluxML/NNlib.jl), the same GPU libraries (https://github.com/JuliaGPU/CUDA.jl), the same automatic differentiation library (https://github.com/FluxML/Zygote.jl), and of course the same JIT compiler (Julia itself). These two libraries are far enough apart that people say "Flux is to PyTorch as Lux is to JAX/flax", but while in the Python world those share almost 0 code or implementation, in the Julia world they share >90% of the core internals but have different higher levels APIs.

    If one hasn't participated in this space it's a bit hard to fathom how much code reuse goes on and how that is influenced by the design of multiple dispatch. This is one of the reasons there is so much cohesion in the community since it doesn't matter if one person is an ecologist and the other is a financial engineer, you may both be contributing to the same library like Distances.jl just adding a distance function which is then used in thousands of places. With the Python ecosystem you tend to have a lot more "megapackages", PyTorch, SciPy, etc. where the barrier to entry is generally a lot higher (and sometimes requires handling the build systems, fun times). But in the Julia ecosystem you have a lot of core development happening in "small" but central libraries, like Distances.jl or Distributions.jl, which are simple enough for an undergrad to get productive in a week but is then used everywhere (Distributions.jl for example is used in every statistics package, and definitions of prior distributions for Turing.jl's probabilistic programming language, etc.).

  • Don't waste your time on Julia
    2 projects | /r/rstats | 14 Aug 2022
    ...so the blog post you've posted 4 times contains a list of issues the author filed in 2020-2021... and at least for the handful I clicked, they indeed have (long) been sorted. e.g., Filed Dec 18th 2020, closed Dec 20th
  • Julia ranks in the top most loved programming languages for 2022
    3 projects | news.ycombinator.com | 23 Jun 2022
    Well, out of the issues mentioned, the ones still open can be categorized as (1) aliasing problems with mutable vectors https://github.com/JuliaLang/julia/issues/39385 https://github.com/JuliaLang/julia/issues/39460 (2) not handling OffsetArrays correctly https://github.com/JuliaStats/StatsBase.jl/issues/646, https://github.com/JuliaStats/StatsBase.jl/issues/638, https://github.com/JuliaStats/Distributions.jl/issues/1265 https://github.com/JuliaStats/StatsBase.jl/issues/643 (3) bad interaction of buffering and I/O redirection https://github.com/JuliaLang/julia/issues/36069 (4) a type dispatch bug https://github.com/JuliaLang/julia/issues/41096

    So if you avoid mutable vectors and OffsetArrays you should generally be fine.

    As far as the argument "Julia is really buggy so it's unusable", I think this can be made for any language - e.g. rand is not random enough, Java's binary search algorithm had an overflow, etc. The fixed issues have tests added so they won't happen again. Maybe copying the test suites from libraries in other languages would have caught these issues earlier, but a new system will have more bugs than a mature system so some amount of bugginess is unavoidable.

  • The Julia language has a number of correctness flaws
    19 projects | news.ycombinator.com | 16 May 2022
  • Does a Julia package have to live in a separate file?
    1 project | /r/Julia | 16 Mar 2021
    See the Distributions.jl package for an example .jl file structure: https://github.com/JuliaStats/Distributions.jl/tree/master/src
  • Organizing a Julia program
    1 project | /r/Julia | 17 Jan 2021
    Structure your program around your domain specific constrains, e.g if you look at Distributions.jl they have folders for univariate/multivariate or discrete/continuous with a file per distribution containing the struct + all its methods :

MLJ.jl

Posts with mentions or reviews of MLJ.jl. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-12-30.
  • What is the Julia equivalent of Scikit-Learn?
    3 projects | /r/Julia | 30 Dec 2022
    MLJ.jl is a good Julia ML framework. There's also a Scikitlearn.jl but its more of a wrapper around the sklearn I believe
  • My experience working as a technical writer for MLJ
    1 project | /r/Julia | 23 Nov 2022
    MLJ is a machine learning framework for Julia, which you can kind of infer from the article but it's not super obvious IMO.
  • [N] New BetaML v0.8: model definition, hyperparameters tuning and fitting in 2 lines
    2 projects | /r/MachineLearning | 2 Oct 2022
    The Beta Machine Learning Toolkit is a package including many algorithms and utilities to implement machine learning workflows in Julia, with a detailed tutorial on its usage from Python or R (no wrapper packages are needed) and an extensive interface to MLJ.
  • Python vs Julia
    3 projects | /r/Julia | 3 Aug 2021
    You should definitely go with Julia. It has steeper learning curve than python, but it is way more powerful. As for the ecosystem, you shouldn't worry about that much: DataFrames.jl and friends is way better than pandas, MLJ.jl (https://github.com/alan-turing-institute/MLJ.jl) and FastAI.jl(https://github.com/FluxML/FastAI.jl) are great frameworks for regular ML and deepnet. And if at any point you get a feeling that you need some python library, you can always plug it in with PyCall.jl(https://github.com/JuliaPy/PyCall.jl).
  • sklearn equivalent for Julia?
    3 projects | /r/Julia | 14 Apr 2021
    Imho, Julia is more diverse in the sense that there is not a single popular ML library. Maybe the Julian equivalent for scikit-learn is MLJ.jl. There is also ScikitLearn.jl, which defines the usual interface of scikit-learn models, and specific algorithms then implement this interface.
  • Swift for TensorFlow Shuts Down
    13 projects | news.ycombinator.com | 12 Feb 2021
    Then you haven't looked at Julia's ecosystem.

    It may not be quite as mature, but it's getting there quickly.

    It's also far more interoperable because of Julia's multiple dispatch and abstract types.

    For example, the https://github.com/alan-turing-institute/MLJ.jl ML framework (sklearn on steroids), works with any table object that implements the Tables.jl interface out of the box, not just with dataframes.

    That's just one example.

What are some alternatives?

When comparing Distributions.jl and MLJ.jl you can also consider the following projects:

HypothesisTests.jl - Hypothesis tests for Julia

ScikitLearn.jl - Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/

Optimization.jl - Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.

AutoMLPipeline.jl - A package that makes it trivial to create and evaluate machine learning pipeline architectures.

StatsBase.jl - Basic statistics for Julia

Enzyme.jl - Julia bindings for the Enzyme automatic differentiator

Lux.jl - Explicitly Parameterized Neural Networks in Julia

PythonNet - Python for .NET is a package that gives Python programmers nearly seamless integration with the .NET Common Language Runtime (CLR) and provides a powerful application scripting tool for .NET developers.

StaticLint.jl - Static Code Analysis for Julia

pyTsetlinMachine - Implements the Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, Weighted Tsetlin Machine, and Embedding Tsetlin Machine, with support for continuous features, multigranularity, clause indexing, and literal budget

Tumble.jl - lazy predictive modeling for julia