dataenforce
MLJ.jl
dataenforce | MLJ.jl | |
---|---|---|
2 | 6 | |
208 | 1,722 | |
- | 0.6% | |
0.0 | 8.7 | |
about 3 years ago | 9 days ago | |
Python | Julia | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dataenforce
-
Swift for TensorFlow Shuts Down
The dependence on library authors is always a challenge in any language. You might have one author using `[a]` where another uses `PositiveNumeric a, Fin n => NonEmptyList n a` for the same thing. You can always just annotate whatever the library author used (e.g. they return a list of strings, so you use List[str]).
There are some interesting further add ons that seem very python, allowing you to go further. For example, with a pandas dataframe you can just say your type is a dataframe which isn't so useful, but it's possible to hack your own types onto it in the vein of https://github.com/CedricFR/dataenforce, or use things like https://smarie.github.io/python-vtypes/ to get smarter typing on things the authors didn't type. I expect that trend will continue.
What fascinates me about python's types is actually the very fact that they are bolted on. You have a language that lets you do crazy things and a type system trying to catch up and make it convenient to verify those crazy things. It's a nice complement to the usual developments of verifying all of the things and slowly extending the set of things you can do.
-
[D] Question: Do you enforce a data format in Pandas? When collecting data over a long period of time, wouldn't it be useful to use a system with versioned schemas that specify how various data entries must be formatted? Give me feedback on this Open Source idea:
https://github.com/CedricFR/dataenforce enforces column names and types, no versioning though. My first instinct is that important data should be stored in databases which enforce schemas, and that should be separate from the python code that reads it.
MLJ.jl
-
What is the Julia equivalent of Scikit-Learn?
MLJ.jl is a good Julia ML framework. There's also a Scikitlearn.jl but its more of a wrapper around the sklearn I believe
-
My experience working as a technical writer for MLJ
MLJ is a machine learning framework for Julia, which you can kind of infer from the article but it's not super obvious IMO.
-
[N] New BetaML v0.8: model definition, hyperparameters tuning and fitting in 2 lines
The Beta Machine Learning Toolkit is a package including many algorithms and utilities to implement machine learning workflows in Julia, with a detailed tutorial on its usage from Python or R (no wrapper packages are needed) and an extensive interface to MLJ.
-
Python vs Julia
You should definitely go with Julia. It has steeper learning curve than python, but it is way more powerful. As for the ecosystem, you shouldn't worry about that much: DataFrames.jl and friends is way better than pandas, MLJ.jl (https://github.com/alan-turing-institute/MLJ.jl) and FastAI.jl(https://github.com/FluxML/FastAI.jl) are great frameworks for regular ML and deepnet. And if at any point you get a feeling that you need some python library, you can always plug it in with PyCall.jl(https://github.com/JuliaPy/PyCall.jl).
-
sklearn equivalent for Julia?
Imho, Julia is more diverse in the sense that there is not a single popular ML library. Maybe the Julian equivalent for scikit-learn is MLJ.jl. There is also ScikitLearn.jl, which defines the usual interface of scikit-learn models, and specific algorithms then implement this interface.
-
Swift for TensorFlow Shuts Down
Then you haven't looked at Julia's ecosystem.
It may not be quite as mature, but it's getting there quickly.
It's also far more interoperable because of Julia's multiple dispatch and abstract types.
For example, the https://github.com/alan-turing-institute/MLJ.jl ML framework (sklearn on steroids), works with any table object that implements the Tables.jl interface out of the box, not just with dataframes.
That's just one example.
What are some alternatives?
swift - Swift for TensorFlow
ScikitLearn.jl - Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/
PythonNet - Python for .NET is a package that gives Python programmers nearly seamless integration with the .NET Common Language Runtime (CLR) and provides a powerful application scripting tool for .NET developers.
AutoMLPipeline.jl - A package that makes it trivial to create and evaluate machine learning pipeline architectures.
julia - The Julia Programming Language
Enzyme.jl - Julia bindings for the Enzyme automatic differentiator
YOLOv4 - Port of YOLOv4 to C# + TensorFlow
py2many - Transpiler of Python to many other languages
Distributions.jl - A Julia package for probability distributions and associated functions.
pyTsetlinMachine - Implements the Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, Weighted Tsetlin Machine, and Embedding Tsetlin Machine, with support for continuous features, multigranularity, clause indexing, and literal budget