nitroml
benchmarks
Our great sponsors
nitroml | benchmarks | |
---|---|---|
1 | 2 | |
40 | 4 | |
- | - | |
0.9 | 1.8 | |
about 3 years ago | over 2 years ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nitroml
-
Launch HN: MindsDB (YC W20) – Machine Learning Inside Your Database
The benchmarking challenges you are facing are pretty common in the AutoML community. My colleagues and I at Google Research are trying to solve this with https://github.com/google/nitroml. It's still super early days (no CI yet), but I think it could help your team benchmark on a set of open standard benchmark tasks as we open source more of the system.
benchmarks
-
Forecast Metro Traffic using MindsDB Cloud and MongoDB Atlas
We will be using the Metro traffic dataset 🚇 that can be downloaded from here. You are also free to use your own dataset and follow along the tutorial.
-
Launch HN: MindsDB (YC W20) – Machine Learning Inside Your Database
Regrading benchmarks, we have three main dataset collections we focus on currently:
1. Datasets from customers, but obviously those can’t be made public.
2. The OpenML benchmark, which is fairly limited because it’s mainly binary categories, but which is good because it’s a 3rd party, so unbiased. We have some intermediary results here (https://docs.google.com/spreadsheets/d/1oAgzzDyBqgmSNC6g9CFO...) , they are middle-of-the-road. However I think the benchmark is pretty limited, i.e. it doesn’t cover most of the kinds of inputs and almost none of the output we support
3. An internal benchmark suite which currently has 59 datasets, mainly focused around classification and regression tasks with many inputs, timeseries problems and text. Some part of it is public but opening that up is a bit difficult due to licensing issues. I’m hoping that in the next year it will grow and 90%+ of it can be made public. We benchmarkagainst older versions of mindsdb, against hand made models we try to adapt to the task, against the state of the art accuracy for the dataset (if we can find it) and a few other auto ML frameworks (well, 1, but I hope to extend that list) [see this repo for the ones we made public: https://github.com/mindsdb/benchmarks, but I'm afraid it's a bit outdated]
That being said benchmarking for us is still WIP, since as far as I can tell nobody is trying to build open source models that are as broad as what we're currently doing (for better or worst), and the closed source services offered by various IaaS providers don't really come with public benchmark results outside of marketing.
What are some alternatives?
FLAML - A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
MindsDB - The platform for customizing AI from enterprise data
lightwood - Lightwood is Legos for Machine Learning.
PheKnowLator - PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
mlops-with-vertex-ai - An end-to-end example of MLOps on Google Cloud using TensorFlow, TFX, and Vertex AI
kraken - OCR engine for all the languages