|7 days ago||9 days ago|
|GNU Affero General Public License v3.0||GNU General Public License v3.0 only|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Deployment automation for ML projects of all shapes and sizes
1 project | news.ycombinator.com | 9 Jun 2021
A tutorial on how to handle prediction uncertainty in production systems, by using Bayesian inference and probabilistic programs
2 projects | reddit.com/r/datascienceproject | 17 May 2021
how to deploy it to Kuberentes using Bodywork.
[P] [D] How are you approaching prediction uncertainty in ML systems?
1 project | reddit.com/r/MachineLearning | 17 May 2021
I usually turn to generative models - e.g. probabilistic programs and Bayesian inference. I’ve written-up my thoughts on how to engineer these into a ‘production system’ deployed to Kubernetes, using PyMC and Bodywork (an open-source ML deployment tool that I contribute to).
Bodywork: MLOps tool for deploying ML projects to Kubernetes
1 project | news.ycombinator.com | 4 May 2021
Tool for mapping executable Python modules to Kubernetes deployments
1 project | reddit.com/r/madeinpython | 4 May 2021
I’m one of the core contributors to Bodywork, an open-source tool for deploying machine learning projects developed in Python, to Kubernetes.
[P] [D] The benefits of training the simplest model you can think of and deploying it to production, as soon as you can.
1 project | reddit.com/r/MachineLearning | 18 Apr 2021
I’ve had many successes with this approach. With this in mind, I’ve put together an example of how to make this Agile approach to developing machine learning systems a reality, by demonstrating that it takes under 15 minutes to deploy a Scikit-Learn model, using FastAPI with Bodywork (an open-source MLOps tool that I have built).
bodywork - MLOps for Python and K8S
1 project | reddit.com/r/mlops | 1 Feb 2021
bodywork-ml/bodywork-core - MLOps automation for Python and Kubernetes
1 project | reddit.com/r/mlops | 1 Feb 2021
MindsDB: Creating machine learning predictive models using SQL.
2 projects | dev.to | 19 Jan 2022
Want to try it out for yourself? Sign up for a free MindsDB account and join our community! Engage with MindsDB community on Slack or Github to ask questions, share and express ideas and thoughts!
Enabling predictive capabilities in ClickHouse database
2 projects | dev.to | 16 Dec 2021
Try making your own predictions with MindDB yourself, by simply signing up for a free cloud account or installing it via Docker. If you need any help feel free to throw a question in the MindsDB community via Slack or Github.
Release0.3 Using PyParing lib in mindsdb
1 project | dev.to | 29 Nov 2021
MindsDB is a predictive platform that makes databases intelligent and machine learning easy to use. It allows data analysts to build and visualize forecasts in BI dashboards without going through the complexity of ML pipelines, all through SQL. It also helps data scientists to streamline MLOps by providing advanced instruments for in-database machine learning and optimize ML workflows through a declarative JSON-AI syntax. I did 3 issues of this project, 1773 and 1771 have merged, and 1777 is reviewing.
Launch HN: MindsDB (YC W20) – Machine Learning Inside Your Database
Here's an issue that enumerates all pending tasks for a first iteration of this feature: https://github.com/mindsdb/mindsdb/issues/1116
Adam and Jorge here, and today we’re very excited to share MindsDB with you (http://github.com/mindsdb/mindsdb). MindsDB AutoML Server is an open-source platform designed to accelerate machine learning workflows for people with data inside databases by introducing virtual AI tables. We allow you to create and consume machine learning models as regular database tables.
Jorge and I have been friends for many years, having first met at college. We have previously founded and failed at another startup, but we stuck together as a team to start MindsDB. Initially a passion project, MindsDB began as an idea to help those who could not afford to hire a team of data scientists, which at the time was (and still is) very expensive. It has since grown into a thriving open-source community with contributors and users all over the globe.
With the plethora of data available in databases today, predictive modeling can often be a pain, especially if you need to write complex applications for ingesting data, training encoders and embedders, writing sampling algorithms, training models, optimizing, scheduling, versioning, moving models into production environments, maintaining them and then having to explain the predictions and the degree of confidence… we knew there had to be a better way!
We aim to steer you away from constantly reinventing the wheel by abstracting most of the unnecessary complexities around building, training, and deploying machine learning models. MindsDB provides you with two techniques for this: build and train models as simply as you would write an SQL query, and seamlessly “publish” and manage machine learning models as virtual tables inside your databases (we support Clickhouse, MariaDB, MySQL, PostgreSQL, and MSSQL. MongoDB is coming soon.) We also support getting data from other sources, such as Snowflake, s3, SQLite, and any excel, JSON, or CSV file.
When we talk to our growing community, we find that they are using MindsDB for anything ranging from reducing financial risk in the payments sector to predicting in-app usage statistics - one user is even trying to predict the price of Bitcoin using sentiment analysis (we wish them luck). No matter what the use-case, what we hear most often is that the two most painful parts of the whole process are model generation (R&D) and/or moving the model into production.
For those who already have models (i.e. who have already done the R&D part), we are launching the ability to bring your own models from frameworks like Pytorch, Tensorflow, scikit-learn, Keras, XGBoost, CatBoost, LightGBM, etc. directly into your database. If you’d like to try this experimental feature, you can sign-up here: (https://mindsdb.com/bring-your-own-ml-models)
We currently have a handful of customers who pay us for support. However, we will soon be launching a cloud version of MindsDB for those who do not want to worry about DevOps, scalability, and managing GPU clusters. Nevertheless, MindsDB will always remain free and open-source, because democratizing machine learning is at the core of every decision we make.
We’re making good progress thanks to our open-source community and are also grateful to have the backing of the founders of MySQL & MariaDB. We would love your feedback and invite you to try it out.
Thanks in advance,
I would love to support Scylla, I ** love that database, those guys are magicians. And I assume in supporting that we'd also offer de-facto support for Cassandra.
I don't think either Scylla or dynamo are on the roadmap now, but if you want them feel free to create an issue asking for them: https://github.com/mindsdb/mindsdb
It should be noted that there's two level of support:
1. As a source of data (easy to implement)
MindsDB - build and deploy Machine Learning models from inside your databases in minutes using plain SQL.
1 project | reddit.com/r/AutoML | 9 Feb 2021
What are some alternatives?
tensorflow - An Open Source Machine Learning Framework for Everyone
H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
scikit-learn - scikit-learn: machine learning in Python
CapRover - Scalable PaaS (automated Docker+nginx) - aka Heroku on Steroids
Keras - Deep Learning for humans
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
NuPIC - Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
Crab - Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).
PaddlePaddle - PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
gym - A toolkit for developing and comparing reinforcement learning algorithms.
gensim - Topic Modelling for Humans
MLP Classifier - A handwritten multilayer perceptron classifer using numpy.