Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
hal9ai
Discontinued Hal9 — Data apps powered by code and LLMs [Moved to: https://github.com/hal9ai/hal9]
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
> I am very much a beginner in the space of machine learning
While the (precious and useful) advice around seem to cover mostly the bigger infrastructures, please note that
you can effectively do an important slice of machine learning work (study, personal research) with just a battery-efficiency-level CPU (not GPU), in the order of minutes, on a battery. That comes before going to "Big Data".
And there are lightweight tools: I am current enamoured with Genann («minimal, well-tested open-source library implementing feedfordward artificial neural networks (ANN) in C»), a single C file of 400 lines compiling to a 40kb object, yet well sufficient to solve a number of the problems you may meet.
https://codeplea.com/genann // https://github.com/codeplea/genann
After all, is it a good idea to have tools that automate process optimization while you are learning the deal? Only partially. You should build - in general and even metaphorically - the legitimacy of your Python ops on a good C ground.
And: note that you can also build ANNs in R (and other math or stats environments). If needed or comfortable...
Also note - reminder - that the MIT lessons of Prof. Patrick Winston for the Artificial Intelligence course (classical AI with a few lessons on ANNs) are freely available. That covers the grounds relative to climb into the newer techniques.
I created something that lets you get free GPU on VS Code with Google Colab with just 1-click. Have a look at https://github.com/DerekChia/colab-vscode
This is my default go-to as a poor man ML setup, with environment and dependencies set up automatically via bash script on start up.
If you want to build a web application on top of your ML project, give https://hal9.com a shot. We designed Hal9 with ease of use for deployment and maximum compatibility with web technologies that enable you to build ML apps with React, Vue, etc. We launched a couple months ago but could use some early feedback and users. Thank you!
In case you want to start creating batch jobs too I’d recommend checking out Orchest (www.orchest.io). It has a generous free tier and supports GPU instances. The platform itself is self-hostable too and open source (https://github.com/orchest/orchest).
The main advantages are its interactive pipeline editor, support for Jupyter notebooks in the pipeline/DAG context, and a simple way to specify environment dependencies. It also supports auto start-and stopping of instances so you only pay for the compute necessary to run your data pipelines.
Disclosure, I’m one of the creators.
One of my deal breakers when choosing tooling is how easy is to move from a local environment to a distributed environment. Ideally, you want to start locally and move to a distributed env if you need to. So choose one tool that allows you to get started quickly and move from there.
As an example: one of the reasons why I don't use Kubeflow is because it requires having a Kubernetes cluster up and running, which is an overkill in many cases.
Check out the project I'm working on: https://github.com/ploomber/ploomber
Related posts
- Decent low code options for orchestration and building data flows?
- Build ML workflows with Jupyter notebooks
- Building container images in Kubernetes, how would you approach it?
- Ideas for infrastructure and tooling to use for frequent model retraining?
- Looking for a mentor in MLOps. I am a lead developer.