Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
clearml
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Python Packages Project Generator
Discontinued 🚀 Your next Python package needs a bleeding-edge project structure.
I'm starting to document a Python library with Sphinx and hosting it on a static site (eg. AWS S3, Netlify, Clouflare pages). Most of the docs are markdown with the examples being Jupyter notebooks. Docs are built and deployed to summerepi.com on commits to master. A little bit fiddly to set up but once it's going it's pretty magical. Source code is here.
You need a couple of tools to cover the entire ML lifecycle. For developing and deploying your pipelines, check out Ploomber (disclaimer: I'm the author):
You can convert a training pipeline into an online service easily (this is great to prevent training-serving skew). Here's an example project
Complementing the given answer, you could check https://github.com/replicate/keepsake for model versioning.
Even if detailed unit testing is hard, you can smoke test your models in CI to make sure that they're at least not crashing. More on smoke tests here. Some example smoke tests for a neural net here. Running your tests in GitHub Actions is relatively easy (here).
Napoleon is a Sphinx extension that enables Sphinx to parse both NumPy and Google style docstrings - the style recommended by Khan Academy.
CookieCutter or Kedro are the winners. I still think we will stick to Kedro template, because it offers extra functionality, and I like to think of each project as a set of pipelines to be run. Anyway, some cookiecutter templates are very good, like this one. In case we use both Kedro and ClearML, we'll have to figure out how to integrate its pipelines with ClearML tasks. But in the slack channel of ClearML there are other teams doing the same, so at least it's possible.