What are some good DS/ML repos where I can learn about structuring a DS/ML project?

This page summarizes the projects mentioned and recommended in the original post on /r/datascience

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • cookiecutter-data-science

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

  • I've found https://github.com/drivendata/cookiecutter-data-science as a guide, but haven't found any repos that solve a problem end to end actually use it. Are there any good repos or resources that exemplify how to solve a DS/ML case end-to-end? Including any UI (a report, stream, dash etc) needed for delivery, handling data, preprocessing, training and local development.

  • projects

    Sample projects using Ploomber. (by ploomber)

  • We have tons of examples that follow a standard layout, here’s one: https://github.com/ploomber/projects/tree/master/templates/ml-intermediate

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Kedro

    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

  • For the lazy ones out there, here's the link to their github repo.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts