|5 months ago||1 day ago|
|BSD 3-clause "New" or "Revised" License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Surprise – a simple recommender system library for Python
1 project | reddit.com/r/Python | 1 Mar 20221 project | reddit.com/r/recommendersystems | 1 Mar 20221 project | reddit.com/r/programming | 1 Mar 20221 project | news.ycombinator.com | 1 Mar 2022
Dislike button would improve Spotify's recommendations
4 projects | news.ycombinator.com | 16 Oct 2021
I spent the latter half of 2019 trying to build this as a startup. Ultimately I pivoted (now I do newsletter recommendations instead), but if I hadn't made some mistakes I think it could've gotten more traction. Mostly I should've simplified the idea to make it easier to build. If anyone's interested in working on this, here's what I would do:
(But first some background: The way I saw it, you can split music recommendation into two tasks: (1) picking a song you already know that should be played right now, and (2) picking a new song you've never heard of before. (Music recommendation is unique in this way since in most other domains there isn't much value in re-recommending items). I think #1 is more important, and if you nail that, you can do a so-so job of #2 and still have a good system.)
Make a website that imports your Last.fm history. Organize the history into sessions (say, groups of listen events with a >= 30 minute gap in between). Feed those sessions into a collaborative filtering library like Surprise, as a CSV of `, , 1` (1 being a rating--in this case we only have positive ratings). Then make some UI that lets people create and export playlists. e.g. I pick a couple seed songs from my listening history, then the app uses Surprise to suggest more songs. Present a list of 10 songs at a time. Click a song to add it, and have a "skip all" button that gets a new list of songs. Save these interactions as ratings--e.g. if I skip a song, that's a -1 rating for this playlist. For some percentage of the suggestions (20% by default? Make it configurable), use Last.fm's or Spotify's API to pick a new song not in your history, based on the songs in the current playlist. Also sometimes include songs that were added to the playlist previously--if you skip them, they get removed from the playlist. Then you can spend a couple minutes every week refreshing your playlists. Export the playlists to Spotify/Apple Music/whatever.
As you get more users, you can do "regular" collaborative filtering (i.e. with different users) to recommend new songs instead of relying on external APIs. There are probably lots of other things you could do too--e.g. scrape wikipedia to figure out what artists have done collaborations or something. In general I think the right approach is to build a model for artist similarity rather than individual song similarity. At recommendation time, you pick an artist and then suggest their top songs (and sometimes pick an artist already in the user's history, and suggest songs they haven't heard yet--that's even easier).
This is the simplest thing I can think of that would solve my "I love music but I listen to the same old songs everyday because I'm busy and don't want to futz around with curating my music library" problem. You wouldn't have to waste time building a crappy custom music app, and users won't have to use said crappy custom music app (speaking from personal experience...). You wouldn't have to deal with music rights or integrating with Spotify/Apple Music since you're not actually playing any music.
If you want to go further with it, you could get traction first and then launch your own streaming service or something. (Reminds me a bit of Readwise starting with just highlights and then launching their own reader recently). I think it'd be neat to make an indie streaming service--kind of like Bandcamp but with an algorithm to help you find the good stuff. Let users upload and listen to their own MP3s so it can still work with popular music. Of course it'd be nicer for users in the short term if you just made deals with the big record labels, however this would help you not end up in Spotify's position of pivoting to podcasts so you can get out of paying record labels. And then maybe in a few decades all the good music won't be on the big labels anyway :).
Anyway if anyone is remotely interested in building something like this, I'll be your first user. I really need it. Otherwise I'll probably build it myself at some point in the next year or two as a side project.
Show HN: The Sample – newsletters curated for you with machine learning
1 project | news.ycombinator.com | 28 Jun 2021
I'm planning to build a business on this, so probably won't open-source it--but I'm always looking for interesting things to write about! I write a weekly newsletter called Future of Discovery; I might write up some more implementation details there in a week or two. In the mean time, most of the heavy lifting is done by the Surprise python lib. It's pretty easy to play around with, just give it a csv of , , and then you can start making rating predictions. Also fastText is easy to mess around with too. Most of the code I've written just layers things on top of that, e.g. to handle exploration-vs-exploitation as discussed in another thread here.
Recently I've been factoring out the ML code into a separate recommendation service so it can different kinds of apps (I just barely made this essay recommender system start using it for example).
I'm happy to chat about recommender systems also if you like, email's in my profile.
[RFP] Product idea for BYOD data science platform
1 project | reddit.com/r/datascience | 17 Jun 2022
I think this might be useful to let small-mid-sized DS teams to better utilize their computing resources (e.g., if you have multiple GPU workstations and rely on assigning each one to people to SSH onto, this might be for you) by pooling them and providing a service like Jupyterhub on-top to provide a unified entry point to conduct their work using notebooks. Addons like MLFlow and Kubeflow can be added with single-click as well once the platform is up.
mlflow: Open source platform for the machine learning lifecycle
1 project | reddit.com/r/u_TsukiZombina | 16 May 2022
MLflow VS VevestaX - a user suggested alternative
2 projects | 12 May 2022
MLOps with MLflow on Kraken CI
2 projects | dev.to | 29 Apr 2022
Besides building, testing and deploying, Kraken CI is also a pretty nice tool to build an MLOps pipeline. In this article, it will be shown how to leverage Kraken CI to build a CI workflow for machine learning using MLflow.
Serving Python Machine Learning Models With Ease
4 projects | dev.to | 12 Apr 2022
For MLFlow users you can now serve models directly in MLFlow using MLServer and if you're a Kubernetes user you should definitely check out Seldon Core - an open source tool that deploys models to Kubernetes (it uses MLServer under the covers).
Data Science Workflows — Notebook to Production
7 projects | dev.to | 8 Feb 2022
But as you can imagine, tracking each experiment with Git can become a hassle. We’d like to automate the logging process of each run. The same as for large file versioning, many tools emerged in recent years for experiment logging, such as W&B, MLflow, TensorBoard, and the list goes on. In this case, I believe that it doesn’t matter with which hammer you choose to hit the nail, as long as you punch it through.
[D] Tips for ML workflow on raw data
2 projects | reddit.com/r/MachineLearning | 21 Jan 2022
Machine Learning adventures with MLFlow - Deploying models from local system to Production
1 project | reddit.com/r/learnmachinelearning | 22 Dec 2021
Its a bug with mlflow -> https://github.com/mlflow/mlflow/issues/3755 Keep the server on, open another terminal export MLFLOW_TRACKING_URI env variable, if on windows set the env variable.....should work.
Old guy programmer here, need to brush up on Python quickly!
13 projects | reddit.com/r/Python | 6 Dec 2021
mlflow for logging and visualizing ML model experiments
Taking on the ML pipeline challenge: why data scientists need to own their ML workflows in production
4 projects | dev.to | 6 Dec 2021
So, if you even want to use MLFlow to track your experiments, run the pipeline on Airflow, and then deploy a model to a Neptune Model Registry, ZenML will facilitate this MLOps Stack for you. This decision can be made jointly by the data scientists and engineers. As ZenML is a framework, custom pieces of the puzzle can also be added here to accommodate legacy infrastructure.
What are some alternatives?
LightFM - A Python implementation of LightFM, a hybrid recommendation algorithm.
clearml - ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
zenml - ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
scikit-learn - scikit-learn: machine learning in Python
dvc - 🦉Data Version Control | Git for Data & Models | ML Experiments Management
guildai - Experiment tracking, ML developer tools
neptune-client - :ledger: Experiment tracking tool and model registry
tensorflow - An Open Source Machine Learning Framework for Everyone
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
gensim - Topic Modelling for Humans