|3 days ago||3 days ago|
|GNU Lesser General Public License v2.1 only||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Gensim: Topic Modelling for Humans
1 project | news.ycombinator.com | 7 Dec 2021
Gensim – a Python library for topic modelling, document indexing
1 project | news.ycombinator.com | 25 Nov 2021
How to build a search engine with word embeddings
2 projects | dev.to | 22 Nov 2021
We will be using gensim to load our Google News pre-trained word vectors. Find the code for this here.
The unthinking application of this regex-efficiency check wasted our attention
1 project | news.ycombinator.com | 30 Sep 2021
The Levenshtein Distance in Production
4 projects | news.ycombinator.com | 6 Jun 2021
> Problem statement: the Levenshtein distance is a string metric for measuring the difference between two sequences
Another variant is "I have a bunch of words (a dictionary) and one query word, and want to find all words from the dictionary that are close to the query word".
This leads to an interesting class of problems, because you can do clever things where you precompute search structures (Levenshtein automata ) from the dictionary. The similarity queries then run (much) faster – in production, performance matters.
We recently merged a PR like that into Gensim .
This gave a ~1,500x speed-up compared to naively comparing all pairwise strings with Levenshtein distance. A difference between the training step running for years (=unusable) and minutes.
Superior tools to Gensim's similarity
1 project | reddit.com/r/LanguageTechnology | 20 Mar 2021
So Gensim's Similarity module seems like a good fit for this problem, especially soft cosine similarity checking. But inside I can't get comfortable, because transformers are very popular lately.
Koan: A word2vec negative sampling implementation with correct CBOW update
2 projects | news.ycombinator.com | 2 Jan 2021
Apparently it did: https://github.com/RaRe-Technologies/gensim/issues/1873
Old guy programmer here, need to brush up on Python quickly!
13 projects | reddit.com/r/Python | 6 Dec 2021
mlflow for logging and visualizing ML model experiments
Taking on the ML pipeline challenge: why data scientists need to own their ML workflows in production
4 projects | dev.to | 6 Dec 2021
So, if you even want to use MLFlow to track your experiments, run the pipeline on Airflow, and then deploy a model to a Neptune Model Registry, ZenML will facilitate this MLOps Stack for you. This decision can be made jointly by the data scientists and engineers. As ZenML is a framework, custom pieces of the puzzle can also be added here to accommodate legacy infrastructure.
[D] 5 considerations for Deploying Machine Learning Models in Production – what did I miss?
3 projects | reddit.com/r/MachineLearning | 21 Nov 2021
Consideration Number #2: Consider using model life cycle development and management platforms like MLflow, DVC, Weights & Biases, or SageMaker Studio. And Ray, Ray Tune, Ray Train (formerly Ray SGD), PyTorch and TensorFlow for distributed, compute-intensive and deep learning ML workloads.
[P] DagYard - DVC x MLflow x Colab x Gdrive - Automatically Configured
2 projects | reddit.com/r/MachineLearning | 18 Nov 2021
MLflow tracking automates the logging process of experiments and sends live information to a local or remote server while the training is still running.
Data Science toolset summary from 2021
13 projects | dev.to | 13 Nov 2021
MLflow - https://mlflow.org/
How to store preprocessing and feature engineering pipeline?
1 project | reddit.com/r/datascience | 21 Oct 2021
MLOps project based template
4 projects | reddit.com/r/mlops | 11 Oct 2021
ML workflow - MLflow
[D] Facebook Visdom vs Google Tensorboard for Pytorch
5 projects | reddit.com/r/MachineLearning | 26 Sep 2021
Oh I think most of the paid tracking solutions have auto refresh. As for the free ones? At clear.ml we have them for quite a while, for MLflow there is an open feature request. https://github.com/mlflow/mlflow/issues/2099
[D] How do you ensure reproducibility?
6 projects | reddit.com/r/MachineLearning | 24 Sep 2021
[D] How to maintain ML models?
5 projects | reddit.com/r/MachineLearning | 16 Sep 2021
MLflow is a MLOps tool that may help you: https://mlflow.org/
What are some alternatives?
Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
dvc - 🦉Data Version Control | Git for Data & Models | ML Experiments Management
scikit-learn - scikit-learn: machine learning in Python
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
tensorflow - An Open Source Machine Learning Framework for Everyone
neptune-client - Neptune client library - integrate your Python scripts with Neptune
zenml - ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.
Keras - Deep Learning for humans
guildai - Experiment tracking, ML developer tools