|2 days ago||10 days ago|
|GNU Affero General Public License v3.0||GNU Lesser General Public License v2.1 only|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Deployment automation for ML projects of all shapes and sizes
1 project | news.ycombinator.com | 9 Jun 2021
A tutorial on how to handle prediction uncertainty in production systems, by using Bayesian inference and probabilistic programs
2 projects | reddit.com/r/datascienceproject | 17 May 2021
how to deploy it to Kuberentes using Bodywork.
[P] [D] How are you approaching prediction uncertainty in ML systems?
1 project | reddit.com/r/MachineLearning | 17 May 2021
I usually turn to generative models - e.g. probabilistic programs and Bayesian inference. I’ve written-up my thoughts on how to engineer these into a ‘production system’ deployed to Kubernetes, using PyMC and Bodywork (an open-source ML deployment tool that I contribute to).
Bodywork: MLOps tool for deploying ML projects to Kubernetes
1 project | news.ycombinator.com | 4 May 2021
Tool for mapping executable Python modules to Kubernetes deployments
1 project | reddit.com/r/madeinpython | 4 May 2021
I’m one of the core contributors to Bodywork, an open-source tool for deploying machine learning projects developed in Python, to Kubernetes.
[P] [D] The benefits of training the simplest model you can think of and deploying it to production, as soon as you can.
1 project | reddit.com/r/MachineLearning | 18 Apr 2021
I’ve had many successes with this approach. With this in mind, I’ve put together an example of how to make this Agile approach to developing machine learning systems a reality, by demonstrating that it takes under 15 minutes to deploy a Scikit-Learn model, using FastAPI with Bodywork (an open-source MLOps tool that I have built).
bodywork - MLOps for Python and K8S
1 project | reddit.com/r/mlops | 1 Feb 2021
bodywork-ml/bodywork-core - MLOps automation for Python and Kubernetes
1 project | reddit.com/r/mlops | 1 Feb 2021
Topic modelling with Gensim and SpaCy on startup news
3 projects | dev.to | 17 Jan 2022
For the topic modelling itself, I am going to use Gensim library by Radim Rehurek, which is very developer friendly and easy to use.
Unsupervised Learning for String Matching in Python - can I have advice on how to go about this?
2 projects | reddit.com/r/learnmachinelearning | 16 Dec 2021
Gensim: Topic Modelling for Humans
1 project | news.ycombinator.com | 7 Dec 2021
Gensim – a Python library for topic modelling, document indexing
1 project | news.ycombinator.com | 25 Nov 2021
How to build a search engine with word embeddings
2 projects | dev.to | 22 Nov 2021
We will be using gensim to load our Google News pre-trained word vectors. Find the code for this here.
The unthinking application of this regex-efficiency check wasted our attention
1 project | news.ycombinator.com | 30 Sep 2021
The Levenshtein Distance in Production
4 projects | news.ycombinator.com | 6 Jun 2021
> Problem statement: the Levenshtein distance is a string metric for measuring the difference between two sequences
Another variant is "I have a bunch of words (a dictionary) and one query word, and want to find all words from the dictionary that are close to the query word".
This leads to an interesting class of problems, because you can do clever things where you precompute search structures (Levenshtein automata ) from the dictionary. The similarity queries then run (much) faster – in production, performance matters.
We recently merged a PR like that into Gensim .
This gave a ~1,500x speed-up compared to naively comparing all pairwise strings with Levenshtein distance. A difference between the training step running for years (=unusable) and minutes.
Superior tools to Gensim's similarity
1 project | reddit.com/r/LanguageTechnology | 20 Mar 2021
So Gensim's Similarity module seems like a good fit for this problem, especially soft cosine similarity checking. But inside I can't get comfortable, because transformers are very popular lately.
Koan: A word2vec negative sampling implementation with correct CBOW update
2 projects | news.ycombinator.com | 2 Jan 2021
Apparently it did: https://github.com/RaRe-Technologies/gensim/issues/1873
What are some alternatives?
scikit-learn - scikit-learn: machine learning in Python
tensorflow - An Open Source Machine Learning Framework for Everyone
MLflow - Open source platform for the machine learning lifecycle
Keras - Deep Learning for humans
BERTopic - Leveraging BERT and c-TF-IDF to create easily interpretable topics.
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
gym - A toolkit for developing and comparing reinforcement learning algorithms.
flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)
fuzzywuzzy - Fuzzy String Matching in Python
Crab - Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).
NuPIC - Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
CNTK - Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit