|over 1 year ago||6 days ago|
|Jupyter Notebook||Jupyter Notebook|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
We haven't tracked posts mentioning image-crop-analysis yet.
Tracking mentions began in Dec 2020.
Deep Learning Pioneer Geoffrey Hinton Publishes New Deep Learning Algorithm
4 projects | news.ycombinator.com | 12 Jan 2023
[Discussion] NLP for products matching
3 projects | reddit.com/r/datascience | 11 Jan 2023
Plus the graph posted there is rather self explanatory. Also it gives you names of competing libraries and their benchmarks. As you can see ScaNN is the best so far, but I use annoy since its speed is sufficient for me (I usually need to match around 10k strings to 80k strings) and it's usage is very simple and straightforward.
Nearest-neighbor search in high-dimensional spaces
3 projects | reddit.com/r/compsci | 4 Nov 2022
Don't roll your own solution, use ScaNN (https://github.com/google-research/google-research/tree/master/scann) or Faiss (https://github.com/facebookresearch/faiss). I used the internal version of ScaNN while I was at Google, and found it incredibly well put-together. Can't speak to the open-source version, but it should be similarly good. These might be a bit overkill given your set sizes, but it'll be easier than building your own fix.3 projects | reddit.com/r/compsci | 4 Nov 2022
The Vector Database Index: Who, what, why now, & how
3 projects | news.ycombinator.com | 20 Sep 2022
We use ScaNN for a large scale/performant neural search. Otherwise this all feels bloated.
Apprendre Python, de zéro
2 projects | reddit.com/r/france | 19 Sep 2022
[D] Most important AI Paper´s this year so far in my opinion + Proto AGI speculation at the end
10 projects | reddit.com/r/MachineLearning | 14 Aug 2022
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems - Google 2022 – Pathways - Jeff Dean! - Network grows with amount of tasks and data! Paper: https://arxiv.org/abs/2205.12755 Github: https://github.com/google-research/google-research/tree/master/muNet
[R] LocoProp: Enhancing BackProp via Local Loss Optimization (Google Brain, 2022)
2 projects | reddit.com/r/MachineLearning | 2 Aug 2022
Some ML questions
2 projects | reddit.com/r/DiscoDiffusion | 12 Jul 2022
* The research around faster denoising diffusion seems quite active https://github.com/google-research/google-research/tree/master/diffusion_distillation ? https://github.com/NVlabs/denoising-diffusion-gan ? Any chances to see those models in DD for a quicker rendering ?
80 million sentence embeddings
4 projects | reddit.com/r/LanguageTechnology | 3 Jun 2022
Nearest neighbour search isn't O(N²), neither is building the index. If you had a machine with enough RAM, then I would recommend scann, as it works well and is incredibly fast. I'm not sure if it works with an on-disk file format, though that's what you would want.
What are some alternatives?
milvus - Vector database for scalable similarity search and AI applications.
qdrant - Qdrant - Vector Search Engine and Database for the next generation of AI applications. Also available in the cloud https://qdrant.to/cloud
fast-soft-sort - Fast Differentiable Sorting and Ranking
struct2depth - Models and examples built with TensorFlow
ML-KWS-for-MCU - Keyword spotting on Arm Cortex-M Microcontrollers
rmi - A learned index structure
faiss - A library for efficient similarity search and clustering of dense vectors.
torchsort - Fast, differentiable sorting and ranking in PyTorch
ml-agents - The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
CLIP - Contrastive Language-Image Pretraining
haystack - :mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.