Word2vec

Open-source projects categorized as Word2vec | Edit details

Top 16 Word2vec Open-Source Projects

  • gensim

    Topic Modelling for Humans

    Project mention: Show HN: I built a site that summarizes articles and PDFs using NLP | news.ycombinator.com | 2022-05-05

    Nice work! I wonder if you're going the same challenges that gensim had for being generic in summarization.

    For context:

    > Despite its general-sounding name, the module will not satisfy the majority of use cases in production and is likely to waste people's time.

    https://github.com/RaRe-Technologies/gensim/wiki/Migrating-f...

  • flashtext

    Extract Keywords from sentence or Replace keywords in sentences.

    Project mention: What is the most efficient way to find substrings in strings? | reddit.com/r/learnpython | 2022-01-11

    Seems like https://github.com/vi3k6i5/flashtext would be better suited here.

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • scattertext

    Beautiful visualizations of how language differs among document types.

    Project mention: Clustering of text - Where to start? | reddit.com/r/LanguageTechnology | 2021-08-04

    If what you want is to determine how similar two categories are, or to learn something about the structure or words that compose those categories, you might consider word shift graphs or Scattertext.

  • magnitude

    A fast, efficient universal vector embedding utility package.

    Project mention: Text Classification Library for a Quick Baseline | news.ycombinator.com | 2021-06-23

    (3) FastText now supports multiple languages [2].

    [1] https://github.com/plasticityai/magnitude#pre-converted-magn...

  • koan

    A word2vec negative sampling implementation with correct CBOW update.

  • textaugment

    TextAugment: Text Augmentation Library

    Project mention: Prefer volume or quality for BERT-based Text classification model | reddit.com/r/LanguageTechnology | 2021-12-13
  • word2vec

    Go library for performing computations in word2vec binary models

    Project mention: Ask HN: Who is hiring? (August 2021) | news.ycombinator.com | 2021-08-02

    Sajari https://www.sajari.com

    Each of us engages with 10 or more different search technologies per day. But most search experiences fall well below the standards set by Google and Amazon.

    We’re here to change that. We’re on a mission to enable every organization to build smart search and discover experiences.

    Here are a few of our open jobs:

    Australia (Sydney or Remote)

    Software Engineer https://www.linkedin.com/jobs/view/2634656333/

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • textfeatures

    👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️

    Project mention: Using dictionaries to check text in R, language processing | reddit.com/r/rstats | 2021-08-18

    TextFeatures can be used to create a summary table of occurrences of text features like number of unique words, etc. Don't know if it does nouns or verbs.

  • pyRDF2Vec

    🐍 Python Implementation and Extension of RDF2Vec

  • postgres-word2vec

    utils to use word embedding models like word2vec vectors in a PostgreSQL database

  • text-summarizer

    Python Framework for Extractive Text Summarization

    Project mention: How do you stay on top of news in an organized way that maximizes absorption and minimizes noise? | reddit.com/r/Zettelkasten | 2022-04-15
  • finalfusion-rust

    finalfusion embeddings in Rust

    Project mention: Compressing high-dimensional vectors by 97% | news.ycombinator.com | 2021-09-02

    Nice article that explains product quantization very well!

    PQ is really a nice compression technique. I implemented PQ and Optimized PQ [1] a while back in our word embedding package for Rust:

    https://github.com/finalfusion/finalfusion-rust/

    https://github.com/finalfusion/reductive/

    Particularly Optimized PQ was effective in reducing vector sizes ~10 times with virtually no reconstruction loss. This made it much easier to ship models (no more 3GB embedding matrix with a neural net that is just a few megabytes large).

    [1] http://kaiminghe.com/publications/pami13opq.pdf

  • dutch-word-embeddings

    Dutch word embeddings, trained on a large collection of Dutch social media messages and news/blog/forum posts.

    Project mention: Is there any way we could help to expand the game to other languages? | reddit.com/r/Semantle | 2022-02-26
  • Romanian-Word-Embeddings

    Romanian Word Embeddings. Here you can find pre-trained corpora of word embeddings. Current methods: CBOW, Skip-Gram, Fast-Text (from Gensim library). The .vec and .model files are available for download (all in one archive).

    Project mention: Romanian word embeddings | reddit.com/r/LanguageTechnology | 2021-12-26
  • NLP-CNN-Subreddit-Sorter-Heroku-App

    End-to-end development of an application using a convolutional neural network that suggests to users/moderators which technical subreddit a post actually belongs to. Novel method to determine # of CNN filters. Custom Word2vec embeddings. The subreddits chosen are all technical and similar, and benefit users/moderators interested in data science and related fields. (Exploratory data analysis, feature engineering, custom word2vec embeddings, convolutional neural network, deployment via flask to

    Project mention: The outputs of my jupyter notebooks inside of Github repos only show half of what they used to. Why did this happen and how to fix? I am certain that the outputs used to show everything when viewed in Github, and I have not reuploaded the notebooks to the repo's since then. | reddit.com/r/github | 2022-03-24
  • graph_summarizer

    summarize text using graphs and language vector models

    Project mention: graph_summarizer | dev.to | 2021-10-14

    Here's the link to the repo

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-05-05.

Word2vec related posts

Index

What are some of the best open-source Word2vec projects? This list will help you:

Project Stars
1 gensim 13,187
2 flashtext 5,163
3 scattertext 1,813
4 magnitude 1,517
5 koan 249
6 textaugment 241
7 word2vec 164
8 textfeatures 154
9 pyRDF2Vec 150
10 postgres-word2vec 120
11 text-summarizer 106
12 finalfusion-rust 47
13 dutch-word-embeddings 30
14 Romanian-Word-Embeddings 8
15 NLP-CNN-Subreddit-Sorter-Heroku-App 1
16 graph_summarizer 0
Find remote jobs at our new job board 99remotejobs.com. There are 8 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com