Python spacy

Open-source Python projects categorized as spacy | Edit details

Top 14 Python spacy Projects

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Two Methods to Scan for PII in Data Warehouses | | 2021-11-29

    NLP libraries such as Stanford NER Detector and Spacy

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: How to Create the Perfect README for Your Open Source Project | | 2021-11-02

    This example is sourced from RasaHQ

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo thinc

    🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

    Project mention: good examples of functional-like python code that one can study? | | 2021-06-29

    thinc - defining neural nets in functional way jax, a new deep learning framework puts emphasis on functions rather than tensors, I've tested it for a couple of applications and it's really cool, you can write stuff like you'd write math expressions in papers using numpy. That speeds up development significantly, and makes code much more readable

  • GitHub repo textacy

    NLP, before and after spaCy

  • GitHub repo pytextrank

    Python implementation of TextRank for phrase extraction and summarization of text documents

    Project mention: Question on easing comprehension | | 2021-09-15
  • GitHub repo Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

    Project mention: Why your own Assistant when there are sooo many? | | 2021-08-31
  • GitHub repo spacy-models

    💫 Models for the spaCy Natural Language Processing (NLP) library

    Project mention: word similarity vs. sentence similarity | | 2021-08-25

    Well the medium model is using Glove (common crawl) for word vectors. There are only 685K keys so depending on the corpus you are working with, its possible lots of the words you are interested in don't have a corresponding vector and end up as zero vectors. Spacy Document/Span vectors are simply averages of the word vectors. So the higher performance of phrases may simply be because there is a higher chance of non Out of Vocabulary (OOV) words. So less chance of a zero vector.

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo Klayers

    Python Packages as AWS Lambda Layers

    Project mention: Can a lambda use a layer which is stored in S3 | | 2021-03-19

    I like to use this guy’s layers as an arn:

  • GitHub repo projects

    🪐 End-to-end NLP workflows from prototype to production (by explosion)

    Project mention: SpaCy v3.0 Released (Python Natural Language Processing) | | 2021-02-01

    The improved transformers support is definitely one of the main features of the release. I'm also really pleased with how the project system and config files work.

    If you're always working with exactly one task model, I think working directly in transformers isn't that different from using spaCy. But if you're orchestrating multiple models, spaCy's pipeline components and Doc object will probably be helpful. A feature in v3 that I think will be particularly useful is the ability to share a transformer model between multiple components, for instance you can have an entity recogniser, text classifier and tagger all using the same transformer, and all backpropagating to it.

    You also might find the projects system useful if you're training a lot of models. For instance, take a look at the project repo [here]( Most of the readme there is actually generated from the project.yml file, which fully specifies the preprocessing steps you need to build the project from the source assets. The project system can also push and pull intermediate or final artifacts to a remote cache, such as an S3 bucket, with the addressing of the artifacts calculated based on hashes of the inputs and the file itself.

    The config file is comprehensive and extensible. The blocks refer to typed functions that you can specify yourself, so you can substitute any of your own layer (or other) functions in, to change some part of the system's behaviour. You don't _have_ to specify your models from the config files like this --- you can instead put it together in code. But the config system means there's a way of fully specifying a pipeline and all of the training settings, which means you can really standardise your training machinery.

    Overall the theme of what we're doing is helping you to line up the workflows you use during development with something you can actually ship. We think one of the problems for ML engineers is that there's quite a gap between how people are iterating in their local dev environment (notebooks, scrappy directories etc) and getting the project into a state that you can get other people working on, try out in automation, and then pilot in some sort of soft production (e.g. directing a small amount of traffic to the model).

    The problem with iterating in the local state is that you're running the model against benchmarks that are not real, and you hit diminishing returns quite quickly this way. It also introduces a lot of rework.

    All that said, there will definitely be usage contexts where it's not worth introducing another technology. For instance, if your main goal is to develop a model, run an experiment and publish a paper, you might find spaCy doesn't do much that makes your life easier.

  • GitHub repo subreddit-analyzer

    A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.

    Project mention: [For Hire] Data Analysis, Bots, Web Scrapers & Automation Software | | 2021-03-23

    Subreddit Analyzer using pandas, matplotlib, Seaborn, spaCy and wordcloud.

  • GitHub repo skweak

    skweak: A software toolkit for weak supervision applied to NLP tasks

    Project mention: Skweak: Weak Supervision for NLP | | 2021-08-22
  • GitHub repo medaCy

    :hospital: Medical Text Mining and Information Extraction with spaCy

    Project mention: Help / Direction | | 2021-02-12

    If you want an easier/ more straight-forward approach, you can check out Medacy (

  • GitHub repo summarizer

    A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

    Project mention: [For Hire] Data Analysis, Bots, Web Scrapers & Automation Software | | 2021-02-23

    Universal Web Scraper that summarizes news articles.

  • GitHub repo thinc-apple-ops

    🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

    Project mention: Spacy training on Apple M1 vs. AMD Ryzen 5900X: 55% faster, 16x more efficient | | 2021-11-08
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-29.

Python spacy related posts


What are some of the best open-source spacy projects in Python? This list will help you:

Project Stars
1 spaCy 21,870
2 rasa 13,108
3 thinc 2,412
4 textacy 1,818
5 pytextrank 1,663
6 Dragonfire 1,181
7 spacy-models 969
8 Klayers 874
9 projects 673
10 subreddit-analyzer 472
11 skweak 445
12 medaCy 340
13 summarizer 232
14 thinc-apple-ops 49
Find remote jobs at our new job board There are 32 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives