Top 14 Python spacy Projects
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: Two Methods to Scan for PII in Data Warehouses | dev.to | 2021-11-29
NLP libraries such as Stanford NER Detector and Spacy
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistantsProject mention: How to Create the Perfect README for Your Open Source Project | dev.to | 2021-11-02
This example is sourced from RasaHQ
Run Linux Software Faster and Safer than Linux with Unikernels.
🔮 A refreshing functional take on deep learning, compatible with your favorite librariesProject mention: good examples of functional-like python code that one can study? | reddit.com/r/functionalprogramming | 2021-06-29
thinc - defining neural nets in functional way jax, a new deep learning framework puts emphasis on functions rather than tensors, I've tested it for a couple of applications and it's really cool, you can write stuff like you'd write math expressions in papers using numpy. That speeds up development significantly, and makes code much more readable
NLP, before and after spaCy
Python implementation of TextRank for phrase extraction and summarization of text documentsProject mention: Question on easing comprehension | dev.to | 2021-09-15
the open-source virtual assistant for Ubuntu based Linux distributionsProject mention: Why your own Assistant when there are sooo many? | reddit.com/r/SapphireFramework | 2021-08-31
💫 Models for the spaCy Natural Language Processing (NLP) libraryProject mention: word similarity vs. sentence similarity | reddit.com/r/LanguageTechnology | 2021-08-25
Well the medium model is using Glove (common crawl) for word vectors. There are only 685K keys so depending on the corpus you are working with, its possible lots of the words you are interested in don't have a corresponding vector and end up as zero vectors. Spacy Document/Span vectors are simply averages of the word vectors. So the higher performance of phrases may simply be because there is a higher chance of non Out of Vocabulary (OOV) words. So less chance of a zero vector.
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
Python Packages as AWS Lambda LayersProject mention: Can a lambda use a layer which is stored in S3 | reddit.com/r/aws | 2021-03-19
I like to use this guy’s layers as an arn: https://github.com/keithrozario/Klayers
🪐 End-to-end NLP workflows from prototype to production (by explosion)Project mention: SpaCy v3.0 Released (Python Natural Language Processing) | news.ycombinator.com | 2021-02-01
The improved transformers support is definitely one of the main features of the release. I'm also really pleased with how the project system and config files work.
If you're always working with exactly one task model, I think working directly in transformers isn't that different from using spaCy. But if you're orchestrating multiple models, spaCy's pipeline components and Doc object will probably be helpful. A feature in v3 that I think will be particularly useful is the ability to share a transformer model between multiple components, for instance you can have an entity recogniser, text classifier and tagger all using the same transformer, and all backpropagating to it.
You also might find the projects system useful if you're training a lot of models. For instance, take a look at the project repo [here](https://github.com/explosion/projects/tree/v3/benchmarks/ner...). Most of the readme there is actually generated from the project.yml file, which fully specifies the preprocessing steps you need to build the project from the source assets. The project system can also push and pull intermediate or final artifacts to a remote cache, such as an S3 bucket, with the addressing of the artifacts calculated based on hashes of the inputs and the file itself.
The config file is comprehensive and extensible. The blocks refer to typed functions that you can specify yourself, so you can substitute any of your own layer (or other) functions in, to change some part of the system's behaviour. You don't _have_ to specify your models from the config files like this --- you can instead put it together in code. But the config system means there's a way of fully specifying a pipeline and all of the training settings, which means you can really standardise your training machinery.
Overall the theme of what we're doing is helping you to line up the workflows you use during development with something you can actually ship. We think one of the problems for ML engineers is that there's quite a gap between how people are iterating in their local dev environment (notebooks, scrappy directories etc) and getting the project into a state that you can get other people working on, try out in automation, and then pilot in some sort of soft production (e.g. directing a small amount of traffic to the model).
The problem with iterating in the local state is that you're running the model against benchmarks that are not real, and you hit diminishing returns quite quickly this way. It also introduces a lot of rework.
All that said, there will definitely be usage contexts where it's not worth introducing another technology. For instance, if your main goal is to develop a model, run an experiment and publish a paper, you might find spaCy doesn't do much that makes your life easier.
A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.Project mention: [For Hire] Data Analysis, Bots, Web Scrapers & Automation Software | reddit.com/r/forhire | 2021-03-23
Subreddit Analyzer using pandas, matplotlib, Seaborn, spaCy and wordcloud.
skweak: A software toolkit for weak supervision applied to NLP tasksProject mention: Skweak: Weak Supervision for NLP | news.ycombinator.com | 2021-08-22
:hospital: Medical Text Mining and Information Extraction with spaCyProject mention: Help / Direction | reddit.com/r/MLQuestions | 2021-02-12
If you want an easier/ more straight-forward approach, you can check out Medacy (https://github.com/NLPatVCU/medaCy)
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.Project mention: [For Hire] Data Analysis, Bots, Web Scrapers & Automation Software | reddit.com/r/jobbit | 2021-02-23
Universal Web Scraper that summarizes news articles.
🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate libraryProject mention: Spacy training on Apple M1 vs. AMD Ryzen 5900X: 55% faster, 16x more efficient | news.ycombinator.com | 2021-11-08
Python spacy related posts
word similarity vs. sentence similarity
1 project | reddit.com/r/LanguageTechnology | 25 Aug 2021
Skweak: Weak Supervision for NLP
1 project | news.ycombinator.com | 22 Aug 2021
Inevitable Manual Work involved in NLP
1 project | reddit.com/r/LanguageTechnology | 4 May 2021
How to get Training data for NER?
2 projects | reddit.com/r/LanguageTechnology | 24 Apr 2021
[For Hire] Data Analysis, Bots, Web Scrapers & Automation Software
1 project | reddit.com/r/forhire | 23 Mar 2021
Word cloud for r/realmadrid
1 project | reddit.com/r/realmadrid | 22 Mar 2021
Can a lambda use a layer which is stored in S3
1 project | reddit.com/r/aws | 19 Mar 2021
What are some of the best open-source spacy projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.