Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 23 Python spacy Projects
-
Tools: Hugging Face SpaCy Scikit-Learn MLFlow There is no flag to discern a human owner vs a corporate entity, so you have to figure it out on your own. ML can assist given there are tens of thousands of records to go.
-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Project mention: How to Installing Rasa & Building Rasa Chatbot on an M1 Macbook. | dev.to | 2023-01-26 -
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: Tinygrad: A simple and powerful neural network framework | news.ycombinator.com | 2022-11-03
I love those tiny DNN frameworks, some examples that I studied in the past (I still use PyTorch for work related projects) :
thinc.by the creators of spaCy https://github.com/explosion/thinc
-
-
Project mention: Are there any good modules for Text to speech generation and Text summarization? | reddit.com/r/Python | 2022-04-04
-
argilla
✨ Open-source tool for data-centric NLP. Argilla helps domain experts and data teams to build better NLP datasets in less time.
Project mention: Rubrix release 0.17.0 with support for the spaCy training format | reddit.com/r/LanguageTechnology | 2022-08-25 -
Use public layers maintained by others. For example Klayers which are produced with CI in GitHub Actions. This way you don't have any layers lying around in your Lambda UI. You can point to new layer versions whenever you wish. You can interact with the layer data via an API : https://github.com/keithrozario/Klayers/#api
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
Project mention: Guidance needed: Extracting diseases and symptoms from medical text | reddit.com/r/LanguageTechnology | 2022-11-05
https://github.com/medspacy/medspacy and https://allenai.github.io/scispacy/ should get you most of the way there
-
You can run it on a Debian or Ubuntu system and install it through the .deb file or by cloning the GitHub repo. There is also a troubleshooting section that is very helpful with the installation and usage of this system.
-
Project mention: spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit) | reddit.com/r/codehunter | 2022-05-02
(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>conda install -c conda-forge spacyFetching package metadata .............Solving package specifications: .Package plan for installation in environment C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder:The following NEW packages will be INSTALLED: blas: 1.0-mkl cymem: 1.31.2-py35h6538335_0 conda-forge dill: 0.2.8.2-py35_0 conda-forge msgpack-numpy: 0.4.4.2-py_0 conda-forge murmurhash: 0.28.0-py35h6538335_1000 conda-forge plac: 0.9.6-py_1 conda-forge preshed: 1.0.0-py35h6538335_0 conda-forge pyreadline: 2.1-py35_1000 conda-forge regex: 2017.11.09-py35_0 conda-forge spacy: 2.0.12-py35h830ac7b_0 conda-forge termcolor: 1.1.0-py_2 conda-forge thinc: 6.10.3-py35h830ac7b_2 conda-forge tqdm: 4.29.1-py_0 conda-forge ujson: 1.35-py35hfa6e2cd_1001 conda-forgeThe following packages will be UPDATED: msgpack-python: 0.4.8-py35_0 --> 0.5.6-py35he980bc4_3 conda-forgeThe following packages will be DOWNGRADED: freetype: 2.7-vc14_2 conda-forge --> 2.5.5-vc14_2Proceed ([y]/n)? yblas-1.0-mkl.t 100% |###############################| Time: 0:00:00 0.00 B/scymem-1.31.2-p 100% |###############################| Time: 0:00:00 1.65 MB/smsgpack-python 100% |###############################| Time: 0:00:00 5.37 MB/smurmurhash-0.2 100% |###############################| Time: 0:00:00 1.49 MB/splac-0.9.6-py_ 100% |###############################| Time: 0:00:00 0.00 B/spyreadline-2.1 100% |###############################| Time: 0:00:00 4.62 MB/sregex-2017.11. 100% |###############################| Time: 0:00:00 3.31 MB/stermcolor-1.1. 100% |###############################| Time: 0:00:00 187.81 kB/stqdm-4.29.1-py 100% |###############################| Time: 0:00:00 2.51 MB/sujson-1.35-py3 100% |###############################| Time: 0:00:00 1.66 MB/sdill-0.2.8.2-p 100% |###############################| Time: 0:00:00 4.34 MB/smsgpack-numpy- 100% |###############################| Time: 0:00:00 0.00 B/spreshed-1.0.0- 100% |###############################| Time: 0:00:00 0.00 B/sthinc-6.10.3-p 100% |###############################| Time: 0:00:00 5.49 MB/sspacy-2.0.12-p 100% |###############################| Time: 0:00:10 7.42 MB/s(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>python -VPython 3.5.3 :: Anaconda custom (64-bit)(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>python -m spacy download enCollecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en\_core\_web\_sm==2.0.0 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB) 100% |################################| 37.4MB ...Installing collected packages: en-core-web-sm Running setup.py install for en-core-web-sm ... doneSuccessfully installed en-core-web-sm-2.0.0 Linking successful C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\en_core_web_sm --> C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\data\en You can now load the model via spacy.load('en')(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>
-
refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
Project mention: Open-source tool to label, assess and maintain natural language data. Treat training data like a software artifact! | reddit.com/r/LanguageTechnology | 2023-01-25 -
Project mention: Using TensorFlow and the Serverless Framework for deep learning and image recognition | dev.to | 2022-05-31
As a hobby, I port a lot of libraries to make the serverless friendly. You can look at them here. They all have an MIT license, so feel free to modify and use them for your project.
-
You could check out https://github.com/explosion/projects/tree/v3/tutorials for some sample code (this is the official spacy github)
-
Project mention: Entity Extraction with Predefined List | reddit.com/r/LanguageTechnology | 2023-01-07
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
Project mention: How can I create a Sanskrit language model? | reddit.com/r/datascience | 2023-01-08
-
subreddit-analyzer
A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.
-
-
summarizer
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.
Project mention: ¿Cómo se llamaba el alma de código que resumía noticias?¿Sigue vivo? | reddit.com/r/mexicow | 2022-02-21https://github.com/PhantomInsights/summarizer y https://github.com/PhantomInsights/tweet-transcriber
-
Project mention: Negate: A Python package to negate sentences | reddit.com/r/LanguageTechnology | 2022-12-19
Cool project! How does this compare to, e.g., negspaCy?
-
concise-concepts
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.
Project mention: No training data, no problem! Few-shot NER with a practical example | reddit.com/r/learnmachinelearning | 2022-05-10Concise-concepts, few-shot NER for spaCy: https://github.com/Pandora-Intelligence/concise-concepts
-
Project mention: Show HN: Zshot, Zero and Few shot named entity and relationships recognition | news.ycombinator.com | 2022-10-28
-
Project mention: Tools for comprehensive conjugation detection/transformation | reddit.com/r/LanguageTechnology | 2022-06-05
For inflections, LemmInflect
-
WordDumb
A calibre plugin that generates Kindle Word Wise and X-Ray files for KFX, AZW3, MOBI and EPUB eBook. Supports 23 languages.
github page here, mobileread page here
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python spacy related posts
- Lambda with Python libraries
- Build Spacy NER Loop for Dataframe
- Newbie question with Spacy Coreference Resolution
- spaCy just got an experimental feature to detect co-references
- SpanFinder is a new experimental spaCy component that identifies span boundaries
- spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)
- [P] Programmatic: Powerful Weak Labeling
-
A note from our sponsor - Sonar
www.sonarsource.com | 1 Feb 2023
Index
What are some of the best open-source spacy projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | spaCy | 25,101 |
2 | rasa | 15,493 |
3 | thinc | 2,649 |
4 | textacy | 2,020 |
5 | pytextrank | 1,956 |
6 | argilla | 1,555 |
7 | Klayers | 1,473 |
8 | scispacy | 1,311 |
9 | Dragonfire | 1,310 |
10 | spacy-models | 1,230 |
11 | refinery | 1,095 |
12 | lambda-packs | 1,084 |
13 | projects | 1,012 |
14 | skweak | 858 |
15 | cltk | 760 |
16 | subreddit-analyzer | 482 |
17 | medaCy | 383 |
18 | summarizer | 251 |
19 | negspacy | 250 |
20 | concise-concepts | 200 |
21 | zshot | 199 |
22 | LemmInflect | 194 |
23 | WordDumb | 181 |