pytorch-partial-crf
usaddress
pytorch-partial-crf | usaddress | |
---|---|---|
1 | 5 | |
30 | 1,488 | |
- | 0.3% | |
10.0 | 0.0 | |
over 1 year ago | 4 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pytorch-partial-crf
-
Entity Extraction with Predefined List
Feed it to a bi-LSTM with a fuzzy-CRF on top.
usaddress
-
Which of your favorite Python 3.11 packages lack Python 3.11 support?
Usaddress https://github.com/datamade/usaddress
-
Script to split addresses in Google Sheets?
Assuming you’re working with addresses in the US, here’s a Python package that should help: https://github.com/datamade/usaddress
-
PyWhat: Identify Anything
Some great probabilistic python libraries:
https://github.com/datamade/usaddress - "usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods."
https://github.com/datamade/probablepeople - "probablepeople is a python library for parsing unstructured romanized name or company strings into components, using advanced NLP methods."
- Turning unstructured address data into a structure Salesforce Address Field
-
Fuzzy Name Matching in Postgres
For address parsing, I've had good luck with this package: https://github.com/datamade/usaddress
What are some alternatives?
pytorch-crf - (Linear-chain) Conditional random field in PyTorch.
libpostal - A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
skweak - skweak: A software toolkit for weak supervision applied to NLP tasks
pyWhat - 🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
NCRFpp - NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
probablepeople - :family: a python library for parsing unstructured western names into name components.
python-crfsuite - A python binding for crfsuite
DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
pytorch-tutorial - PyTorch Tutorial for Deep Learning Researchers
ctparse - Parse natural language time expressions in python
SymSpell - SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
FuckIt.py - The Python error steamroller.