usaddress
libpostal
usaddress | libpostal | |
---|---|---|
5 | 5 | |
1,547 | 4,140 | |
0.3% | 0.7% | |
4.7 | 6.0 | |
7 days ago | 6 days ago | |
Python | C | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
usaddress
-
Which of your favorite Python 3.11 packages lack Python 3.11 support?
Usaddress https://github.com/datamade/usaddress
-
Script to split addresses in Google Sheets?
Assuming you’re working with addresses in the US, here’s a Python package that should help: https://github.com/datamade/usaddress
-
PyWhat: Identify Anything
Some great probabilistic python libraries:
https://github.com/datamade/usaddress - "usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods."
https://github.com/datamade/probablepeople - "probablepeople is a python library for parsing unstructured romanized name or company strings into components, using advanced NLP methods."
- Turning unstructured address data into a structure Salesforce Address Field
-
Fuzzy Name Matching in Postgres
For address parsing, I've had good luck with this package: https://github.com/datamade/usaddress
libpostal
-
Install Python Libraries Using Command Prompt
@echo off REM Check if MSYS2 and MinGW are installed where msys2 2>nul >nul if %errorlevel% equ 0 ( echo MSYS2 is already installed. Use --force to reinstall. ) else ( REM Install MSYS2 and MinGW choco install msys2 refreshenv ) REM Check if MSYS2 packages are updated pacman -Qu 2>nul >nul if %errorlevel% equ 0 ( echo MSYS2 packages are already updated. Use --force to reinstall. ) else ( REM Update MSYS2 packages pacman -Syu ) REM Check if build dependencies are installed pacman -Q autoconf automake curl git make libtool gcc mingw-w64-x86_64-gcc 2>nul >nul if %errorlevel% equ 0 ( echo Build dependencies are already installed. Use --force to reinstall. ) else ( REM Install build dependencies pacman -S autoconf automake curl git make libtool gcc mingw-w64-x86_64-gcc ) REM Check if libpostal is cloned if exist libpostal ( echo libpostal repository is already cloned. Use --force to reinstall. ) else ( REM Clone libpostal repository git clone https://github.com/openvenues/libpostal ) cd libpostal REM Check if libpostal is built and installed if exist C:/Program Files/libpostal/bin/libpostal.dll ( echo libpostal is already built and installed. Use --force to reinstall. ) else ( REM Build and install libpostal cp -rf windows/* ./ ./bootstrap.sh ./configure --datadir=C:/libpostal make -j4 make install ) REM Check if libpostal is added to PATH environment variable setx /m PATH "%PATH%;C:\Program Files\libpostal\bin" 2>nul >nul if %errorlevel% equ 0 ( echo libpostal is already added to PATH environment variable. Use --force to reinstall. ) else ( REM Add libpostal to PATH environment variable setx PATH "%PATH%;C:\Program Files\libpostal\bin" ) REM Test libpostal installation libpostal "100 S Broad St, Philadelphia, PA" pause
-
Transforming free-form geospatial directions into addresses - SOTA?
I know of https://github.com/openvenues/libpostal which handles typos and omissions in addresses, but I am looking into a more fuzzy description of a location.
-
[P] Better ways to clean lots of text?
use an address parser library like libpostal https://github.com/openvenues/libpostal
-
complete stack for an analysis team
Also, what OS(s) does IT support for clients and servers? I think Libpostal doesn't officially support Windows, but you can build it to target that. Seems difficult and/or unreliable though: https://github.com/openvenues/libpostal/issues/219
-
Automating a Web Scraper
You can feed libpostal sequence of string until it gives good results. A lot of miss, some hits, score the hits. https://github.com/openvenues/libpostal
What are some alternatives?
DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
neuralcoref - ✨Fast Coreference Resolution in spaCy with Neural Networks
pyWhat - 🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
splink - Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
addok - Search engine for address. Only address.
kvdo - A kernel module which provide a pool of deduplicated and/or compressed block storage.
ctparse - Parse natural language time expressions in python
rmlint - Extremely fast tool to remove duplicates and other lint from your filesystem
probablepeople - :family: a python library for parsing unstructured western names into name components.
link-grammar - The CMU Link Grammar natural language parser
SymSpell - SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python