Our great sponsors
-
pyWhat
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
usaddress
:us: a python library for parsing unstructured United States address strings into address components
-
probablepeople
:family: a python library for parsing unstructured western names into name components.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
In the same vague theme of "I don't know what I'm dealing with" : https://github.com/ajalt/fuckitpy
Another one sort of related is hachoir, and specifically the hachoir-metadata script: https://github.com/vstinner/hachoir
Some great probabilistic python libraries:
https://github.com/datamade/usaddress - "usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods."
https://github.com/datamade/probablepeople - "probablepeople is a python library for parsing unstructured romanized name or company strings into components, using advanced NLP methods."
Some great probabilistic python libraries:
https://github.com/datamade/usaddress - "usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods."
https://github.com/datamade/probablepeople - "probablepeople is a python library for parsing unstructured romanized name or company strings into components, using advanced NLP methods."
We built a similar tool, utilizing a CNN. It works on structured (and unstructured) data and provides additional info.
https://github.com/capitalone/DataProfiler
Cool part, is you can “extend” the intern name-entity recognition model by refitting with the new data.
Out if the box, the DataProfiler does something like 18 entities including most of the PII dada.
Didn't know there was a python version, but as the README says, this is based on the classic fuckitjs: https://github.com/mattdiamond/fuckitjs