PyWhat: Identify Anything

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • pyWhat

    🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

  • FuckIt.py

    The Python error steamroller.

  • In the same vague theme of "I don't know what I'm dealing with" : https://github.com/ajalt/fuckitpy

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • hachoir

    Hachoir is a Python library to view and edit a binary stream field by field

  • Another one sort of related is hachoir, and specifically the hachoir-metadata script: https://github.com/vstinner/hachoir

  • usaddress

    :us: a python library for parsing unstructured United States address strings into address components

  • Some great probabilistic python libraries:

    https://github.com/datamade/usaddress - "usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods."

    https://github.com/datamade/probablepeople - "probablepeople is a python library for parsing unstructured romanized name or company strings into components, using advanced NLP methods."

  • probablepeople

    :family: a python library for parsing unstructured western names into name components.

  • Some great probabilistic python libraries:

    https://github.com/datamade/usaddress - "usaddress is a Python library for parsing unstructured address strings into address components, using advanced NLP methods."

    https://github.com/datamade/probablepeople - "probablepeople is a python library for parsing unstructured romanized name or company strings into components, using advanced NLP methods."

  • DataProfiler

    What's in your data? Extract schema, statistics and entities from datasets

  • We built a similar tool, utilizing a CNN. It works on structured (and unstructured) data and provides additional info.

    https://github.com/capitalone/DataProfiler

    Cool part, is you can “extend” the intern name-entity recognition model by refitting with the new data.

    Out if the box, the DataProfiler does something like 18 entities including most of the PII dada.

  • chardet

    Python character encoding detector

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • fuckitjs

    The Original Javascript Error Steamroller

  • Didn't know there was a python version, but as the README says, this is based on the classic fuckitjs: https://github.com/mattdiamond/fuckitjs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts