Fuzzy Name Matching in Postgres

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. usaddress

    :us: a python library for parsing unstructured United States address strings into address components

    For address parsing, I've had good luck with this package: https://github.com/datamade/usaddress

  2. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  3. dmetaphone

    Double Metaphone for PostgreSQL full text search

    you can use double metaphone with postgres' text search facility directly: https://github.com/jkominek/dmetaphone

  4. SymSpell

    SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

    I'm glad to see these built-in to Postgres, as these are the basics of fuzzy string matching.

    A quantum leap would be to integrate an implementation of the symmetric delete algorithm, such as https://github.com/wolfgarbe/SymSpell

    Soundex and Phonex can yield too many false negatives outside of phonetically English names. Levenshtein/Jaro-Winkler aren't indexable solutions themselves, so they require N^2 comparisons. SymSpell conceptually combines these two into an indexed string-distance solution. It has the usual index issue of being designed for many reads, few writes.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Should you combine edit distance "spell check" algorithms with phonetic matching algorithms for robust keyword finding?

    1 project | /r/AskComputerScience | 7 Nov 2023
  • Learn more about spell checkers

    2 projects | /r/nlp_knowledge_sharing | 18 Mar 2023
  • Help with deep learning project "autocorrection"

    1 project | /r/deeplearning | 15 Jan 2023
  • Spellcheck and Levenshtein distance

    1 project | /r/MLQuestions | 15 Nov 2022
  • Auto correct/Auto complete feature

    1 project | /r/AskComputerScience | 27 Jun 2022