Entity Resolution with Magniv

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • record-linkage-resources

    Resources for tackling record linkage / deduplication / data matching problems

    Entity resolution/record linkage/deduplication is an oddly specialized domain of knowledge given that it's such a common problem. I put together a page of resources a while back if anyone is interested: https://github.com/ropeladder/record-linkage-resources

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • faiss

    A library for efficient similarity search and clustering of dense vectors.

    You just can't, similarity metrics (especially cosine) on 768 dim arrays are prohibitively slow.

    Any reason you couldn't just dump it in FAISS or Annoy? No need to do pairwise comparisons. https://github.com/facebookresearch/faiss/issues/95

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts