Our great sponsors
-
record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems
Entity resolution/record linkage/deduplication is an oddly specialized domain of knowledge given that it's such a common problem. I put together a page of resources a while back if anyone is interested: https://github.com/ropeladder/record-linkage-resources
-
dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
You just can't, similarity metrics (especially cosine) on 768 dim arrays are prohibitively slow.
Any reason you couldn't just dump it in FAISS or Annoy? No need to do pairwise comparisons. https://github.com/facebookresearch/faiss/issues/95