entity-embed
dblink
entity-embed | dblink | |
---|---|---|
2 | 1 | |
138 | 54 | |
0.0% | - | |
0.0 | 0.0 | |
over 1 year ago | almost 3 years ago | |
Jupyter Notebook | Scala | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
entity-embed
dblink
-
[D] Machine Learning and "Record Linkage"
Felligi-Sunter is the baseline model in record linkage research. It is implemented in R in fastLink and RecordLinkage, but you will need training data. There are some other options, e.g. dblink, that use Bayesian methods and a latent variable set up so you don’t need training data.
What are some alternatives?
splink - Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
fastai - The fastai deep learning library
mmlspark - Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML]
TensorFlow-Examples - TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
sparkMeasure - This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
delight - A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
JedAIToolkit - An open source, high scalability toolkit in Java for Entity Resolution.
record-linkage-resources - Resources for tackling record linkage / deduplication / data matching problems