Seeking Recommendations for a Master Data Management Tool

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • fuzzy-item-matching

    Use machine learning and the Databricks Lakehouse Platform for product matching that can be used by marketplaces and suppliers for various purposes. Resolve differences between product definitions and descriptions and determine which items are likely pairs and which are distinct across disparate data sets.

  • The fuzzy matching isn't tool specific. Using existing quality checks by business low code tools (Rapidminer/Alteryx). E.g. custom text similarity matches for things like vendor/customer names to notify on typo back to ERP. This looks similar to the concept (https://github.com/databricks-industry-solutions/fuzzy-item-matching)

  • delta-rs

    A native Rust library for Delta Lake, with bindings into Python

  • Maybe if I get some free time soon I can formalize into a working example. Been wanting an excuse to try similar concept in delta-rs and polars/duckdb vs databricks/spark vs iceberg/polars.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts