rumble vs zingg

rumble

⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)

Source Code

rumbledb.org

Suggest alternative

Edit details

zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML (by zinggAI)

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

rumble		zingg
	Project
1	Mentions	23
207	Stars	875
1.0%	Growth	2.1%
8.5	Activity	9.3
about 1 month ago	Latest Commit	about 14 hours ago
Java	Language	Java
GNU General Public License v3.0 or later	License	GNU Affero General Public License v3.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

rumble

Posts with mentions or reviews of rumble. We have used some of these posts to build our list of alternatives and similar projects.

We haven't tracked posts mentioning rumble yet.
Tracking mentions began in Dec 2020.

zingg

Posts with mentions or reviews of zingg. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-18.

Ask HN: What is the most impactful thing you've ever built?
33 projects | news.ycombinator.com | 18 Nov 2022

As part of my data consulting, I struggled with identity resolution and started working on scalable no code identity resolution - https://github.com/zinggAI/zingg/ . It has pushed my limits as a software engineer and product builder, and I had to do a lot of learning to build it. Its cool to see people use Zingg in their workflows and save months of working on custom solutions. Big highlight has been North Carolina Open Campaign Data https://crossroads-cx.medium.com/building-open-access-to-nc-...
Show HN: Zingg – open-source entity resolution for single source of truth
3 projects | news.ycombinator.com | 9 Feb 2022

Hello HN,
I am Sonal, a data consultant from India. For the past few months(and years!), I have been working on an entity resolution tool to build a single source of truth for customers, suppliers, products and parts. Here is a short demo of Zingg in action https://www.youtube.com/watch?v=zOabyZxN9b0
As a data consultant, I often struggled to build unified views of core entities on the datalake and the warehouse. Data spread across different systems has variations and consistencies making Customer 360, KYC, AML, segmentation, personalization and other analytics difficult.
As I talked with different clients facing this issue, I searched for existing solutions which I could use or recommend. Unfortunately, most of them were very expensive MDM solutions like Tamr, or CDP solutions like Amperity. There were many open source libraries, but they did not tie well into the datalake/warehouse scenarios we were working with, did not scale and/or needed a decent bit of programming or did not generalize. I even tried to build something internally and failed miserably, and that got me hooked :-)
As I dug deeper into the problem, I realized that there were multiple challenges. Data matching, at its very core, becomes a cartesian join, as you need to compare every pair of records to figure out the matches. With millions of records, this becomes extremely tough to scale. I referred to various research papers and then implemented a blocking algorithm to overcome this. More details at https://docs.zingg.ai/docs/zModels.html
The second challenge was to say which pairs are a match. I wanted to have a machine learning-based approach to handle the different types of entities and the variety of differences in real world data. But I also felt that non ML experts should be able to use Zingg easily, hence took the approach of abstracting the feature generation and hyper-parameter tuning for the classifier.
Once I settled on the ML approach, the problem of training data quickly arose, which led me to pick up active learning and build an interactive labeler through which sample records can be marked as matches and non matches to build training sets quickly. I still feel that we should have an unsupervised approach as well, but I have not yet figured out the right way to do so.
The Zingg repository is hosted at https://github.com/zinggAI/zingg and we have close to 60 members on our Slack(https://join.slack.com/t/zinggai/shared_invite/zt-w7zlcnol-vEuqU9m~Q56kLLUVxRgpOA). We are now two developers working full time on Zingg!!! I am super happy that early users have been able to use Zingg and push us to build more stuff - model documentation, using pre-existing training data, native Snowflake integration etc.
I have been an open source consumer all my dev life, and this is the first time I have made a decent contribution. It is my first time trying to build a community as well. Not sure how the future will unfold, but wanted to reach out to the community here and hear what you think about the problem, the approach, any ideas or suggestions.
Thanks for reading along, and please do post your thoughts in the comments below.

3 projects | news.ycombinator.com | 9 Feb 2022

Thanks for your support. Yes we do ship with some examples and their models which can be run out of the box. We have 3 customer demographic datasets and an ecommerce items matching across Google and Amazon. You can check them here https://github.com/zinggAI/zingg/tree/main/examples
How do I promote the project appropriately?
3 projects | /r/opensource | 30 Dec 2021

Have you tried posting on hacker news and subreddits? I am also working on an open source entity resolution tool at https://github.com/zinggAI/zingg and I saw good response from the data engineering, data science and the ML subreddits as well as hacker news.

3 projects | /r/opensource | 30 Dec 2021
GitHub Java Projects to Contribute
2 projects | /r/opensource | 17 Nov 2021

Check Zingg out at https://github.com/zinggAI/zingg and let me know if you would like to contribute
Match over 1 GB of data with inconsistent names
3 projects | /r/dataengineering | 9 Nov 2021

I am working on an open source tool that uses ml for fuzzy matching - https://github.com/zinggAI/zingg . Hope you find it useful. Happy to help.

3 projects | /r/dataengineering | 9 Nov 2021

This is interesting, would love to get your feedback on Zingg(https://github.com/zinggAI/zingg) if you are upto it. Thanks!
Introducing Zingg: Open Source Entity Resolution and Deduplication Using ML and Spark
2 projects | /r/datascience | 5 Oct 2021

- Zingg scales very well to large volumes of data(https://github.com/zinggAI/zingg/blob/main/docs/hardwareSizing.md)

2 projects | /r/datascience | 5 Oct 2021

What are some alternatives?

When comparing rumble and zingg you can also consider the following projects:

splink - Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

clrs

AA-Tweaker - Tool to apply patches to Google Play Services that will enable some extra functionality on Android Auto

s3proxy - Access other storage backends via the S3 API

CLRS - Algorithms implementation in C++ and solutions of questions (both code and math proof) from “Introduction to Algorithms” (3e) (CLRS) in LaTeX.

skipledger - Differential privacy solution for maintaining and exposing information from evolving, append-only journals / ledgers.

automount - Simple devd(8) based automounter for FreeBSD

lsblk - List information about block devices in the FreeBSD system.

nested-data-reporting-plugin - Jenkins plugin to report data from nested as pie-charts, trend-charts and data tables.

yt-channels-DS-AI-ML-CS - A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

unidecode - ASCII transliterations of Unicode text - GitHub mirror

vscode-data-preview - Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

zingg vs splink zingg vs clrs rumble vs AA-Tweaker rumble vs s3proxy zingg vs CLRS zingg vs skipledger zingg vs automount zingg vs lsblk rumble vs nested-data-reporting-plugin zingg vs yt-channels-DS-AI-ML-CS zingg vs unidecode rumble vs vscode-data-preview

Compare rumble vs zingg and see what are their differences.

rumble

zingg

rumble

zingg

What are some alternatives?