mmlspark VS metarank

Compare mmlspark vs metarank and see what are their differences.

mmlspark

Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML] (by Azure)

metarank

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine (by metarank)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
mmlspark metarank
2 13
2,489 1,977
- 1.2%
9.3 9.1
over 2 years ago about 11 hours ago
Scala Scala
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

mmlspark

Posts with mentions or reviews of mmlspark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-03-08.

metarank

Posts with mentions or reviews of metarank. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-06-22.
  • My Favorite Off-the-Shelf Data Science Repos, What Are Yours?
    3 projects | news.ycombinator.com | 22 Jun 2022
    Here are my top off-the-shelf data science models for Marketing. Would be interested which other marketing data science tools you use?

    Product Recommendation on Your Website with Metarank (https://github.com/metarank/metarank)

    Metarank is a tool that helps you easily build an advanced recommendation engine for your products or content on your website. To get started you only need historical performance data of your products (e.g. number of clicks) and additional metadata like product rating, genre, ingredients or price. In a YAML file, you define the features and the model parameters (e.g. number of iterations, modeling technique). The API service integrates with Apache Flink and can be easily integrated into Kubernetes clusters.

    User Journey Analysis on your Website with Retentioneering (https://github.com/retentioneering/retentioneering-tools)

    Retentioneering helps you to understand the user journey on your website. Retentioneering is a Python library that allows you to easily connect your Google Analytics data (in Bigquery). You define user-id, event-type and time stamp. From this data input a comprehensive graph network is created with gains and losses as you know it from a customer journey. In addition, customer segments are created that have a similar customer journey. This reduces the complexity of a purely descriptive view of the data.

    Marketing Mix Modeling with Robyn (https://github.com/facebookexperimental/Robyn)

    Less third-party cookie means less attribution models. The answer to this is Marketing Mix Modeling. Marketing mix models are regression models that use statistical probability to calculate the effect size of marketing channels and other independent variables. The advantage is that business context can be modeled much more realistically. For example, Google Searches for the own brand can be integrated to determine the share of the own brand strength in the revenue. Likewise, offline advertising measures can be modeled with other metrics in this context (e.g. offline advertising with GRPs). Robyn takes into account adstock effects, ROAS calculation and multicollinarity in the marketing channels. In addition, with simple functionality, budgets can be optimized using the predictions and results from marketing tests can be integrated into the model for calibration.

  • Metarank - A low code Machine Learning tool that personalizes product listings, articles, recommendations, and search results in order to boost sales. A friendly Learn-to-Rank engine
    2 projects | /r/scala | 23 Mar 2022
  • Show HN: We made an open-source personalization engine
    7 projects | news.ycombinator.com | 23 Mar 2022
    As people with heavy e-commerce background, we feel that the main pain point of typical old-school offline personalization solutions is that 80% of customers in medium-sized online stores are coming only once:

    * you have a very short window to adapt your store, as the visitor will never come back in the future.

    * even if you have zero past knowledge about a new visitor, there is still something to compare with other similar visitors: are they from mobile? Is it ios or android? Are they US? Is it a holiday now? Did they come from google search or facebook ad?

    * this knowledge is ephemeral and makes sense only within their current session. But a visitor can still do a couple of interactions like browsing different collections of items or clicking on search results, and it can also be taken into account.

    But compared to Amazon and Google, it's you who define which features should be used for the ranking and how long they are stored (see the "ttl" option on all feature extractors in our docs for details).

    For example, here is https://github.com/metarank/metarank/blob/master/src/test/re... the config of features used in the movie recommendations demo - in a most privacy-sensitive setup you can just drop all the "interacted_with" extractors and will get zero private data stored for each visitor.

    7 projects | news.ycombinator.com | 23 Mar 2022
    Right now it runs in a dev-mode on a single EC2 t3.large instance with loadavg ~0.30, but the inference load is quite tiny right now: around 3-4 reranking requests per second. And yes, as a typical open-source project it still crashes from time to time :)

    The training dataset is not that huge (see https://github.com/metarank/ranklens/ for details, it's open-source), so we do a full retraining directly on the node right after the deployment, and it takes around 1 minute to finish. We also run the same process inside the CI: https://github.com/metarank/metarank/blob/master/run_e2e.sh

    There is an option to run this thing in a distributed mode:

    * training is done using a separate batch job running on Apache Flink (and on k8s using flink's integration)

    * feature updates are done in a separate streaming Flink job, writing everything in Redis

    * The API fetches latest feature values from Redis and runs the ML model.

    The dev-mode I've mentioned earlier is when all these three things are bundled together in a single process to make it easier to play with the tool. But we didn't spent much time testing distributed setup, as this thing is still a hobby side-project and we're limited in time spent developing it.

    7 projects | news.ycombinator.com | 23 Mar 2022
    This is actually part of our CI process: https://github.com/metarank/metarank/blob/master/run_e2e.sh . This script runs on every PR to retrain the model used on a demo and confirm that it's working fine.

    So you can just download the jar file from releases page and run ./run_e2e.sh in the checked-out repository, it should do the job.

    7 projects | news.ycombinator.com | 23 Mar 2022

What are some alternatives?

When comparing mmlspark and metarank you can also consider the following projects:

SynapseML - Simple and Distributed Machine Learning

recommenders - Best Practices on Recommendation Systems

isolation-forest - A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Medusa - Building blocks for digital commerce

retentioneering-tools - Retentioneering: product analytics, data-driven CJM optimization, marketing analytics, web analytics, transaction analytics, graph visualization, process mining, and behavioral segmentation in Python. Predictive analytics over clickstream, AB tests, machine learning, and Markov Chain simulations.

feathr - Feathr – A scalable, unified data and AI engineering platform for enterprise

Robyn - Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.

polyaxon - MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle

Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]

eth-phishing-detect - Utility for detecting phishing domains targeting Web3 users

scaladex - The Scala Package Index