Show HN: MetricFlow – open-source metric framework

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • metricflow

    MetricFlow allows you to define, build, and maintain metrics in code.

  • Three things:

    First, MetricFlow does not currently support MySQL. We launched with support for BigQuery, Redshift, and Snowflake. I have opened an issue to add support for MySQL (and similar issues for other SQL engines are coming): https://github.com/transform-data/metricflow/issues/27

    Second, what we call a data source is more similar to a table in a database, rather than the underlying database service itself. Metricflow itself is useful when you're using a single SQL engine - indeed, that's all we support today - but it is most useful when you're in a world where joins are a thing. That said, if you have one big data table you might still find it useful to have declarative metric definitions defined in Metricflow. Suppose, for example, you had a big NoSQL style table filled with JSON objects. You might define a few data sources that normalize those JSON objects into top level elements (identifiers, dimensions, aggregated measures) using the sql_query data source config attribute, and then that'd allow you to support structured queries on the data consumption end while pushing unstructured blobs from your application layer. This will be slow at query time, and only as reliable as the level of discipline exerted in your application development workflow, but it's possible.

    Third, if we did support MySQL you'd basically connect to it via standard connection parameters - we have a config file where you can store the required information and then we'll manage the connections for you. However, I'm not familiar with uxwizz, and a quick perusal of their documentation did not turn up how one goes about connecting to the underlying DB. It's likely I just missed this, but at any rate I don't know how it is done. If they don't support standard MySQL client connections you'd need to write an adapter of some kind against whatever DB connection APIs they provide, in which case you'd likely need to roll a custom implementation of MetricFlow's SqlClient interface and initialize the MetricFlowEngine with that.

  • dbt_metrics

    Macros for calculating metrics

  • If you’re interested, the longer version:

    Semantics

    MetricFlow has a less configuration relative to these other frameworks. We accomplish this by choosing abstractions that allow us to handle more on our side at query time through the DataFlow Plan builder. Working with the SQL constructions as a dataflow enables extensions such as non-dw data sources, or using other languages(Python) for some transformations.

    The dbt spec is relatively new and requires a few extremely unDRY expressions. The most obvious is the lack of support for joins which means you simply won’t be able to answer most questions unless you build huge tables. There are a few other issues with the abstractions. For example, dimensions are defined multiple times across metrics. A few folks posted more about these challenges in their Github Issue but they’re sticking to their spec. I’m skeptical it will work at any scale.

    The Cube concept is similar to Explores in Looker. They’re limiting because you end up with a bunch of representations of small domains within the warehouse and the moment you hit the edge of that domain you need to add a new Cube/Explore. This is not DRY and it’s frustrating. There is also no first-class object for Metrics which means you’re limited to to relatively simple metric types.

    Performance

    MetricFlow has the flexibility of the DataFlow Plan Builder and builds quite efficient queries. The Materialization feature allows you to build roll up tables programmatically to the data warehouse which could then be used as a low-latency serving layer.

    dbt is a jinja macro and generates a static query per metric requested: [https://github.com/dbt-labs/dbt_metrics/blob/main/macros/get.... This macro will be quite hard to optimize for more complicated metric types. We struggled a ton with this before refactoring our framework to allow the manipulation and optimizations of these DataFlow Plans.

    Cube is pretty slick on caching, but I know less about their query optimizations. They have some awesome pre-aggregation and caching features. I think this comes from their background in serving frontend interfaces.

    Interfaces

    MetricFlow supports a Python SDK and our CLI, today. Transform has a few more interfaces (SQL over JDBC, GraphQL, React) that sit outside the scope of this OSS project.

    dbt only builds a query in the dbt context today. TBD what the dbt server does but I imagine it will expose a JDBC for paying customers.

    Cube seems more focused on building custom data applications but has recently pivoted to the analytics front. I haven’t seen those interfaces in action but I’m curious to learn more there.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts