Cache invalidation might no longer be a hard thing in Computer Science

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • materialize

    The data warehouse for operational workloads. (by MaterializeInc)

  • Yep, cache invalidation is hard because you can't just cache /users/1/networkstatus as a function of the request itself; its underlying value is a complex function of values fetched from possibly dozens of tables and services, any of which could change at any time without any immediately discernable connection to User 1.

    OP's response to this, in another comment, is: "Now going back to your example with complicated dependencies. Maybe TTL is a better solution. With many dependencies, any changes from the dependency list might trigger invalidation. At some point, just doing TTL, would be simpler."

    But what if you actually do want your dependency list to trigger invalidation?

    https://materialize.com/ is the closest thing I know of to a "solution" for this - it lets you build nested, realtime materialized views on top of streaming and non-streaming data, all with SQL syntax and Postgres wire compatibility, that fully recalculate and recache their results whenever any input changes, no matter how deep (based on https://timelydataflow.github.io/timely-dataflow/ from Microsoft Research).

    You could either use the outputs of such a system as a cache, or weave together a cache invalidation signaling system that sends a stream of keys that need to be evicted from cache at any time, directly into a Kafka topic. And then, of course, this could plug into Meta's cache invalidation consensus system. But it's remarkably disingenuous for Meta to pretend that the dependency side of this problem is trivial or can be solved by TTLs alone.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • We Built a Streaming SQL Engine

    3 projects | news.ycombinator.com | 21 Oct 2023
  • Query Real Time Data in Kafka Using SQL

    7 projects | dev.to | 23 Mar 2023
  • What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

    1 project | /r/dataengineering | 23 Jan 2023
  • How to handle partial updates and bulk updates in the source systems

    1 project | /r/dataengineering | 5 Jan 2023
  • Headless BI with streaming data

    2 projects | dev.to | 22 Sep 2022