Cache invalidation might no longer be a hard thing in Computer Science

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

materialize

117 5,580 10.0 Rust

The data warehouse for operational workloads. (by MaterializeInc)

Yep, cache invalidation is hard because you can't just cache /users/1/networkstatus as a function of the request itself; its underlying value is a complex function of values fetched from possibly dozens of tables and services, any of which could change at any time without any immediately discernable connection to User 1.
OP's response to this, in another comment, is: "Now going back to your example with complicated dependencies. Maybe TTL is a better solution. With many dependencies, any changes from the dependency list might trigger invalidation. At some point, just doing TTL, would be simpler."
But what if you actually do want your dependency list to trigger invalidation?
https://materialize.com/ is the closest thing I know of to a "solution" for this - it lets you build nested, realtime materialized views on top of streaming and non-streaming data, all with SQL syntax and Postgres wire compatibility, that fully recalculate and recache their results whenever any input changes, no matter how deep (based on https://timelydataflow.github.io/timely-dataflow/ from Microsoft Research).
You could either use the outputs of such a system as a cache, or weave together a cache invalidation signaling system that sends a stream of keys that need to be evicted from cache at any time, directly into a Kafka topic. And then, of course, this could plug into Meta's cache invalidation consensus system. But it's remarkably disingenuous for Meta to pretend that the dependency side of this problem is trivial or can be solved by TTLs alone.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

We Built a Streaming SQL Engine

3 projects | news.ycombinator.com | 21 Oct 2023
Query Real Time Data in Kafka Using SQL

7 projects | dev.to | 23 Mar 2023
What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

1 project | /r/dataengineering | 23 Jan 2023
How to handle partial updates and bulk updates in the source systems

1 project | /r/dataengineering | 5 Jan 2023
Headless BI with streaming data

2 projects | dev.to | 22 Sep 2022

Cache invalidation might no longer be a hard thing in Computer Science

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Rust Database SQL Streaming Kafka
Post date: 8 Jun 2022

materialize

InfluxDB

Related posts

We Built a Streaming SQL Engine

Query Real Time Data in Kafka Using SQL

What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

How to handle partial updates and bulk updates in the source systems

Headless BI with streaming data

Cache invalidation might no longer be a hard thing in Computer Science

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Rust Database SQL Streaming Kafka Post date: 8 Jun 2022

materialize

InfluxDB

Related posts

We Built a Streaming SQL Engine

Query Real Time Data in Kafka Using SQL

What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

How to handle partial updates and bulk updates in the source systems

Headless BI with streaming data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Rust Database SQL Streaming Kafka
Post date: 8 Jun 2022