The "missing" graph datatype already exists. It was invented in the '70s

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

morel

2 290 8.0 Java

Standard ML interpreter, with relational extensions, implemented in Java

> don't have to have two separate worlds for queries and other logic
That's definitely the dream. Another point along that spectrum (from the author of Apache Calcite): https://github.com/hydromatic/morel

CoinBLAS

3 21 1.8 Jupyter Notebook

Bitcoin blockchain graph analysis with the GraphBLAS.

When you consider that a graph and a matrix are isomorphic, doing vector matrix multiplication takes a vector with a set value, say row 4, and multiplies it by a matrix where row 4 has values present that represent edges to the nodes that are adjacent to it (ie "adjacency" matrix). The result is a vector with the next "step" in a BFS across the graph, do that in a loop and you step across the whole graph.
A cool result of this is, for example, taking an adjacency matrix and squaring it is the "Friend of a Friend" graph. It takes every node/row and multiplies it by itself, returning a matrix that are adjacent to the adjacencies of each node, ie, the friends (adjacencies of the adjacencies) of friends (adjacencies) of the nodes.
Deeper traversal are just higher nodes, a matrix cubed are the friends of the friends of the friends.
A picture is worth a thousand words, see figure 7 of this paper:
https://arxiv.org/pdf/1606.05790.pdf
Also check out figure 8, this shows how incidence matrices can work to represent hyper and multi graphs. An pair of incidence matrices reprsent two graphs, one from nodes to edges and the other from edges to nodes, these are n by m and m by n. When you multiply them, you get a square adjacency matrix that "projects" the incidence into an adjacency. This can be used to collapse hypergraphs into simple graphs that use different semirings to combine the multiple edges.
For some pretty pictures of this kind of stuff, check out CoinBLAS (note I am not a crypto-bro, it was just a very handy extremely large multi-graph that I could easily download in chunks to play with):
https://github.com/Graphegon/CoinBLAS/

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
snakelog

2 9 0.0 Python

A Datalog Framework for Python

You can without too much work transpile datalog to SQL. SQL does have such strong support that it is useful https://github.com/philzook58/snakelog or perhaps just do it manually https://www.philipzucker.com/tiny-sqlite-datalog/

pygraphistry

9 2,055 9.2 Python

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

If you enjoy this kind of thinking, we recently released GFQL for dataframe-native graph querying & compute
Imagine Neo4j Cypher, except no need for a database -- just import it -- and automatically vectorizes for significantly faster CPU+GPU performance. This is fundamentally similar to the kinds of implementations a datalog approach enables. (And indeed one of the alternative interfaces we were considering!)
We've run it on 100M+ edge graphs on some of the cheapest GPUs you can get, and are getting ready for the next rev with aggregate compute: https://github.com/graphistry/pygraphistry/blob/master/demos...

Graphony

1 5 - Python

> Abstractions should be seen as models. They are always wrong, but they are sometimes useful. (And sometimes not.)
George Box was very specifically talking about statistical models when he coined that aphorism. Matrices are linear algebra and graphs are graph theory, I find it hard to think they are not correct and useful models.
> A forward traversal >v of node v enters from the left, reads the sequence stored in the node, and exits from the right. A reverse traversal I'm not an expert in this field but I'm guessing you're talking about De Bruijn graphs, which can be very elegantly modeled with incidence matrices, here's an example of one using the GraphBLAS that downloads data from BioPython, loads it into incidence matrices and graphs it. This is just a simple example, SuiteSparse can handle many billions of edges:
https://github.com/Graphegon/Graphony?tab=readme-ov-file#exa...
Traversing bidirectionally is quite easy, the upper triangle of a matrix are the directed outgoing edges, and the lower triangle are the incoming. This style of "push/pull" optimization is common in the GraphBLAS.
> In a good graph representation, you can do this by maintaining a small state that does not grow significantly with the length of the context or the number of underlying paths.
Again if I understand you correctly, in the GraphBLAS this is accomplished by using accumulators and masks. During traversal data can be accumulated, with a stock operator or one you define, into a vector or matrix, and that object can be used to efficiently mask subsequent computations to avoid unnecessary work or determine when you've reached a termination condition.
> Matrices don't feel like a good abstraction for graphs like this.
Mathematically, graphs and matrices are isomorphic. Regardless of algorithm or storage format like edge lists, tuples or CSR, every graph is a matrix, and vice versa. And if you have a matrix, you have linear algebra to operate on it.
Some people don't like Linear Algebra as a graph abstraction, so I guess for them it is "not good", but on the other hand, it's Linear Algebra and Graph Theory, whose roots date back to the 2nd century BC, forward through great minds like Descartes and Euler, permeating every kind of math, science, physics and engineering discipline humans have ever created. That's a strong argument for its goodness.
Now it is entirely possible, likely even, that the current SuiteSparse implementation doesn't have exactly the tool needed or maybe not the precise best storage format, but these missing pieces do not invalidate the underlying mathematical foundation that it's based on.

gbbs

1 189 4.6 C++

GBBS: Graph Based Benchmark Suite

Joins are great for OLTP workloads but are horrible for all but the simplest graph algorithms. A datalog-based system makes for a nice story, but it would not survive benchmarking.
Pick a classic algorithm, say triangle counting, implement it in Datalog, compare against GBBS [1] and come back here to report results.
[1] https://github.com/ParAlg/gbbs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Graph Data Fits in Memory
1 project | news.ycombinator.com | 15 Apr 2024
Displaying Content as a Graph
1 project | news.ycombinator.com | 1 Jan 2024
NeurIPS 2023 Posters Cluster Visualization
1 project | news.ycombinator.com | 9 Dec 2023
How to pass any first-round interview (even in a terrible talent market)
1 project | news.ycombinator.com | 5 Jul 2023
Don't Bring a Tree to a Mesh Fight
1 project | news.ycombinator.com | 23 Nov 2021

The "missing" graph datatype already exists. It was invented in the '70s

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Graph Datalog Visualization Logic GPU
Post date: 5 Mar 2024

morel

CoinBLAS

InfluxDB

snakelog

pygraphistry

Graphony

gbbs

Related posts

The "missing" graph datatype already exists. It was invented in the '70s

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Graph Datalog Visualization Logic GPU Post date: 5 Mar 2024

morel

CoinBLAS

InfluxDB

snakelog

pygraphistry

Graphony

gbbs

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Graph Datalog Visualization Logic GPU
Post date: 5 Mar 2024