Vectorizing Graph Neural Networks

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

nodevectors

8 497 0.0 Python

Fastest network node embeddings in the west

Yes, people working on graph based ML realize quickly that the underlying data structures most originally academic libraries (networkX, PyG, etc.) use are bad.
I wrote about this before [1] and based a node embedding library around the concept [2].
The NetworkX style graphs are laid out as a bunch of items in a heap with pointers to each other. That works at extreme scales, because everything is on a cluster's RAM and you don't mind paying the latency costs of fetch operations. But it makes little sense for graphs with < 5B nodes to be honest.
Laying out the graph as a CSR sparse matrix makes way more sense because of data locality. At larger scales, you could just leave the CSR array data on NVMe drives, and you'd still operate at 500mb/s random query throughput with hand coded access, ~150mb/s with mmap. That remains to be implemented by someone.
[1] https://www.singlelunch.com/2019/08/01/700x-faster-node2vec-...
[2] https://github.com/VHRanger/nodevectors

duckdb-pgq

1 32 0.0 C++

DuckDB is an in-process SQL OLAP Database Management System

You might be interested in duckdb-pgq[1], working on implementing graph queries support in duckdb. There are some papers online about it as well if you are interested.
1: https://github.com/cwida/duckdb-pgq

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
GCGT

1 7 10.0 Cuda

Source code for the paper: GPU-based Compressed Graph Traversal

> I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory
The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.
> For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices
Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).

efg

2 14 4.7 C

GPU based Compressed Graph Traversal

> I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory
The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.
> For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices
Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project