Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Yes, people working on graph based ML realize quickly that the underlying data structures most originally academic libraries (networkX, PyG, etc.) use are bad.
I wrote about this before [1] and based a node embedding library around the concept [2].
The NetworkX style graphs are laid out as a bunch of items in a heap with pointers to each other. That works at extreme scales, because everything is on a cluster's RAM and you don't mind paying the latency costs of fetch operations. But it makes little sense for graphs with < 5B nodes to be honest.
Laying out the graph as a CSR sparse matrix makes way more sense because of data locality. At larger scales, you could just leave the CSR array data on NVMe drives, and you'd still operate at 500mb/s random query throughput with hand coded access, ~150mb/s with mmap. That remains to be implemented by someone.
[1] https://www.singlelunch.com/2019/08/01/700x-faster-node2vec-...
[2] https://github.com/VHRanger/nodevectors
You might be interested in duckdb-pgq[1], working on implementing graph queries support in duckdb. There are some papers online about it as well if you are interested.
1: https://github.com/cwida/duckdb-pgq
> I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory
The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.
> For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices
Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).
> I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory
The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.
> For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices
Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).
Related posts
- DCompute: Native execution of D on GPUs and other Accelerators
- Ask HN: Best way to learn GPU programming?
- CuGraph – GPU-accelerated graph analytics
- An efficient C++17 GPU numerical computing library with Python-like syntax
- MatX: Efficient C++17 GPU numerical computing library with Python-like syntax