Vectorizing Graph Neural Networks

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • nodevectors

    Fastest network node embeddings in the west

  • Yes, people working on graph based ML realize quickly that the underlying data structures most originally academic libraries (networkX, PyG, etc.) use are bad.

    I wrote about this before [1] and based a node embedding library around the concept [2].

    The NetworkX style graphs are laid out as a bunch of items in a heap with pointers to each other. That works at extreme scales, because everything is on a cluster's RAM and you don't mind paying the latency costs of fetch operations. But it makes little sense for graphs with < 5B nodes to be honest.

    Laying out the graph as a CSR sparse matrix makes way more sense because of data locality. At larger scales, you could just leave the CSR array data on NVMe drives, and you'd still operate at 500mb/s random query throughput with hand coded access, ~150mb/s with mmap. That remains to be implemented by someone.

    [1] https://www.singlelunch.com/2019/08/01/700x-faster-node2vec-...

    [2] https://github.com/VHRanger/nodevectors

  • duckdb-pgq

    DuckDB is an in-process SQL OLAP Database Management System

  • You might be interested in duckdb-pgq[1], working on implementing graph queries support in duckdb. There are some papers online about it as well if you are interested.

    1: https://github.com/cwida/duckdb-pgq

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • GCGT

    Source code for the paper: GPU-based Compressed Graph Traversal

  • > I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory

    The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.

    > For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices

    Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).

  • efg

    GPU based Compressed Graph Traversal

  • > I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory

    The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.

    > For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices

    Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts