[D] Seeking Advice - For graph ML, Neo4j or nah?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Memgraph

    Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.

  • I think building your graph database/structure can be quite an engineering and time-consuming challenge, as you mentioned, which I would personally avoid. I believe there are some solutions out there that may help you. There is one open source solution for the requirements and concerns you are mentioning. It checks out most of the things you need, functionality, efficiency, and custom low-level optimizations and it is not bulky as the Neo4j Java backend. In essence, we have built Memgraph an in-memory graph database written in C++. The distinctive key feature of DB is that all the data is stored in RAM for fast queries. There is some cool stuff with ML for graphs. Take a look at this blog post about node embedding and recommendation engines, it is native integration with Python and uses PyTorch. There is also the MAGE library for graph algorithms and ML, it is also open-sourced, which is great news for customization and expansions. I share your thoughts on OpenCypher, as being an issue. Memgraph has an object graph mapper (similar to ORM), called GQLAlchemy, and is in Python. There is also a learning curve, but not a different new skill as Cypher. The good thing is allowed various features for graphs manipulation via Python. There are also some other solutions such TigerGraph, Nebula, etc. But I am not very familiar with them. Feel free to explore. I hope this helps! 😁

  • mage

    MAGE - Memgraph Advanced Graph Extensions :crystal_ball: (by memgraph)

  • I think building your graph database/structure can be quite an engineering and time-consuming challenge, as you mentioned, which I would personally avoid. I believe there are some solutions out there that may help you. There is one open source solution for the requirements and concerns you are mentioning. It checks out most of the things you need, functionality, efficiency, and custom low-level optimizations and it is not bulky as the Neo4j Java backend. In essence, we have built Memgraph an in-memory graph database written in C++. The distinctive key feature of DB is that all the data is stored in RAM for fast queries. There is some cool stuff with ML for graphs. Take a look at this blog post about node embedding and recommendation engines, it is native integration with Python and uses PyTorch. There is also the MAGE library for graph algorithms and ML, it is also open-sourced, which is great news for customization and expansions. I share your thoughts on OpenCypher, as being an issue. Memgraph has an object graph mapper (similar to ORM), called GQLAlchemy, and is in Python. There is also a learning curve, but not a different new skill as Cypher. The good thing is allowed various features for graphs manipulation via Python. There are also some other solutions such TigerGraph, Nebula, etc. But I am not very familiar with them. Feel free to explore. I hope this helps! 😁

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • gqlalchemy

    GQLAlchemy is a library developed with the purpose of assisting in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder.

  • I think building your graph database/structure can be quite an engineering and time-consuming challenge, as you mentioned, which I would personally avoid. I believe there are some solutions out there that may help you. There is one open source solution for the requirements and concerns you are mentioning. It checks out most of the things you need, functionality, efficiency, and custom low-level optimizations and it is not bulky as the Neo4j Java backend. In essence, we have built Memgraph an in-memory graph database written in C++. The distinctive key feature of DB is that all the data is stored in RAM for fast queries. There is some cool stuff with ML for graphs. Take a look at this blog post about node embedding and recommendation engines, it is native integration with Python and uses PyTorch. There is also the MAGE library for graph algorithms and ML, it is also open-sourced, which is great news for customization and expansions. I share your thoughts on OpenCypher, as being an issue. Memgraph has an object graph mapper (similar to ORM), called GQLAlchemy, and is in Python. There is also a learning curve, but not a different new skill as Cypher. The good thing is allowed various features for graphs manipulation via Python. There are also some other solutions such TigerGraph, Nebula, etc. But I am not very familiar with them. Feel free to explore. I hope this helps! 😁

  • cugraph

    cuGraph - RAPIDS Graph Analytics Library

  • I feel like you would need to develop a custom solution which might in part store data in Neo4j but you will have to figure out how to efficiently pull the data you need to train your GNNs; and I think this tends to be the bottleneck since Graph DBs are not optimised for the kinds of queries you need for GNNs. For what it's worth, I wouldn't really bother with implementing a custom graph data structure (unless I was really keen) as there are some good implementations out there. Have you looked at cuGraph for example?

  • graph-data-science

    Source code for the Neo4j Graph Data Science library of graph algorithms.

  • Neo4j is OSS - https://github.com/neo4j/graph-data-science - if you want to modify any of the algorithms, you can - and there's also the pregel API which offers algo building blocks (simplifying implementation)

  • demo-news-recommendation

    Discontinued Exploring News Recommendation With Neo4j GDS

  • Rather than keep listing the virtues of Neo (there are many!), I'd recommend checking out some examples and getting a feel for how it works yourself: we have a great repo with lots of different use cases (https://github.com/neo4j-product-examples), but I'd take a look at the recommendation example (blog, code) as a good first step for embeddings on a knowledge graph.

  • Neo4j.rb

    An active model wrapper for the Neo4j Graph Database for Ruby.

  • The native Python client does add additional overhead when training GNN models. I have also found older reported issues regarding performance hits with neo4j's python driver. Now, these issues may not be strictly pertinent to our current use-case, but they reflect an underlying concern: Performance degradation when interaction between neo4j and native python is considered.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts