grape
dcai-lab
grape | dcai-lab | |
---|---|---|
3 | 10 | |
482 | 401 | |
2.9% | 3.2% | |
6.4 | 5.4 | |
2 months ago | 4 months ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
grape
- Grape (Graph Representation LeArning, Predictions and Evaluation)
-
Zoomable, animated scatterplots in the browser that scales over a billion points
Ideally, you'd embed the graph into 2 or 3d first, then visualize it as a scatterplot.
Visualizing the edges at scale doesnt yield nice results in general.
The way to do it is to reduce the graph to some 300d or 500d embeddings, then use TSNE/UMAP/PACMAP to reduce that to 3d. Then visualize.
My prefered way is to use some first order embedding method like GGVec in this library [1] (disclaimer I wrote it). Node2Vec and ProNE don't yield great embeddings for visualization (the first is too filamented, the second too close to the unit ball).
Another great library to do this work is GRAPE [2]. Try first-order embedding methods, or short walks on second order methods to avoid the embeddings being too filamented by long random walk sampling.
[1] https://github.com/VHRanger/nodevectors
[2] https://github.com/AnacletoLAB/grape/
-
[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.
For graph embeddings, there's quite a few. I'd recommend this one, but there's also this one (disclaimer: I'm the author) or this one, more of a DGL library.
dcai-lab
-
Resources to learn practical/industry-focused ML (preferably using TensorFlow)?
Data-Centric AI honestly if you've been working on ML pipelines this might be familiar to you
-
Andrew NG, github courses
Another great resource inspired by the Andrew Ng data-centric AI movement is the Introduction to Data-Centric AI course taught this past semester at MIT by PhDs.
-
Good Beginner Courses for ML?
Data-centric AI course. Brand new, taught the 1st time a few months ago by MIT PhD grads. This covers how to ensure good data quality for your models. More data science havy.
-
[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.
Thanks for the kind words! Make sure to check out the current open MIT course if you are just starting out: https://dcai.csail.mit.edu/
-
The Missing Semester of Your CS Education
Introduction to Data-Centric AI https://dcai.csail.mit.edu
- Introduction to Data-Centric AI
-
MIT Introduction to Data-Centric AI
Course homepage | Lecture videos on YouTube | Lab Assignments
What are some alternatives?
deodel - A mixed attributes predictive algorithm implemented in Python.
snorkel - A system for quickly generating training data with weak supervision
refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
deepscatter - Zoomable, animated scatterplots in the browser that scales over a billion points
BotLibre - An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
dgl - Python package built to ease deep learning on graph, on top of existing DL frameworks.
llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
nanocube
chordviz - A convolutional neural network trained using PyTorch to predict the next chord (as tablature) on a guitar based on image data. Includes labeling software for the image data as well as an iOS app for hosting and running the model.