pyRDF2Vec VS lightly

Compare pyRDF2Vec vs lightly and see what are their differences.

Our great sponsors
  • Scout APM - Less time debugging, more time building
  • SonarQube - Static code analysis for 29 languages.
  • SaaSHub - Software Alternatives and Reviews
pyRDF2Vec lightly
1 12
154 1,598
6.5% 4.9%
7.3 9.4
20 days ago 6 days ago
Python Python
MIT License MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pyRDF2Vec

Posts with mentions or reviews of pyRDF2Vec. We have used some of these posts to build our list of alternatives and similar projects.
  • [P] pyRDF2Vec 0.2.0 is out!
    1 project | reddit.com/r/MachineLearning | 22 Mar 2021
    This release is packed with many new features and optimizations under the hood. An entire overview of what's new can be found in our CHANGELOG (https://github.com/IBCNServices/pyRDF2Vec/releases/tag/0.2.0). An overview of some major updates:

lightly

Posts with mentions or reviews of lightly. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-01-10.
  • Self-Supervised Models are More Robust and Fair
    1 project | dev.to | 7 Apr 2022
    If you’re interested in self-supervised learning and want to try it out yourself you can check out our open-source repository for self-supervised learning.
  • [D] Can a Siamese Neural Network work for invoice classification?
    1 project | reddit.com/r/MachineLearning | 20 Jan 2022
    I assume that you have an image of the invoice. Then using a framework like https://github.com/lightly-ai/lightly with many implemented algorithms is the way to go. And after that step, with model-producing embeddings, you need to compare the embedding of a query with your known database and check if the distance is below some threshold. Of course, pipeline with checking the closest neighbor can be more complicated but I would start with sth really simple.
  • [P] TensorFlow Similarity now self-supervised training
    2 projects | reddit.com/r/MachineLearning | 10 Jan 2022
    https://github.com/lightly-ai/lightly implements a lot of self supervised models, and had been available for a while.
  • Launch HN: Lightly (YC S21): Label only the data which improves your ML model
    4 projects | news.ycombinator.com | 9 Aug 2021
    Hi HackerNews! We’re Matt and Igor from Lightly (https://www.lightly.ai/). Most companies that do machine learning at scale label only 1% of their data because it's too expensive to label all of it. We built Lightly to help companies pick the most valuable 1% to be labeled.

    If you wonder what data labeling looks like for images then think about these captchas that want you to tag images in the web containing objects such as a bus or person. When we were working on training machine learning (ML) models from scratch, we often had to do this labeling ourselves. But there was always far too much data for us to be able to label all of it. We talked with more than 250 ML teams ranging from small groups of 2-3 people to large teams at Apple and Google, and they all face the same problem: they have too much data to label.

    Not only that, but there wouldn’t be a lot of value in labeling everything. For example, if you have billions of images, it's a waste of time to get humans to label every one of them, because most of those labels wouldn't add useful information to the model you’re hoping to train. Most of the images are probably similar enough to other images that have already been labeled and they have nothing new to tell your model. Spending more labeling effort on those would be a bit like labeling the same image over and over again—quite wasteful.

    As soon as your ML model surpasses the initial prototype stage, you’re most interested in the edge cases in your dataset — the ones that represent rare events. For example, a few days ago, there was a Twitter thread about failure cases for Tesla vehicles. One Tesla has mistaken a yellow moon for a yellow traffic light: https://twitter.com/JordanTeslaTech/status/14184133078625853.... Another edge case is a truck full of traffic lights: https://twitter.com/haltakov/status/1400797882891091970. Finding and labeling such rare cases is key to having a robust system that will work in difficult situations.

    Rather than labeling everything, a better approach is to first discard all the redundant images and keep only the ones that it's worth spending time/money to label. Let's call those "interesting" images. If you could spend labeling effort only on the "interesting" images, you'd get the same value for a fraction of the cost.

    Many ML companies in a more advanced stage have had to tackle this problem. One approach is to pay people to go through the images and discard the "boring" (nothing-new-to-tell-me) images, leaving the "interesting" (worth-spending-resources-to-label) ones. That can save you money if it's on average cheaper to answer the question "boring or interesting?" about an image than it is to label it. This solution scales as long as you have an increasing human labeling workforce every year. However, ML data doubles every year on average, and therefore the labeling capacity would need to double too.

    Much better than that — the holy grail — would be for a computer to do the work of discarding the "boring" images. Compared to paying humans to do it, you'd get the "interesting" subset of your billion images almost for free. You would have much less work to do (or money to spend) on labeling, and you'd get just as good a model after training. You could split the savings with whoever knew how to make a computer do this for you, and you'd both come out ahead. That’s basically our intention with Lightly.

    My co-founder Matt and I worked on many machine learning projects ourselves, where we also had to manage tooling and annotation budgets. Dealing with data in a production environment is different from academia. In academia, we have well-balanced and manually curated datasets. It is, as some of you know, a huge pain. The solution of the problem boils down to working with unlabeled data.

    Luckily, in recent years, a new subfield of deep learning has emerged called self-supervised learning. It’s a technique to train models to understand data without any labels. In natural language processing (NLP), modern models like BERT or GPT all rely on it. In computer vision, we have had a similar breakthrough in the last year with models such as SimCLR or MoCo. Back in 2020, we started experimenting with self-supervised learning to better understand unlabeled data and improve our software. However, there was no easy-to-use framework available to work with the latest models. To solve that problem, we built our own framework to make the power of self-supervised learning easily accessible. Since we want to foster research in this domain and grow a bigger community around this topic we decided to open-source the framework in fall 2020 (https://github.com/lightly-ai/lightly). It is now used by universities and research labs all over the world.

    4 projects | news.ycombinator.com | 9 Aug 2021
    modAL indeed has a similar goal of choosing the best subset of data to be labeled. However it has some notable differences:

    modAL is built on scikit-learn which is also evident from the suggested workflow. Lightly on the other hand was specifically built for deep learning applications supporting active learning for classification but also object detection and semantic segmentation.

    modAL provides uncertainty-based active learning. However, it has been shown that uncertainty-based AL fails at batch-wise AL for vision datasets and CNNs, see https://arxiv.org/abs/1708.00489. Furthermore it only works with an initially trained model and thus labeled dataset. Lightly offers self-supervised learning to learn high dimensional embeddings through its open-source package https://github.com/lightly-ai/lightly. They can be used through our API to choose a diverse subset. Optionally, this sampling can be combined with uncertainty-based AL.

  • Lightly – A Python library for self-supervised learning on images
    1 project | news.ycombinator.com | 7 Jun 2021
    1 project | news.ycombinator.com | 23 Mar 2021
  • Active Learning using Detectron2
    2 projects | dev.to | 30 May 2021
    You can easily train, embed, and upload a dataset using the lightly Python package.  First, we need to install the package. We recommend using pip for this. Make sure you're in a Python3.6+ environment. If you're on Windows you should create a conda environment.
  • lightly-ai/lightly Lightly is a computer vision framework for self-supervised learning.
    1 project | reddit.com/r/Python | 24 Mar 2021
  • [P] Release of lightly 1.1.3 - A python library for self-supervised learning
    2 projects | reddit.com/r/MachineLearning | 23 Mar 2021
    We just released a new version of lightly (https://github.com/lightly-ai/lightly) and after the valuable feedback from this subreddit, we thought some of you might be interested in the updates.

What are some alternatives?

When comparing pyRDF2Vec and lightly you can also consider the following projects:

pytorch-metric-learning - The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

simsiam-cifar10 - Code to train the SimSiam model on cifar10 using PyTorch

byol - Implementation of the BYOL paper.

dino - PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Transformer-SSL - This is an official implementation for "Self-Supervised Learning with Swin Transformers".

comma10k - 10k crowdsourced images for training segnets

DataProfiler - What's in your data? Extract schema, statistics and entities from datasets

modAL - A modular active learning framework for Python

Ne2Ne-Image-Denoising - Deep Unsupervised Image Denoising, based on Neighbour2Neighbour training

gensim - Topic Modelling for Humans

byol-pytorch - Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

PASS - The PASS dataset: pretrained models and how to get the data