skweak vs AugLy

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

skweak		AugLy
	Project
8	Mentions	14
909	Stars	4,899
0.2%	Growth	0.5%
6.2	Activity	6.0
6 months ago	Latest Commit	about 1 month ago
Python	Language	Python
MIT License	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

skweak

Posts with mentions or reviews of skweak. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-07.

Entity Extraction with Predefined List
2 projects | /r/LanguageTechnology | 7 Jan 2023

Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
[P] Programmatic: Powerful Weak Labeling
2 projects | /r/MachineLearning | 20 Apr 2022

Code for https://arxiv.org/abs/2104.09683 found: https://github.com/NorskRegnesentral/skweak
Show HN: Programmatic – a REPL for creating labeled data
1 project | news.ycombinator.com | 8 Apr 2022

Hi Raza here, one of the other co-founders.
I know that HN likes to nerd out over technical details so thought I’d share a bit more on how we aggregate the noisy labels to clean them up.
At the moment we use the great Skweak [1] open source library to do this. Skweak uses an HMM to infer the most likely unobserved label given the evidence of the votes from each of the labelling functions.
This whole strategy of first training a label model and then training a neural net was pioneered by Snorkel. We’ve used this approach for now but we actually think there are big opportunities for improvement.
We’re working on an end-to-end approach that de-noises the labelling function and trains the model at the same time. So far we’ve seen improvements on the standard benchmarks [2] and are planning to submit to Neurips.
R
[1]: Skweak package: https://github.com/NorskRegnesentral/skweak
The hand-picked selection of the best Python libraries released in 2021
12 projects | /r/Python | 21 Dec 2021

skweak.
Skweak: Weak Supervision for NLP
1 project | news.ycombinator.com | 22 Aug 2021
Inevitable Manual Work involved in NLP
1 project | /r/LanguageTechnology | 4 May 2021

For more advanced unsupervised labeling, you should check skweak
How to get Training data for NER?
2 projects | /r/LanguageTechnology | 24 Apr 2021

I'm the main developer behind skweak by the way, happy to hear you're interested in our toolkit :-) We do already have a small list of products (see https://github.com/NorskRegnesentral/skweak/blob/main/data/products.json) extracted from DBPedia and Wikidata, but it may not be exactly the type of products you're looking for.

AugLy

Posts with mentions or reviews of AugLy. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-12-21.

Meta's A.I. exodus: Top talent quits as lab tries to keep pace with rivals
1 project | news.ycombinator.com | 1 Apr 2022

Their recent effort to generate training data for spotting stuff that includes unsanctioned narratives comes to mind. https://github.com/facebookresearch/AugLy
Next steps for after classification
1 project | /r/LanguageTechnology | 1 Jan 2022

Data augmentation is usually helpful: https://github.com/facebookresearch/AugLy
The hand-picked selection of the best Python libraries released in 2021
12 projects | /r/Python | 21 Dec 2021

AugLy.
Prefer volume or quality for BERT-based Text classification model
2 projects | /r/LanguageTechnology | 13 Dec 2021
Augly - An augmentation library for audio, image, video, and text from facebook
1 project | /r/DataCentricAI | 6 Dec 2021
[D] What's the best method to generate synthetic data for an image with text? Small dataset
3 projects | /r/MachineLearning | 13 Aug 2021
AugLy is opensourse now.
1 project | /r/technews | 28 Jun 2021
Facebook is open-sourcing AugLy, a library that uses data augmentations to evaluate and improve ML models
1 project | /r/neuralnetworks | 23 Jun 2021
Integration test: Complexity of privacy-preserving bird call bio-sensor for distributed ecological monitoring?
5 projects | /r/SingularityNet | 23 Jun 2021

Some of the technologies which could be integrated include differential privacy, distributed online machine learning, misinformation resilience and multi-party computation, all within the context of smart contracts and bioinformatics.
[N] Facebook AI Open Sources AugLy: A New Python Library For Data Augmentation To Develop Robust Machine Learning Models
5 projects | /r/MachineLearning | 19 Jun 2021

Facebook Blog: https://ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models/

What are some alternatives?

When comparing skweak and AugLy you can also consider the following projects:

snorkel - A system for quickly generating training data with weak supervision

imgaug - Image augmentation for machine learning experiments.

argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

speechbrain - A PyTorch-based Speech Toolkit

DearPy3D - Dear PyGui 3D Engine (prototyping)

PySyft - Perform data science on data that remains in someone else's server

snorkel - A system for quickly generating training data with weak supervision [Moved to: https://github.com/snorkel-team/snorkel]

BlenderProc - A procedural Blender pipeline for photorealistic training image generation

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]

gradio - Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b