AugLy
awkward
Our great sponsors
AugLy | awkward | |
---|---|---|
14 | 4 | |
4,898 | 792 | |
0.5% | 2.3% | |
6.0 | 9.6 | |
28 days ago | 7 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
AugLy
-
Meta's A.I. exodus: Top talent quits as lab tries to keep pace with rivals
Their recent effort to generate training data for spotting stuff that includes unsanctioned narratives comes to mind. https://github.com/facebookresearch/AugLy
-
Next steps for after classification
Data augmentation is usually helpful: https://github.com/facebookresearch/AugLy
-
The hand-picked selection of the best Python libraries released in 2021
AugLy.
- Prefer volume or quality for BERT-based Text classification model
- Augly - An augmentation library for audio, image, video, and text from facebook
- [D] What's the best method to generate synthetic data for an image with text? Small dataset
- AugLy is opensourse now.
- Facebook is open-sourcing AugLy, a library that uses data augmentations to evaluate and improve ML models
-
Integration test: Complexity of privacy-preserving bird call bio-sensor for distributed ecological monitoring?
Some of the technologies which could be integrated include differential privacy, distributed online machine learning, misinformation resilience and multi-party computation, all within the context of smart contracts and bioinformatics.
-
[N] Facebook AI Open Sources AugLy: A New Python Library For Data Augmentation To Develop Robust Machine Learning Models
Facebook Blog: https://ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models/
awkward
-
Efficient Jagged Arrays
there's a whole ecosystem in Python originally developed for high energy physics data processing: https://github.com/scikit-hep/awkward all because Numpy demands square N-dimensional array
Same technique used everywhere, here's a simple Julia pkg for the same thing: https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/3a6f5b...
But Julia at least has the decency to just support ragged Vector{Vector} out of the box, and it's not that slow
-
The hand-picked selection of the best Python libraries released in 2021
Awkward Array.
-
Awkward: Nested, jagged, differentiable, mixed type, GPU-enabled, JIT'd NumPy
Numba's @vectorize decorator (https://numba.pydata.org/numba-doc/latest/user/vectorize.htm...) makes a ufunc, and Awkward Array knows how to implicitly map ufuncs. (It is necessary to specify the signature in the @vectorize argument; otherwise, it won't be a true ufunc and Awkward won't recognize it.)
When Numba's JIT encounters a ctypes function, it goes to the ABI source and inserts a function pointer in the LLVM IR that it's generating. Unfortunately, that means that there is function-pointer indirection on each call, and whether that matters depends on how long-running the function is. If you mean that your assembly function is 0.1 ns per call or something, then yes, that function-pointer indirection is going to be the bottleneck. If you mean that your assembly function is 1 μs per call and that's fast, given what it does, then I think it would be alright.
If you need to remove the function-pointer indirection and still run on Awkward Arrays, there are other things we can do, but they're more involved. Ping me in a GitHub Issue or Discussion on https://github.com/scikit-hep/awkward-1.0
What are some alternatives?
imgaug - Image augmentation for machine learning experiments.
sqlmodel - SQL databases in Python, designed for simplicity, compatibility, and robustness.
speechbrain - A PyTorch-based Speech Toolkit
DearPyGui - Dear PyGui: A fast and powerful Graphical User Interface Toolkit for Python with minimal dependencies
PySyft - Perform data science on data that remains in someone else's server
uproot5 - ROOT I/O in pure Python and NumPy.
BlenderProc - A procedural Blender pipeline for photorealistic training image generation
django-ninja - 💨 Fast, Async-ready, Openapi, type hints based framework for building APIs
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
numba-dpex - Data Parallel Extension for Numba
evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
skweak - skweak: A software toolkit for weak supervision applied to NLP tasks