numpy-groupies
awkward
numpy-groupies | awkward | |
---|---|---|
1 | 4 | |
191 | 793 | |
- | 0.6% | |
5.9 | 9.6 | |
15 days ago | 1 day ago | |
Python | Python | |
BSD 2-clause "Simplified" License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
numpy-groupies
-
From Python to NumPy
I have had the same experience and instead of Pandas have been using numpy-groupies to handle aggregate/groupby operations. It's quite performant and feels a bit cleaner to use than importing pandas for a couple operations.
https://github.com/ml31415/numpy-groupies
awkward
-
Efficient Jagged Arrays
there's a whole ecosystem in Python originally developed for high energy physics data processing: https://github.com/scikit-hep/awkward all because Numpy demands square N-dimensional array
Same technique used everywhere, here's a simple Julia pkg for the same thing: https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/3a6f5b...
But Julia at least has the decency to just support ragged Vector{Vector} out of the box, and it's not that slow
-
The hand-picked selection of the best Python libraries released in 2021
Awkward Array.
-
Awkward: Nested, jagged, differentiable, mixed type, GPU-enabled, JIT'd NumPy
Numba's @vectorize decorator (https://numba.pydata.org/numba-doc/latest/user/vectorize.htm...) makes a ufunc, and Awkward Array knows how to implicitly map ufuncs. (It is necessary to specify the signature in the @vectorize argument; otherwise, it won't be a true ufunc and Awkward won't recognize it.)
When Numba's JIT encounters a ctypes function, it goes to the ABI source and inserts a function pointer in the LLVM IR that it's generating. Unfortunately, that means that there is function-pointer indirection on each call, and whether that matters depends on how long-running the function is. If you mean that your assembly function is 0.1 ns per call or something, then yes, that function-pointer indirection is going to be the bottleneck. If you mean that your assembly function is 1 μs per call and that's fast, given what it does, then I think it would be alright.
If you need to remove the function-pointer indirection and still run on Awkward Arrays, there are other things we can do, but they're more involved. Ping me in a GitHub Issue or Discussion on https://github.com/scikit-hep/awkward-1.0
What are some alternatives?
ruck - 🧬 Modularised Evolutionary Algorithms For Python with Optional JIT and Multiprocessing (Ray) support. Inspired by PyTorch Lightning
sqlmodel - SQL databases in Python, designed for simplicity, compatibility, and robustness.
gonum - Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
DearPyGui - Dear PyGui: A fast and powerful Graphical User Interface Toolkit for Python with minimal dependencies
hep - hep is the mono repository holding all of go-hep.org/x/hep packages and tools
uproot5 - ROOT I/O in pure Python and NumPy.
python-performance-playground - Performance analysis of Python snippets for scientific computing
django-ninja - 💨 Fast, Async-ready, Openapi, type hints based framework for building APIs
numba-dpex - Data Parallel Extension for Numba
skweak - skweak: A software toolkit for weak supervision applied to NLP tasks
AugLy - A data augmentations library for audio, image, text, and video.
dpbench - Benchmark suite to evaluate Data Parallel Extensions for Python