Using PyTorch and NumPy? You're making a mistake

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Pytorch

338 78,016 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

I suppose...
1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.
2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials.
3) To quote the author:
> I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.
^ No actual stats, just some vague hand waving; this just seems like nonsense.
So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.
Oh well, spin it into some page views.

NumPy

272 26,360 10.0 Python

The fundamental package for scientific computing with Python.

I suppose...
1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.
2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials.
3) To quote the author:
> I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.
^ No actual stats, just some vague hand waving; this just seems like nonsense.
So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.
Oh well, spin it into some page views.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
CenterNet

6 7,101 0.0 Python

Object detection, 3D detection, and pose estimation using center point detection:

Yeah, I'd run into this 2 years ago and ended up also reporting an issue on the Centernet repo [1]
The solution I have in that repo adapts from the very helpful discussions in the original Pytorch issue [2]
I will admit that this is *very* easy to mess up as evidenced by the fact that examples in the official tutorials for Pytorch and other well known code-bases suffer from it. In the Pytorch training framework I've developed at work, we've implemented a custom `worker_init_fn` as outlined in [1] that is the default for all "trainer" instances who are responsible for instantiating DataLoaders in 99% of our training runs.
[1] https://github.com/xingyizhou/CenterNet/issues/233
[2] https://github.com/pytorch/pytorch/issues/5059

tensorflow

223 182,456 10.0 C++

An Open Source Machine Learning Framework for Everyone

Note official TensorFlow tutorials make the exact same mistake. I've reported it but it hasn't been fixed. [1]
[1]: https://github.com/tensorflow/tensorflow/issues/47755

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project