Using PyTorch and NumPy? You're making a mistake

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • I suppose...

    1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.

    2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials.

    3) To quote the author:

    > I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.

    ^ No actual stats, just some vague hand waving; this just seems like nonsense.

    So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.

    Oh well, spin it into some page views.

  • NumPy

    The fundamental package for scientific computing with Python.

  • I suppose...

    1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.

    2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials.

    3) To quote the author:

    > I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.

    ^ No actual stats, just some vague hand waving; this just seems like nonsense.

    So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.

    Oh well, spin it into some page views.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • CenterNet

    Object detection, 3D detection, and pose estimation using center point detection:

  • Yeah, I'd run into this 2 years ago and ended up also reporting an issue on the Centernet repo [1]

    The solution I have in that repo adapts from the very helpful discussions in the original Pytorch issue [2]

    I will admit that this is *very* easy to mess up as evidenced by the fact that examples in the official tutorials for Pytorch and other well known code-bases suffer from it. In the Pytorch training framework I've developed at work, we've implemented a custom `worker_init_fn` as outlined in [1] that is the default for all "trainer" instances who are responsible for instantiating DataLoaders in 99% of our training runs.

    [1] https://github.com/xingyizhou/CenterNet/issues/233

    [2] https://github.com/pytorch/pytorch/issues/5059

  • tensorflow

    An Open Source Machine Learning Framework for Everyone

  • Note official TensorFlow tutorials make the exact same mistake. I've reported it but it hasn't been fixed. [1]

    [1]: https://github.com/tensorflow/tensorflow/issues/47755

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts