What does random.seed([list of int]) do? - python

I know what random.seed(int) does, like below:
random.seed(10)
But I saw a code which uses random.seed([list of int]), like below:
random.seed([1, 2, 1000])
What is the difference between passing a list and int to random.seed ?

The answer is basically in the comments, but putting it together: it appears the code you found imports random from numpy, instead of importing the standard Python random module:
from numpy import random
random.seed([1, 2, 1000])
Not recommended, to avoid exactly the confusion you're running into.
numpy can use a 1d array of integers as a seed (presumably because it uses a different pseudo-random function than Python itself to generate 'random' numbers, which can use a more complex seed), as described in the documentation for numpy.RandomState

Related

How do I globally seed np.random.default_rng for unit tests

The recommended way by numpy to create random numbers is to create a np.random.Generator like this
import numpy as np
def foo():
# Some more complex logic here, this is the top level method that creates the rng
rng = np.random.default_rng()
return rng.random()
Now suppose I am writing tests for my code base, and I need to seed the rng for reproducible results.
Is it possible to tell numpy to use the same seed every time, regardless where default_rng() is called?
This is basically the old behavior of np.random.seed().
The reason I need this is because I have many such tests and would have to mock the default_rng call to use a seed for each of them since in pytest you have to mock at the location where something is being used, not where it is defined. Thus mocking it globally like in this answer does not work.
With the old way, one could define a fixture that sets the seed for each test automatically inside conftest.py like this:
# conftest.py
import pytest
import numpy as np
#pytest.fixture(autouse=True)
def set_random_seed():
# seeds any random state in the tests, regardless where is is defined
np.random.seed(0)
# test_foo.py
def test_foo():
assert np.isclose(foo(), 0.84123412) # That's not the right number, just an example
With the new way of using default_rng, this seems to no longer be possible.
Instead I would need to put a fixture like this in every test module that requires the rng to be seeded.
# inside test_foo.py, but also every other test file
import pytest
from unittest import mock
import numpy as np
#pytest.fixture()
def seed_default_rng():
seeded_rng = np.random.default_rng(seed=0)
with mock.patch("module.containing.foo.np.random.default_rng") as mocked:
mocked.return_value = seeded_rng
yield
def test_foo(seed_default_rng):
assert np.isclose(foo(), 0.84123412)
The best I've come up with is to have a parametrizable fixture in the conftest.py like this
# conftest.py
import pytest
from unittest import mock
import numpy as np
#pytest.fixture
def seed_default_rng(request):
seeded_rng = np.random.default_rng(seed=0)
mock_location = request.node.get_closest_marker("rng_location").args[0]
with mock.patch(f"{mock_location}.np.random.default_rng") as mocked:
mocked.return_value = seeded_rng
yield
This can then be used in each test like so:
# test_foo.py
import pytest
from module.containing.foo import foo
#pytest.mark.rng_location("module.containing.foo")
def test_foo(seed_default_rng):
assert np.isclose(foo(), 0.84123412) # just an example number
It's still not as convenient as before, but you only need to add the mark to each test instead of mocking the default_rng method.
If you want the full numpy API with a guarantee of stable random values across numpy versions, the short answer is - you can't.
You can use a workaround with the np.random.RandomState module, but you sacrifice the use of the current np.random module - there's no good, stable way around this.
Why numpy.random is not stable across versions
As of numpy v1.16, numpy.random.default_rng() constructs a new Generator with the default BitGenerator. But in the description of np.random.Generator, the following guidance is attached:
No Compatibility Guarantee
Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.
Therefore, using np.random.default_rng() will preserve random numbers for the same versions of numpy across platforms, but not across versions.
This has been true since the adoption of NEP 0019: Random number generator policy. See the abstract:
For the past decade, NumPy has had a strict backwards compatibility policy for the number stream of all of its random number distributions. Unlike other numerical components in numpy, which are usually allowed to return different when results when they are modified if they remain correct, we have obligated the random number distributions to always produce the exact same numbers in every version. The objective of our stream-compatibility guarantee was to provide exact reproducibility for simulations across numpy versions in order to promote reproducible research. However, this policy has made it very difficult to enhance any of the distributions with faster or more accurate algorithms. After a decade of experience and improvements in the surrounding ecosystem of scientific software, we believe that there are now better ways to achieve these objectives. We propose relaxing our strict stream-compatibility policy to remove the obstacles that are in the way of accepting contributions to our random number generation capabilities.
Workaround for testing with pytest
A section of the NEP is devoted to Supporting Unit Tests and discusses preserving guaranteed stream compatibility across versions and platforms in the legacy np.random.RandomState module. From the numpy docs on "Legacy Random Generation":
The RandomState provides access to legacy generators. This generator is considered frozen and will have no further improvements. It is guaranteed to produce the same values as the final point release of NumPy v1.16. These all depend on Box-Muller normals or inverse CDF exponentials or gammas. This class should only be used if it is essential to have randoms that are identical to what would have been produced by previous versions of NumPy.
The np.random.RandomState docs provide an example usage, which can be adapted for use with pytest. The important point is that functions making use of np.random.random and other methods must be monkeypatched using a RandomState instance:
Contents of mymod.py
import numpy as np
def myfunc():
return np.random.random(size=3)
Contents of test_mymod.py
import pytest
import numpy as np
from numpy.random import RandomState
from mymod import myfunc
#pytest.fixture(autouse=True)
def mock_random(monkeypatch: pytest.MonkeyPatch):
def stable_random(*args, **kwargs):
rs = RandomState(12345)
return rs.random(*args, **kwargs)
monkeypatch.setattr('numpy.random.random', stable_random)
def test_myfunc():
# this test will work across numpy versions
known_result = np.array([0.929616, 0.316376, 0.183919])
np.testing.assert_allclose(myfunc(), known_result, atol=1e-6)

The difference between np.function and function [duplicate]

This question already has answers here:
Why is "import *" bad?
(12 answers)
How do I import other Python files?
(23 answers)
Closed 5 years ago.
We can import numpy and use its functions directly as:
from numpy import *
a = arraay([1,2,3]) # and it works well.
Why do some people use the following method?
import numpy as np
a= np.array([1,2,3])
The difference is easy: from numpy import * imports all names from the top-level NumPy module into your current "module" (namespace). import numpy as np will just make that top-level NumPy module available if you use np.xxx.
However there is one reason why you shouldn't use from any_module import *: It may just overwrite existing names. For example NumPy has its own any, max, all and min functions, which will happily shadow the built-in Python any, max, ... functions (a very common "gotcha").
My advise: Avoid from numpy import * even if it seems like less effort than typing np. all the time!
It's a matter of neatness but also consistency: you might have multiple functions with the same name from different modules (for instance there's a function called "random" in Numpy, but also in other packages like SciPy) so it's important to denote which exact function you're using from which exact module. This link has a great explanation and makes the point about code readability as well.

Import as statement working differently for different modules?

I am learning Python, and right now I am learning about the import statements in Python. I was testing out some code, and I came across something unusual. Here is the code I was testing.
from math import pow as power
import random as x
print(pow(2, 3))
print(power(2, 3))
print(x.randint(0, 5))
print(random.randint(0, 5))
I learned that in Python, you can reassign the names of modules using as, so I reassigned pow to power. I expected both pow(2, 3) and power(2, 3) to output the exact same stuff because all I did was change the name. However, pow(2, 3) outputs 8, which is an integer, while power(2, 3) outputs 8.0, which is a float. Why is that?
Furthermore, I imported the random module as well, and set its name to be x. In the case of the pow and power, both the old name, pow, and the new name, power, worked. But with this random module, only the new name, x, works, and the old name, random, doesn't work. print(x.randint(0, 5)) works, but random.randint(0, 5) doesn't work. Why is this so?
Can anyone please explain to a Python newbie such as myself why my code is not working the way I expect it to? I am using Python version 3.62, if that helps.
That's because when you import pow from math as power and then you call pow, the pow you are calling is a built-in function, not the pow from the math module.
For random there is no built-in function in python so you only import the x one.
The pow built-in function documentation
when you use pow you are actually using the inbuilt pow function.
but there is no inbuilt function called random thus it does not work
normally in python if you use 'as' you can only use the module what you imported it as not what it was originally called

Type hinting / annotation (PEP 484) for numpy.ndarray

Has anyone implemented type hinting for the specific numpy.ndarray class?
Right now, I'm using typing.Any, but it would be nice to have something more specific.
For instance if the NumPy people added a type alias for their array_like object class. Better yet, implement support at the dtype level, so that other objects would be supported, as well as ufunc.
Update
Check recent numpy versions for a new typing module
https://numpy.org/doc/stable/reference/typing.html#module-numpy.typing
dated answer
It looks like typing module was developed at:
https://github.com/python/typing
The main numpy repository is at
https://github.com/numpy/numpy
Python bugs and commits can be tracked at
http://bugs.python.org/
The usual way of adding a feature is to fork the main repository, develop the feature till it is bomb proof, and then submit a pull request. Obviously at various points in the process you want feedback from other developers. If you can't do the development yourself, then you have to convince someone else that it is a worthwhile project.
cython has a form of annotations, which it uses to generate efficient C code.
You referenced the array-like paragraph in numpy documentation. Note its typing information:
A simple way to find out if the object can be converted to a numpy array using array() is simply to try it interactively and see if it works! (The Python Way).
In other words the numpy developers refuse to be pinned down. They don't, or can't, describe in words what kinds of objects can or cannot be converted to np.ndarray.
In [586]: np.array({'test':1}) # a dictionary
Out[586]: array({'test': 1}, dtype=object)
In [587]: np.array(['one','two']) # a list
Out[587]:
array(['one', 'two'],
dtype='<U3')
In [589]: np.array({'one','two'}) # a set
Out[589]: array({'one', 'two'}, dtype=object)
For your own functions, an annotation like
def foo(x: np.ndarray) -> np.ndarray:
works. Of course if your function ends up calling some numpy function that passes its argument through asanyarray (as many do), such an annotation would be incomplete, since your input could be a list, or np.matrix, etc.
When evaluating this question and answer, pay attention to the date. 484 was a relatively new PEP back then, and code to make use of it for standard Python still in development. But it looks like the links provided are still valid.
Numpy 1.21 includes a numpy.typing module with an NDArray generic type.
From the Numpy 1.21 docs:
numpy.typing.NDArray = numpy.ndarray[typing.Any, numpy.dtype[+ScalarType]]
A generic version of np.ndarray[Any, np.dtype[+ScalarType]].
Can be used during runtime for typing arrays with a given dtype and unspecified shape.
Examples:
>>> import numpy as np
>>> import numpy.typing as npt
>>> print(npt.NDArray)
numpy.ndarray[typing.Any, numpy.dtype[+ScalarType]]
>>> print(npt.NDArray[np.float64])
numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]]
>>> NDArrayInt = npt.NDArray[np.int_]
>>> a: NDArrayInt = np.arange(10)
>>> def func(a: npt.ArrayLike) -> npt.NDArray[Any]:
... return np.array(a)
As of 2022-09-05, support for shapes is still a work in progress per numpy/numpy#16544.
At my company we've been using:
from typing import TypeVar, Generic, Tuple, Union, Optional
import numpy as np
Shape = TypeVar("Shape")
DType = TypeVar("DType")
class Array(np.ndarray, Generic[Shape, DType]):
"""
Use this to type-annotate numpy arrays, e.g.
image: Array['H,W,3', np.uint8]
xy_points: Array['N,2', float]
nd_mask: Array['...', bool]
"""
pass
def compute_l2_norm(arr: Array['N,2', float]) -> Array['N', float]:
return (arr**2).sum(axis=1)**.5
print(compute_l2_norm(arr = np.array([(1, 2), (3, 1.5), (0, 5.5)])))
We actually have a MyPy checker around this that checks that the shapes work out (which we should release at some point). Only thing is it doesn't make PyCharm happy (ie you still get the nasty warning lines):
nptyping adds lots of flexibility for specifying numpy type hints.
What i did was to just define it as
Dict[Tuple[int, int], TYPE]
So for example if you want an array of floats you can do:
a = numpy.empty(shape=[2, 2], dtype=float) # type: Dict[Tuple[int, int], float]
This is of course not exact from a documentation perspective, but for analyzing correct usage and getting proper completion with pyCharm it works great!

Why does built-in sum behave wrongly after "from numpy import *"?

I have some code like:
import math, csv, sys, re, time, datetime, pickle, os, gzip
from numpy import *
x = [1, 2, 3, ... ]
y = sum(x)
The sum of the actual values in x is 2165496761, which is larger than the limit of 32bit integer. The reported y value is -2129470535, implying integer overflow.
Why did this happen? I thought the built-in sum was supposed to use Python's arbitrary-size integers?
See How to restore a builtin that I overwrote by accident? if you've accidentally done something like this at the REPL (interpreter prompt).
Doing from numpy import * causes the built-in sum function to be replaced with numpy.sum:
>>> sum(xrange(10**7))
49999995000000L
>>> from numpy import sum
>>> sum(xrange(10**7)) # assuming a 32-bit platform
-2014260032
To verify that numpy.sum is in use, try to check the type of the result:
>>> sum([721832253, 721832254, 721832254])
-2129470535
>>> type(sum([721832253, 721832254, 721832254]))
<type 'numpy.int32'>
To avoid this problem, don't use star import.
If you must use numpy.sum and want an arbitrary-sized integer result, specify a dtype for the result like so:
>>> sum([721832253, 721832254, 721832254],dtype=object)
2165496761L
or refer to the builtin sum explicitly (possibly giving it a more convenient binding):
>>> __builtins__.sum([721832253, 721832254, 721832254])
2165496761L
The reason why you get this invalid value is that you're using np.sum on a int32. Nothing prevents you from not using a np.int32 but a np.int64 or np.int128 dtype to represent your data. You could for example just use
x.view(np.int64).sum()
On a side note, please make sure that you never use from numpy import *. It's a terrible practice and a habit you must get rid of as soon as possible. When you use the from ... import *, you might be overwriting some Python built-ins which makes it very difficult to debug. Typical example, your overwriting of functions like sum or max...
Python handles large numbers with arbitrary precision:
>>> sum([721832253, 721832254, 721832254])
2165496761
Just sum them up!
To make sure you don't use numpy.sum, try __builtins__.sum() instead.

Categories