How to access numpy default global random number generator - python

I need to create a class which takes in a random number generator (i.e. a numpy.random.RandomState object) as a parameter. In the case this argument is not specified, I would like to assign it to the random generator that numpy uses when we run numpy.random.<random-method>. How do I access this global generator? Currently I am doing this by just assigning the module object as the random generator (since they share methods / duck typing). However this causes issues when pickling (unable to pickle module object) and deep-copying. I would like to use the RandomState object behind numpy.random
PS: I'm using python-3.4

As well as what kazemakase suggests, we can take advantage of the fact that module-level functions like numpy.random.random are really methods of a hidden numpy.random.RandomState by pulling the __self__ directly from one of those methods:
numpy_default_rng = numpy.random.random.__self__

numpy.random imports * from numpy.random.mtrand, which is an extension module written in Cython. The source code shows that the global state is stored in the variable _rand. This variable is not imported into the numpy.random scope but you can get it directly from mtrand.
import numpy as np
from numpy.random.mtrand import _rand as global_randstate
np.random.seed(42)
print(np.random.rand())
# 0.3745401188473625
np.random.RandomState().seed(42) # Different object, does not influence global state
print(np.random.rand())
# 0.9507143064099162
global_randstate.seed(42) # this changes the global state
print(np.random.rand())
# 0.3745401188473625

I don't know how to access the global state. However, you can use a RandomState object and pass it along. Random distributions are attached to it, so you call them as methods.
Example:
import numpy as np
def computation(parameter, rs):
return parameter*np.sum(rs.uniform(size=5)-0.5)
my_state = np.random.RandomState(seed=3)
print(computation(3, my_state))

Related

Why mock patching works with random but not with np?

I have a module where a number of different functions use random numbers or random choices.
I am trying to use mock and patch to inject pre-chosen values in place of these random selections but can't understand an error I am receiving.
In the function I am testing, I use
np.random.randint
when I use the code
from unittest import mock
import random
mocked_random_int = lambda : 7
with mock.patch('np.random.randint', mocked_random_int):
I get an error message no module named np. However, numpy is imported as np and other functions are calling it just fine.
Even more perplexing if I edit the code above to remove the 'np' at the front it does what I want:
with mock.patch('random.randint', mocked_random_int):
But I want to understand why the code works without the np. Thank you!
There is a difference between a module or package name and the variable it is assigned to in any given namespace. A simple import
import numpy
tells python to check its imported module list, import numpy as necessary, and assign the module to the variable "numpy"
import numpy as np
is almost the same, except that you assign to a variable "np". Its still the same numpy package, its just that you've aliased it differently.
mock.patch will import and patch the module regardless of whether you've already imported it, but you need to give the module name, not your current module's alias to the module.

How do I globally seed np.random.default_rng for unit tests

The recommended way by numpy to create random numbers is to create a np.random.Generator like this
import numpy as np
def foo():
# Some more complex logic here, this is the top level method that creates the rng
rng = np.random.default_rng()
return rng.random()
Now suppose I am writing tests for my code base, and I need to seed the rng for reproducible results.
Is it possible to tell numpy to use the same seed every time, regardless where default_rng() is called?
This is basically the old behavior of np.random.seed().
The reason I need this is because I have many such tests and would have to mock the default_rng call to use a seed for each of them since in pytest you have to mock at the location where something is being used, not where it is defined. Thus mocking it globally like in this answer does not work.
With the old way, one could define a fixture that sets the seed for each test automatically inside conftest.py like this:
# conftest.py
import pytest
import numpy as np
#pytest.fixture(autouse=True)
def set_random_seed():
# seeds any random state in the tests, regardless where is is defined
np.random.seed(0)
# test_foo.py
def test_foo():
assert np.isclose(foo(), 0.84123412) # That's not the right number, just an example
With the new way of using default_rng, this seems to no longer be possible.
Instead I would need to put a fixture like this in every test module that requires the rng to be seeded.
# inside test_foo.py, but also every other test file
import pytest
from unittest import mock
import numpy as np
#pytest.fixture()
def seed_default_rng():
seeded_rng = np.random.default_rng(seed=0)
with mock.patch("module.containing.foo.np.random.default_rng") as mocked:
mocked.return_value = seeded_rng
yield
def test_foo(seed_default_rng):
assert np.isclose(foo(), 0.84123412)
The best I've come up with is to have a parametrizable fixture in the conftest.py like this
# conftest.py
import pytest
from unittest import mock
import numpy as np
#pytest.fixture
def seed_default_rng(request):
seeded_rng = np.random.default_rng(seed=0)
mock_location = request.node.get_closest_marker("rng_location").args[0]
with mock.patch(f"{mock_location}.np.random.default_rng") as mocked:
mocked.return_value = seeded_rng
yield
This can then be used in each test like so:
# test_foo.py
import pytest
from module.containing.foo import foo
#pytest.mark.rng_location("module.containing.foo")
def test_foo(seed_default_rng):
assert np.isclose(foo(), 0.84123412) # just an example number
It's still not as convenient as before, but you only need to add the mark to each test instead of mocking the default_rng method.
If you want the full numpy API with a guarantee of stable random values across numpy versions, the short answer is - you can't.
You can use a workaround with the np.random.RandomState module, but you sacrifice the use of the current np.random module - there's no good, stable way around this.
Why numpy.random is not stable across versions
As of numpy v1.16, numpy.random.default_rng() constructs a new Generator with the default BitGenerator. But in the description of np.random.Generator, the following guidance is attached:
No Compatibility Guarantee
Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.
Therefore, using np.random.default_rng() will preserve random numbers for the same versions of numpy across platforms, but not across versions.
This has been true since the adoption of NEP 0019: Random number generator policy. See the abstract:
For the past decade, NumPy has had a strict backwards compatibility policy for the number stream of all of its random number distributions. Unlike other numerical components in numpy, which are usually allowed to return different when results when they are modified if they remain correct, we have obligated the random number distributions to always produce the exact same numbers in every version. The objective of our stream-compatibility guarantee was to provide exact reproducibility for simulations across numpy versions in order to promote reproducible research. However, this policy has made it very difficult to enhance any of the distributions with faster or more accurate algorithms. After a decade of experience and improvements in the surrounding ecosystem of scientific software, we believe that there are now better ways to achieve these objectives. We propose relaxing our strict stream-compatibility policy to remove the obstacles that are in the way of accepting contributions to our random number generation capabilities.
Workaround for testing with pytest
A section of the NEP is devoted to Supporting Unit Tests and discusses preserving guaranteed stream compatibility across versions and platforms in the legacy np.random.RandomState module. From the numpy docs on "Legacy Random Generation":
The RandomState provides access to legacy generators. This generator is considered frozen and will have no further improvements. It is guaranteed to produce the same values as the final point release of NumPy v1.16. These all depend on Box-Muller normals or inverse CDF exponentials or gammas. This class should only be used if it is essential to have randoms that are identical to what would have been produced by previous versions of NumPy.
The np.random.RandomState docs provide an example usage, which can be adapted for use with pytest. The important point is that functions making use of np.random.random and other methods must be monkeypatched using a RandomState instance:
Contents of mymod.py
import numpy as np
def myfunc():
return np.random.random(size=3)
Contents of test_mymod.py
import pytest
import numpy as np
from numpy.random import RandomState
from mymod import myfunc
#pytest.fixture(autouse=True)
def mock_random(monkeypatch: pytest.MonkeyPatch):
def stable_random(*args, **kwargs):
rs = RandomState(12345)
return rs.random(*args, **kwargs)
monkeypatch.setattr('numpy.random.random', stable_random)
def test_myfunc():
# this test will work across numpy versions
known_result = np.array([0.929616, 0.316376, 0.183919])
np.testing.assert_allclose(myfunc(), known_result, atol=1e-6)

Why can I use the random.radint method without instantiating an instance?

To use the random.randint method I can use the following two options:
import random
# one way to use random
random.randint(1,10)
# second way with instanciating first
instance = random.Random()
instance.randint(1,10)
Why can I use the first way although I did not instantiate an instance yet?
Importing random instantiates Random into a private variable.
On line 786 of random.py.
_inst = Random()
And then on line 791.
randint = _inst.randint
random.py resides in the Lib folder of your python installation if you want to check it out yourself.
Lines 786 - 808 are the lines of interest. They basically set all the methods of that private instance of random to variables so they can be called this way.
From the docs.
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state.
https://docs.python.org/3/library/random.html

Variables set before importing a python module aren't recognized by module

Here's my code:
import numpy as np
import matplotlib.pyplot as plt
import astropy
import matplotlib
%matplotlib ipympl
import scatterplot_with_hist as sc
badx=[]
bady=[]
import badcomp as bc
#things like data5 and list2 are defined in here--I know that code is functional so I'll omit it for brevity
bc.getlist(start = 2000, end = 2200)
The module code is as follows:
def getlist(start, end):
for f in range(1):
for i in range(1238):
for n in range(int(start),int(end)):
if ((data[n]['col1'] - list2[i]) == 0):
badx.append(data[n]['col2'])
bady.append(data[n]['col3'])
If I run this code in the regular space (instead of importing it and running it as a function) it works fine. When I run it as an imported function, it won't recognize variables like data5, list2, and badx and bady.
Why?
Each Python module has it's own global namespace. That means that code in different modules that each try to access global variables will see separate ones. You can access another module's global variables by importing the module and interacting with the attributes of the module object.
In your code, the getlist function in the badcomp module is trying to interact with several global variables, including badx and bady for output, and data and list2 for input. It's not working because you've defined those in the interactive session, which uses the namespace of a module with the special name __main__.
While you could import __main__ from badcomp and interact with the global variables defined there via the module's attributes, that would be a really bad design, since it won't work if the module gets imported in any other way (e.g. by a different module you write later). Instead, the function should probably use variables defined in its own global namespace. The __main__ module is already importing badcomp (as bc), and can access things like badx and bady as bc.badx and bc.bady if the definitions are moved into the module.
Or you might reconsider if global variables are the best way for this function to work. It's often much better to use arguments and return values to pass data in and out of a function, rather than global variables. Maybe badx and bady should be defined within getlist and returned at the end. Meanwhile, data and list2 could be added as arguments to the function.
When a module is imported, it does NOT have access to the global or local namespace of the module that called it. You can get around this by creating a function that creates a variable in the global namespace inside the imported module and run the function from the calling module with each of the variables you need.
Example code (really bad design, but it'll teach you hopefully):
Put THIS in the imported module:
def putVarsInNamespace(variable, variableNameToInject)
exec("global %s" % variableName)
exec("%s = variable" % variableName)
Put THIS in the calling module:
test = 5
from <MODULENAME> import putVarsInNamespace
putVarsInNamespace(test, "test")
How this works: variableNameToInject is the name that you want the injected variable to be called. It then runs global variableNameToInject but it uses the VALUE of variableNameToInject which is the name that the injected variable should be called. This is useful when you want to inject multiple variables without using multiple functions. It then sets the variable name (the value of variableNameToInject) to the value of variable, and just like that it's injected.

using module within user defined function

When I need to use numpy within a python function I'm defining, which method is correct/better/preferred/more pythonic?
Method 1
def do_something(arg):
import numpy as np
y = np.array(arg)
return y
or
Method 2
import numpy as np
def do_something(arg):
y = np.array(arg)
return y
My expectation is that method 2 is correct because it does not execute the import statement every time the function is called. Also, I would expect that importing within the function only makes numpy available within the scope of that function, which also seems bad.
Yes method 2 is correct as is your explanation. Import in Python is similar to #include header_file in C/C++. Module importing is quite fast, but not instant, put the imports at the top. It is also not true that method 1 makes the code slow.

Categories