I have a large Python code that I've been maintaining/updating/expanding since ~2014. Recently I came across numpy's Random Number Generator Policy (2018-05) and now I'm a bit confused.
I'm not sure what changed, and if I should upgrade my code accordingly to use the new Random Generator. For example, the Random sampling docs say:
# Do this
from numpy.random import default_rng
rng = default_rng()
vals = rng.standard_normal(10)
more_vals = rng.standard_normal(10)
# instead of this
from numpy import random
vals = random.standard_normal(10)
more_vals = random.standard_normal(10)
All my code depends on the (old?) syntax shown in the second block (i.e., I don't use default_rng but simple calls to np.random.seed(), np.random.uniform(), np.random.normal(), etc), and I don't know why I should use the first block instead of the second block.
Could someone shed some light over this please?
1.In python2(old code) default_rng is not available.
2.In python3(new code) both first and second blocks you mentioned will runs without an error and executed.
3.In future they may drop the random.standard_normal from coming versions of python,that's why they mentioned to use of default_rng instead of random.standard_normal
Related
I have such code and use Jupyter-Notebook
for j in range(timesteps):
a_int = np.random.randint(largest_number/2) # int version
and i get random numbers, but when i try to move part of code to the functions, i start to receive same number in each iteration
def create_train_data():
np.random.seed(seed=int(time.time()))
a_int = np.random.randint(largest_number/2) # int version
return a
for j in range(timesteps):
c = create_train_data()
Why it's happend and how to fix it? i think maybe it because of processes in Jupyter-Notebook
The offending line of code is
np.random.seed(seed=int(time.time()))
Since you're executing in a loop that completes fairly quickly, calling int() on the time reduces your random seed to the same number for the entire loop. If you really want to manually set the seed, the following is a more robust approach.
def create_train_data():
a_int = np.random.randint(largest_number/2) # int version
return a
np.random.seed(seed=int(time.time()))
for j in range(timesteps):
c = create_train_data()
Note how the seed is being created once and then used for the entire loop, so that every time a random integer is called the seed changes without being reset.
Note that numpy already takes care of a pseudo-random seed. You're not gaining more random results by using it. A common reason for manually setting the seed is to ensure reproducibility. You set the seed at the start of your program (top of your notebook) to some fixed integer (I see 42 in a lot of tutorials), and then all the calculations follow from that seed. If somebody wants to verify your results, the stochasticity of the algorithms can't be a confounding factor.
The other answers are correct in saying that it is because of the seed. If you look at the Documentation From SciPy you will see that seeds are used to create a predictable random sequence. However, I think the following answer from another question regarding seeds gives a better overview of what it does and why/where to use it.
What does numpy.random.seed(0) do?
Hans Musgrave's answer is great if you are happy with pseudo-random numbers. Pseudo-random numbers are good for most applications but they are problematic if used for cryptography.
The standard approach for getting one truly random number is seeding the random number generator with the system time before pulling the number, like you tried. However, as Hans Musgrave pointed out, if you cast the time to int, you get the time in seconds which will most likely be the same throughout the loop. The correct solution to seed the RNG with a time is:
def create_train_data():
np.random.seed()
a_int = np.random.randint(largest_number/2) # int version
return a
This works because Numpy already uses the computer clock or another source of randomness for the seed if you pass no arguments (or None) to np.random.seed:
Parameters: seed : {None, int, array_like}, optional Random seed used
to initialize the pseudo-random number generator. Can be any integer
between 0 and 2**32 - 1 inclusive, an array (or other sequence) of
such integers, or None (the default). If seed is None, then
RandomState will try to read data from /dev/urandom (or the Windows
analogue) if available or seed from the clock otherwise.
It all depends on your application though. Do note the warning in the docs:
Warning The pseudo-random generators of this module should not be used
for security purposes. For security or cryptographic uses, see the
secrets module.
The use of Python symbolic computation module "Sympy" in a simulation is very difficult, I need to have reliable fixed inputs, for that I use the seed() in the random module.
However every time I call a simple sympy function, it seems to overwrites the seed with a new value, thus getting new output every time. I have searched a little bit and found this. But neither of them has a solution.
Consider this code:
from sympy import *
import random
random.seed(1)
for _ in range(2):
x = symbols('x')
equ = (x** random.randint(1,5)) ** Rational(random.randint(1,5)/2)
print(equ)
This outputs
(x**2)**(5/2)
x**4
on the first run, and
(x**2)**(5/2)
(x**5)**(3/2)
On the second run, and every-time I run the script it returns new output. I need a way to fix this to enforce the use of seed().
Does this help? From the docs on random:
"You can instantiate your own instances of Random to get generators that don’t share state"
Usage:
import random
# Create a new pseudo random number generator
prng = random.Random()
prng.seed(1)
This number generator will be unaffected by sympy
I'm trying to use Python/Numpy for a project that I'd normally do in Matlab, so I'm somewhat new to this environment (though I have played with Python/Django on the web development side). I'm now running into what I have to believe is a super simple issue that occurs when I'm trying to assign an element of a numpy array to another numpy array. The basic offending code is as follows. It does have some other fluff around it which I don't believe could be causing the issue, but I can provide that code as well if it would help.
import numpy as np
tf = 100
dt = 10
X0 = np.array([6978,0,5.8787,5.8787])
xhist = np.zeros(tf/dt+1)
yhist = np.zeros(tf/dt+1)
xhist[0] = X0[0]
yhist[0] = X0[1]
print(X0[0])
print(xhist[0])
When I run the above code, the first print statement gives me 6978, as expected; however, the second print statement gives me 0, and I can't figure out for the life of me why this is. Any ideas? Thanks in advance!
Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...
To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.
import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool
def threaded(sources, destinations):
# Define worker function
def worker(point):
dlt = (destinations-point) # delta between destinations and given point
d = inner1d(dlt,dlt) # get distances
return np.argmin(d) # return closest index
# Multithread!
p = ThreadPool()
return p.map(worker, sources)
def unthreaded(sources, destinations):
results = []
#for p in sources:
for i in range(len(sources)):
dlt = (destinations-sources[i]) # difference between destinations and given point
d = inner1d(dlt,dlt) # get distances
results.append(np.argmin(d)) # append closest index
return results
# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000 # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100
#Compare!
print 'threaded: %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]
Retults:
threaded: 0.894030461056
unthreaded: 1.97295164054
Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.
All recommendations to improve performance (within the limitations described above) will be greatly appreciated!
Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:
They're probably using CPython inside (several references to that documentation and not any other).
They're not fond of threads (lots of non-thread safe methods)
Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.
Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
The above should've worked out for you. If not, try another parallel tool (after some serious praying).
On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.
I got an issue which is, in my code,anyone can help will be great.
this is the example code.
from random import *
from numpy import *
r=array([uniform(-R,R),uniform(-R,R),uniform(-R,R)])
def Ft(r):
for i in range(3):
do something here, call r
return something
however I found that in python shell, every time I run function Ft, it gives me different
result.....seems like within the function, in each iterate of the for loop,call r once, it gives random numbers once... but not fix the initial random number when I call the function....how can I fix it?
how about use b=copy(r) then call b in the Ft function?
Thanks
Do you mean that you want the calls to randon.uniform() to return the same sequence of values each time you run the function?
If so, you need to call random.seed() to set the start of the sequence to a fixed value. If you don't, the current system time is used to initialise the random number generator, which is intended to cause it to generate a different sequence every time.
Something like this should work
random.seed(42) # Set the random number generator to a fixed sequence.
r = array([uniform(-R,R), uniform(-R,R), uniform(-R,R)])
I think you mean 'list' instead of 'array', you're trying to use functions when you really don't need to. If I understand you correctly, you want to edit a list of random floats:
import random
r=[random.uniform(-R,R) for x in range(3)]
def ft(r):
for i in range(len(r)):
r[i]=???