I've been attempting to use Python to create a script that lets me generate large numbers of points for use in the Monte Carlo method to calculate an estimate to Pi. The script I have so far is this:
import math
import random
random.seed()
n = 10000
for i in range(n):
x = random.random()
y = random.random()
z = (x,y)
if x**2+y**2 <= 1:
print z
else:
del z
So far, I am able to generate all of the points I need, but what I would like to get is the number of points that are produced when running the script for use in a later calculation. I'm not looking for incredibly precise results, just a good enough estimate. Any suggestions would be greatly appreciated.
If you're doing any kind of heavy duty numerical calculation, considering learning numpy. Your problem is essentially a one-linear with a numpy setup:
import numpy as np
N = 10000
pts = np.random.random((N,2))
# Select the points according to your condition
idx = (pts**2).sum(axis=1) < 1.0
print pts[idx], idx.sum()
Giving:
[[ 0.61255615 0.44319463]
[ 0.48214768 0.69960483]
[ 0.04735956 0.18509277]
...,
[ 0.37543094 0.2858077 ]
[ 0.43304577 0.45903071]
[ 0.30838206 0.45977162]], 7854
The last number is count of the number of events that counted, i.e. the count of the points whose radius is less than one.
Not sure if this is what you're looking for, but you can run enumerate on range and get the position in your iteration:
In [1]: for index, i in enumerate(xrange(10, 15)):
...: print index + 1, i
...:
...:
1 10
2 11
3 12
4 13
5 14
In this case, index + 1 would represent the current point being created (index itself would be the total number of points created at the beginning of a given iteration). Also, if you are using Python 2.x, xrange is generally better for these sorts of iterations as it does not load the entire list into memory but rather accesses it on an as-needed basis.
Just add hits variable before the loop, initialize it to 0 and inside your if statement increment hits by one.
Finally you can calculate PI value using hits and n.
import math
import random
random.seed()
n = 10000
hits = 0 # initialize hits with 0
for i in range(n):
x = random.random()
y = random.random()
z = (x,y)
if x**2+y**2 <= 1:
hits += 1
else:
del z
# use hits and n to compute PI
Related
I am facing a problem where I need my code to add the element of a list until the sum gets as close as possible to a constant. Once the constant is reached, I need the code to store the sum, and also the sum of indexes (count how many variables it needed to reach that sum). I am a beginner in Python and this problem is giving a very hard time.
I have tryed a while loop as well as a for loop. At that point, I am kind of stuck and not sure if my method is accurate.
here is a concrete example of the logic. Assuming demand for period 1 is 10 and demand for period 2 is 23 and Q is 12. (Q here represent an optimal quantity). What I want to figure out is whether we should place an order in period 1 that includes demand for periods1+2, or if it is better to place 1 order at period 1 and another one at period 2. Q is what determines it, if demand for period 1 is closer to Q or cumulative demand for period 1+2 is closer to Q. In this example, |10-12| < |(10+23)-12|, therefore we want to record an order for period 1, and another one for period 2.
def feeoq(q, demand):
sum = 0
prod = []
for i in demand:
sum = sum + i
if abs(sum - q) < abs(sum + i - q):
return prod.append(sum)
else:
sum = sum + i
I am not getting an error message but the function is not returning what I am expecting.
You have repeated the sum = sum + i line. Once at the start of the loop and then in the else condition. I guess you should remove the first addition line and append sum + i.
Perhaps we can talk about this as a base of discussion:
import numpy as np
np.random.seed(42)
L = np.random.randint(0, 10, 10)
q = 7
print(L)
def subs(L, q):
sum = 0
for i, e in enumerate(L):
sum += e
if sum > q:
if abs(sum - q) > abs(sum - e - q):
r = sum - e
sum = e
n = i - 1
else:
r = sum
sum = 0
n = i
yield n, r
yield i, sum
print(list(subs(L, q)))
Explanation :
Basically, this function first of all checks, if sum is bigger than q. Only if yes, you have two values of which not both are smaller or both are bigger than q. This is the prerequisite for your test, which one of both has the smaller distance to q.
Now, depending on which one is closer to q, the function returns sum or sum - e.
Now that I use here the verb return while using yield in the code: it's not a usual function but a generator. The main clue about this type of functions is, when they yield a value, you can think in a first step of the same like returning a value, with one important difference: the function itself does not return (i.e. does not end), but falls asleep, waiting for its next call, keeping its complete state including all the values calculated up to now, to then proceed right in the next line after yield, as if nothing happened before - until the next yield keyword.
In short: IMO exactly what you need if you want to sum up until whatever but do not really want to stop when whatever is reached... :)
Just my point of view to the problem, as stated in my comment below your question:
An algorithm, which calculates the number of needed orders of fixed quantities per order for every period, such that the demand is always covered.
Additionally the rest of every order, which was not covered in a certain period, is considered, so that eventually the demand of the next period can be covered with one order less.
def calcOrder(demand, Q):
result = []
rest = 0
for i, e in enumerate(demand):
current = e - rest
order = np.ceil(current/Q)
result.append(int(order))
rest = order * Q - current
return result
Example:
import numpy as np
np.random.seed(793)
demand = np.random.randint(0, 51, 5)
Q = 12
demand
# array([50, 20, 19, 48, 25])
print(f'demand\tcurrent\torder\trest')
rest = 0
for i, e in enumerate(demand):
current = e - rest
order = int(np.ceil(current/Q))
rest = order * Q - current
print(f'{e}\t{current}\t{order}\t{rest}')
# demand current order rest
# 50 50 5 10
# 20 10 1 2
# 19 17 2 7
# 48 41 4 7
# 25 18 2 6
I'm making a code to simulate a Brownian motion.
from random import random
import matplotlib.pyplot as plt
import numpy as np
N=100
p=0.5
l=1
x1=[]
x2=[]
x1.append(0)
x2.append(0)
for i in range(1, N):
step = -l if random() < p else l
X1 = x1[i-l] + step
x1.append(X1)
for i in range(1, N):
step = -l if random() < p else l
X2 = x2[i-l] + step
x2.append(X2)
x1mean=np.array(x1)
x2mean=np.array(x2)
mean=[]
for j in range (0,N):
mean.append((x1mean[j]+x2mean[j])/2.0)
plt.plot(mean)
plt.plot(x1)
plt.plot(x2)
plt.show()
This code makes the displacement for 2 diferent particles, but in order to calculate the mean displacement properly, I would need to have a great number of particles, likes 100. As you can see, I'm looking for a way to condensate the code because I cannot repetat the same code 100 times.
Is there a way to create a loop that makes all this code in function of 1 variable, i.e. the number of particles?
Thanks.
I can't provide you a working python code, because until now I did not write a single line of python code. But I can give you an idea how to solve your problem.
Assumptions:
N : Number of Moves
P : Number of Particles
Step 1:
Create a method generating your array/list and returning it. So you can re-use it and avoid copying your code.
def createParticleMotion(N, p, l):
x1=[]
x1.append(0)
for i in range(1, N):
step = -l if random() < p else l
X1 = x1[i-l] + step
x1.append(X1)
return x1
Step 2:
Create a list of lists, lets call it particleMotions. The list it selves has P list of your N moves. Fill the list within a for loop for you number of particles P by calling the method from the first step and append the list paticleMotions by the returned list/array.
May be the answer for Python: list of lists will help you creating this.
Step 3:
After you created and filled particleMotions use this list within a double for loop and calculate the mean and store it in a list of means.
mean=[]
for n in range (0,N):
sum=0
for p in range (0,P):
sum = sum + particleMotions[p][n]
mean.append(sum/P)
And now you can use a next for loop to plot your result.
for particle in range (0,P):
plt.plot(particleMotions[particle])
So again don't blame me for syntax errors. I am no phyton developer. I just want to give you a way to solve your problem.
This?
from random import random
import matplotlib.pyplot as plt
import numpy as np
N=100
p=0.5
l=1
mydict = {}
for n in range(100):
mydict[n] = []
mydict[n].append(0)
for i in range(1, N):
step = -l if random() < p else l
X1 = mydict[n][i-l] + step
mydict[n].append(X1)
for k,v in mydict.iteritems():
plt.plot(v)
# mean
plt.plot([np.mean(i) for i in mydict.values()])
plt.show()
I'm creating N_MC paths of simulated stock prices S with n points in each path, excluding the initial point. The algorithm to do so is recursive on the previous value of the stock price, for a given path. Here's what I have now:
import numpy as np
import time
N_MC = 1000
n = 10000
S = np.zeros((N_MC, n+1))
S0 = 1.0
S[:, 0] = S0
start_time_normals = time.clock()
Z = np.exp(np.random.normal(size=(N_MC, n)))
print "generate normals time = ", time.clock() - start_time_normals
start_time_prices = time.clock()
for i in xrange(N_MC):
for j in xrange(1, n+1):
S[i, j] = S[i, j-1]*Z[i, j-1]
print "pices time = ", time.clock() - start_time_prices
The times were:
generate normals time = 1.07
pices time = 9.98
Is there a much more efficient way to generate the arrays S, perhaps using Numpy's routines? It would be nice if the normal random variables Z could be generated more quickly, too, but I'm not as hopeful.
It's not necessary to loop over 'paths', because they're independent of each other. So, you can remove the outer loop for i in xrange(N_MC) and just operate on entire columns of S and Z.
For accelerating the recursive computation, let's just consider a single 'path'. Say z is vector containing the random values at each timestep (all known ahead of time). s is a vector that should contain the output at each timestep. s0 is the initial output at time zero. j is time.
Your code defines the ouput recursively:
s[j] = s[j-1]*z[j-1]
Let's expand this:
s[1] = s[0]*z[0]
s[2] = s[1]*z[1]
= s[0]*z[0]*z[1]
s[3] = s[2]*z[2]
= s[0]*z[0]*z[1]*z[2]
s[4] = s[3]*z[3]
= s[0]*z[0]*z[1]*z[2]*z[3]
Each output s[j] is given by s[0] times the product of the random values from 0 to j-1. You can calculate cumulative products like this using numpy.cumprod(), which should be much more efficient than looping:
s = np.concatenate(([s0], s0 * np.cumprod(z[0:-1])))
You can use the axis parameter for operating along one dimension of a matrix (e.g. for doing this in parallel across 'paths').
Consider points Y given in increasing order from [0,T). We are to consider these points as lying on a circle of circumference T. Now consider points X also from [0,T) and also lying on a circle of circumference T.
We say the distance between X and Y is the sum of the absolute distance between the each point in X and its closest point in Y recalling that both are considered to be lying in a circle. Write this distance as Delta(X, Y).
I am trying to find a quick way of approximating the distributions of distance between the circles over all possible rotations of X. I am currently does this by Monte Carlo simulation. First here is my code to make some fake data.
import random
import numpy as np
from bisect import bisect_left
def simul(rate, T):
time = np.random.exponential(rate)
times = [0]
newtime = times[-1]+time
while (newtime < T):
times.append(newtime)
newtime = newtime+np.random.exponential(rate)
return times[1:]
Now the code the find the distance between two circles.
def takeClosest(myList, myNumber, T):
"""
Assumes myList is sorted. Returns closest value to myNumber in a circle of circumference T.
If two numbers are equally close, return the smallest number.
"""
pos = bisect_left(myList, myNumber)
if (pos == 0 and myList[pos] != myNumber):
before = myList[pos - 1] - T
after = myList[0]
elif (pos == len(myList)):
before = myList[pos-1]
after = myList[0] + T
else:
before = myList[pos - 1]
after = myList[pos]
if after - myNumber < myNumber - before:
return after
else:
return before
def circle_dist(timesY, timesX):
dist = 0
for t in timesX:
closest_number = takeClosest(timesY, t, T)
dist += np.abs(closest_number - t)
return dist
Now the main code to make the data and to try 1000 different random rotations.
T = 50000
timesX = simul(1, T)
timesY = simul(10, T)
dists=[]
iters = 100
for i in xrange(iters):
offset = np.random.randint(0,T)
timesX = [(t+offset) % T for t in timesX]
dists.append(circle_dist(timesY, timesX))
We can now print out any statistics we like of the distances. I am particularly interested in the variance.
print "Variance is ", np.var(dists)
Unfortunately I need to do this a lot and it takes around 16 seconds currently. I find this a little surprising it is so slow. Any suggestions for how to speed it up gratefully received.
Edit 1. Reduced the number of iterations to 100 (the previous value didn't correspond to my timings correctly). This now takes around 16 seconds on my computer.
Edit 2. Fixed bug in takeClosest
EDIT: I've just noticed that performance optimization is a little premature, because the expression closest_number - t is not a valid implementation of any definition of a distance on a "circle" - that is only a distance on an open-ended line
sample test case (pseudocode):
T = 10
X = [1, 2]
Y = [9]
dist(X, Y) = dist(1, 9) + dist(2, 9)
dist_on_line = 8 + 7 = 15
dist_on_circle = 2 + 3 = 5
Note that definition of the circle [0,10) implies that dist(0, 10) is not defined, but in the limit it approaches 0: lim(dist(0, t), t->10) = 0
A correct implementation of a distance on a circle would be:
dist_of_t = min(t - closest_number_before_t,
closes_number_after_t - t,
T - t + closes_number_before_t,
T - closest_number_after_t + t)
Original answer:
you could rotate and iterate over timesY instead of timesX since that array is an order of magnitude smaller - doing bisect_left of timeX is negligible (O(logn)) compared to iterating over all the elements (O(n))
but IMHO, the real slowdown if because of Python dynamic typing (every of the ~50000 items in timesX has to be checked for type compatibility each time you try to compare it to some other value) => converting timesX and timesY to numpy arrays should help, if that is not enought CPU acceleration (cython, numba, ...) is the think you need
The function circle_dist can be replaced by a one-liner. So you can plug it into your outer for i loop:
sum(abs(takeClosest(timesY, t) - t) for t in timesX)
Furthermore, you should always - if possible - allocate arrays like dists in one step and avoid appending elements many thousand times.
But, unfortunately, both improvements only save a few percent of computing time.
Edit 1: Replacing np.abs(...) with abs(...) decreases computing time by 50 % on my machine (on a reduced data set)!
Edit 2: Updated the one-liner according to Aprillion's comment.
I want to calculate average for random walk for 1000 times to get good average so my code for this random walk is
import math
import random
from matplotlib import pyplot
position = 0
walk = [position]
steps = 10
for i in xrange(steps):
step = 1 if random.randint(0, 1) else -1
position += step
walk.append(position)
print((walk))
pyplot.hist(walk)
pyplot.show()
so, what is best way to make python repeat it many times and calculated the average for these random walks.
Thanks
It will be easier to do if you break it down into smaller functions, for example making the main part of your code a function
def makewalk(steps):
position = 0
walk = [position]
for i in xrange(steps):
step = 1 if random.randint(0, 1) else -1
position += step
walk.append(position)
return walk # instead of simply printing it
Also, you could use inbuilt functions to reduce it to a few lines
import numpy
def makewalk(N):
steps = numpy.random.randint(0, 2, N) * 2 - 1
# an array of length N with random integers between 0 (inclusive) and 2 (exclusive)
# multiplying it by two and subtracting 1 the numbers 0 and 1 become -1 and 1 respectively
walk = numpy.cumsum(steps) # what it says, a cumulative sum
return walk
Now just loop over it 1000 times
from matplotlib import pyplot
steps = 10000
numwalks = 1000
walks = [makewalk(steps) for i in xrange(numwalks)]
There are your walks, do whatever you like with them, and since the walks are numpy arrays you can easily compute the elementwise sum without loops
averagewalk = numpy.sum(walks, 0)*1.0/numwalks # sums along the 0th axis and returns an array of length steps