I have a Matlab code that generates a 1D random walk.
%% probability to move up or down
prob = [0.05, 0.95];
start = 2; %% start with 2
positions(1) = start;
for i=2:1000
rr = rand(1);
down = rr<prob(1) & positions(i-1)>1;
up = rr>prob(2) & positions(i-1)<4;
positions(i) = positions(i-1)-down + up;
figure(1), clf
plot(positions)
This gives me the plot below 1D Random Walk with Matlab
I need to try to translate this in Python and I have came up with this (using numpy):
import random
import numpy as np
import matplotlib.pyplot as plt
prob = [0.05, 0.95] ##probability to move up or down
N = 100 ##length of walk
def randomWalk(N):
positions=np.zeros(N)
start = 2 ##Start at 2
positions[0] = start
for i in range(1,100):
rr = random.randint(0,1)
if rr<prob[0] and positions[i-1]>1:
start -= 1
elif rr>prob[1] and positions[i-1]<4:
start += 1
positions[i] = start
return positions
plt.plot(randomWalk(N))
plt.show()
It looks fairly close to what I want (see figure below):1D Random Walk with Python
But I wonder if they are really equivalent, because they do seem different: The Python code seems spikier than the Matlab one.
What is missing in my Python code to achieve the perfect stepwise increase/decrease (similar to the Matlab code)? Maybe it needs an "else" that tells it to stay the same unless the two conditions are met. How do I implement that?
You are doing a bunch of things differently.
For one, you are using rand in MATLAB, which returns a random float between 0 and 1. In python, you are using randint, which returns a random integer. You are doing randint(0, 1), which means "a random integer from 0 to 1, not including 0". So it will always be 1. You want random.random(), which returns a random float between 0 and 1.
Next, you are computing down and up in MATLAB, but in Python you are computing down or up in Python. For your particular case of probabilities these end up having the same result, but they are syntactically different. You can use an almost identical syntax to MATLAB for Python in this case.
Finally, you are calculating a lot more samples for MATLAB than Python (about a factor of 10 more).
Here is a direct port of your MATLAB code to Python. The result for me is pretty much the same as your MATLAB example (with different random numbers, of course):
import random
import matplotlib.pyplot as plt
prob = [0.05, 0.95] # Probability to move up or down
start = 2 #Start at 2
positions = [start]
for _ in range(1, 1000):
rr = random.random()
down = rr < prob[0] and positions[-1] > 1
up = rr > prob[1] and positions[-1] < 4
positions.append(positions[-1] - down + up)
plt.plot(positions)
plt.show()
If speed is an issue you can probably speed this up by using np.random.random(1000) to generate the random numbers up-front, and do the probability comparisons up-front as well in a vectorized manner.
So something like this:
import random
import numpy as np
import matplotlib.pyplot as plt
prob = [0.05, 0.95] # Probability to move up or down
start = 2 #Start at 2
positions = [start]
rr = np.random.random(1000)
downp = rr < prob[0]
upp = rr > prob[1]
for idownp, iupp in zip(downp, upp):
down = idownp and positions[-1] > 1
up = iupp and positions[-1] < 4
positions.append(positions[-1] - down + up)
plt.plot(positions)
plt.show()
Edit: To explain a bit more about the second example, basically what I am doing is pre-computing whether the probability is below the first threshold or above the second for every step ahead of time. This is much faster than computing a random sample and doing the comparison at each step of the loop. Then I am using zip to combine those two random sequences into one sequence where each element is the pair of corresponding elements from the two sequences. This is assuming python 3, if you are using python 2 you should use itertools.izip instead of zip.
So it is roughly equivalent to this:
import random
import numpy as np
import matplotlib.pyplot as plt
prob = [0.05, 0.95] # Probability to move up or down
start = 2 #Start at 2
positions = [start]
rr = np.random.random(1000)
downp = rr < prob[0]
upp = rr > prob[1]
for i in range(len(rr)):
idownp = downp[i]
iupp = upp[i]
down = idownp and positions[-1] > 1
up = iupp and positions[-1] < 4
positions.append(positions[-1] - down + up)
plt.plot(positions)
plt.show()
In python, it is generally preferred to iterate over values, rather than indexes. There is pretty much never a situation where you need to iterate over an index. If you find yourself doing something like for i in range(len(foo)):, or something equivalent to that, you are almost certainly doing something wrong. You should either iterate over foo directly, or if you need the index for something else you can use something like for i, ifoo in enumerate(foo):, which gets you both the elements of foo and their indexes.
Iterating over indexes is common in MATLAB because of various limitations in the MATLAB language. It is technically possible to do something similar to what I did in that Python example in MATLAB, but in MATLAB it requires a lot of boilerplate to be safe and will be extremely slow in most cases. In Python, however, it is the fastest and cleanest approach.
Related
I am a medical physics student trying to simulate photon detection - I succeeded (below) but I want to make it better by speeding it up: it currently takes 50 seconds to run and I want it to run in some fraction of that time. I assume someone more knowledgeable in Python could optimize it to complete within less than 10 seconds (without reducing num_photons_detected values). Thank you very much for trying out this little optimization challenge.
from random import seed
from random import random
import random
import matplotlib.pyplot as plt
import numpy as np
rows, cols = (25, 25)
num_photons_detected = [10**3, 10**4, 10**5, 10**6, 10**7]
lesionPercentAboveNoiseLevel = [1, 0.20, 0.10, 0.05]
index_range = np.array([i for i in range(rows)])
for l in range(len(lesionPercentAboveNoiseLevel)):
pixels = np.array([[0.0 for i in range(cols)] for j in range(rows)])
for k in range(len(num_photons_detected)):
random.seed(a=None, version=2)
photons_random_pixel_choice = np.array([random.choice(index_range) for z in range(rows)])
counts = 0
while num_photons_detected[k] > counts:
for i in photons_random_pixel_choice:
photons_random_pixel_choice = np.array([random.choice(index_range) for z in range(rows)]) #further ensures random pixel selection
for j in photons_random_pixel_choice:
pixels[i,j] +=1
counts +=1
plt.imshow(pixels, cmap="gray") #in the resulting images/graphs, x is on the vertical and y on the horizontal
plt.show()
I think that, aside from efficiency issues, a problem with the code is that it does not select the positions of photons truly at random. Instead, it selects rows numbers, and then for each selected row, it picks column numbers where photons will be observed in that row. As a result, if a row number is not selected, there will be no photons in that row at all, and if the same row is selected several times, there will be many photons in it. This is visible in the produced plots which have a clear pattern of lighter and darker rows:
Assuming that this is unintended and that each pixel should have equal chances of being selected, here is a function generating an array of a given size, with a given number of randomly selected pixels:
import numpy as np
def generate_photons(rows, cols, num_photons):
rng = np.random.default_rng()
indices = rng.choice(rows*cols, num_photons)
np.add.at(pix:=np.zeros(rows*cols), indices, 1)
return pix.reshape(rows, cols)
You can use it to produce images with specified parameters. E.g.:
import matplotlib.pyplot as plt
pixels = generate_photons(rows=25, cols=25, num_photons=10**4)
plt.imshow(pixels, cmap="gray")
plt.show()
gives:
photons_random_pixel_choice = np.array([random.choice(index_range) for z in range(rows)])
It seems like the goal here is:
Use a pre-made sequence of integers, 0 to 24 inclusive, to select one of those values.
Repeat that process 25 times in a list comprehension, to get a Python list of 25 random values in that range.
Make a 1-d Numpy array from those results.
This is very much missing the point of using Numpy. If we want integers in a range, then we can directly ask for those. But more importantly, we should let Numpy do the looping as much as possible when using Numpy data structures. This is where it pays to read the documentation:
size: int or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
So, just make it directly: photons_random_pixel_choice = random.integers(rows, size=(rows,)).
I am trying to do the inverse method for a discrete variable, but I don't get anything (if I print the sample I get only one number, not a sample). How do I make this work correctly?
import numpy as np
import numpy.random as rd
import matplotlib.pyplot as plt
def inversao_discreto(p,M):
amostra=np.zeros(M)
acum=np.zeros(len(p))
acum[:]=p
for i in range(1, len(acum)):
acum[i]+=acum[i-1]
r=rd.random_sample(M)
for i in range(M):
i=0
k=0
while(r[k]>acum[i]):
i+=1
amostra=i
return amostra
model=np.array([0.1,0.2,0.1,0.2,0.4])
sample=inversao_discreto(model,10000)
As far as I could understand, you want to implement the Inverse transform sampling for discrete variables, which works like this:
Given a probability distribution `p`
Given a number of samples `M`
Calculate the cumulative probability distribution function `acum` from `p`
Generate `M` uniformly distributed samples and store into `r`
For each sample `r[i]`
Get the index where the cumulative distribution `acum` is larger or equal to the sample `r[i]`
Store this value into `amostra[i]`
If my understanding is correct, then your code is almost there. Your mistake is only in the last for loop. Your variable i is tracking the position of your samples stored in r. Then, k will keep track of where in the accumulator acum you are comparing r to. Here is the pseudo-code:
For each sample `r[i]`
Start with `k = 0`
While `r[i] > acum[k]`
Increment `k`
Now that `r[i] <= acum[k]`, store `k` into `amostra[i]`
Translating it to Python:
for i in range(M):
k = 0
while (r[i] > acum[k]):
k += 1
amostra[i] = k
So here is the code fixed:
import numpy as np
import numpy.random as rd
def inversao_discreto(p, M):
amostra = np.zeros(M)
acum = np.zeros(len(p))
acum[:] = p
for i in range(1, len(acum)):
acum[i] += acum[i - 1]
r = rd.random_sample(M)
for i in range(M):
k = 0
while (r[i] > acum[k]):
k += 1
amostra[i] = k
return amostra
model = np.array([0.1, 0.2, 0.1, 0.2, 0.4])
sample = inversao_discreto(model, 10000)
I tried to change as little code as possible to make it work since you are relatively new to Python. Most of the changes I implemented were based on this Style Guide for Python Code, which I recommend you to take a look at since will improve your code, visually speaking.
I'm currently working on a way to find rectangles/polygons in up to 15 given points (Image below).
Given Points
My goal is it to find polygons in that point array, like I marked in the image below. The polygons are rectangles in the real world but they are distorted a bit that's the reason why they can look like polygons or other shapes. I must find the best rectangle/polygon.
My idea was to check all connections between the points but the total amount of that is to big to run in and it took.
Does anyone has an idea how to solve that, I researched in the web and found the k-Nearest algorithm in sklearn for python but I don't have experience with that if this is the right way to solve it and how to do that. Maybe I'll also need a method to filter out some of the outliers to make it even easier for the algorithm to find the right corner points of the polygon.
The code snippet below splits the given point string into separate arrays, the array coordinatesOnly contains just the x and y values of the points.
Many thanks for you help.
Polygon in Given Points
import math
import numpy as np
import matplotlib.pyplot as plt
import time
from sklearn.neighbors import NearestNeighbors
millis = round(int(time.time())) / 1000
####input String
print("2D to 3D convert")
resultString = "0,487.50,399.46,176.84,99.99;1,485.93,423.43,-4.01,95.43;2,380.53,433.28,1.52,94.90;3,454.47,397.68,177.07,90.63;4,490.20,404.10,-6.17,89.90;5,623.56,430.52,-176.09,89.00;6,394.66,385.44,90.22,87.74;7,625.61,416.77,-177.95,87.02;8,597.21,591.66,-91.04,86.49;9,374.03,540.89,-11.20,85.77;10,600.51,552.91,178.29,85.52;11,605.29,530.78,-179.89,85.34;12,583.73,653.92,-82.39,84.42;13,483.56,449.58,-91.12,83.37;14,379.01,451.62,-6.21,81.51"
resultString = resultString.split(";")
resultStringSplitted = list()
coordinatesOnly = list()
for i in range(len(resultString)):
resultStringSplitted .append(resultString[i].split(","))
newList = ((float(resultString[i].split(",")[1]),float(resultString[i].split(",")[2])))
coordinatesOnly.append(newList)
for j in range(len(resultStringSplitted[i])):
resultStringSplitted[i][j] = float(resultStringSplitted[i][j])
#Check if score is valid
validScoreList = list()
for i in range(len(resultStringSplitted)):
if resultStringSplitted[i][len(resultStringSplitted[i])-1] != 0:
validScoreList.append(resultStringSplitted[i])
resultStringSplitted = validScoreList
#Result String array contains all 2D results
# [Point Number, X Coordinate, Y Coordinate, Angle, Point Score]
for i in range(len(resultStringSplitted)):
plt.scatter(resultStringSplitted[i][1],resultStringSplitted[i][2])
plt.show(block=True)
Since you mentioned that you can have a maximum of 15 points, I suggest to check all possible combinations of 4 points and keep all rectangles that are close enough to perfect rectangles. For 15 points, it is "only" 15*14*13*12=32760 potential rectangles.
import math
import itertools
import numpy as np
coordinatesOnly = ((0,0),(0,1),(1,0),(1,1),(2,0),(2,1),(1,3)) # Test data
rectangles = []
# Returns True if l0 and l1 are within 10% deviation
def isValid(l0, l1):
if l0 == 0 or l1 == 0:
return False
return abs(max(l0,l1)/min(l0,l1) - 1) < 0.1
for p in itertools.combinations(np.array(coordinatesOnly),4):
for r in itertools.permutations(p,4):
l01 = np.linalg.norm(r[1]-r[0]) # Side
l12 = np.linalg.norm(r[2]-r[1]) # Side
l23 = np.linalg.norm(r[3]-r[2]) # Side
l30 = np.linalg.norm(r[0]-r[3]) # Side
l02 = np.linalg.norm(r[2]-r[0]) # Diagonal
l13 = np.linalg.norm(r[2]-r[0]) # Diagonal
areSidesEqual = isValid(l01,l23) and isValid(l12,l30)
isDiag1Valid = isValid(math.sqrt(l01*l01+l30*l30),l13) # Pythagore
isDiag2Valid = isValid(math.sqrt(l01*l01+l12*l12),l02) # Pythagore
if areSidesEqual and isDiag1Valid and isDiag2Valid:
rectangles.append(r)
break
print(rectangles)
It takes about 1 second to run on 15 points on my computer. It really depends on what are your requirements for computation time, i.e., real time, interactive time, "I just don't want to spend days waiting for the answer" time.
I'm trying to solve a differential equation numerically, and am writing an equation that will give me an array of the solution to each time point.
import numpy as np
import matplotlib.pylab as plt
pi=np.pi
sin=np.sin
cos=np.cos
sqrt=np.sqrt
alpha=pi/4
g=9.80665
y0=0.0
theta0=0.0
sina = sin(alpha)**2
second_term = g*sin(alpha)*cos(alpha)
x0 = float(raw_input('What is the initial x in meters?'))
x_vel0 = float(raw_input('What is the initial velocity in the x direction in m/s?'))
y_vel0 = float(raw_input('what is the initial velocity in the y direction in m/s?'))
t_f = int(raw_input('What is the maximum time in seconds?'))
r0 = x0
vtan = sqrt(x_vel0**2+y_vel0**2)
dt = 1000
n = range(0,t_f)
r_n = r0*(n*dt)
r_nm1 = r0((n-1)*dt)
F_r = ((vtan**2)/r_n)*sina-second_term
r_np1 = 2*r_n - r_nm1 + dt**2 * F_r
data = [r0]
for time in n:
data.append(float(r_np1))
print data
I'm not sure how to make the equation solve for r_np1 at each time in the range n. I'm still new to Python and would like some help understanding how to do something like this.
First issue is:
n = range(0,t_f)
r_n = r0*(n*dt)
Here you define n as a list and try to multiply the list n with the integer dt. This will not work. Pure Python is NOT a vectorized language like NumPy or Matlab where you can do vector multiplication like this. You could make this line work with
n = np.arange(0,t_f)
r_n = r0*(n*dt),
but you don't have to. Instead, you should move everything inside the for loop to do the calculation at each timestep. At the present point, you do the calculation once, then add the same only result t_f times to the data list.
Of course, you have to leave your initial conditions (which is a key part of ODE solving) OUTSIDE of the loop, because they only affect the first step of the solution, not all of them.
So:
# Initial conditions
r0 = x0
data = [r0]
# Loop along timesteps
for n in range(t_f):
# calculations performed at each timestep
vtan = sqrt(x_vel0**2+y_vel0**2)
dt = 1000
r_n = r0*(n*dt)
r_nm1 = r0*((n-1)*dt)
F_r = ((vtan**2)/r_n)*sina-second_term
r_np1 = 2*r_n - r_nm1 + dt**2 * F_r
# append result to output list
data.append(float(r_np1))
# do something with output list
print data
plt.plot(data)
plt.show()
I did not add any piece of code, only rearranged your lines. Notice that the part:
n = range(0,t_f)
for time in n:
Can be simplified to:
for time in range(0,t_f):
However, you use n as a time variable in the calculation (previously - and wrongly - defined as a list instead of a single number). Thus you can write:
for n in range(0,t_f):
Note 1: I do not know if this code is right mathematically, as I don't even know the equation you're solving. The code runs now and provides a result - you have to check if the result is good.
Note 2: Pure Python is not the best tool for this purpose. You should try some highly optimized built-ins of SciPy for ODE solving, as you have already got hints in the comments.
I have an array where discreet sinewave values are recorded and stored. I want to find the max and min of the waveform. Since the sinewave data is recorded voltages using a DAQ, there will be some noise, so I want to do a weighted average. Assuming self.yArray contains my sinewave values, here is my code so far:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
for y in range (0,filtersize):
summation = sum(self.yArray[x+y])
ave = summation/filtersize
filterarray.append(ave)
My issue seems to be in the second for loop, where depending on my averaging window size (filtersize), I want to sum up the values in the window to take the average of them. I receive an error saying:
summation = sum(self.yArray[x+y])
TypeError: 'float' object is not iterable
I am an EE with very little experience in programming, so any help would be greatly appreciated!
The other answers correctly describe your error, but this type of problem really calls out for using numpy. Numpy will run faster, be more memory efficient, and is more expressive and convenient for this type of problem. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
# make a sine wave with noise
times = np.arange(0, 10*np.pi, .01)
noise = .1*np.random.ranf(len(times))
wfm = np.sin(times) + noise
# smoothing it with a running average in one line using a convolution
# using a convolution, you could also easily smooth with other filters
# like a Gaussian, etc.
n_ave = 20
smoothed = np.convolve(wfm, np.ones(n_ave)/n_ave, mode='same')
plt.plot(times, wfm, times, -.5+smoothed)
plt.show()
If you don't want to use numpy, it should also be noted that there's a logical error in your program that results in the TypeError. The problem is that in the line
summation = sum(self.yArray[x+y])
you're using sum within the loop where your also calculating the sum. So either you need to use sum without the loop, or loop through the array and add up all the elements, but not both (and it's doing both, ie, applying sum to the indexed array element, that leads to the error in the first place). That is, here are two solutions:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = sum(self.yArray[x:x+filtersize]) # sum over section of array
ave = summation/filtersize
filterarray.append(ave)
or
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = 0.
for y in range (0,filtersize):
summation = self.yArray[x+y]
ave = summation/filtersize
filterarray.append(ave)
self.yArray[x+y] is returning a single item out of the self.yArray list. If you are trying to get a subset of the yArray, you can use the slice operator instead:
summation = sum(self.yArray[x:y])
to return an iterable that the sum builtin can use.
A bit more information about python slices can be found here (scroll down to the "Sequences" section): http://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy
You could use numpy, like:
import numpy
filtersize = 2
ysums = numpy.cumsum(numpy.array(self.yArray, dtype=float))
ylags = numpy.roll(ysums, filtersize)
ylags[0:filtersize] = 0.0
moving_avg = (ysums - ylags) / filtersize
Your original code attempts to call sum on the float value stored at yArray[x+y], where x+y is evaluating to some integer representing the index of that float value.
Try:
summation = sum(self.yArray[x:y])
Indeed numpy is the way to go. One of the nice features of python is list comprehensions, allowing you to do away with the typical nested for loop constructs. Here goes an example, for your particular problem...
import numpy as np
step=2
res=[np.sum(myarr[i:i+step],dtype=np.float)/step for i in range(len(myarr)-step+1)]