Indexing and vectorizing a nested-loop

Indexing and vectorizing a nested-loop - python

I am running a particular script that will calculate the fractal dimension of the input data. While the script does run fine, it is very slow, and a look into it using cProfile showed that the function boxcount is accounting for around 90% of the run time. I have had similar issues in a previous questions,More efficient way to loop?, and Vectorization of a nested for-loop. While looking at cProfile, the function itself does not run slow, but in the script is needs to be called a large number of times. I'm struggling to find a way to re-write this to eliminate the large number of function calls. Here is the code below:
for j in range(starty, endy):
jmin=j-half_tile
jmax=j+half_tile+1
# Loop over columns
for i in range(startx, endx):
imin=i-half_tile
imax=i+half_tile+1
# Extract a subset of points from the input grid, centered on the current
# point. The size of tile is given by the current entry of the tile list.
z = surface[imin:imax, jmin:jmax]
# print 'Tile created. Size:', z.shape
# Calculate fractal dimension of the tile using 3D box-counting
fd, intercept = boxcount(z,dx,nside,cell,slice_size,box_size)
FractalDim[i,j] = fd
Lacunarity[i,j] = intercept
My real problem is that for each loop through i,j, it finds the values of imin,imax,jmin,jmax, which is basically creating a subset of the input data, centered around the values of imin,imax,jmin,jmax. The function of interest, boxcount is evaluated over the range of imin,imax,jmin,jmax as well. For this example, the value of half_tile is 6, and the values for starty,endy,startx,endx are 6,271,5,210 respectively. The values of dx,cell,nside,slice_size,box_size are all just constants used in the boxcount function.
I have done problems similar to this, just not with the added complication of centering the slice of data around a particular point. Can this be vectorized? or improved at all?
EDIT
Here is the code for the function boxcount as requested.
def boxcount(z,dx,nside,cell,slice_size,box_size):
# fractal dimension calculation using box-counting method
n = 5 # number of graph points for simple linear regression
gx = [] # x coordinates of graph points
gy = [] # y coordinates of graph points
boxCount = np.zeros((5))
cell_set = np.reshape(np.zeros((5*(nside**3))), (nside**3,5))
nslice=nside**2
# Box is centered at the mid-point of the tile. Calculate for each point in the
# tile, which voxel the contains the point
z0 = z[nside/2,nside/2]-dx*nside/2
for j in range(1,13):
for i in range(1,13):
ij = (j-1)*12 + i
# print 'i, j:', i, j
delz1 = z[i-1,j-1]-z0
delz2 = z[i-1,j]-z0
delz3 = z[i,j-1]-z0
delz4 = z[i,j]-z0
delz = 0.25*(delz1+delz2+delz3+delz4)
if delz < 0.0:
break
slice = ceil(delz)
# print " delz:",delz," slice:",slice
# Identify the voxel occupied by current point
ijk = int(slice-1.)*nslice + (j-1)*nside + i
for k in range(5):
if cell_set[cell[ijk,k],k] != 1:
cell_set[cell[ijk,k],k] = 1
# Set any cells deeper than this one equal to one aswell
# index = cell[ijk,k]
# for l in range(int(index),box_size[k],slice_size[k]):
# cell_set[l,k] = 1
# Count number of filled boxes for each box size
boxCount = np.sum(cell_set,axis=0)
# print "boxCount:", boxCount
for ib in range(1,n+1):
# print "ib:",ib," x(ib):",math.log(1.0/ib)," y(ib):",math.log(boxCount[ib-1])
gx.append( math.log(1.0/ib) )
gy.append( math.log(boxCount[ib-1]) )
# simple linear regression
m, b = np.polyfit(gx,gy,1)
# print "Polyfit: Slope:", m,' Intercept:', b
# fd = m-1
fd = max(2.,m)
return(fd,b)

Related

Filtering the two frequencies with highest amplitudes of a signal in the frequency domain

I have tried to filter the two frequencies which have the highest amplitudes. I am wondering if the result is correct, because the filtered signal seems less smooth than the original?
Is it correct that the output of the FFT-function contains the fundamental frequency A0/ C0, and is it correct to include it in the search of the highest amplitude (it is indeed the highest!) ?
My code (based on my professors and collegues code, and I did not understand every detail so far):
# signal
data = np.loadtxt("profil.txt")
t = data[:,0]
x = data[:,1]
x = x-np.mean(x) # Reduce signal to mean
n = len(t)
max_ind = int(n/2-1)
dt = (t[n-1]-t[0])/(n-1)
T = n*dt
df = 1./T
# Fast-Fourier-Transformation
c = 2.*np.absolute(fft(x))/n #get the power sprectrum c from the array of complex numbers
c[0] = c[0]/2. #correction for c0 (fundamental frequency)
f = np.fft.fftfreq(n, d=dt)
a = fft(x).real
b = fft(x).imag
n_fft = len(a)
# filter
p = np.ones(len(c))
p[c[0:int(len(c)/2)].argsort()[int(len(c)/2-1)]] = 0 #setting the positions of p to 0 with
p[c[0:int(len(c)/2)].argsort()[int(len(c)/2-2)]] = 0 #the indices from the argsort function
print(c[0:int(len(c)/2-1)].argsort()[int(n_fft/2-2)]) #over the first half of the c array,
ab_filter_2 = fft(x) #because the second half contains the
ab_filter_2.real = a*p #negative frequencies.
ab_filter_2.imag = b*p
x_filter2 = ifft(ab_filter_2)*2
I do not quite get the whole deal about FFT returning negative and positive frequencies. I know they are just mirrored, but then why can I not search over the whole array? And the iFFT function works with an array of just the positive frequencies?
The resulting plot: (blue original, red is filtered):

This part is very wasteful:
a = fft(x).real
b = fft(x).imag
You’re computing the FFT twice for no good reason. You compute it a 3rd time later, and you already computed it once before. You should compute it only once, not 4 times. The FFT is the most expensive part of your code.
Then:
ab_filter_2 = fft(x)
ab_filter_2.real = a*p
ab_filter_2.imag = b*p
x_filter2 = ifft(ab_filter_2)*2
Replace all of that with:
out = ifft(fft(x) * p)
Here you do the same thing twice:
p[c[0:int(len(c)/2)].argsort()[int(len(c)/2-1)]] = 0
p[c[0:int(len(c)/2)].argsort()[int(len(c)/2-2)]] = 0
But you set only the left half of the filter. It is important to make a symmetric filter. There are two locations where abs(f) has the same value (up to rounding errors!), there is a positive and a negative frequency that go together. Those two locations should have the same filter value (actually complex conjugate, but you have a real-valued filter so the difference doesn’t matter in this case).
I’m unsure what that indexing does anyway. I would split out the statement into shorter parts on separate lines for readability.
I would do it this way:
import numpy as np
x = ...
x -= np.mean(x)
fft_x = np.fft.fft(x)
c = np.abs(fft_x) # no point in normalizing, doesn't change the order when sorting
f = c[0:len(c)//2].argsort()
f = f[-2:] # the last two elements are the indices to the largest two frequency components
p = np.zeros(len(c))
p[f] = 1 # preserve the largest two components
p[-f] = 1 # set the same components counting from the end
out = np.fft.ifft(fft_x * p).real
# note that np.fft.ifft(fft_x * p).imag is approximately zero if the filter is created correctly
Is it correct that the output of the FFT-function contains the fundamental frequency A0/ C0 […]?
In principle yes, but you subtracted the mean from the signal, effectively setting the fundamental frequency (DC component) to 0.

replace nested for loops combined with conditions to boost performance

In order to speed up my code I want to exchange my for loops by vectorization or other recommended tools. I found plenty of examples with replacing simple for loops but nothing for replacing nested for loops in combination with conditions, which I was able to comprehend / would have helped me...
With my code I want to check if points (X, Y coordinates) can be connected by lineaments (linear structures). I started pretty simple but over time the code outgrew itself and is now exhausting slow...
Here is an working example of the part taking the most time:
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import MultiLineString, LineString, Point
from shapely.affinity import rotate
from math import sqrt
from tqdm import tqdm
import random as rng
# creating random array of points
xys = rng.sample(range(201 * 201), 100)
points = [list(divmod(xy, 201)) for xy in xys]
# plot points
plt.scatter(*zip(*points))
# calculate length for rotating lines -> diagonal of bounds so all points able to be reached
length = sqrt(2)*200
# calculate angles to rotate lines
angles = []
for a in range(0, 360, 1):
angle = np.deg2rad(a)
angles.append(angle)
# copy points array to helper array (points_list) so original array is not manipulated
points_list = points.copy()
# array to save final lines
lines = []
# iterate over every point in points array to search for connecting lines
for point in tqdm(points):
# delete point from helper array to speed up iteration -> so points do not get
# double, triple, ... checked
if len(points_list) > 0:
points_list.remove(point)
else:
break
# create line from original point to point at end of line (x+length) - this line
# gets rotated at calculated angles
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# iterate over angle Array to rotate line by each angle
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
lst = list(rot_line.coords)
# save starting point (a) and ending point(b) of rotated line for np.cross()
# (cross product to check if points on/near rotated line)
a = np.asarray(lst[0])
b = np.asarray(lst[1])
# counter to count number of points on/near line
count = 0
line_list = []
# iterate manipulated points_list array (only points left for which there has
# not been a line rotated yet)
for poi in points_list:
# check whether point (pio) is on/near rotated line by calculating cross
# product (np.corss())
p = np.asarray(poi)
cross = np.cross(p-a,b-a)
# check if poi is inside accepted deviation from cross product
if cross > -750 and cross < 750:
# check if more than 5 points (poi) are on/near the rotated line
if count < 5:
line_list.append(poi)
count += 1
# if 5 points are connected by the rotated line sort the coordinates
# of the points and check if the length of the line meets the criteria
else:
line_list = sorted(line_list , key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
break
# use shapeplys' MultiLineString to create lines from coordinates and plot them
# afterwards
multiLines = MultiLineString(lines)
fig, ax = plt.subplots()
ax.set_title("Lines")
for multiLine in MultiLineString(multiLines).geoms:
# print(multiLine)
plt.plot(*multiLine.xy)
As mentioned above it was thinking about using pandas or numpy vectorization and therefore build a pandas df for the points and lines (gdf) and one with the different angles (angles) to rotate the lines:
Name
Type
Size
Value
gdf
DataFrame
(122689, 6)
Column name: x, y, value, start, end, line
angles
DataFrame
(360, 1)
Column name: angle
But I ran out of ideas to replace this nested for loops with conditions with pandas vectorization. I found this article on medium and halfway through the article there are conditions for vectorization mentioned and I was wondering if my code maybe is not suitbale for vectorization because of dependencies within the loops...
If this is right, it does not necessarily needs to be vectoriation everything boosting the performance is welcome!

You can quite easily vectorize the most computationally intensive part: the innermost loop. The idea is to compute the points_list all at once. np.cross can be applied on each lines, np.where can be used to filter the result (and get the IDs).
Here is the (barely tested) modified main loop:
for point in tqdm(points):
if len(points_list) > 0:
points_list.remove(point)
else:
break
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# CHANGED PART
if len(points_list) == 0:
continue
p = np.asarray(points_list)
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
a, b = np.asarray(rot_line.coords)
cross = np.cross(p-a,b-a)
foundIds = np.where((cross > -750) & (cross < 750))[0]
if foundIds.size > 5:
# Similar to the initial part, not efficient, but rarely executed
line_list = p[foundIds][:5].tolist()
line_list = sorted(line_list, key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
This is about 15 times faster on my machine.
Most of the time is spent in the shapely module which is very inefficient (especially rotate and even np.asarray(rot_line.coords)). Indeed, each call to rotate takes about 50 microseconds which is simply insane: it should take no more than 50 nanoseconds, that is, 1000 time faster (actually, an optimized native code should be able to to that in less than 20 ns on my machine). If you want a faster code, then please consider not using this package (or improving its performance).

Self Avoiding Random Walk in a 3d lattice using pivot algorithm in python

I was working on a problem from past few days, It is related to creating a self avoiding random walks using pivot algorithm and then to implement another code which places spherical inclusions in a 3d lattice. I have written two codes one code generates coordinates of the spheres by using Self avoiding random walk algorithm and then another code uses the co-ordinates generated by the first program and then creates a 3d lattice in abaqus with spherical inclusions in it.
The result which will be generated after running the codes:
Now the zoomed in portion (marked in red boundary)
Problem: The spheres generated are merging with each other, How to avoid this, i ran out of ideas. I just need some direction or algorithm to work on.
The code which i wrote is as follows:
code 1: generates the coordinates of the spheres
# Functions of the code: 1) This code generates the coordinates for the spherical fillers using self avoiding random walk which was implemented by pivot algorithm
# Algorithm of the code: 1)Prepare an initial configuration of a N steps walk on lattice(equivalent N monomer chain)
# 2)Randomly pick a site along the chain as pivot site
# 3)Randomly pick a side(right to the pivot site or left to it), the chain on this side is used for the next step.
# 4)Randomly apply a rotate operation on the part of the chain we choose at the above step.
# 5)After the rotation, check the overlap between the rotated part of the chain and the rest part of the chain.
# 6)Accept the new configuration if there is no overlap and restart from 2th step.
# 7)Reject the configuration and repeat from 2th step if there are overlaps.
################################################################################################################################################################
# Modules to be imported are below
import numpy as np
import timeit
from scipy.spatial.distance import cdist
import math
import random
import sys
import ast
# define a dot product function used for the rotate operation
def v_dot(a):return lambda b: np.dot(a,b)
def distance(x, y): #The Euclidean Distance to Spread the spheres initiallly
if len(x) != len(y):
raise ValueError, "vectors must be same length"
sum = 0
for i in range(len(x)):
sum += (x[i]-y[i])**2
return math.sqrt(sum)
x=1/math.sqrt(2)
class lattice_SAW: # This is the class which creates the self avoiding random walk coordinates
def __init__(self,N,l0):
self.N = N #No of spheres
self.l0 = l0 #distance between spheres
# initial configuration. Usually we just use a straight chain as inital configuration
self.init_state = np.dstack((np.arange(N),np.zeros(N),np.zeros(N)))[0] #initially set all the coordinates to zeros
self.state = self.init_state.copy()
# define a rotation matrix
# 9 possible rotations: 3 axes * 3 possible rotate angles(45,90,135,180,225,270,315)
self.rotate_matrix = np.array([[[1,0,0],[0,0,-1],[0,1,0]],[[1,0,0],[0,-1,0],[0,0,-1]]
,[[1,0,0],[0,0,1],[0,-1,0]],[[0,0,1],[0,1,0],[-1,0,0]]
,[[-1,0,0],[0,1,0],[0,0,-1]],[[0,0,-1],[0,1,0],[-1,0,0]]
,[[0,-1,0],[1,0,0],[0,0,1]],[[-1,0,0],[0,-1,0],[0,0,1]]
,[[0,1,0],[-1,0,0],[0,0,1]],[[x,-x,0],[x,x,0],[0,0,1]]
,[[1,0,0],[0,x,-x],[0,x,x]]
,[[x,0,x],[0,1,0],[-x,0,x]],[[-x,-x,0],[x,-x,0],[0,0,1]]
,[[1,0,0],[0,-x,-x],[0,x,-x]],[[-x,0,x],[0,1,0],[-x,0,-x]]
,[[-x,x,0],[-x,-x,0],[0,0,1]],[[1,0,0],[0,-x,x],[0,-x,-x]]
,[[-x,0,-x],[0,1,0],[x,0,-x]],[[x,x,0],[-x,x,0],[0,0,1]]
,[[1,0,0],[0,x,x],[0,-x,x]],[[x,0,-x],[0,1,0],[x,0,x]]])
# define pivot algorithm process where t is the number of successful steps
def walk(self,t): #this definitions helps to start walking in 3d
acpt = 0
# while loop until the number of successful step up to t
while acpt <= t:
pick_pivot = np.random.randint(1,self.N-1) # pick a pivot site
pick_side = np.random.choice([-1,1]) # pick a side
if pick_side == 1:
old_chain = self.state[0:pick_pivot+1]
temp_chain = self.state[pick_pivot+1:]
else:
old_chain = self.state[pick_pivot:] # variable to store the coordinates of the old chain
temp_chain = self.state[0:pick_pivot]# for the temp chain
# pick a symmetry operator
symtry_oprtr = self.rotate_matrix[np.random.randint(len(self.rotate_matrix))]
# new chain after symmetry operator
new_chain = np.apply_along_axis(v_dot(symtry_oprtr),1,temp_chain - self.state[pick_pivot]) + self.state[pick_pivot]
# use cdist function of scipy package to calculate the pair-pair distance between old_chain and new_chain
overlap = cdist(new_chain,old_chain) #compare the old chain and the new chain to check if the new chain overlaps on any previous postion
overlap = overlap.flatten() # just to combine the coordinates in a list which will help to check the overlap
# determinte whether the new state is accepted or rejected
if len(np.nonzero(overlap)[0]) != len(overlap):
continue
else:
if pick_side == 1:
self.state = np.concatenate((old_chain,new_chain),axis=0)
elif pick_side == -1:
self.state = np.concatenate((new_chain,old_chain),axis=0)
acpt += 1
# place the center of mass of the chain on the origin, so then we can translate these coordinates to the initial spread spheres
self.state = self.l0*(self.state - np.int_(np.mean(self.state,axis=0)))
#now write the coordinates of the spheres in a file
sys.stdout = open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions.txt", "w")
n = 30
#text_file.write("Number of Spheres: " + str(n) + "\n")
#Radius of one sphere: is calculated based on the volume fraction here it 2 percent
r = (3*0.40)/(4*math.pi*n)
#print r
r = r**(1.0/3.0)
#print "Sphere Radius is " + str(r)
sphereList = [] # list to maintain track of the Spheres using their positions
sphereInstancesList = [] # to maintain track of the instances which will be used to cut and or translate the spheres
sphere_pos=[]
#Create n instances of the sphere
#After creating n instances of the sphere we trie to form cluster around each instance of the sphere
for i in range(1, n+1):
InstanceName = 'Sphere_' + str(i) # creating a name for the instance of the sphere
#print InstanceName
#text_file.write(InstanceName)
#Maximum tries to distribute sphere
maxTries = 10000000
while len(sphereList) < i:
maxTries -= 1
if maxTries < 1:
print "Maximum Distribution tries exceded. Error! Restart the Script!"
break;
# Make sure Spheres dont cut cube sides: this will place the spheres well inside the cube so that they does'nt touch the sides of the cube
# This is done to avoid the periodic boundary condition: later in the next versions it will be used
vecPosition = [(2*r)+(random.random()*(10.0-r-r-r)),(2*r)+(random.random()*(10.0-r-r-r)),(2*r)+(random.random()*(10.0-r-r-r))]
sphere_pos.append(vecPosition)
for pos in sphereList:
if distance(pos, vecPosition) < 2*r: # checking whether the spheres collide or not
break
else:
sphereList.append(vecPosition)
print vecPosition
#text_file.write(str(vecPosition) + "\n")
cluster_Size=[10,12,14,16,18,20,22,24,26,28,30] #list to give the random number of spheres which forms a cluster
for i in range(1,n+1):
Number_of_Spheres_Clustered=random.choice(cluster_Size) #selecting randomly from the list cluster_Size
radius_sphr=2*r #Distance between centers of the spheres
pivot_steps=1000 # for walking the max number of steps
chain = lattice_SAW(Number_of_Spheres_Clustered,radius_sphr) #initializing the object
chain.walk(pivot_steps) # calling the walk function to walk in the 3d space
co_ordinates=chain.state # copying the coordinates into a new variable for processing and converting the coordinates into lists
for c in range(len(co_ordinates)):
temp_cds=co_ordinates[c]
temp_cds=list(temp_cds)
for k in range(len(temp_cds)):
temp_cds[k]=temp_cds[k]+sphere_pos[i-1][k]
#text_file.write(str(temp_cds) + "\n")
print temp_cds #sys.stdout redirected into the file which stores the coordinates as lists
sys.stdout.flush()
f2=open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions.txt", "r")
remove_check=[]
for line in f2:
temp_check=ast.literal_eval(line)
if (temp_check[0]>10 or temp_check[0]<-r or temp_check[1]>10 or temp_check[1]<-r or temp_check[2]>10 or temp_check[2]<-r):
remove_check.append(str(temp_check))
f2.close()
flag=0
f2=open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions.txt", "r")
f3=open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions_corrected.txt", "w")
for line in f2:
line=line.strip()
if any(line in s for s in remove_check):
flag=flag+1
else:
f3.write(line+'\n')
f3.close()
f2.close()
The other code would not be required because there is no geometry computation in the second code. Any help or some direction of how to avoid the collision of spheres is very helpful, Thank you all

To accommodate non-intersecting spheres with turns of 45,135,225,315 (really, only 45 and 315 are issues), you just need to make your spheres a little bit smaller. Take 3 consecutive sphere-centers, with a 45-degree turn in the middle. In the plane containing the 3 points, that makes an isosceles triangle, with a 45-degree center angle:
Note that the bottom circles (spheres) overlap. To avoid this, you shrink the radius by multiplying by a factor of 0.76:

python - combining argsort with masking to get nearest values in moving window

I have some code for calculating missing values in an image, based on neighbouring values in a 2D circular window. It also uses the values from one or more temporally-adjacent images at the same locations (i.e. the same 2D window shifted in the 3rd dimension).
For each position that is missing, I need to calculate the value based not necessarily on all the values available in the whole window, but only on the spatially-nearest n cells that do have values (in both images / Z-axis positions), where n is some value less than the total number of cells in the 2D window.
At the minute, it's much quicker to calculate for everything in the window, because my means of sorting to get the nearest n cells with data is the slowest part of the function as it has to be repeated each time even though the distances in terms of window coordinates do not change. I'm not sure this is necessary and feel I must be able to get the sorted distances once, and then mask those in the process of only selecting available cells.
Here's my code for selecting the data to use within a window of the gap cell location:
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will go through every gap (zero) location in the data
# for each image and for each gap use slicing to get a window of
# size (radius*2+1, radius*2+1) around it from each image, with the
# gap being at the centre i.e.
# testgaplocation = [radius, radius]
# and the variables testdata, alternatedata below will refer to these
# slices
locations_to_exclude = np.logical_or(circle_template, np.logical_or
(testdata==0, alternatedata==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
available_distances = dist[locations_to_include]
available_data = testdata[locations_to_include]
available_alternates = alternatedata[locations_to_include]
if number_available > number_required:
# In this case we need to find the closest number_required of elements, based
# on distances recorded in dist, from available_data and available_alternates
# Having to repeat this argsort for each gap cell calculation is slow and feels
# like it should be avoidable
sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None)
requiredIndices = sortedDistanceIndices[0:number_required]
selected_data = np.take(available_data, requiredIndices)
selected_alternates = np.take(available_alternates , requiredIndices)
else:
# we just use available_data and available_alternates as they are...
# now do stuff with the selected data to calculate a value for the gap cell
This works, but over half of the total time of the function is taken in the argsort of the masked spatial distance data. (~900uS of a total 1.4mS - and this function will be running tens of billions of times, so this is an important difference!)
I am sure that I must be able to just do this argsort once outside of the function, when the spatial distance window is originally set up, and then include those sort indices in the masking, to get the first howManyToCalculate indices without having to re-do the sort. The answer might involve putting the various bits that we are extracting from, into a record array - but I can't figure out how, if so. Can anyone see how I can make this part of the process more efficient?

So you want to do the sorting outside of the loop:
sorted_dist_idcs = dist.argsort(kind='mergesort', axis=None)
Then using some variables from the original code, this is what I could come up with, though it still feels like a major round-trip..
loc_to_incl_sorted = locations_to_include.take(sorted_dist_idcs)
sorted_dist_idcs_to_incl = sorted_dist_idcs[loc_to_incl_sorted]
required_idcs = sorted_dist_idcs_to_incl[:number_required]
selected_data = testdata.take(required_idcs)
selected_alternates = alternatedata.take(required_idcs)
Note the required_idcs refer to locations in the testdata and not available_data as in the original code. And this snippet I used take for the purpose of conveniently indexing the flattened array.

#moarningsun - thanks for the comment and answer. These got me on the right track, but don't quite work for me when the gap is < radius from the edge of the data: in this case I use a window around the gap cell which is "trimmed" to the data bounds. In this situation the indices reflect the "full" window and thus can't be used to select cells from the bounded window.
Unfortunately I edited that part of my code out when I clarified the original question but it's turned out to be relevant.
I've realised now that if you use argsort again on the output of argsort then you get ranks; i.e. the position that each item would have when the overall array was sorted. We can safely mask these and then take the smallest number_required of them (and do this on a structured array to get the corresponding data at the same time).
This implies another sort within the loop, but in fact we can use partitioning rather than a full sort, because all we need is the smallest num_required items. If num_required is substantially less than the number of data items then this is much faster than doing the argsort.
For example with num_required = 80 and num_available = 15000 the full argsort takes ~900µs whereas argpartition followed by index and slice to get the first 80 takes ~110µs. We still need to do the argsort to get the ranks at the outset (rather than just partitioning based on distance) in order to get the stability of the mergesort, and thus get the "right one" when distance is not unique.
My code as shown below now runs in ~610uS on real data, including the actual calculations that aren't shown here. I'm happy with that now, but there seem to be several other apparently minor factors that can have an influence on the runtime that's hard to understand.
For example putting the circle_template in the structured array along with dist, ranks, and another field not shown here, doubles the runtime of the overall function (even if we don't access circle_template in the loop!). Even worse, using np.partition on the structured array with order=['ranks'] increases the overall function runtime by almost two orders of magnitude vs using np.argpartition as shown below!
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
ranks = dist.argsort(axis=None,kind='mergesort').argsort().reshape(dist.shape)
diam = radius * 2 + 1
# putting circle_template in this array too doubles overall function runtime!
fullWindowArray = np.zeros((diam,diam),dtype=[('ranks',ranks.dtype.str),
('thisdata',dayDataStack.dtype.str),
('alternatedata',dayDataStack.dtype.str),
('dist',spatialDist.dtype.str)])
fullWindowArray['ranks'] = ranks
fullWindowArray['dist'] = dist
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will loop here to go through every gap (zero) location in the data
# for each image
gapz, gapy, gapx = 1, radius, radius
desLeft, desRight = gapx - radius, gapx + radius+1
desTop, desBottom = gapy - radius, gapy + radius+1
extentB, extentR = dataStack.shape[1:]
# handle the case where the gap is < search radius from the edge of
# the data. If this is the case, we can't use the full
# diam * diam window
dataL = max(0, desLeft)
maskL = 0 if desLeft >= 0 else abs(dataL - desLeft)
dataT = max(0, desTop)
maskT = 0 if desTop >= 0 else abs(dataT - desTop)
dataR = min(desRight, extentR)
maskR = diam if desRight <= extentR else diam - (desRight - extentR)
dataB = min(desBottom,extentB)
maskB = diam if desBottom <= extentB else diam - (desBottom - extentB)
# get the slice that we will be working within
# ranks, dist and circle are already populated
boundedWindowArray = fullWindowArray[maskT:maskB,maskL:maskR]
boundedWindowArray['alternatedata'] = alternatedata[dataT:dataB, dataL:dataR]
boundedWindowArray['thisdata'] = testdata[dataT:dataB, dataL:dataR]
locations_to_exclude = np.logical_or(boundedWindowArray['circle_template'],
np.logical_or
(boundedWindowArray['thisdata']==0,
boundedWindowArray['alternatedata']==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
data_to_use = boundedWindowArray[locations_to_include]
if number_available > number_required:
# argpartition seems to be v fast when number_required is
# substantially < data_to_use.size
# But partition on the structured array itself with order=['ranks']
# is almost 2 orders of magnitude slower!
reqIndices = np.argpartition(data_to_use['ranks'],number_required)[:number_required]
data_to_use = np.take(data_to_use,reqIndices)
else:
# we just use available_data and available_alternates as they are...
pass
# now do stuff with the selected data to calculate a value for the gap cell

python return array from iteration

I want to plot an approximation of the number "pi" which is generated by a function of two uniformly distributed random variables. The goal is to show that with a higher sample draw the function value approximates "pi".
Here is my function for pi:
def pi(n):
x = rnd.uniform(low = -1, high = 1, size = n) #n = size of draw
y = rnd.uniform(low = -1, high = 1, size = n)
a = x**2 + y**2 <= 1 #1 if rand. draw is inside the unit cirlce, else 0
ac = np.count_nonzero(a) #count 1's
af = np.float(ac) #create float for precision
pi = (af/n)*4 #compute p dependent on size of draw
return pi
My problem:
I want to create a lineplot that plots the values from pi() dependent on n.
My fist attempt was:
def pipl(n):
for i in np.arange(1,n):
plt.plot(np.arange(1,n), pi(i))
print plt.show()
pipl(100)
which returns:
ValueError: x and y must have same first dimension
My seocond guess was to start an iterator:
def y(n):
n = np.arange(1,n)
for i in n:
y = pi(i)
print y
y(1000)
which results in:
3.13165829146
3.16064257028
3.06519558676
3.19839679359
3.13913913914
so the algorithm isn't far off, however i need the output as a data type which matplotlib can read.
I read:
http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation
and tried tom implement the function like:
...
y = np.array(pi(i))
...
or
...
y = pi(i)
y = np.array(y)
...
and all the other functions that are available from the website. However, I can't seem to get my iterated y values into one that matplotlib can read.
I am fairly new to python so please be considerate with my simple request. I am really stuck here and can't seem to solve this issue by myself.
Your help is really appreciated.

You can try with this
def pipl(n):
plt.plot(np.arange(1,n), [pi(i) for i in np.arange(1,n)])
print plt.show()
pipl(100)
that give me this plot

If you want to stay with your iterable approach you can use Numpy's fromiter() to collect the results to an array. Like:
def pipl(n):
for i in np.arange(1,n):
yield pi(i)
n = 100
plt.plot(np.arange(1,n), np.fromiter(pipl(n), dtype='f32'))
But i think Numpy's vectorize would be even better in this case, it makes the resulting code much more readable (to me). With this approach you dont need the pipl function anymore.
# vectorize the function pi
pi_vec = np.vectorize(pi)
# define all n's
n = np.arange(1,101)
# and plot
plt.plot(n, pi_vec(n))
A little side note, naming a function pi which does not return a true pi seems kinda tricky to me.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing and vectorizing a nested-loop - python

Related

Filtering the two frequencies with highest amplitudes of a signal in the frequency domain

replace nested for loops combined with conditions to boost performance

Self Avoiding Random Walk in a 3d lattice using pivot algorithm in python

python - combining argsort with masking to get nearest values in moving window

python return array from iteration

Categories

Resources