Add values to an external file in the form of columns - python

I would like to know how I can put the values ​​resulting from my loops into the external file in form of columns. I have this code:
Perhaps a very specific brief explanation of what I would like to obtain could help me a bit, I have a group of particles that fall on the surface of the earth around a point (0.0) each with their respective weights and I would like to know the sum of the weights of the particles that fall within the rings of internal radius Ri and external Rj (the external radius of the first ring becomes the internal radius of the ring that follows it)
#Insert the radio values
#For example Ri_0=-20
#For example Rj_max=4000
#For example Bin=40
data=pd.read_csv("photons.txt", header=0, delim_whitespace=True)
df=pd.DataFrame(data)`
Ri_0=input("Insert the internal minimum radius value: ")
Rj_max=input("Insert the external maximum radius value: ")
Bin= input("Insert the bin: ")
R_internals=range(Ri_0,Rj_max+1,Bin)
Ri=list(R_internals)
Rj=[]
R=[]
for m in Ri:
R_externals=m+Bin
Rj.append(R_externals)
for d,f in zip(Ri,Rj):
R_average=(d+f)/2
R.append(R_average)
import zipfile
#Loops
count=0
#I think the problem is in this loop
for i,j in zip(Ri,Rj):
for r in df["radio"]:
if r >= i and r <= j:
d=df[df['radio']==r]['ParWeight'].iloc[0]
count=count+d
I have a problem adding the sum of the weights of the particles that fall within the internal radius ring Ri and external Rj and then adding it to an external file in the form of a data column and next to it the value of R that comes to be the average of Ri and Rj, I get a systematic error because it adds the value of the sum of weights of all the particles and does not separate them by ring, I attach the file "photons.txt" in the next link [https://drive.google.com/file/d/1YM0U3UN4p1OGvbiajZakMWtQSmyZoCkN/view?usp=sharing]
I was several days to trying to solve that problem.
Thanks so much.

Bryan, before function definition I am using your code and your data. Then my idea is to find where each radius is fall into in your Ri array. Once I find where each radius falls in I add up all weights per each group. Please let me know if you have any questions. Cheers.
import pandas as pd
import numpy as np
data_file = r"D:\workspace\projects\misc\data\photons.txt"
df = pd.read_csv(data_file, delimiter="\t")
Ri_0 = -20
Rj_max = 4000
Bin = 40
R_internals=range(Ri_0,Rj_max+1,Bin)
Ri=list(R_internals)
def getRadiusBoundary(x, listBoundaries):
""" This function will place a specific radius to its boundaries.
Indexes of external and internal radiuses are used
"""
for i in range(len(listBoundaries)):
if x <= listBoundaries[0]:
return 0
elif x > listBoundaries[len(listBoundaries) - 1]:
return len(listBoundaries)
idx = [i+1 for i in range(len(listBoundaries)-1) if listBoundaries[i] < x and x <= listBoundaries[i+1]]
return idx[0]
# this is a dictionary that maps radius groups to its boundaries
dictRadiusMapping = {i+1: [Ri[i], Ri[i+1]] for i in range(len(Ri)-1)}
dictRadiusMapping[0] = [-np.inf, Ri[0]]
dictRadiusMapping[len(Ri)] = [Ri[len(Ri) - 1], np.inf]
# here I am placing each radius into its appropriate group (finding its [external, internal] boundaries)
df["radius_group"] = df["radio"].apply(lambda x: getRadiusBoundary(x, Ri))
radius_sums = df.groupby("radius_group")["ParWeight"].sum().reset_index() # summing for each radius group
radius_sums.columns = ["radius_group", "weight_sum"]
radius_sums["radiuses"] = radius_sums["radius_group"].map(dictRadiusMapping)

Related

replace nested for loops combined with conditions to boost performance

In order to speed up my code I want to exchange my for loops by vectorization or other recommended tools. I found plenty of examples with replacing simple for loops but nothing for replacing nested for loops in combination with conditions, which I was able to comprehend / would have helped me...
With my code I want to check if points (X, Y coordinates) can be connected by lineaments (linear structures). I started pretty simple but over time the code outgrew itself and is now exhausting slow...
Here is an working example of the part taking the most time:
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import MultiLineString, LineString, Point
from shapely.affinity import rotate
from math import sqrt
from tqdm import tqdm
import random as rng
# creating random array of points
xys = rng.sample(range(201 * 201), 100)
points = [list(divmod(xy, 201)) for xy in xys]
# plot points
plt.scatter(*zip(*points))
# calculate length for rotating lines -> diagonal of bounds so all points able to be reached
length = sqrt(2)*200
# calculate angles to rotate lines
angles = []
for a in range(0, 360, 1):
angle = np.deg2rad(a)
angles.append(angle)
# copy points array to helper array (points_list) so original array is not manipulated
points_list = points.copy()
# array to save final lines
lines = []
# iterate over every point in points array to search for connecting lines
for point in tqdm(points):
# delete point from helper array to speed up iteration -> so points do not get
# double, triple, ... checked
if len(points_list) > 0:
points_list.remove(point)
else:
break
# create line from original point to point at end of line (x+length) - this line
# gets rotated at calculated angles
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# iterate over angle Array to rotate line by each angle
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
lst = list(rot_line.coords)
# save starting point (a) and ending point(b) of rotated line for np.cross()
# (cross product to check if points on/near rotated line)
a = np.asarray(lst[0])
b = np.asarray(lst[1])
# counter to count number of points on/near line
count = 0
line_list = []
# iterate manipulated points_list array (only points left for which there has
# not been a line rotated yet)
for poi in points_list:
# check whether point (pio) is on/near rotated line by calculating cross
# product (np.corss())
p = np.asarray(poi)
cross = np.cross(p-a,b-a)
# check if poi is inside accepted deviation from cross product
if cross > -750 and cross < 750:
# check if more than 5 points (poi) are on/near the rotated line
if count < 5:
line_list.append(poi)
count += 1
# if 5 points are connected by the rotated line sort the coordinates
# of the points and check if the length of the line meets the criteria
else:
line_list = sorted(line_list , key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
break
# use shapeplys' MultiLineString to create lines from coordinates and plot them
# afterwards
multiLines = MultiLineString(lines)
fig, ax = plt.subplots()
ax.set_title("Lines")
for multiLine in MultiLineString(multiLines).geoms:
# print(multiLine)
plt.plot(*multiLine.xy)
As mentioned above it was thinking about using pandas or numpy vectorization and therefore build a pandas df for the points and lines (gdf) and one with the different angles (angles) to rotate the lines:
Name
Type
Size
Value
gdf
DataFrame
(122689, 6)
Column name: x, y, value, start, end, line
angles
DataFrame
(360, 1)
Column name: angle
But I ran out of ideas to replace this nested for loops with conditions with pandas vectorization. I found this article on medium and halfway through the article there are conditions for vectorization mentioned and I was wondering if my code maybe is not suitbale for vectorization because of dependencies within the loops...
If this is right, it does not necessarily needs to be vectoriation everything boosting the performance is welcome!
You can quite easily vectorize the most computationally intensive part: the innermost loop. The idea is to compute the points_list all at once. np.cross can be applied on each lines, np.where can be used to filter the result (and get the IDs).
Here is the (barely tested) modified main loop:
for point in tqdm(points):
if len(points_list) > 0:
points_list.remove(point)
else:
break
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# CHANGED PART
if len(points_list) == 0:
continue
p = np.asarray(points_list)
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
a, b = np.asarray(rot_line.coords)
cross = np.cross(p-a,b-a)
foundIds = np.where((cross > -750) & (cross < 750))[0]
if foundIds.size > 5:
# Similar to the initial part, not efficient, but rarely executed
line_list = p[foundIds][:5].tolist()
line_list = sorted(line_list, key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
This is about 15 times faster on my machine.
Most of the time is spent in the shapely module which is very inefficient (especially rotate and even np.asarray(rot_line.coords)). Indeed, each call to rotate takes about 50 microseconds which is simply insane: it should take no more than 50 nanoseconds, that is, 1000 time faster (actually, an optimized native code should be able to to that in less than 20 ns on my machine). If you want a faster code, then please consider not using this package (or improving its performance).

Counterclockwise sorting of x, y data

I have a set of points in a text file: random_shape.dat.
The initial order of points in the file is random. I would like to sort these points in a counter-clockwise order as follows (the red dots are the xy data):
I tried to achieve that by using the polar coordinates: I calculate the polar angle of each point (x,y) then sort by the ascending angles, as follows:
"""
Script: format_file.py
Description: This script will format the xy data file accordingly to be used with a program expecting CCW order of data points, By soting the points in Counterclockwise order
Example: python format_file.py random_shape.dat
"""
import sys
import numpy as np
# Read the file name
filename = sys.argv[1]
# Get the header name from the first line of the file (without the newline character)
with open(filename, 'r') as f:
header = f.readline().rstrip('\n')
angles = []
# Read the data from the file
x, y = np.loadtxt(filename, skiprows=1, unpack=True)
for xi, yi in zip(x, y):
angle = np.arctan2(yi, xi)
if angle < 0:
angle += 2*np.pi # map the angle to 0,2pi interval
angles.append(angle)
# create a numpy array
angles = np.array(angles)
# Get the arguments of sorted 'angles' array
angles_argsort = np.argsort(angles)
# Sort x and y
new_x = x[angles_argsort]
new_y = y[angles_argsort]
print("Length of new x:", len(new_x))
print("Length of new y:", len(new_y))
with open(filename.split('.')[0] + '_formatted.dat', 'w') as f:
print(header, file=f)
for xi, yi in zip(new_x, new_y):
print(xi, yi, file=f)
print("Done!")
By running the script:
python format_file.py random_shape.dat
Unfortunately I don't get the expected results in random_shape_formated.dat! The points are not sorted in the desired order.
Any help is appreciated.
EDIT: The expected resutls:
Create a new file named: filename_formatted.dat that contains the sorted data according to the image above (The first line contains the starting point, the next lines contain the points as shown by the blue arrows in counterclockwise direction in the image).
EDIT 2: The xy data added here instead of using github gist:
random_shape
0.4919261070361315 0.0861956168831175
0.4860816807027076 -0.06601587301587264
0.5023029456281289 -0.18238249845392662
0.5194784026079869 0.24347943722943777
0.5395164357511545 -0.3140611471861465
0.5570497147514262 0.36010146103896146
0.6074231036252226 -0.4142604617604615
0.6397066014669927 0.48590810704447085
0.7048302091822873 -0.5173701298701294
0.7499157837544145 0.5698170011806378
0.8000108666123336 -0.6199254449254443
0.8601249660418364 0.6500974025974031
0.9002010323281716 -0.7196585989767801
0.9703341483292582 0.7299242424242429
1.0104102146155935 -0.7931355765446666
1.0805433306166803 0.8102046438410078
1.1206193969030154 -0.865251869342778
1.1907525129041021 0.8909386068476981
1.2308285791904374 -0.9360074773711129
1.300961695191524 0.971219008264463
1.3410377614778592 -1.0076702085792988
1.4111708774789458 1.051499409681228
1.451246943765281 -1.0788793781975592
1.5213800597663678 1.1317798110979933
1.561456126052703 -1.1509956709956706
1.6315892420537896 1.2120602125147582
1.671665308340125 -1.221751279024005
1.7417984243412115 1.2923406139315234
1.7818744906275468 -1.2943211334120424
1.8520076066286335 1.3726210153482883
1.8920836729149686 -1.3596340023612745
1.9622167889160553 1.4533549783549786
2.0022928552023904 -1.4086186540731989
2.072425971203477 1.5331818181818184
2.1125020374898122 -1.451707005116095
2.182635153490899 1.6134622195985833
2.2227112197772345 -1.4884454939000387
2.292844335778321 1.6937426210153486
2.3329204020646563 -1.5192876820149541
2.403053518065743 1.774476584022039
2.443129584352078 -1.5433264462809912
2.513262700353165 1.8547569854388037
2.5533387666395 -1.561015348288075
2.6234718826405867 1.9345838252656438
2.663547948926922 -1.5719008264462806
2.7336810649280086 1.9858362849271942
2.7737571312143436 -1.5750757575757568
2.8438902472154304 2.009421487603306
2.883966313501766 -1.5687258953168035
2.954099429502852 2.023481896890988
2.9941754957891877 -1.5564797323888229
3.0643086117902745 2.0243890200708385
3.1043846780766096 -1.536523022432113
3.1745177940776963 2.0085143644234558
3.2145938603640314 -1.5088557654466737
3.284726976365118 1.9749508067689887
3.324803042651453 -1.472570838252656
3.39493615865254 1.919162731208186
3.435012224938875 -1.4285753640299088
3.5051453409399618 1.8343467138921687
3.545221407226297 -1.3786835891381335
3.6053355066557997 1.7260966810966811
3.655430589513719 -1.3197205824478546
3.6854876392284703 1.6130086580086582
3.765639771801141 -1.2544077134986225
3.750611246943765 1.5024152236652237
3.805715838087476 1.3785173160173163
3.850244800627849 1.2787337662337666
3.875848954088563 -1.1827449822904361
3.919007794704616 1.1336638361638363
3.9860581363759846 -1.1074537583628485
3.9860581363759846 1.0004485329485333
4.058012891753723 0.876878197560016
4.096267318663407 -1.0303482880755608
4.15638141809291 0.7443374218374221
4.206476500950829 -0.9514285714285711
4.256571583808748 0.6491902794175526
4.3166856832382505 -0.8738695395513574
4.36678076609617 0.593855765446675
4.426894865525672 -0.7981247540338443
4.476989948383592 0.5802489177489183
4.537104047813094 -0.72918339236521
4.587199130671014 0.5902272727272733
4.647313230100516 -0.667045454545454
4.697408312958435 0.6246979535615904
4.757522412387939 -0.6148858717040526
4.807617495245857 0.6754968516332154
4.8677315946753605 -0.5754260133805582
4.917826677533279 0.7163173947264858
4.977940776962782 -0.5500265643447455
5.028035859820701 0.7448917748917752
5.088149959250204 -0.5373268398268394
5.138245042108123 0.7702912239275879
5.198359141537626 -0.5445838252656432
5.2484542243955445 0.7897943722943728
5.308568323825048 -0.5618191656828015
5.358663406682967 0.8052154663518301
5.41877750611247 -0.5844972451790631
5.468872588970389 0.8156473829201105
5.5289866883998915 -0.6067217630853987
5.579081771257811 0.8197294372294377
5.639195870687313 -0.6248642266824076
5.689290953545233 0.8197294372294377
5.749405052974735 -0.6398317591499403
5.799500135832655 0.8142866981503349
5.859614235262157 -0.6493565525383702
5.909709318120076 0.8006798504525783
5.969823417549579 -0.6570670995670991
6.019918500407498 0.7811767020857934
6.080032599837001 -0.6570670995670991
6.13012768269492 0.7562308146399057
6.190241782124423 -0.653438606847697
6.240336864982342 0.7217601338055886
6.300450964411845 -0.6420995670995664
6.350546047269764 0.6777646595828419
6.410660146699267 -0.6225964187327819
6.4607552295571855 0.6242443919716649
6.520869328986689 -0.5922077922077915
6.570964411844607 0.5548494687131056
6.631078511274111 -0.5495730027548205
6.681173594132029 0.4686727666273125
6.7412876935615325 -0.4860743801652889
6.781363759847868 0.3679316979316982
6.84147785927737 -0.39541245791245716
6.861515892420538 0.25880333951762546
6.926639500135833 -0.28237987012986965
6.917336127605076 0.14262677798392165
6.946677533279001 0.05098957832291173
6.967431210462995 -0.13605442176870675
6.965045730326905 -0.03674603174603108
I find that an easy way to sort points with x,y-coordinates like that is to sort them dependent on the angle between the line from the points and the center of mass of the whole polygon and the horizontal line which is called alpha in the example. The coordinates of the center of mass (x0 and y0) can easily be calculated by averaging the x,y coordinates of all points. Then you calculate the angle using numpy.arccos for instance. When y-y0 is larger than 0 you take the angle directly, otherwise you subtract the angle from 360° (2𝜋). I have used numpy.where for the calculation of the angle and then numpy.argsort to produce a mask for indexing the initial x,y-values. The following function sort_xy sorts all x and y coordinates with respect to this angle. If you want to start from any other point you could add an offset angle for that. In your case that would be zero though.
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
Plotting x, y before sorting using matplotlib.pyplot.plot (points are obvisously not sorted):
Plotting x, y using matplotlib.pyplot.plot after sorting with this method:
If it is certain that the curve does not cross the same X coordinate (i.e. any vertical line) more than twice, then you could visit the points in X-sorted order and append a point to one of two tracks you follow: to the one whose last end point is the closest to the new one. One of these tracks will represent the "upper" part of the curve, and the other, the "lower" one.
The logic would be as follows:
dist2 = lambda a,b: (a[0]-b[0])*(a[0]-b[0]) + (a[1]-b[1])*(a[1]-b[1])
z = list(zip(x, y)) # get the list of coordinate pairs
z.sort() # sort by x coordinate
cw = z[0:1] # first point in clockwise direction
ccw = z[1:2] # first point in counter clockwise direction
# reverse the above assignment depending on how first 2 points relate
if z[1][1] > z[0][1]:
cw = z[1:2]
ccw = z[0:1]
for p in z[2:]:
# append to the list to which the next point is closest
if dist2(cw[-1], p) < dist2(ccw[-1], p):
cw.append(p)
else:
ccw.append(p)
cw.reverse()
result = cw + ccw
This would also work for a curve with steep fluctuations in the Y-coordinate, for which an angle-look-around from some central point would fail, like here:
No assumption is made about the range of the X nor of the Y coordinate: like for instance, the curve does not necessarily have to cross the X axis (Y = 0) for this to work.
Counter-clock-wise order depends on the choice of a pivot point. From your question, one good choice of the pivot point is the center of mass.
Something like this:
# Find the Center of Mass: data is a numpy array of shape (Npoints, 2)
mean = np.mean(data, axis=0)
# Compute angles
angles = np.arctan2((data-mean)[:, 1], (data-mean)[:, 0])
# Transform angles from [-pi,pi] -> [0, 2*pi]
angles[angles < 0] = angles[angles < 0] + 2 * np.pi
# Sort
sorting_indices = np.argsort(angles)
sorted_data = data[sorting_indices]
Not really a python question I think, but still I think you could try sorting by - sign(y) * x doing something like:
def counter_clockwise_sort(points):
return sorted(points, key=lambda point: point['x'] * (-1 if point['y'] >= 0 else 1))
should work fine, assuming you read your points properly into a list of dicts of format {'x': 0.12312, 'y': 0.912}
EDIT: This will work as long as you cross the X axis only twice, like in your example.
If:
the shape is arbitrarily complex and
the point spacing is ~random
then I think this is a really hard problem.
For what it's worth, I have faced a similar problem in the past, and I used a traveling salesman solver. In particular, I used the LKH solver. I see there is a Python repo for solving the problem, LKH-TSP. Once you have an order to the points, I don't think it will be too hard to decide on a clockwise vs clockwise ordering.
If we want to answer your specific problem, we need to pick a pivot point.
Since you want to sort according to the starting point you picked, I would take a pivot in the middle (x=4,y=0 will do).
Since we're sorting counterclockwise, we'll take arctan2(-(y-pivot_y),-(x-center_x)) (we're flipping the x axis).
We get the following, with a gradient colored scatter to prove correctness (fyi I removed the first line of the dat file after downloading):
import numpy as np
import matplotlib.pyplot as plt
points = np.loadtxt('points.dat')
#oneliner for ordering points (transform, adjust for 0 to 2pi, argsort, index at points)
ordered_points = points[np.argsort(np.apply_along_axis(lambda x: np.arctan2(-x[1],-x[0]+4) + np.pi*2, axis=1,arr=points)),:]
#color coding 0-1 as str for gray colormap in matplotlib
plt.scatter(ordered_points[:,0], ordered_points[:,1],c=[str(x) for x in np.arange(len(ordered_points)) / len(ordered_points)],cmap='gray')
Result (in the colormap 1 is white and 0 is black), they're numbered in the 0-1 range by order:
For points with comparable distances between their neighbouring pts, we can use KDTree to get two closest pts for each pt. Then draw lines connecting those to give us a closed shape contour. Then, we will make use of OpenCV's findContours to get contour traced always in counter-clockwise manner. Now, since OpenCV works on images, we need to sample data from the provided float format to uint8 image format. Given, comparable distances between two pts, that should be pretty safe. Also, OpenCV handles it well to make sure it traces even sharp corners in curvatures, i.e. smooth or not-smooth data would work just fine. And, there's no pivot requirement, etc. As such all kinds of shapes would be good to work with.
Here'e the implementation -
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
from scipy.spatial import cKDTree
import cv2
from scipy.ndimage.morphology import binary_fill_holes
def counter_clockwise_order(a, DEBUG_PLOT=False):
b = a-a.min(0)
d = pdist(b).min()
c = np.round(2*b/d).astype(int)
img = np.zeros(c.max(0)[::-1]+1, dtype=np.uint8)
d1,d2 = cKDTree(c).query(c,k=3)
b = c[d2]
p1,p2,p3 = b[:,0],b[:,1],b[:,2]
for i in range(len(b)):
cv2.line(img,tuple(p1[i]),tuple(p2[i]),255,1)
cv2.line(img,tuple(p1[i]),tuple(p3[i]),255,1)
img = (binary_fill_holes(img==255)*255).astype(np.uint8)
if int(cv2.__version__.split('.')[0])>=3:
_,contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
else:
contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cont = contours[0][:,0]
f1,f2 = cKDTree(cont).query(c,k=1)
ordered_points = a[f2.argsort()[::-1]]
if DEBUG_PLOT==1:
NPOINTS = len(ordered_points)
for i in range(NPOINTS):
plt.plot(ordered_points[i:i+2,0],ordered_points[i:i+2,1],alpha=float(i)/(NPOINTS-1),color='k')
plt.show()
return ordered_points
Sample run -
# Load data in a 2D array with 2 columns
a = np.loadtxt('random_shape.csv',delimiter=' ')
ordered_a = counter_clockwise_order(a, DEBUG_PLOT=1)
Output -

Self Avoiding Random Walk in a 3d lattice using pivot algorithm in python

I was working on a problem from past few days, It is related to creating a self avoiding random walks using pivot algorithm and then to implement another code which places spherical inclusions in a 3d lattice. I have written two codes one code generates coordinates of the spheres by using Self avoiding random walk algorithm and then another code uses the co-ordinates generated by the first program and then creates a 3d lattice in abaqus with spherical inclusions in it.
The result which will be generated after running the codes:
Now the zoomed in portion (marked in red boundary)
Problem: The spheres generated are merging with each other, How to avoid this, i ran out of ideas. I just need some direction or algorithm to work on.
The code which i wrote is as follows:
code 1: generates the coordinates of the spheres
# Functions of the code: 1) This code generates the coordinates for the spherical fillers using self avoiding random walk which was implemented by pivot algorithm
# Algorithm of the code: 1)Prepare an initial configuration of a N steps walk on lattice(equivalent N monomer chain)
# 2)Randomly pick a site along the chain as pivot site
# 3)Randomly pick a side(right to the pivot site or left to it), the chain on this side is used for the next step.
# 4)Randomly apply a rotate operation on the part of the chain we choose at the above step.
# 5)After the rotation, check the overlap between the rotated part of the chain and the rest part of the chain.
# 6)Accept the new configuration if there is no overlap and restart from 2th step.
# 7)Reject the configuration and repeat from 2th step if there are overlaps.
################################################################################################################################################################
# Modules to be imported are below
import numpy as np
import timeit
from scipy.spatial.distance import cdist
import math
import random
import sys
import ast
# define a dot product function used for the rotate operation
def v_dot(a):return lambda b: np.dot(a,b)
def distance(x, y): #The Euclidean Distance to Spread the spheres initiallly
if len(x) != len(y):
raise ValueError, "vectors must be same length"
sum = 0
for i in range(len(x)):
sum += (x[i]-y[i])**2
return math.sqrt(sum)
x=1/math.sqrt(2)
class lattice_SAW: # This is the class which creates the self avoiding random walk coordinates
def __init__(self,N,l0):
self.N = N #No of spheres
self.l0 = l0 #distance between spheres
# initial configuration. Usually we just use a straight chain as inital configuration
self.init_state = np.dstack((np.arange(N),np.zeros(N),np.zeros(N)))[0] #initially set all the coordinates to zeros
self.state = self.init_state.copy()
# define a rotation matrix
# 9 possible rotations: 3 axes * 3 possible rotate angles(45,90,135,180,225,270,315)
self.rotate_matrix = np.array([[[1,0,0],[0,0,-1],[0,1,0]],[[1,0,0],[0,-1,0],[0,0,-1]]
,[[1,0,0],[0,0,1],[0,-1,0]],[[0,0,1],[0,1,0],[-1,0,0]]
,[[-1,0,0],[0,1,0],[0,0,-1]],[[0,0,-1],[0,1,0],[-1,0,0]]
,[[0,-1,0],[1,0,0],[0,0,1]],[[-1,0,0],[0,-1,0],[0,0,1]]
,[[0,1,0],[-1,0,0],[0,0,1]],[[x,-x,0],[x,x,0],[0,0,1]]
,[[1,0,0],[0,x,-x],[0,x,x]]
,[[x,0,x],[0,1,0],[-x,0,x]],[[-x,-x,0],[x,-x,0],[0,0,1]]
,[[1,0,0],[0,-x,-x],[0,x,-x]],[[-x,0,x],[0,1,0],[-x,0,-x]]
,[[-x,x,0],[-x,-x,0],[0,0,1]],[[1,0,0],[0,-x,x],[0,-x,-x]]
,[[-x,0,-x],[0,1,0],[x,0,-x]],[[x,x,0],[-x,x,0],[0,0,1]]
,[[1,0,0],[0,x,x],[0,-x,x]],[[x,0,-x],[0,1,0],[x,0,x]]])
# define pivot algorithm process where t is the number of successful steps
def walk(self,t): #this definitions helps to start walking in 3d
acpt = 0
# while loop until the number of successful step up to t
while acpt <= t:
pick_pivot = np.random.randint(1,self.N-1) # pick a pivot site
pick_side = np.random.choice([-1,1]) # pick a side
if pick_side == 1:
old_chain = self.state[0:pick_pivot+1]
temp_chain = self.state[pick_pivot+1:]
else:
old_chain = self.state[pick_pivot:] # variable to store the coordinates of the old chain
temp_chain = self.state[0:pick_pivot]# for the temp chain
# pick a symmetry operator
symtry_oprtr = self.rotate_matrix[np.random.randint(len(self.rotate_matrix))]
# new chain after symmetry operator
new_chain = np.apply_along_axis(v_dot(symtry_oprtr),1,temp_chain - self.state[pick_pivot]) + self.state[pick_pivot]
# use cdist function of scipy package to calculate the pair-pair distance between old_chain and new_chain
overlap = cdist(new_chain,old_chain) #compare the old chain and the new chain to check if the new chain overlaps on any previous postion
overlap = overlap.flatten() # just to combine the coordinates in a list which will help to check the overlap
# determinte whether the new state is accepted or rejected
if len(np.nonzero(overlap)[0]) != len(overlap):
continue
else:
if pick_side == 1:
self.state = np.concatenate((old_chain,new_chain),axis=0)
elif pick_side == -1:
self.state = np.concatenate((new_chain,old_chain),axis=0)
acpt += 1
# place the center of mass of the chain on the origin, so then we can translate these coordinates to the initial spread spheres
self.state = self.l0*(self.state - np.int_(np.mean(self.state,axis=0)))
#now write the coordinates of the spheres in a file
sys.stdout = open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions.txt", "w")
n = 30
#text_file.write("Number of Spheres: " + str(n) + "\n")
#Radius of one sphere: is calculated based on the volume fraction here it 2 percent
r = (3*0.40)/(4*math.pi*n)
#print r
r = r**(1.0/3.0)
#print "Sphere Radius is " + str(r)
sphereList = [] # list to maintain track of the Spheres using their positions
sphereInstancesList = [] # to maintain track of the instances which will be used to cut and or translate the spheres
sphere_pos=[]
#Create n instances of the sphere
#After creating n instances of the sphere we trie to form cluster around each instance of the sphere
for i in range(1, n+1):
InstanceName = 'Sphere_' + str(i) # creating a name for the instance of the sphere
#print InstanceName
#text_file.write(InstanceName)
#Maximum tries to distribute sphere
maxTries = 10000000
while len(sphereList) < i:
maxTries -= 1
if maxTries < 1:
print "Maximum Distribution tries exceded. Error! Restart the Script!"
break;
# Make sure Spheres dont cut cube sides: this will place the spheres well inside the cube so that they does'nt touch the sides of the cube
# This is done to avoid the periodic boundary condition: later in the next versions it will be used
vecPosition = [(2*r)+(random.random()*(10.0-r-r-r)),(2*r)+(random.random()*(10.0-r-r-r)),(2*r)+(random.random()*(10.0-r-r-r))]
sphere_pos.append(vecPosition)
for pos in sphereList:
if distance(pos, vecPosition) < 2*r: # checking whether the spheres collide or not
break
else:
sphereList.append(vecPosition)
print vecPosition
#text_file.write(str(vecPosition) + "\n")
cluster_Size=[10,12,14,16,18,20,22,24,26,28,30] #list to give the random number of spheres which forms a cluster
for i in range(1,n+1):
Number_of_Spheres_Clustered=random.choice(cluster_Size) #selecting randomly from the list cluster_Size
radius_sphr=2*r #Distance between centers of the spheres
pivot_steps=1000 # for walking the max number of steps
chain = lattice_SAW(Number_of_Spheres_Clustered,radius_sphr) #initializing the object
chain.walk(pivot_steps) # calling the walk function to walk in the 3d space
co_ordinates=chain.state # copying the coordinates into a new variable for processing and converting the coordinates into lists
for c in range(len(co_ordinates)):
temp_cds=co_ordinates[c]
temp_cds=list(temp_cds)
for k in range(len(temp_cds)):
temp_cds[k]=temp_cds[k]+sphere_pos[i-1][k]
#text_file.write(str(temp_cds) + "\n")
print temp_cds #sys.stdout redirected into the file which stores the coordinates as lists
sys.stdout.flush()
f2=open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions.txt", "r")
remove_check=[]
for line in f2:
temp_check=ast.literal_eval(line)
if (temp_check[0]>10 or temp_check[0]<-r or temp_check[1]>10 or temp_check[1]<-r or temp_check[2]>10 or temp_check[2]<-r):
remove_check.append(str(temp_check))
f2.close()
flag=0
f2=open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions.txt", "r")
f3=open("C:\Users\haris\Desktop\CAME_MINE\HIWI\Codes\Temp\Sphere_Positions_corrected.txt", "w")
for line in f2:
line=line.strip()
if any(line in s for s in remove_check):
flag=flag+1
else:
f3.write(line+'\n')
f3.close()
f2.close()
The other code would not be required because there is no geometry computation in the second code. Any help or some direction of how to avoid the collision of spheres is very helpful, Thank you all
To accommodate non-intersecting spheres with turns of 45,135,225,315 (really, only 45 and 315 are issues), you just need to make your spheres a little bit smaller. Take 3 consecutive sphere-centers, with a 45-degree turn in the middle. In the plane containing the 3 points, that makes an isosceles triangle, with a 45-degree center angle:
Note that the bottom circles (spheres) overlap. To avoid this, you shrink the radius by multiplying by a factor of 0.76:

Indexing and vectorizing a nested-loop

I am running a particular script that will calculate the fractal dimension of the input data. While the script does run fine, it is very slow, and a look into it using cProfile showed that the function boxcount is accounting for around 90% of the run time. I have had similar issues in a previous questions,More efficient way to loop?, and Vectorization of a nested for-loop. While looking at cProfile, the function itself does not run slow, but in the script is needs to be called a large number of times. I'm struggling to find a way to re-write this to eliminate the large number of function calls. Here is the code below:
for j in range(starty, endy):
jmin=j-half_tile
jmax=j+half_tile+1
# Loop over columns
for i in range(startx, endx):
imin=i-half_tile
imax=i+half_tile+1
# Extract a subset of points from the input grid, centered on the current
# point. The size of tile is given by the current entry of the tile list.
z = surface[imin:imax, jmin:jmax]
# print 'Tile created. Size:', z.shape
# Calculate fractal dimension of the tile using 3D box-counting
fd, intercept = boxcount(z,dx,nside,cell,slice_size,box_size)
FractalDim[i,j] = fd
Lacunarity[i,j] = intercept
My real problem is that for each loop through i,j, it finds the values of imin,imax,jmin,jmax, which is basically creating a subset of the input data, centered around the values of imin,imax,jmin,jmax. The function of interest, boxcount is evaluated over the range of imin,imax,jmin,jmax as well. For this example, the value of half_tile is 6, and the values for starty,endy,startx,endx are 6,271,5,210 respectively. The values of dx,cell,nside,slice_size,box_size are all just constants used in the boxcount function.
I have done problems similar to this, just not with the added complication of centering the slice of data around a particular point. Can this be vectorized? or improved at all?
EDIT
Here is the code for the function boxcount as requested.
def boxcount(z,dx,nside,cell,slice_size,box_size):
# fractal dimension calculation using box-counting method
n = 5 # number of graph points for simple linear regression
gx = [] # x coordinates of graph points
gy = [] # y coordinates of graph points
boxCount = np.zeros((5))
cell_set = np.reshape(np.zeros((5*(nside**3))), (nside**3,5))
nslice=nside**2
# Box is centered at the mid-point of the tile. Calculate for each point in the
# tile, which voxel the contains the point
z0 = z[nside/2,nside/2]-dx*nside/2
for j in range(1,13):
for i in range(1,13):
ij = (j-1)*12 + i
# print 'i, j:', i, j
delz1 = z[i-1,j-1]-z0
delz2 = z[i-1,j]-z0
delz3 = z[i,j-1]-z0
delz4 = z[i,j]-z0
delz = 0.25*(delz1+delz2+delz3+delz4)
if delz < 0.0:
break
slice = ceil(delz)
# print " delz:",delz," slice:",slice
# Identify the voxel occupied by current point
ijk = int(slice-1.)*nslice + (j-1)*nside + i
for k in range(5):
if cell_set[cell[ijk,k],k] != 1:
cell_set[cell[ijk,k],k] = 1
# Set any cells deeper than this one equal to one aswell
# index = cell[ijk,k]
# for l in range(int(index),box_size[k],slice_size[k]):
# cell_set[l,k] = 1
# Count number of filled boxes for each box size
boxCount = np.sum(cell_set,axis=0)
# print "boxCount:", boxCount
for ib in range(1,n+1):
# print "ib:",ib," x(ib):",math.log(1.0/ib)," y(ib):",math.log(boxCount[ib-1])
gx.append( math.log(1.0/ib) )
gy.append( math.log(boxCount[ib-1]) )
# simple linear regression
m, b = np.polyfit(gx,gy,1)
# print "Polyfit: Slope:", m,' Intercept:', b
# fd = m-1
fd = max(2.,m)
return(fd,b)

python - combining argsort with masking to get nearest values in moving window

I have some code for calculating missing values in an image, based on neighbouring values in a 2D circular window. It also uses the values from one or more temporally-adjacent images at the same locations (i.e. the same 2D window shifted in the 3rd dimension).
For each position that is missing, I need to calculate the value based not necessarily on all the values available in the whole window, but only on the spatially-nearest n cells that do have values (in both images / Z-axis positions), where n is some value less than the total number of cells in the 2D window.
At the minute, it's much quicker to calculate for everything in the window, because my means of sorting to get the nearest n cells with data is the slowest part of the function as it has to be repeated each time even though the distances in terms of window coordinates do not change. I'm not sure this is necessary and feel I must be able to get the sorted distances once, and then mask those in the process of only selecting available cells.
Here's my code for selecting the data to use within a window of the gap cell location:
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will go through every gap (zero) location in the data
# for each image and for each gap use slicing to get a window of
# size (radius*2+1, radius*2+1) around it from each image, with the
# gap being at the centre i.e.
# testgaplocation = [radius, radius]
# and the variables testdata, alternatedata below will refer to these
# slices
locations_to_exclude = np.logical_or(circle_template, np.logical_or
(testdata==0, alternatedata==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
available_distances = dist[locations_to_include]
available_data = testdata[locations_to_include]
available_alternates = alternatedata[locations_to_include]
if number_available > number_required:
# In this case we need to find the closest number_required of elements, based
# on distances recorded in dist, from available_data and available_alternates
# Having to repeat this argsort for each gap cell calculation is slow and feels
# like it should be avoidable
sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None)
requiredIndices = sortedDistanceIndices[0:number_required]
selected_data = np.take(available_data, requiredIndices)
selected_alternates = np.take(available_alternates , requiredIndices)
else:
# we just use available_data and available_alternates as they are...
# now do stuff with the selected data to calculate a value for the gap cell
This works, but over half of the total time of the function is taken in the argsort of the masked spatial distance data. (~900uS of a total 1.4mS - and this function will be running tens of billions of times, so this is an important difference!)
I am sure that I must be able to just do this argsort once outside of the function, when the spatial distance window is originally set up, and then include those sort indices in the masking, to get the first howManyToCalculate indices without having to re-do the sort. The answer might involve putting the various bits that we are extracting from, into a record array - but I can't figure out how, if so. Can anyone see how I can make this part of the process more efficient?
So you want to do the sorting outside of the loop:
sorted_dist_idcs = dist.argsort(kind='mergesort', axis=None)
Then using some variables from the original code, this is what I could come up with, though it still feels like a major round-trip..
loc_to_incl_sorted = locations_to_include.take(sorted_dist_idcs)
sorted_dist_idcs_to_incl = sorted_dist_idcs[loc_to_incl_sorted]
required_idcs = sorted_dist_idcs_to_incl[:number_required]
selected_data = testdata.take(required_idcs)
selected_alternates = alternatedata.take(required_idcs)
Note the required_idcs refer to locations in the testdata and not available_data as in the original code. And this snippet I used take for the purpose of conveniently indexing the flattened array.
#moarningsun - thanks for the comment and answer. These got me on the right track, but don't quite work for me when the gap is < radius from the edge of the data: in this case I use a window around the gap cell which is "trimmed" to the data bounds. In this situation the indices reflect the "full" window and thus can't be used to select cells from the bounded window.
Unfortunately I edited that part of my code out when I clarified the original question but it's turned out to be relevant.
I've realised now that if you use argsort again on the output of argsort then you get ranks; i.e. the position that each item would have when the overall array was sorted. We can safely mask these and then take the smallest number_required of them (and do this on a structured array to get the corresponding data at the same time).
This implies another sort within the loop, but in fact we can use partitioning rather than a full sort, because all we need is the smallest num_required items. If num_required is substantially less than the number of data items then this is much faster than doing the argsort.
For example with num_required = 80 and num_available = 15000 the full argsort takes ~900µs whereas argpartition followed by index and slice to get the first 80 takes ~110µs. We still need to do the argsort to get the ranks at the outset (rather than just partitioning based on distance) in order to get the stability of the mergesort, and thus get the "right one" when distance is not unique.
My code as shown below now runs in ~610uS on real data, including the actual calculations that aren't shown here. I'm happy with that now, but there seem to be several other apparently minor factors that can have an influence on the runtime that's hard to understand.
For example putting the circle_template in the structured array along with dist, ranks, and another field not shown here, doubles the runtime of the overall function (even if we don't access circle_template in the loop!). Even worse, using np.partition on the structured array with order=['ranks'] increases the overall function runtime by almost two orders of magnitude vs using np.argpartition as shown below!
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
ranks = dist.argsort(axis=None,kind='mergesort').argsort().reshape(dist.shape)
diam = radius * 2 + 1
# putting circle_template in this array too doubles overall function runtime!
fullWindowArray = np.zeros((diam,diam),dtype=[('ranks',ranks.dtype.str),
('thisdata',dayDataStack.dtype.str),
('alternatedata',dayDataStack.dtype.str),
('dist',spatialDist.dtype.str)])
fullWindowArray['ranks'] = ranks
fullWindowArray['dist'] = dist
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will loop here to go through every gap (zero) location in the data
# for each image
gapz, gapy, gapx = 1, radius, radius
desLeft, desRight = gapx - radius, gapx + radius+1
desTop, desBottom = gapy - radius, gapy + radius+1
extentB, extentR = dataStack.shape[1:]
# handle the case where the gap is < search radius from the edge of
# the data. If this is the case, we can't use the full
# diam * diam window
dataL = max(0, desLeft)
maskL = 0 if desLeft >= 0 else abs(dataL - desLeft)
dataT = max(0, desTop)
maskT = 0 if desTop >= 0 else abs(dataT - desTop)
dataR = min(desRight, extentR)
maskR = diam if desRight <= extentR else diam - (desRight - extentR)
dataB = min(desBottom,extentB)
maskB = diam if desBottom <= extentB else diam - (desBottom - extentB)
# get the slice that we will be working within
# ranks, dist and circle are already populated
boundedWindowArray = fullWindowArray[maskT:maskB,maskL:maskR]
boundedWindowArray['alternatedata'] = alternatedata[dataT:dataB, dataL:dataR]
boundedWindowArray['thisdata'] = testdata[dataT:dataB, dataL:dataR]
locations_to_exclude = np.logical_or(boundedWindowArray['circle_template'],
np.logical_or
(boundedWindowArray['thisdata']==0,
boundedWindowArray['alternatedata']==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
data_to_use = boundedWindowArray[locations_to_include]
if number_available > number_required:
# argpartition seems to be v fast when number_required is
# substantially < data_to_use.size
# But partition on the structured array itself with order=['ranks']
# is almost 2 orders of magnitude slower!
reqIndices = np.argpartition(data_to_use['ranks'],number_required)[:number_required]
data_to_use = np.take(data_to_use,reqIndices)
else:
# we just use available_data and available_alternates as they are...
pass
# now do stuff with the selected data to calculate a value for the gap cell

Categories