Generating multiple random (x, y) coordinates, excluding duplicates? - python

I want to generate a bunch (x, y) coordinates from 0 to 2500 that excludes points that are within 200 of each other without recursion.
Right now I have it check through a list of all previous values to see if any are far enough from all the others. This is really inefficient and if I need to generate a large number of points it takes forever.
So how would I go about doing this?

This is a variant on Hank Ditton's suggestion that should be more efficient time- and memory-wise, especially if you're selecting relatively few points out of all possible points. The idea is that, whenever a new point is generated, everything within 200 units of it is added to a set of points to exclude, against which all freshly-generated points are checked.
import random
radius = 200
rangeX = (0, 2500)
rangeY = (0, 2500)
qty = 100 # or however many points you want
# Generate a set of all points within 200 of the origin, to be used as offsets later
# There's probably a more efficient way to do this.
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
if x*x + y*y <= radius*radius:
randPoints = []
excluded = set()
i = 0
while i<qty:
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
if (x,y) in excluded: continue
i += 1
excluded.update((x+dx, y+dy) for (dx,dy) in deltas)
print randPoints

I would overgenerate the points, target_N < input_N, and filter them using a KDTree. For example:
import numpy as np
from scipy.spatial import KDTree
N = 20
pts = 2500*np.random.random((N,2))
tree = KDTree(pts)
print tree.sparse_distance_matrix(tree, 200)
Would give me points that are "close" to each other. From here it should be simple to apply any filter:
(11, 0) 60.843426339
(0, 11) 60.843426339
(1, 3) 177.853472309
(3, 1) 177.853472309

Some options:
Use your algorithm but implement it with a kd-tree that would speed up nearest neighbours look-up
Build a regular grid over the [0, 2500]^2 square and 'shake' all points randomly with a bi-dimensional normal distribution centered on each intersection in the grid
Draw a larger number of random points then apply a k-means algorithm and only keep the centroids. They will be far away from one another and the algorithm, though iterative, could converge more quickly than your algorithm.

This has been answered, but it's very tangentially related to my work so I took a stab at it. I implemented the algorithm described in this note which I found linked from this blog post. Unfortunately it's not faster than the other proposed methods, but I'm sure there are optimizations to be made.
import numpy as np
import matplotlib.pyplot as plt
def lonely(p,X,r):
m = X.shape[1]
x0,y0 = p
x = y = np.arange(-r,r)
x = x + x0
y = y + y0
u,v = np.meshgrid(x,y)
u[u < 0] = 0
u[u >= m] = m-1
v[v < 0] = 0
v[v >= m] = m-1
return not np.any(X[u[:],v[:]] > 0)
def generate_samples(m=2500,r=200,k=30):
# m = extent of sample domain
# r = minimum distance between points
# k = samples before rejection
active_list = []
# step 0 - initialize n-d background grid
X = np.ones((m,m))*-1
# step 1 - select initial sample
x0,y0 = np.random.randint(0,m), np.random.randint(0,m)
X[active_list[0]] = 1
# step 2 - iterate over active list
while active_list:
i = np.random.randint(0,len(active_list))
rad = np.random.rand(k)*r+r
theta = np.random.rand(k)*2*np.pi
# get a list of random candidates within [r,2r] from the active point
candidates = np.round((rad*np.cos(theta)+active_list[i][0], rad*np.sin(theta)+active_list[i][1])).astype(np.int32).T
# trim the list based on boundaries of the array
candidates = [(x,y) for x,y in candidates if x >= 0 and y >= 0 and x < m and y < m]
for p in candidates:
if X[p] < 0 and lonely(p,X,r):
X[p] = 1
del active_list[i]
return X
X = generate_samples(2500, 200, 10)
s = np.where(X>0)
And the results:

Per the link, the method from aganders3 is known as Poisson Disc Sampling. You might be able to find more efficient implementations that use a local grid search to find 'overlaps.' For example Poisson Disc Sampling. Because you are constraining the system, it cannot be completely random. The maximum packing for circles with uniform radii in a plane is ~90% and is achieved when the circles are arranged in a perfect hexagonal array. As the number of points you request approaches the theoretical limit, the generated arrangement will become more hexagonal. In my experience, it is difficult to get above ~60% packing with uniform circles using this approach.

the following method uses list comprehension, but I am generating integers you can use different random generators for different datatypes
arr = [[random.randint(-4, 4), random.randint(-4, 4)] for i in range(40)]


Order 2d points based on distance from each other [duplicate]

I have a list of (x,y)-coordinates that represent a line skeleton.
The list is obtained directly from a binary image:
import numpy as np
Now the points in the list are sorted according to their position in the image along one of the axes.
I would like to sort the list such that the order represents a smooth path along the line. (This is currently not the case where the line curves back).
Subsequently, I want to fit a spline to these points.
A similar problem has been described and solved using arcPy here. Is there a convenient way to achieve this using python, numpy, scipy, openCV (or another library?)
below is an example image. it results in a list of 59 (x,y)-coordinates.
when I send the list to scipy's spline fitting routine, I am running into a problem because the points aren't 'ordered' on the line:
I apologize for the long answer in advance :P (the problem is not that simple).
Lets start by rewording the problem. Finding a line that connects all the points, can be reformulated as a shortest path problem in a graph, where (1) the graph nodes are the points in the space, (2) each node is connected to its 2 nearest neighbors, and (3) the shortest path passes through each of the nodes only once. That last constrain is a very important (and quite hard one to optimize). Essentially, the problem is to find a permutation of length N, where the permutation refers to the order of each of the nodes (N is the total number of nodes) in the path.
Finding all the possible permutations and evaluating their cost is too expensive (there are N! permutations if I'm not wrong, which is too big for problems). Bellow I propose an approach that finds the N best permutations (the optimal permutation for each of the N points) and then find the permutation (from those N) that minimizes the error/cost.
1. Create a random problem with unordered points
Now, lets start to create a sample problem:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
And here, the unsorted version of the points [x, y] to simulate a random points in space connected in a line:
idx = np.random.permutation(x.size)
x = x[idx]
y = y[idx]
plt.plot(x, y)
The problem is then to order those points to recover their original order so that the line is plotted properly.
2. Create 2-NN graph between nodes
We can first rearrange the points in a [N, 2] array:
points = np.c_[x, y]
Then, we can start by creating a nearest neighbour graph to connect each of the nodes to its 2 nearest neighbors:
from sklearn.neighbors import NearestNeighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
G is a sparse N x N matrix, where each row represents a node, and the non-zero elements of the columns the euclidean distance to those points.
We can then use networkx to construct a graph from this sparse matrix:
import networkx as nx
T = nx.from_scipy_sparse_matrix(G)
3. Find shortest path from source
And, here begins the magic: we can extract the paths using dfs_preorder_nodes, which will essentially create a path through all the nodes (passing through each of them exactly once) given a starting node (if not given, the 0 node will be selected).
order = list(nx.dfs_preorder_nodes(T, 0))
xx = x[order]
yy = y[order]
plt.plot(xx, yy)
Well, is not too bad, but we can notice that the reconstruction is not optimal. This is because the point 0 in the unordered list lays in the middle of the line, that is way it first goes in one direction, and then comes back and finishes in the other direction.
4. Find the path with smallest cost from all sources
So, in order to obtain the optimal order, we can just get the best order for all the nodes:
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(points))]
Now that we have the optimal path starting from each of the N = 100 nodes, we can discard them and find the one that minimizes the distances between the connections (optimization problem):
mindist = np.inf
minidx = 0
for i in range(len(points)):
p = paths[i] # order of nodes
ordered = points[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
The points are ordered for each of the optimal paths, and then a cost is computed (by calculating the euclidean distance between all pairs of points i and i+1). If the path starts at the start or end point, it will have the smallest cost as all the nodes will be consecutive. On the other hand, if the path starts at a node that lies in the middle of the line, the cost will be very high at some point, as it will need to travel from the end (or beginning) of the line to the initial position to explore the other direction. The path that minimizes that cost, is the path starting in an optimal point.
opt_order = paths[minidx]
Now, we can reconstruct the order properly:
xx = x[opt_order]
yy = y[opt_order]
plt.plot(xx, yy)
One possible solution is to use a nearest neighbours approach, possible by using a KDTree. Scikit-learn has an nice interface. This can then be used to build a graph representation using networkx. This will only really work if the line to be drawn should go through the nearest neighbours:
from sklearn.neighbors import KDTree
import numpy as np
import networkx as nx
G = nx.Graph() # A graph to hold the nearest neighbours
X = [(0, 1), (1, 1), (3, 2), (5, 4)] # Some list of points in 2D
tree = KDTree(X, leaf_size=2, metric='euclidean') # Create a distance tree
# Now loop over your points and find the two nearest neighbours
# If the first and last points are also the start and end points of the line you can use X[1:-1]
for p in X
dist, ind = tree.query(p, k=3)
print ind
# ind Indexes represent nodes on a graph
# Two nearest points are at indexes 1 and 2.
# Use these to form edges on graph
# p is the current point in the list
n1, l1 = X[ind[0][1]], dist[0][1] # The next nearest point
n2, l2 = X[ind[0][2]], dist[0][2] # The following nearest point
G.add_edge(p, n1)
G.add_edge(p, n2)
print G.edges() # A list of all the connections between points
print nx.shortest_path(G, source=(0,1), target=(5,4))
>>> [(0, 1), (1, 1), (3, 2), (5, 4)] # A list of ordered points
Update: If the start and end points are unknown and your data is reasonably well separated, you can find the ends by looking for cliques in the graph. The start and end points will form a clique. If the longest edge is removed from the clique it will create a free end in the graph which can be used as a start and end point. For example, the start and end points in this list appear in the middle:
X = [(0, 1), (0, 0), (2, 1), (3, 2), (9, 4), (5, 4)]
After building the graph, now its a case of removing the longest edge from the cliques to find the free ends of the graph:
def find_longest_edge(l):
e1 = G[l[0]][l[1]]['weight']
e2 = G[l[0]][l[2]]['weight']
e3 = G[l[1]][l[2]]['weight']
if e2 < e1 > e3:
return (l[0], l[1])
elif e1 < e2 > e3:
return (l[0], l[2])
elif e1 < e3 > e2:
return (l[1], l[2])
end_cliques = [i for i in list(nx.find_cliques(G)) if len(i) == 3]
edge_lengths = [find_longest_edge(i) for i in end_cliques]
edges = G.edges()
start_end = [n for n,nbrs in G.adjacency_iter() if len(nbrs.keys()) == 1]
print nx.shortest_path(G, source=start_end[0], target=start_end[1])
>>> [(0, 0), (0, 1), (2, 1), (3, 2), (5, 4), (9, 4)] # The correct path
I had the exact same problem. If you have two arrays of scattered x and y values that are not too curvy, then you can transform the points into PCA space, sort them in PCA space, and then transform them back. (I've also added in some bonus smoothing functionality).
import numpy as np
from scipy.signal import savgol_filter
from sklearn.decomposition import PCA
def XYclean(x,y):
xy = np.concatenate((x.reshape(-1,1), y.reshape(-1,1)), axis=1)
# make PCA object
pca = PCA(2)
# fit on data
#transform into pca space
xypca = pca.transform(xy)
newx = xypca[:,0]
newy = xypca[:,1]
indexSort = np.argsort(x)
newx = newx[indexSort]
newy = newy[indexSort]
#add some more points (optional)
f = interpolate.interp1d(newx, newy, kind='linear')
newX=np.linspace(np.min(newx), np.max(newx), 100)
newY = f(newX)
#smooth with a filter (optional)
window = 43
newY = savgol_filter(newY, window, 2)
#return back to old coordinates
xyclean = pca.inverse_transform(np.concatenate((newX.reshape(-1,1), newY.reshape(-1,1)), axis=1) )
yc = xyclean[:,1]
return xc, yc
I agree with Imanol_Luengo Imanol Luengo's solution, but if you know the index of the first point, then there is a considerably easier solution that uses only NumPy:
def order_points(points, ind):
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
This approach appears to work well with the sine curve example, especially because it is easy to define the first point as either the leftmost or rightmost point.
For the img_skeleton data cited in the question, it would be similarly easy to algorithmically obtain the first point, for example as the topmost point.
# create sine curve:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
# shuffle the order of the x and y coordinates:
idx = np.random.permutation(x.size)
xs,ys = x[idx], y[idx] # shuffled points
# find the leftmost point:
ind = xs.argmin()
# assemble the x and y coordinates into a list of (x,y) tuples:
points = [(xx,yy) for xx,yy in zip(xs,ys)]
# order the points based on the known first point:
points_new = order_points(points, ind)
# plot:
fig,ax = plt.subplots(1, 2, figsize=(10,4))
xn,yn = np.array(points_new).T
ax[0].plot(xs, ys) # original (shuffled) points
ax[1].plot(xn, yn) # new (ordered) points
I am working on a similar problem, but it has an important constraint (much like the example given by the OP) which is that each pixel has either one or two neighboring pixel, in the 8-connected sense. With this constraint, there is a very simple solution.
def sort_to_form_line(unsorted_list):
Given a list of neighboring points which forms a line, but in random order,
sort them to the correct order.
IMPORTANT: Each point must be a neighbor (8-point sense)
to a least one other point!
sorted_list = [unsorted_list.pop(0)]
while len(unsorted_list) > 0:
i = 0
while i < len(unsorted_list):
if are_neighbours(sorted_list[0], unsorted_list[i]):
#neighbours at front of list
sorted_list.insert(0, unsorted_list.pop(i))
elif are_neighbours(sorted_list[-1], unsorted_list[i]):
#neighbours at rear of list
i = i+1
return sorted_list
def are_neighbours(pt1, pt2):
Check if pt1 and pt2 are neighbours, in the 8-point sense
pt1 and pt2 has integer coordinates
return (np.abs(pt1[0]-pt2[0]) < 2) and (np.abs(pt1[1]-pt2[1]) < 2)
Modifying upon Toddp's answer , you can find end-points of arbitrarily shaped lines using this code and then order the points as Toddp stated, this is much faster than Imanol Luengo's answer, the only constraint is that the line must have only 2 end-points :
def order_points(points):
if isinstance(points,np.ndarray):
assert points.shape[1]==2
points = points.tolist()
exts = get_end_points(points)
assert len(exts) ==2
ind = points.index(exts[0])
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
def get_end_points(ptsxy):
#source :
if isinstance(ptsxy,list): ptsxy = np.array(ptsxy)
assert ptsxy.shape[1]==2
#translate to (0,0)for faster excution
xx,yy,w,h = cv2.boundingRect(ptsxy)
pts_translated = ptsxy -(xx,yy)
bim = np.zeros((h+1,w+1))
extremes = []
for p in pts_translated:
x = p[0]
y = p[1]
n = 0
n += bim[y - 1,x]
n += bim[y - 1,x - 1]
n += bim[y - 1,x + 1]
n += bim[y,x - 1]
n += bim[y,x + 1]
n += bim[y + 1,x]
n += bim[y + 1,x - 1]
n += bim[y + 1,x + 1]
n /= 255
if n == 1:
extremes = np.array(extremes)+(xx,yy)
return extremes.tolist()

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a
Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.
I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])

Difference between two methods of random point generation

In order to do a monte carlo simulation to estimate expected distance between two random points in $n$ dimensional space I discovered the following two similar looking methods to generate random points seem to differ. I'm not able to figure out why.
Method 1:
def expec_distance1(n, N = 10000):
u = uniform(0,1)
dist = 0
for i in range(N):
x = np.array([u.rvs() for i in range(n)])
y = np.array([u.rvs() for i in range(n)])
dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
return dist
Method 2:
def expec_distance2(n, N = 10000):
u = uniform(0,1)
dist = 0
for i in range(N):
x = u.rvs(n)
y = u.rvs(n)
dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
return dist
where uniform distribution is scipy.stats.uniform and np stands for numpy.
For 100 runs of the two methods (for n = 2), with method 1, I get $\mu = 0.53810011995126483, \sigma = 0.13064091613389378$
with method 2, $\mu = 0.52155615672453093, \sigma = 0.0023768774304696902$
Why is there such a big difference between std dev of two methods?
Here is the code to try:
(I've replaced scipy with numpy, cause its faster but it has the same difference between std dev)
In Python 2, list comprehensions leak their loop variables.
Since you're looping over i in your list comprehensions ([u.rvs() for i in range(n)]), that is the i used in dist = (dist*i + euclidean_dist(x,y))/(i+1.0). (i always equals n-1 rather than the value of the main loop variable.)

Matrix vector multiplication where the vector has been interpolated - Python

I have used the finite element method to approximate the laplace equation and thus have turned it into a matrix system AU = F where A is the stiffness vector and solved for U (not massively important for my question).
I have now got my approximation U, which when i find AU i should get the vector F (or at least similar) where F is:
AU gives the following plot for x = 0 to x = 1 (say, for 20 nodes):
I then need to interpolate U to a longer vector and find AU (for a bigger A too, but not interpolating that). I interpolate U by the following:
U_inter = interp1d(x,U)
U_rich = U_inter(longer_x)
which seems to work okay until i multiply it with the longer A matrix:
It seems each spike is at a node of x (i.e. the nodes of the original U). Does anybody know what could be causing this? The following is my code to find A, U and F.
import numpy as np
import math
import scipy
from scipy.sparse import diags
import scipy.sparse.linalg
from scipy.interpolate import interp1d
import matplotlib
import matplotlib.pyplot as plt
def Poisson_Stiffness(x0):
"""Finds the Poisson equation stiffness matrix with any non uniform mesh x0"""
x0 = np.array(x0)
N = len(x0) - 1 # The amount of elements; x0, x1, ..., xN
h = x0[1:] - x0[:-1]
a = np.zeros(N+1)
a[1:-1] = 1/h[1:] + 1/h[:-1]
a[-1] = 1/h[-1]
b = -1/h
c = -1/h
data = [a.tolist(), b.tolist(), c.tolist()]
Positions = [0, 1, -1]
Stiffness_Matrix = diags(data, Positions, (N+1,N+1))
return Stiffness_Matrix
def NodalQuadrature(x0):
"""Finds the Nodal Quadrature Approximation of sin(pi x)"""
x0 = np.array(x0)
h = x0[1:] - x0[:-1]
N = len(x0) - 1
approx = np.zeros(len(x0))
for i in range(1,N):
approx[i] = math.sin(math.pi*x0[i])
approx[i] = (approx[i]*h[i-1] + approx[i]*h[i])/2
return approx
def Solver(x0):
Stiff_Matrix = Poisson_Stiffness(x0)
NodalApproximation = NodalQuadrature(x0)
NodalApproximation[0] = 0
U = scipy.sparse.linalg.spsolve(Stiff_Matrix, NodalApproximation)
return U
x = np.linspace(0,1,10)
rich_x = np.linspace(0,1,50)
U = Solver(x)
A_rich = Poisson_Stiffness(rich_x)
U_inter = interp1d(x,U)
U_rich = U_inter(rich_x)
AUrich =
comment 1:
I added a Stiffness_Matrix = Stiffness_Matrix.tocsr() statement to avoid an efficiency warning. FE calculations are complex enough that I'll have to print out some intermediate values before I can identify what is going on.
comment 2:
plt.plot(rich_x, plots nice. The noise you get is the result of the difference between the inperpolated U_rich and the true solution: U_rich-Solver(rich_x).
comment 3:
I don't think there's a problem with your code. The problem is with idea that you can test an interpolation this way. I'm rusty on FE theory, but I think you need to use the shape functions to interpolate, not a simple linear one.
comment 4:
Intuitively, with you are asking, what kind of forcing F would produce U_rich. Compared to Solver(rich_x), U_rich has flat spots, regions where it's value is less than the true solution. What F would produce that? One that is spiky, with NodalQuadrature(x) at the x points, but near zero values in between. That's what your plot is showing.
A higher order interpolation will eliminate the flat spots, and produce a smoother back calculated F. But you really need to revisit the FE theory.
You might find it instructive to look at
plt.plot(rich_x, NodalQuadrature(rich_x))
The second plot is much smoother, but only about 1/5 as high.
Better yet look at:
plt.plot(rich_x,AUrich,'-*') # the spikes
plt.plot(x,NodalQuadrature(x),'o') # original forcing
plt.plot(rich_x, NodalQuadrature(rich_x),'+') # new forcing
In the model the forcing isn't continuous, it is a value at each node. With more nodes (rich_x) the magnitude at each node is less.

Can I vectorise this python code?

I have written this python code to get neighbours of a label (a set of pixels sharing some common properties). The neighbours for a label are defined as the other labels that lie on the other side of the boundary (the neighbouring labels share a boundary). So, the code I wrote works but is extremely slow:
# segments: It is a 2-dimensional numpy array (an image really)
# where segments[x, y] = label_index. So each entry defines the
# label associated with a pixel.
# i: The label whose neighbours we want.
def get_boundaries(segments, i):
neighbors = []
for y in range(1, segments.shape[1]):
for x in range(1, segments.shape[0]):
# Check if current index has the label we want
if segments[x-1, y] == i:
# Check if neighbour in the x direction has
# a different label
if segments[x-1, y] != segments[x, y]:
# Check if neighbour in the y direction has
# a different label
if segments[x, y-1] == i:
if segments[x, y-1] != segments[x, y]:
neighbors.append(segments[x, y])
return np.unique(np.asarray(neighbors))
As you can imagine, I have probably completely misused python here. I was wondering if there is a way to optimize this code to make it more pythonic.
Here you go:
def get_boundaries2(segments, i):
x, y = np.where(segments == i) # where i is
right = x + 1
rightMask = right < segments.shape[0] # keep in bounds
down = y + 1
downMask = down < segments.shape[1]
rightNeighbors = segments[right[rightMask], y[rightMask]]
downNeighbors = segments[x[downMask], down[downMask]]
neighbors = np.union1d(rightNeighbors, downNeighbors)
return neighbors
As you can see, there are no Python loops at all; I also tried to minimize copies (the first attempt made a copy of segments with a NAN border, but then I devised the "keep in bounds" check).
Note that I did not filter out i itself from the "neighbors" here; you can add that easily at the end if you want. Some timings:
Input 2000x3000: original takes 13 seconds, mine takes 370 milliseconds (35x speedup).
Input 1000x300: original takes 643 ms, mine takes 17.5 ms (36x speedup).
You need to replace your for loops with numpy's implicit looping.
I don't know enough about your code to convert it directly, but I can give an example.
Suppose you have an array of 100000 random integers, and you need to get an array of each element divided by its neighbor.
import random, numpy as np
a = np.fromiter((random.randint(1, 100) for i in range(100000)), int)
One way to do this would be:
[a[i] / a[i+1] for i in range(len(a)-1)]
Or this, which is much faster:
a / np.roll(a, -1)
initcode = 'import random, numpy as np; a = np.fromiter((random.randint(1, 100) for i in range(100000)), int)'
timeit.timeit('[a[i] / a[i+1] for i in range(len(a)-1)]', initcode, number=100)
timeit.timeit('(a / np.roll(a, -1))', initcode, number=100)
