Binning 2D data with circles instead of rectangles - from pandas df - python

I have a dataframe of x, y data and need to bin it into circles. Ie a grid of circles of certain size and spacing centered on some point. So for example some data would be left out after this sampling/binning. How is this possible?
I have tried np.histogram2d and creating masks/broadcasting. The mask was too slow, and I don't seem able to broadcast into a circle. Only to tell if the point is within said grid of circles via this answer: Binning 2D data into overlapping circles in x,y.
If there is a way to input edges or something into histogram2d and make the edges circular please let me know. Cheers

The only way this can be done is by looping over your points and grid of circles like so:
def inside_circle(x, y, x0, y0, r):
return (x - x0)*(x - x0) + (y - y0)*(y - y0) < r*r
x_bins = np.linspace(-9, 9, 30)
y_bins = np.linspace(-9, 9, 30)
h = df['upmm']
w = df['anode_entrance']
histo = np.zeros((32,32))
for i in range(0, len(h)):
for j in range(0, len(x_bins)):
for k in range(0, len(y_bins)):
if inside_circle(h[i], w[i], x_bins[j], y_bins[k], 0.01):
histo[j][k] = histo[j][k] + 1
plt.imshow(histo, cmap='hot', interpolation='nearest')
plt.show()

Related

Draw a circle in a numpy array given index and radius without external libraries

I need to draw a circle in a 2D numpy array given [i,j] as indexes of the array, and r as the radius of the circle. Each time a condition is met at index [i,j], a circle should be drawn with that as the center point, increasing all values inside the circle by +1. I want to avoid the for-loops at the end where I draw the circle (where I use p,q to index) because I have to draw possibly millions of circles. Is there a way without for loops? I also don't want to import another library for just a single task.
Here is my current implementation:
for i in range(array_shape[0]):
for j in range(array_shape[1]):
if (condition): # Draw circle if condition is fulfilled
# Create a square of pixels with side lengths equal to radius of circle
x_square_min = i-r
x_square_max = i+r+1
y_square_min = j-r
y_square_max = j+r+1
# Clamp this square to the edges of the array so circles near edges don't wrap around
if x_square_min < 0:
x_square_min = 0
if y_square_min < 0:
y_square_min = 0
if x_square_max > array_shape[0]:
x_square_max = array_shape[0]
if y_square_max > array_shape[1]:
y_square_max = array_shape[1]
# Now loop over the box and draw circle inside of it
for p in range(x_square_min , x_square_max):
for q in range(y_square_min , y_square_max):
if (p - i) ** 2 + (q - j) ** 2 <= r ** 2:
new_array[p,q] += 1 # Incrementing because need to have possibility of
# overlapping circles
If you're using the same radius for every single circle, you can simplify things significantly by only calculating the circle coordinates once and then adding the center coordinates to the circle points when needed. Here's the code:
# The main array of values is called array.
shape = array.shape
row_indices = np.arange(0, shape[0], 1)
col_indices = np.arange(0, shape[1], 1)
# Returns xy coordinates for a circle with a given radius, centered at (0,0).
def points_in_circle(radius):
a = np.arange(radius + 1)
for x, y in zip(*np.where(a[:,np.newaxis]**2 + a**2 <= radius**2)):
yield from set(((x, y), (x, -y), (-x, y), (-x, -y),))
# Set the radius value before running code.
radius = RADIUS
circle_r = np.array(list(points_in_circle(radius)))
# Note that I'm using x as the row number and y as the column number.
# Center of circle is at (x_center, y_center). shape_0 and shape_1 refer to the main array
# so we can get rid of coordinates outside the bounds of array.
def add_center_to_circle(circle_points, x_center, y_center, shape_0, shape_1):
circle = np.copy(circle_points)
circle[:, 0] += x_center
circle[:, 1] += y_center
# Get rid of rows where coordinates are below 0 (can't be indexed)
bad_rows = np.array(np.where(circle < 0)).T[:, 0]
circle = np.delete(circle, bad_rows, axis=0)
# Get rid of rows that are outside the upper bounds of the array.
circle = circle[circle[:, 0] < shape_0, :]
circle = circle[circle[:, 1] < shape_1, :]
return circle
for x in row_indices:
for y in col_indices:
# You need to set CONDITION before running the code.
if CONDITION:
# Because circle_r is the same for all circles, it doesn't need to be recalculated all the time. All you need to do is add x and y to circle_r each time CONDITION is met.
circle_coords = add_center_to_circle(circle_r, x, y, shape[0], shape[1])
array[tuple(circle_coords.T)] += 1
When I set radius = 10, array = np.random.rand(1200).reshape(40, 30) and replaced if CONDITION with if (x == 20 and y == 20) or (x == 25 and y == 20), I got this, which seems to be what you want:
Let me know if you have any questions.
Adding each circle can be vectorized. This solution iterates over the coordinates where the condition is met. On a 2-core colab instance ~60k circles with radius 30 can be added per second.
import numpy as np
np.random.seed(42)
arr = np.random.rand(400,300)
r = 30
xx, yy = np.mgrid[-r:r+1, -r:r+1]
circle = xx**2 + yy**2 <= r**2
condition = np.where(arr > .999) # np.where(arr > .5) to benchmark 60k circles
for x,y in zip(*condition):
# valid indices of the array
i = slice(max(x-r,0), min(x+r+1, arr.shape[0]))
j = slice(max(y-r,0), min(y+r+1, arr.shape[1]))
# visible slice of the circle
ci = slice(abs(min(x-r, 0)), circle.shape[0] - abs(min(arr.shape[0]-(x+r+1), 0)))
cj = slice(abs(min(y-r, 0)), circle.shape[1] - abs(min(arr.shape[1]-(y+r+1), 0)))
arr[i, j] += circle[ci, cj]
Visualizing np.array arr
import matplotlib.pyplot as plt
plt.figure(figsize=(8,8))
plt.imshow(arr)
plt.show()

Find closest point in 2D mashed array

To give y'all some context, I'm doing this inversion technique where I am trying to reproduce a profile using the integrated values. To do that I need to find the value within an array along a certain line(s). To exemplify my issue I have the following code:
fig, ax = plt.subplots(1, figsize = (10,10))
#Create the grid (different grid spacing):
X = np.arange(0,10.01,0.25)
Y = np.arange(0,10.01,1.00)
#Create the 2D array to be plotted
Z = []
for i in range(np.size(X)):
Zaux = []
for j in range(np.size(Y)):
Zaux.append(i*j + j)
ax.scatter(X[i],Y[j], color = 'red', s = 0.25)
Z.append(Zaux)
#Mesh the 1D grids:
Ymesh, Xmesh = np.meshgrid(Y, X)
#Plot the color plot:
ax.pcolor(Y,X, Z, cmap='viridis', vmin=np.nanmin(Z), vmax=np.nanmax(Z))
#Plot the points in the grid of the color plot:
for i in range(np.size(X)):
for j in range(np.size(Y)):
ax.scatter(Y[j],X[i], color = 'red', s = 3)
#Create a set of lines:
for i in np.linspace(0,2,5):
X_line = np.linspace(0,10,256)
Y_line = i*X_line*3.1415-4
#Plot each line:
ax.plot(X_line,Y_line, color = 'blue')
ax.set_xlim(0,10)
ax.set_ylim(0,10)
plt.show()
That outputs this graph:
I need to find the closest points in Z that are being crossed by each of the lines. The idea is to integrate the values in Z that are crossed by the blue lines and plot that as a function of slope of the lines. Anyone has a good solution for it? I've tried a set of for loops, but I think it's kind of clunky.
Anyway, thanks for your time...
I am not sure about the closest points thing. That seems "clunky" too. What if it passes exactly in the middle between two points? Also I already had written code that weighs the four neighbor pixels by their closeness for an other project so I am going with that. Also I take the liberty of not rescaling the picture.
i,j = np.meshgrid(np.arange(41),np.arange(11))
Z = i*j + j
class Image_knn():
def fit(self, image):
self.image = image.astype('float')
def predict(self, x, y):
image = self.image
weights_x = [1-(x % 1), x % 1]
weights_y = [1-(y % 1), y % 1]
start_x = np.floor(x).astype('int')
start_y = np.floor(y).astype('int')
return sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
And a little sanity check it returns the picture if we give it it's coordinates.
image_model = Image_knn()
image_model.fit(Z)
assert np.allclose(image_model.predict(*np.where(np.ones(Z.shape, dtype='bool'))).reshape((11,41)), Z)
I generate m=100 lines and scale the points on them so that they are evenly spaced. Here is a plot of every 10th of them.
n = 1000
m = 100
slopes = np.linspace(1e-10,10,m)
t, slope = np.meshgrid(np.linspace(0,1,n), slopes)
x_max, y_max = Z.shape[0]-1, Z.shape[1]-1
lines_x = t
lines_y = t*slope
scales = np.broadcast_to(np.stack([x_max/lines_x[:,-1], y_max/lines_y[:,-1]]).min(axis=0), (n,m)).T
lines_x *= scales
lines_y *= scales
And finally I can get the "points" consisting of slope and "integral" and draw it. You probably should take a closer look at the "integral" it's just a ruff guess of mine.
%%timeit
points = np.array([(slope, np.mean(image_model.predict(lines_x[i],lines_y[i]))
*np.linalg.norm(np.array((lines_x[i,-1],lines_y[i,-1]))))
for i,slope in enumerate(slopes)])
plt.scatter(points[:,0],points[:,1])
Notice the %%timeit in the last block. This takes ~38.3 ms on my machine and therefore wasn't optimized. As Donald Knuth puts it "premature optimization is the root of all evil". If you were to optimize this you would remove the for loop, shove all the coordinates for line points in the model at once by reshaping and reshaping back and then organize them with the slopes. But I saw no reason to put myself threw that for a few ms.
And finally we get a nice cusp as a reward. Notice that it makes sense that the maximum is at 4 since the diagonal is at a slope of 4 for our 40 by 10 picture. The intuition for the cusp is a bit harder to explain but I guess you probably have that already. For the length it comes down to the function (x,y) -> sqrt(x^2+y^2) having different directional differentials when going up and when going left on the rectangle.

Efficiently apply function to spheric neighbourhood in numpy array

I have a 3D numpy array of float values in Python.
I need to retrieve all the elements in a sphere of radius r starting from
a center point P(x, y, z). Then, I want to apply to the sphere points a function that
updates their values and needs the distance to the center point to do this. I do these steps a lot of times and for
large radius values, so I would like to have a solution that is as efficient
as possible.
My current solution checks only the points in the bounding box of the sphere,
as indicated here: Using a QuadTree to get all points within a bounding circle.
A sketch of the code looks like this:
# P(x, y, z): center of the sphere
for k1 in range(x - r, x + r + 1):
for k2 in range(y - r, y + r + 1):
for k3 in range(z - r, z + r + 1):
# Sphere center - current point distance
dist = np.sum((np.array([k1, k2, k3]) - np.array([x, y, z])) ** 2)
if (dist <= r * r):
# computeUpdatedValue(distance, radius): function that computes the new value of the matrix in the current point
newValue = computeUpdatedValue(dist, r)
# Update the matrix
mat[k1, k2, k3] = newValue
However, I thought that applying a mask to retrive the points and, then,
update them based on distance in a vectorized manner is more efficient.
I have seen how to apply a circular kernel
(How to apply a disc shaped mask to a numpy array?),
but I do no know how to efficiently apply the function (depending on the indices) on each of the mask's elements.
EDIT: If your array is very big compared to the region you are updating, the solution below will take much more memory than necessary. You can apply the same idea but only to the region where the sphere may fall:
def updateSphereBetter(mat, center, radius):
# Find beginning and end of region of interest
center = np.asarray(center)
start = np.minimum(np.maximum(center - radius, 0), mat.shape)
end = np.minimum(np.maximum(center + radius + 1, 0), mat.shape)
# Slice region of interest
mat_sub = mat[tuple(slice(s, e) for s, e in zip(start, end))]
# Center coordinates relative to the region of interest
center_rel = center - start
# Same as before but with mat_sub and center_rel
ind = np.indices(mat_sub.shape)
ind = np.moveaxis(ind, 0, -1)
dist_squared = np.sum(np.square(ind - center_rel), axis=-1)
mask = dist_squared <= radius * radius
mat_sub[mask] = computeUpdatedValue(dist_squared[mask], radius)
Note that since mat_sub is a view of mat, updating it updates the original array, so this produces the same result as before, but with less resources.
Here is a little proof of concept. I defined computeUpdatedValue so that it shows the distance from the center, and then plotted a few "sections" of an example:
import numpy as np
import matplotlib.pyplot as plt
def updateSphere(mat, center, radius):
# Make array of all index coordinates
ind = np.indices(mat.shape)
# Compute the squared distances to each point
ind = np.moveaxis(ind, 0, -1)
dist_squared = np.sum(np.square(ind - center), axis=-1)
# Make a mask for squared distances within squared radius
mask = dist_squared <= radius * radius
# Update masked values
mat[mask] = computeUpdatedValue(dist_squared[mask], radius)
def computeUpdatedValue(dist_squared, radius):
# 1 at the center of the sphere and 0 at the surface
return np.clip(1 - np.sqrt(dist_squared) / radius, 0, 1)
mat = np.zeros((100, 60, 80))
updateSphere(mat, [50, 20, 40], 20)
plt.subplot(131)
plt.imshow(mat[:, :, 30], vmin=0, vmax=1)
plt.subplot(132)
plt.imshow(mat[:, :, 40], vmin=0, vmax=1)
plt.subplot(133)
plt.imshow(mat[:, :, 55], vmin=0, vmax=1)
Output:

Generate profiles through a 2D array at an angle without altering pixels

I'd like to plot two profiles through the highest intensity point in a 2D numpy array, which is an image of a blob (i.e. a line through the semi-major axis, and another line through the semi-minor axis). The blob is rotated at an angle theta counterclockwise from the standard x-axis and is asymmetric.
It is a 600x600 array with a max intensity of 1 (at only one pixel) that is located right at the center at (300, 300). The angle rotation from the x-axis (which then gives the location of the semi-major axis when rotated by that angle) is theta = 89.54 degrees. I do not want to use scipy.ndimage.rotate because it uses spline interpolation, and I do not want to change any of my pixel values. But I suppose a nearest-neighbor interpolation method would be okay.
I tried generating lines corresponding to the major and minor axes across the image, but the result was not right at all (the peak was far less than 1), so maybe I did something wrong. The code for this is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
def profiles_at_angle(image, axis, theta):
theta = np.deg2rad(theta)
if axis == 'major':
x_0, y_0 = 0, 300-300*np.tan(theta)
x_1, y_1 = 599, 300+300*np.tan(theta)
elif axis=='minor':
x_0, y_0 = 300-300*np.tan(theta), 599
x_1, y_1 = 300+300*np.tan(theta), -599
num = 600
x, y = np.linspace(x_0, x_1, num), np.linspace(y_0, y_1, num)
z = ndimage.map_coordinates(image, np.vstack((x,y)))
fig, axes = plt.subplots(nrows=2)
axes[0].imshow(image, cmap='gray')
axes[0].axis('image')
axes[1].plot(z)
plt.xlim(250,350)
plt.show()
profiles_at_angle(image, 'major', theta)
Did I do something obviously wrong in my code above? Or how else can I accomplish this? Thank you.
Edit: Here are some example images. Sorry for the bad quality; my browser crashed every time I tried uploading them anywhere so I had to take photos of the screen.
Figure 1: This is the result of my code above, which is clearly wrong since the peak should be at 1. I'm not sure what I did wrong though.
Figure 2: I made this plot below by just taking the profiles through the standard x and y axes, ignoring any rotation (this only looks good coincidentally because the real angle of rotation is so close to 90 degrees, so I was able to just switch the labels and get this). I want my result to look something like this, but taking the correction rotation angle into account.
Edit: It could be useful to run tests on this method using data very much like my own (it's a 2D Gaussian with nearly the same parameters):
image = np.random.random((600,600))
def generate(data_set):
xvec = np.arange(0, np.shape(data_set)[1], 1)
yvec = np.arange(0, np.shape(data_set)[0], 1)
X, Y = np.meshgrid(xvec, yvec)
return X, Y
def gaussian_func(xy, x0, y0, sigma_x, sigma_y, amp, theta, offset):
x, y = xy
a = (np.cos(theta))**2/(2*sigma_x**2) + (np.sin(theta))**2/(2*sigma_y**2)
b = -np.sin(2*theta)/(4*sigma_x**2) + np.sin(2*theta)/(4*sigma_y**2)
c = (np.sin(theta))**2/(2*sigma_x**2) + (np.cos(theta))**2/(2*sigma_y**2)
inner = a * (x-x0)**2
inner += 2*b*(x-x0)*(y-y0)
inner += c * (y-y0)**2
return (offset + amp * np.exp(-inner)).ravel()
xx, yy = generate(image)
image = gaussian_func((xx.ravel(), yy.ravel()), 300, 300, 5, 4, 1, 1.56, 0)
image = np.reshape(image, (600, 600))
This should do it for you. You just did not properly compute your lines.
theta = 65
peak = np.argwhere(image==1)[0]
x = np.linspace(peak[0]-100,peak[0]+100,1000)
y = lambda x: (x-peak[1])*np.tan(np.deg2rad(theta))+peak[0]
y_maj = np.linspace(y(peak[1]-100),y(peak[1]+100),1000)
y = lambda x: -(x-peak[1])/np.tan(np.deg2rad(theta))+peak[0]
y_min = np.linspace(y(peak[1]-100),y(peak[1]+100),1000)
del y
z_min = scipy.ndimage.map_coordinates(image, np.vstack((x,y_min)))
z_maj = scipy.ndimage.map_coordinates(image, np.vstack((x,y_maj)))
fig, axes = plt.subplots(nrows=2)
axes[0].imshow(image)
axes[0].plot(x,y_maj)
axes[0].plot(x,y_min)
axes[0].axis('image')
axes[1].plot(z_min)
axes[1].plot(z_maj)
plt.show()

Non-rectangular Arrays in Python x3

I am creating graphs in python's matplotlib using contourf and I want to know if there is anyway to set the size of the pixel size to create a "flexible" plot size. The program I am creating will have user generated data that could mean the matrix I am using to plot could be as big as [693][983] or as small as [10][10]. The problem I am having is that matplotlib/contourf defaults to a square grid which means I get a lot of "stretching" if say my matrix size is [100][700]. I want to know if I can create a flexible grid size to fix this problem. Here is my current code.
def PlotCreation2(Matrix, MinY, MaxY, X, Y, Lvl, Zeta):
fig, ax = plt.subplots()
Intervals = (MaxY - MinY) / 50
if (Intervals == 0):
MaxY = MaxY +1
MinY = MinY -1
Intervals = (MaxY - MinY) / 50
contour_levels = arange(MinY,MaxY,Intervals)
cs = ax.contourf(X, Y, Matrix, contour_levels)
cbar = plt.colorbar(cs)
plt.savefig(('plt%d.png') %(Zeta*1000+Lvl))
print(("Image number %d has been created") %Zeta)
I this case the Matrix could be anysize but the meshgrid from X and Y will always match the Matrix. Don't worry about the Lvl or Zeta variables, I just use them to name the image I save.

Categories