OpenCV-python get pixel set in between two points (line) [duplicate] - python

I want to use the LineIterator in OpenCV 3.0 using Python, is it still available with OpenCV 3.0 built for Python? It seems that the answers on the internet are all pointing to cv.InitLineIterator which is part of the cv module. I've tried importing this module but it seems like it is not included with the current build. Has it been renamed or strictly just removed?

I've solved my own problem. Line iterator seems to be unavailable in the cv2 library. Therefore, I made my own line iterator. No loops are used, so it should be pretty fast. Here is the code if anybody needs it:
def createLineIterator(P1, P2, img):
"""
Produces and array that consists of the coordinates and intensities of each pixel in a line between two points
Parameters:
-P1: a numpy array that consists of the coordinate of the first point (x,y)
-P2: a numpy array that consists of the coordinate of the second point (x,y)
-img: the image being processed
Returns:
-it: a numpy array that consists of the coordinates and intensities of each pixel in the radii (shape: [numPixels, 3], row = [x,y,intensity])
"""
#define local variables for readability
imageH = img.shape[0]
imageW = img.shape[1]
P1X = P1[0]
P1Y = P1[1]
P2X = P2[0]
P2Y = P2[1]
#difference and absolute difference between points
#used to calculate slope and relative location between points
dX = P2X - P1X
dY = P2Y - P1Y
dXa = np.abs(dX)
dYa = np.abs(dY)
#predefine numpy array for output based on distance between points
itbuffer = np.empty(shape=(np.maximum(dYa,dXa),3),dtype=np.float32)
itbuffer.fill(np.nan)
#Obtain coordinates along the line using a form of Bresenham's algorithm
negY = P1Y > P2Y
negX = P1X > P2X
if P1X == P2X: #vertical line segment
itbuffer[:,0] = P1X
if negY:
itbuffer[:,1] = np.arange(P1Y - 1,P1Y - dYa - 1,-1)
else:
itbuffer[:,1] = np.arange(P1Y+1,P1Y+dYa+1)
elif P1Y == P2Y: #horizontal line segment
itbuffer[:,1] = P1Y
if negX:
itbuffer[:,0] = np.arange(P1X-1,P1X-dXa-1,-1)
else:
itbuffer[:,0] = np.arange(P1X+1,P1X+dXa+1)
else: #diagonal line segment
steepSlope = dYa > dXa
if steepSlope:
slope = dX.astype(np.float32)/dY.astype(np.float32)
if negY:
itbuffer[:,1] = np.arange(P1Y-1,P1Y-dYa-1,-1)
else:
itbuffer[:,1] = np.arange(P1Y+1,P1Y+dYa+1)
itbuffer[:,0] = (slope*(itbuffer[:,1]-P1Y)).astype(np.int) + P1X
else:
slope = dY.astype(np.float32)/dX.astype(np.float32)
if negX:
itbuffer[:,0] = np.arange(P1X-1,P1X-dXa-1,-1)
else:
itbuffer[:,0] = np.arange(P1X+1,P1X+dXa+1)
itbuffer[:,1] = (slope*(itbuffer[:,0]-P1X)).astype(np.int) + P1Y
#Remove points outside of image
colX = itbuffer[:,0]
colY = itbuffer[:,1]
itbuffer = itbuffer[(colX >= 0) & (colY >=0) & (colX<imageW) & (colY<imageH)]
#Get intensities from img ndarray
itbuffer[:,2] = img[itbuffer[:,1].astype(np.uint),itbuffer[:,0].astype(np.uint)]
return itbuffer

Edit:
The function line from scikit-image can make the same effect and it's faster than anything we could code.
from skimage.draw import line
# being start and end two points (x1,y1), (x2,y2)
discrete_line = list(zip(*line(*start, *end)))
Also the timeit result is quite faster. So, use this.
Old "deprecated" answer:
As previous answer says, it's not implemented so you must do it yourself.
I didn't do it from scratch i just rewrote some parts of the function in a fancier and more modern way that should handle all cases correctly unlike the most voted answer that didn't work correctly for me. I took the example from here and did some cleanup and some styling.
Feel free to comment it. Also i added the clipline test like in the source code that can be found in the drawing.cpp in the source code for OpenCv 4.x
Thank you all for the references and the hard work.
def bresenham_march(img, p1, p2):
x1 = p1[0]
y1 = p1[1]
x2 = p2[0]
y2 = p2[1]
#tests if any coordinate is outside the image
if (
x1 >= img.shape[0]
or x2 >= img.shape[0]
or y1 >= img.shape[1]
or y2 >= img.shape[1]
): #tests if line is in image, necessary because some part of the line must be inside, it respects the case that the two points are outside
if not cv2.clipLine((0, 0, *img.shape), p1, p2):
print("not in region")
return
steep = math.fabs(y2 - y1) > math.fabs(x2 - x1)
if steep:
x1, y1 = y1, x1
x2, y2 = y2, x2
# takes left to right
also_steep = x1 > x2
if also_steep:
x1, x2 = x2, x1
y1, y2 = y2, y1
dx = x2 - x1
dy = math.fabs(y2 - y1)
error = 0.0
delta_error = 0.0
# Default if dx is zero
if dx != 0:
delta_error = math.fabs(dy / dx)
y_step = 1 if y1 < y2 else -1
y = y1
ret = []
for x in range(x1, x2):
p = (y, x) if steep else (x, y)
if p[0] < img.shape[0] and p[1] < img.shape[1]:
ret.append((p, img[p]))
error += delta_error
if error >= 0.5:
y += y_step
error -= 1
if also_steep: # because we took the left to right instead
ret.reverse()
return ret

I compared the 4 methods provided on this page:
Using python 2.7.6 and scikit-image 0.9.3 with some minor code changes.
Image input is via OpenCV.
A line segment (1, 76) to (867, 190)
Method 1: Sci-kit Image Line
Compute time: 0.568 ms
Number of pixels found: 867
Correct start pixel: yes
Correct end pixel: yes
Method 2: Code from #trenixjetix code
There seems to be a bug where the image width and height are flipped.
Compute time: 0.476 ms
Number of pixels found: 866
Correct start pixel: yes
Correct end pixel: no, off by 1
Method 3: Code from ROS.org
https://answers.ros.org/question/10160/opencv-python-lineiterator-returning-position-information/
Compute time: 0.433 ms (should be same as method 2)
Number of pixels found: 866
Correct start pixel: yes
Correct end pixel: no, off by 1
Method 4: Code from #mohikhsan
Compute time: 0.156 ms
Number of pixels found: 866
Correct start pixel: no, off by 1
Correct end pixel: yes
Summary:
Most accurate method: Sci-kit Image Line
Fastest method: Code from #mohikhsan
It could be nice to have a python implementation that matches the OpenCV C++ implementation?
https://github.com/opencv/opencv/blob/master/modules/imgproc/src/drawing.cpp
or uses a python generator:
https://wiki.python.org/moin/Generators

Not a fancy way to do this, but an effective and very very simple one-liner:
points_on_line = np.linspace(pt_a, pt_b, 100) # 100 samples on the line
If you want to approximately get each pixel along the way
points_on_line = np.linspace(pt_a, pt_b, np.linalg.norm(pt_a - pt_b))
(e.g. number of samples as the number of pixels between point A and point B)
For example:
pt_a = np.array([10, 11])
pt_b = np.array([45, 67])
im = np.zeros((80, 80, 3), np.uint8)
for p in np.linspace(pt_a, pt_b, np.linalg.norm(pt_a-pt_b)):
cv2.circle(im, tuple(np.int32(p)), 1, (255,0,0), -1)
plt.imshow(im)

This is not exactly an answer, but I can't add comment so I write it here.
The solution by trenixjetix is really great to cover the most 2 efficient ways to do this. I just want to give minor clarification for the scikit-image method he mentioned.
# being start and end two points (x1,y1), (x2,y2)
discrete_line = list(zip(*line(*start, *end)))
In scikit-image metric, starting and ending point of line is followed (row, col), while opencv use (x,y) coordinate, which is reversed in term of function parameters. Pay attention to that.
Add up the David's answer, I got the scikit execution time is faster than trenixjetix's function, using python 3.8. The result can vary, but almost every time scikit is faster.
trenixjetix time(ms) 0.22279999999996747
scikit-image time(ms) 0.13810000000002987

I got troubles running the skimage example from trenixjetix so I created a small wrapper function accepting points from numpy array slices, tuples or lists all the same:
from skimage.draw import line as skidline
def get_linepnts(p0, p1):
p0, p1 = np.array(p0).flatten(), np.array(p1).flatten()
return np.array(list(zip(*skidline(p0[0],p0[1], p1[0],p1[1]))))
The resulting array can be used to retrieve values from numpy arrays in the following way:
l0 = get_linepnts(p0, p1)
#if p0/p1 are in (x,y) format, then this needs to be swapped for retrieval:
vals = yournpmat[l0[:,1], l0[:,0]]

Related

Draw a circle in a numpy array given index and radius without external libraries

I need to draw a circle in a 2D numpy array given [i,j] as indexes of the array, and r as the radius of the circle. Each time a condition is met at index [i,j], a circle should be drawn with that as the center point, increasing all values inside the circle by +1. I want to avoid the for-loops at the end where I draw the circle (where I use p,q to index) because I have to draw possibly millions of circles. Is there a way without for loops? I also don't want to import another library for just a single task.
Here is my current implementation:
for i in range(array_shape[0]):
for j in range(array_shape[1]):
if (condition): # Draw circle if condition is fulfilled
# Create a square of pixels with side lengths equal to radius of circle
x_square_min = i-r
x_square_max = i+r+1
y_square_min = j-r
y_square_max = j+r+1
# Clamp this square to the edges of the array so circles near edges don't wrap around
if x_square_min < 0:
x_square_min = 0
if y_square_min < 0:
y_square_min = 0
if x_square_max > array_shape[0]:
x_square_max = array_shape[0]
if y_square_max > array_shape[1]:
y_square_max = array_shape[1]
# Now loop over the box and draw circle inside of it
for p in range(x_square_min , x_square_max):
for q in range(y_square_min , y_square_max):
if (p - i) ** 2 + (q - j) ** 2 <= r ** 2:
new_array[p,q] += 1 # Incrementing because need to have possibility of
# overlapping circles
If you're using the same radius for every single circle, you can simplify things significantly by only calculating the circle coordinates once and then adding the center coordinates to the circle points when needed. Here's the code:
# The main array of values is called array.
shape = array.shape
row_indices = np.arange(0, shape[0], 1)
col_indices = np.arange(0, shape[1], 1)
# Returns xy coordinates for a circle with a given radius, centered at (0,0).
def points_in_circle(radius):
a = np.arange(radius + 1)
for x, y in zip(*np.where(a[:,np.newaxis]**2 + a**2 <= radius**2)):
yield from set(((x, y), (x, -y), (-x, y), (-x, -y),))
# Set the radius value before running code.
radius = RADIUS
circle_r = np.array(list(points_in_circle(radius)))
# Note that I'm using x as the row number and y as the column number.
# Center of circle is at (x_center, y_center). shape_0 and shape_1 refer to the main array
# so we can get rid of coordinates outside the bounds of array.
def add_center_to_circle(circle_points, x_center, y_center, shape_0, shape_1):
circle = np.copy(circle_points)
circle[:, 0] += x_center
circle[:, 1] += y_center
# Get rid of rows where coordinates are below 0 (can't be indexed)
bad_rows = np.array(np.where(circle < 0)).T[:, 0]
circle = np.delete(circle, bad_rows, axis=0)
# Get rid of rows that are outside the upper bounds of the array.
circle = circle[circle[:, 0] < shape_0, :]
circle = circle[circle[:, 1] < shape_1, :]
return circle
for x in row_indices:
for y in col_indices:
# You need to set CONDITION before running the code.
if CONDITION:
# Because circle_r is the same for all circles, it doesn't need to be recalculated all the time. All you need to do is add x and y to circle_r each time CONDITION is met.
circle_coords = add_center_to_circle(circle_r, x, y, shape[0], shape[1])
array[tuple(circle_coords.T)] += 1
When I set radius = 10, array = np.random.rand(1200).reshape(40, 30) and replaced if CONDITION with if (x == 20 and y == 20) or (x == 25 and y == 20), I got this, which seems to be what you want:
Let me know if you have any questions.
Adding each circle can be vectorized. This solution iterates over the coordinates where the condition is met. On a 2-core colab instance ~60k circles with radius 30 can be added per second.
import numpy as np
np.random.seed(42)
arr = np.random.rand(400,300)
r = 30
xx, yy = np.mgrid[-r:r+1, -r:r+1]
circle = xx**2 + yy**2 <= r**2
condition = np.where(arr > .999) # np.where(arr > .5) to benchmark 60k circles
for x,y in zip(*condition):
# valid indices of the array
i = slice(max(x-r,0), min(x+r+1, arr.shape[0]))
j = slice(max(y-r,0), min(y+r+1, arr.shape[1]))
# visible slice of the circle
ci = slice(abs(min(x-r, 0)), circle.shape[0] - abs(min(arr.shape[0]-(x+r+1), 0)))
cj = slice(abs(min(y-r, 0)), circle.shape[1] - abs(min(arr.shape[1]-(y+r+1), 0)))
arr[i, j] += circle[ci, cj]
Visualizing np.array arr
import matplotlib.pyplot as plt
plt.figure(figsize=(8,8))
plt.imshow(arr)
plt.show()

How can I vectorize this numpy rotation code?

The Goal:
I would like to vectorize (or otherwise speed up) this code. It rotates a 3d numpy model around its center point (let x,y,z denote the dimensions; then we want to rotate around the z-axis). The np model is binary voxels that are either "on" or "off"
I bet some basic matrix operation could do it, like take a layer and apply the rotation matrix to each element. The only issue with that is decimals; where should I have the new value land since cos(pi / 6) == sqrt(3) / 2?
The Code:
def rotate_model(m, theta):
'''
theta in degrees
'''
n =np.zeros(m.shape)
for i,layer in enumerate(m):
rotated = rotate(layer,theta)
n[i] = rotated
return n
where rotate() is:
def rotate(arr, theta):
'''
Rotates theta clockwise
rotated.shape == arr.shape, unlike scipy.ndimage.rotate(), which inflates size and also does some strange mixing
'''
if theta == int(theta):
theta *= pi / 180
theta = -theta
# theta=-theta b/c clockwise. Otherwise would default to counterclockwise
rotated =np.zeros(arr.shape)
#print rotated.shape[0], rotated.shape[1]
y_mid = arr.shape[0]//2
x_mid = arr.shape[1]//2
val = 0
for x_new in range(rotated.shape[1]):
for y_new in range(rotated.shape[0]):
x_centered = x_new - x_mid
y_centered = y_new - y_mid
x = x_centered*cos(theta) - y_centered*sin(theta)
y = x_centered*sin(theta) + y_centered*cos(theta)
x += x_mid
y += y_mid
x = int(round(x)); y = int(round(y)) # cast so range() picks it up
# lossy rotation
if x in range(arr.shape[1]) and y in range(arr.shape[0]):
val = arr[y,x]
rotated[y_new,x_new] = val
#print val
#print x,y
return rotated
You have a couple of problems in your code. First, if you want to fit the original image onto a rotated grid then you need a larger grid (usually). Alternatively, imagine a regular grid but the shape of your object - a rectangle - is rotated, thus becoming a "rhomb". It is obvious if you want to fit the entire rhomb - you need a larger output grid (array). On the other hand, you say in the code "rotated.shape == arr.shape, unlike scipy.ndimage.rotate(), which inflates size". If that is the case, maybe you do not want to fit the entire object? So, maybe it is OK to do this: rotated=np.zeros(arr.shape). But in general, yeah, one has to have a larger grid in order to fit the entire input image after it is rotated.
Another issue is angle conversion that you are doing:
if theta == int(theta):
theta *= pi / 180
theta = -theta
Why??? What will happen when I want to rotate the image by 1 radian? Or 2 radians? Am I forbidden to use integer number of radians? I think you are trying to do too much in this function and therefore it will be very confusing to do use it. Just require the caller to convert angles to radians. Or, you can do it inside this function if input theta is always in degrees. Or, you can add another parameter called, e.g., units and caller could set it to radians or degrees. Don't try to guess it based on "integer-ness" of input!
Now, let's rewrite your code a little bit:
rotated = np.zeros_like(arr) # instead of np.zero(arr.shape)
y_mid = arr.shape[0] // 2
x_mid = arr.shape[1] // 2
# val = 0 <- this is unnecessary
# pre-compute cos(theta) and sin(theta):
cs = cos(theta)
sn = sin(theta)
for x_new in range(rotated.shape[1]):
for y_new in range(rotated.shape[0]):
x = int(round((x_new - x_mid) * cs - (y_new - y_mid) * sn + x_mid)
y = int(round((x_new - x_mid) * sn - (y_new - y_mid) * cs + y_mid)
# just use comparisons, don't search through many values!
if 0 <= x < arr.shape[1] and 0 <= y < arr.shape[0]:
rotated[y_new, x_new] = arr[y, x]
So, now I can see (more easily) that for each pixel from the output array is mapped to a location in the input array. Yes, you can vectorize this.
import numpy as np
def rotate(arr, theta, unit='rad'):
# deal with theta units:
if unit.startswith('deg'):
theta = np.deg2rad(theta)
# for convenience, store array size:
ny, nx = arr.shape
# generate arrays of indices and flatten them:
y_new, x_new = np.indices(arr.shape)
x_new = x_new.ravel()
y_new = y_new.ravel()
# compute center of the array:
x0 = nx // 2
y0 = ny // 2
# compute old coordinates
xc = x_new - x0
yc = y_new - y0
x = np.round(np.cos(theta) * xc - np.sin(theta) * yc + x0).astype(np.int)
y = np.round(np.sin(theta) * xc - np.cos(theta) * yc + y0).astype(np.int)
# main idea to deal with indices is to create a mask:
mask = (x >= 0) & (x < nx) & (y >= 0) & (y < ny)
# ... and then select only those coordinates (both in
# input and "new" coordinates) that satisfy the above condition:
x = x[mask]
y = y[mask]
x_new = x_new[mask]
y_new = y_new[mask]
# map input values to output pixels *only* for selected "good" pixels:
rotated = np.zeros_like(arr)
rotated[y_new, x_new] = arr[y, x]
return rotated
Here is some code for anyone also doing 3d modeling. It solved my specific use-case pretty well. Still figuring out how to rotate in the proper plane. Hope it's helpful to you as well:
def rotate_model(m, theta):
'''
Redefines the prev 'rotate_model()' method
theta has to be in degrees
'''
rotated = scipy.ndimage.rotate(m, theta, axes=(1,2))
# have tried (1,0), (2,0), and now (1,2)
# ^ z is "up" and "2"
# scipy.ndimage.rotate() shrinks the model
# TODO: regrow it back
x_r = rotated.shape[1]
y_r = rotated.shape[0]
x_m = m.shape[1]
y_m = m.shape[0]
x_diff = abs(x_r - x_m)
y_diff = abs(y_r - y_m)
if x_diff%2==0 and y_diff%2==0:
return rotated[
x_diff//2 : x_r-x_diff//2,
y_diff//2 : y_r-y_diff//2,
:
]
elif x_diff%2==0 and y_diff%2==1:
# if this shift ends up turning the model to shit in a few iterations,
# change the following lines to include a flag that alternates cutting off the top and bottom bits of the array
return rotated[
x_diff//2 : x_r-x_diff//2,
y_diff//2+1 : y_r-y_diff//2,
:
]
elif x_diff%2==1 and y_diff%2==0:
return rotated[
x_diff//2+1 : x_r-x_diff//2,
y_diff//2 : y_r-y_diff//2,
:
]
else:
# x_diff%2==1 and y_diff%2==1:
return rotated[
x_diff//2+1 : x_r-x_diff//2,
y_diff//2+1 : y_r-y_diff//2,
:
]

How to Expand a Polygon Until One of the Borders Reaches a Point

I have code to expand the polygon, it works by multiplying the xs and ys by a factor then re centering the resultant polyon at the center of the original.
I also have code to find the value for the expansion factor, given a point that the polygon needs to reach:
import numpy as np
import itertools as IT
import copy
from shapely.geometry import LineString, Point
def getPolyCenter(points):
"""
http://stackoverflow.com/a/14115494/190597 (mgamba)
"""
area = area_of_polygon(*zip(*points))
result_x = 0
result_y = 0
N = len(points)
points = IT.cycle(points)
x1, y1 = next(points)
for i in range(N):
x0, y0 = x1, y1
x1, y1 = next(points)
cross = (x0 * y1) - (x1 * y0)
result_x += (x0 + x1) * cross
result_y += (y0 + y1) * cross
result_x /= (area * 6.0)
result_y /= (area * 6.0)
return (result_x, result_y)
def expandPoly(points, factor):
points = np.array(points, dtype=np.float64)
expandedPoly = points*factor
expandedPoly -= getPolyCenter(expandedPoly)
expandedPoly += getPolyCenter(points)
return np.array(expandedPoly, dtype=np.int64)
def distanceLine2Point(points, point):
points = np.array(points, dtype=np.float64)
point = np.array(point, dtype=np.float64)
points = LineString(points)
point = Point(point)
return points.distance(point)
def distancePolygon2Point(points, point):
distances = []
for i in range(len(points)):
if i==len(points)-1:
j = 0
else:
j = i+1
line = [points[i], points[j]]
distances.append(distanceLine2Point(line, point))
minDistance = np.min(distances)
#index = np.where(distances==minDistance)[0][0]
return minDistance
"""
Returns the distance from a point to the nearest line of the polygon,
AND the distance from where the normal to the line (to reach the point)
intersets the line to the center of the polygon.
"""
def distancePolygon2PointAndCenter(points, point):
distances = []
for i in range(len(points)):
if i==len(points)-1:
j = 0
else:
j = i+1
line = [points[i], points[j]]
distances.append(distanceLine2Point(line, point))
minDistance = np.min(distances)
i = np.where(distances==minDistance)[0][0]
if i==len(points)-1:
j = 0
else:
j = i+1
line = copy.deepcopy([points[i], points[j]])
centerDistance = distanceLine2Point(line, getPolyCenter(points))
return minDistance, centerDistance
minDistance, centerDistance = distancePolygon2PointAndCenter(points, point)
expandedPoly = expandPoly(points, 1+minDistance/centerDistance)
This code only works when the point is directly opposing one of the polygons lines.
Modify your method distancePolygon2PointAndCenter to instead of
Returns the distance from a point to the nearest line of the polygon
To return the distance from a point to the segment intersected by a ray from the center to the point. This is the line that will intersect the point once the polygon is fully expanded. To get this segment, take both endpoints of each segment of your polygon, and plug them into the equation for the line parallel & intersecting the ray mentioned earlier. That is y = ((centerY-pointY)/(centerX-pointX)) * (x - centerX) + centerY. You want to want to find endpoints where either one of them intersect the line, or the two are on opposite sides of the line.
Then, the only thing left to do is make sure that we pick the segment intersecting the right "side" of the line. To do this, there are a few options. The fail-safe method would be to use the formula cos(theta) = sqrt((centerX**2 + centerY**2)*(pointX**2 + pointY**2)) / (centerX * pointX + centerY * pointY) however, you could use methods such as comparing x and y values, taking the arctan2(), and such to figure out which segment is on the correct "side" of center. You'll just have lots of edge cases to cover. After all this is said and done, your two (unless its not convex, in which case take the segment farthest from you center) endpoints makeup the segment to expand off of.
Determine what is "polygon center" as central point C of expanding. Perhaps it is centroid (or some point with another properties?).
Make a segment from your point P to C. Find intersection point I between PC and polygon edges. If polygon is concave and there are some intersection points, choose the closest one to P.
Calculate coefficient of expanding:
E = Length(PC) / Length(CI)
Calculate new vertex coordinates. For i-th vertex of polygon:
V'[i].X = C.X + (V[i].X - C.X) * E
V'[i].Y = C.Y + (V[i].Y - C.Y) * E
Decide which point you want to reach, then calculate how much % your polygon needs to expand to reach that point and use the shapely.affinity.scale function. For example, in my case I just needed to make the polygon 5% bigger:
region = shapely.affinity.scale(myPolygon,
xfact=1.05, yfact=1.05 )

Find neighbors with cuts efficiently and return index

I have many points in the x,y plane, with length around 10000, each point (x,y) has an intrinsic radius r. This small data set is only one tiny corner of my entire data set. I have an interested point (x1,y1), I want to find nearby point around (x1,y1) within 1 and meet the criteria that the distance between (x,y) and (x1,y1) is less than r. I want to return the index of those good points, not the good points themselves.
import numpy as np
np.random.seed(2000)
x = 20.*np.random.rand(10000)
y = 20.*np.random.rand(10000)
r = 0.3*np.random.rand(10000)
x1 = 10. ### (x1,y1) is an interest point
y1 = 12.
def index_finder(x,y,r,x1,y1):
idx = (abs(x - x1) < 1.) & (abs(y - y1) < 1.) ### This cut will probably cut 90% of the data
x_temp = x[idx] ### but if I do like this, then I lose the track of the original index
y_temp = y[idx]
dis_square = (x_temp - x1)*(x_temp - x1) + (y_temp - y1)*(y_temp - y1)
idx1 = dis_square < r*r ### after this cut, there are only a few left
x_good = x_temp[idx1]
y_good = y_temp[idx1]
In this function, I can find the good points around (x1,y1), but not the index of those good points. HOWEVER, I need the ORIGINAL index because the ORIGINAL index are used to extract other data associated with the coordinate (x,y). As I mentioned, the sample data set is only a tiny corner of my entire data set, I will call the above function around 1,000,000 times for my entire data set, therefore the efficiency of the above index_finder function is also a consideration.
Any thoughts on such task?
Approach #1
We could simply index into the first mask with its own mask for selecting the True places masked values from the second stage, like so -
idx[idx] = idx1
Thus, idx would have the final valid masked values/ good valued places corresponding to original array x and y, i.e. -
x_good = x[idx]
y_good = y[idx]
This mask could then be used to index into other arrays as mentioned in the question.
Approach #2
As another approach, we could use two conditional statements , thus creating two masks with them. Finally, combine them with AND-ing to get the combined mask, which could be indexed into x and y arrays for the final outputs. We won't need to get the actual indices that way, so that's one more benefit with it.
Hence, the implementation -
X = x-x1
Y = y-y1
mask1 = (np.abs(X) < 1.) & (np.abs(Y) < 1.)
mask2 = X**2 + Y*2 < r**2
comb_mask = mask1 & mask2
x_good = x[comb_mask]
y_good = y[comb_mask]
If for some reason, you still need the corresponding indices, just do -
comb_idx = np.flatnonzero(comb_mask)
If you are doing these operations for different x1 and y1 pairs for the same x and y dataset, I would suggest using broadcasting to vectorize it through all those x1, y1 paired datasets, as shown in this post.
numpy.where seems made for finding the indices
the vectorized norm calc + np.where() could be faster than a loop
sq_norm = (x - x1)**2 + (y - y1)**2 # no need to take 10000 sqrt
idcs = np.where(sq_norm < 1.)
len(idcs[0])
Out[193]: 69
np.stack((idcs[0], x[idcs], y[idcs]), axis=1)[:5]
Out[194]:
array([[ 38. , 9.47165956, 11.94250173],
[ 39. , 9.6966941 , 11.67505453],
[ 276. , 10.68835317, 12.11589316],
[ 288. , 9.93632584, 11.07624915],
[ 344. , 9.48644057, 12.04911857]])
the norm calc can include the r array too, the 2nd step?
r_sq_norm = (x[idcs] - x1)**2 + (y[idcs] - y1)**2 - r[idcs]**2
r_idcs = np.where(r_sq_norm < 0.)
idcs[0][r_idcs]
Out[11]: array([1575, 3476, 3709], dtype=int64)
you might want to time the 2 step test vs including r in the 1st vectorized norm calc?
sq_norm = (x - x1)**2 + (y - y1)**2 - r**2
idcs = np.where(sq_norm < 0.)
idcs[0]
Out[13]: array([1575, 3476, 3709], dtype=int64)
You can take a mask of your indices, like so:
def index_finder(x,y,r,x1,y1):
idx = np.nonzero((abs(x - x1) < 1.) & (abs(y - y1) < 1.)) #numerical, not boolean
mask = (x[idx] - x1)*(x[idx] - x1) + (y[idx] - y1)*(y[idx] - y1) < r*r
idx1 = [i[mask] for i in idx]
x_good = x_temp[idx1]
y_good = y_temp[idx1]
now idx1 is the indices you want to extract.
Faster way in general to do this is to use scipy.spatial.KDTree
from scipy.spatial import KDTree
xy = np.stack((x,y))
kdt = KDTree(xy)
kdt.query_ball_point([x1, y1], r)
If you have many points to query against the same dataset, this will be much faster than sequentially calling your index_finder app.
x1y1 = np.stack((x1, y1)) #`x1` and `y1` are arrays of coordinates.
kdt.query_ball_point(x1y1, r)
ALSO WRONG: if you have different distances for each point, you can do:
def query_variable_ball(kdtree, x, y, r):
out = []
for x_, y_, r_ in zip(x, y, r):
out.append(kdt.query_ball_point([x_, y_], r_)
return out
xy = np.stack((x,y))
kdt = KDTree(xy)
query_variable_ball(kdt, x1, y1, r)
EDIT 2: This should work with different r values for each point
from scipy.spatial import KDTree
def index_finder_kd(x, y, r, x1, y1): # all arrays
xy = np.stack((x,y), axis = -1)
x1y1 = np.stack((x1, y1), axis = -1)
xytree = KDTree(xy)
d, i = xytree.query(x1y1, k = None, distance_upper_bound = 1.)
good_idx = np.zeros(x.size, dtype = bool)
for idx, dist in zip(i, d):
good_idx[idx] |= r[idx] > dist
x_good = x[good_idx]
y_good = y[good_idx]
return x_good, y_good, np.flatnonzero(good_idx)
This is very slow for only one (x1, y1) pair as the KDTree takes a while to populate. But if you have millions of pairs, this will be much faster.
(I've assumed you want the union of all good points in the (x, y) data for all (x1, y1), if you want them separately it's also possible using a similar method, removing elements of i[j] based on whether d[j] < r[i[j]])

Efficient method of calculating density of irregularly spaced points

I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.

Categories