Does this solution work for the closest pair problem? - python

I have been looking for the solution for the closest pair problem in Python, but all the solutions were completely different from mine.
Here is the code that finds the closest pair of points on a 2D dimension.
def closest_pair_two_axes(xarr, yarr):
pairs = []
# doesn't really depend if xarr or yarr because the lengths are the same
for i in range(len(xarr)):
pairs.append([xarr[i], yarr[i]])
# Finds the difference between each point
dist_arr = []
for i in range(1, len(pairs)-1):
for j in range(i, len(pairs)):
diff_x = pairs[j][0] - pairs[j - i][0]
diff_y = pairs[j][1] - pairs[j - i][1]
diff = (diff_x ** 2 + diff_y ** 2) ** 0.5
dist_arr.append(round(diff, 2))
index = dist_arr.index(min(dist_arr))
closest_distance = dist_arr[index]
return closest_distance
xarr and yarr are just the names of arrays for the x and y coordinates. In the function, I combine both arrays into a dictionary containing sets of coordinates. After that, I iterate through every possible combination of the coordinates and find the distance between each one of them. In the end, I find the minimum distance, trace it back to the pair it belongs to, and then return it to the user.
This code works but I am not sure if it works correctly and if the result really is the closest pair. Is the code correct?
EDIT:
I changed [j-1] to [j-i] so that it will iterate through every pair possible. Also, I solved the problem using another person's algorithm(the right algorithm) and got the same answer as I did with my algorithm. It might not be the fastest and cleanest code, but it works!

Related

How to find the distance between two elements in a 2D array

Let's say you have the grid:
list = [[-,O,-,-,O,-,],
[O,-,-,-,-,O],
[O,O,-,-,X,-],
[-,-,O,-,-,-]]
How would you get the coordinates of all O's that are within a distance of 3 from X?
From what I saw in other answers, using scipy.spatial.KDTree.query_ball_point seemed like a common approach but I was unable to figure out how to adapt it to my use case. One possible idea I had was to store every coordinate of the list such as
coords=[[0,0],[0,1]...] and then use the scipy method and pass in the coordinate of the X and the searching distance. And then once I received the list of possible coordinates, I then iterate through the list and check which ones are equal to O. I was wondering, however, if there was a more efficient or more optimized solution I could use. Any help would be greatly appreciated.
You don't need to make it too complicate by using Scipy. This problem can easily done by help of mathematics.
Equation of coordinate inside circle is x^2 + y^2 <= Radius^2, so just check coordinate that inside the circle.
list = [[-,O,-,-,O,-,],
[O,-,-,-,-,O],
[O,O,-,-,X,-],
[-,-,O,-,-,-]]
X_coor = #Coordinate of X, in this case y = 2, x = 4
d = #Maximum distance from X in this case d = 3
total = 0
O_coor = [] #Store coordinate of all O near X
for y in range(max(0, X_coor.y - d), min(list.length - 1, X_coor.y + d)):
for x in range(max(0, X_coor.x - sqrt(d**2 - (y - X_coor.y)**2)), min(list.length - 1, X_coor.x + sqrt(d**2 - (y - X_coor.y)**2))):
if list[y][x] == "O":
total++
O_coor.append([x, y])
print(total)
print(O_coor)
It a long code, but you can ask me parts that you don't understand.
Note: This solution check only coordinate in circle area not entire list, so even if you have large list this still very fast.

Analyzing the complexity matrix path-finding

Recently in my homework, I was assinged to solve the following problem:
Given a matrix of order nxn of zeros and ones, find the number of paths from [0,0] to [n-1,n-1] that go only through zeros (they are not necessarily disjoint) where you could only walk down or to the right, never up or left. Return a matrix of the same order where the [i,j] entry is the number of paths in the original matrix that go through [i,j], the solution has to be recursive.
My solution in python:
def find_zero_paths(M):
n,m = len(M),len(M[0])
dict = {}
for i in range(n):
for j in range(m):
M_top,M_bot = blocks(M,i,j)
X,Y = find_num_paths(M_top),find_num_paths(M_bot)
dict[(i,j)] = X*Y
L = [[dict[(i,j)] for j in range(m)] for i in range(n)]
return L[0][0],L
def blocks(M,k,l):
n,m = len(M),len(M[0])
assert k<n and l<m
M_top = [[M[i][j] for i in range(k+1)] for j in range(l+1)]
M_bot = [[M[i][j] for i in range(k,n)] for j in range(l,m)]
return [M_top,M_bot]
def find_num_paths(M):
dict = {(1, 1): 1}
X = find_num_mem(M, dict)
return X
def find_num_mem(M,dict):
n, m = len(M), len(M[0])
if M[n-1][m-1] != 0:
return 0
elif (n,m) in dict:
return dict[(n,m)]
elif n == 1 and m > 1:
new_M = [M[0][:m-1]]
X = find_num_mem(new_M,dict)
dict[(n,m-1)] = X
return X
elif m == 1 and n>1:
new_M = M[:n-1]
X = find_num_mem(new_M, dict)
dict[(n-1,m)] = X
return X
new_M1 = M[:n-1]
new_M2 = [M[i][:m-1] for i in range(n)]
X,Y = find_num_mem(new_M1, dict),find_num_mem(new_M2, dict)
dict[(n-1,m)],dict[(n,m-1)] = X,Y
return X+Y
My code is based on the idea that the number of paths that go through [i,j] in the original matrix is equal to the product of the number of paths from [0,0] to [i,j] and the number of paths from [i,j] to [n-1,n-1]. Another idea is that the number of paths from [0,0] to [i,j] is the sum of the number of paths from [0,0] to [i-1,j] and from [0,0] to [i,j-1]. Hence I decided to use a dictionary whose keys are matricies of the form [[M[i][j] for j in range(k)] for i in range(l)] or [[M[i][j] for j in range(k+1,n)] for i in range(l+1,n)] for some 0<=k,l<=n-1 where M is the original matrix and whose values are the number of paths from the top of the matrix to the bottom. After analizing the complexity of my code I arrived at the conclusion that it is O(n^6).
Now, my instructor said this code is exponential (for find_zero_paths), however, I disagree.
The recursion tree (for find_num_paths) size is bounded by the number of submatrices of the form above which is O(n^2). Also, each time we add a new matrix to the dictionary we do it in polynomial time (only slicing lists), SO... the total complexity is polynomial (poly*poly = poly). Also, the function 'blocks' runs in polynomial time, and hence 'find_zero_paths' runs in polynomial time (2 lists of polynomial-size times a function which runs in polynomial time) so all in all the code runs in polynomial time.
My question: Is the code polynomial and my O(n^6) bound is wrong or is it exponential and I am missing something?
Unfortunately, your instructor is right.
There is a lot to unpack here:
Before we start, as quick note. Please don't use dict as a variable name. It hurts ^^. Dict is a reserved keyword for a dictionary constructor in python. It is a bad practice to overwrite it with your variable.
First, your approach of counting M_top * M_bottom is good, if you were to compute only one cell in the matrix. In the way you go about it, you are unnecessarily computing some blocks over and over again - that is why I pondered about the recursion, I would use dynamic programming for this one. Once from the start to end, once from end to start, then I would go and compute the products and be done with it. No need for O(n^6) of separate computations. Sine you have to use recursion, I would recommend caching the partial results and reusing them wherever possible.
Second, the root of the issue and the cause of your invisible-ish exponent. It is hidden in the find_num_mem function. Say you compute the last element in the matrix - the result[N][N] field and let us consider the simplest case, where the matrix is full of zeroes so every possible path exists.
In the first step, your recursion creates branches [N][N-1] and [N-1][N].
In the second step, [N-1][N-1], [N][N-2], [N-2][N], [N-1][N-1]
In the third step, you once again create two branches from every previous step - a beautiful example of an exponential explosion.
Now how to go about it: You will quickly notice that some of the branches are being duplicated over and over. Cache the results.

Efficient Particle-Pair Interactions Calculation

I have an N-body simulation that generates a list of particle positions, for multiple timesteps in the simulation. For a given frame, I want to generate a list of the pairs of particles' indices (i, j) such that dist(p[i], p[j]) < masking_radius. Essentially I'm creating a list of "interaction" pairs, where the pairs are within a certain distance of each other. My current implementation looks something like this:
interaction_pairs = []
# going through each unique pair (order doesn't matter)
for i in range(num_particles):
for j in range(i + 1, num_particles):
if dist(p[i], p[j]) < masking_radius:
interaction_pairs.append((i,j))
Because of the large number of particles, this process takes a long time (>1 hr per test), and it is severely limiting to what I need to do with the data. I was wondering if there was any more efficient way to structure the data such that calculating these pairs would be more efficient instead of comparing every possible combination of particles. I was looking into KDTrees, but I couldn't figure out a way to utilize them to compute this more efficiently. Any help is appreciated, thank you!
Since you are using python, sklearn has multiple implementations for nearest neighbours finding:
http://scikit-learn.org/stable/modules/neighbors.html
There is KDTree and Balltree provided.
As for KDTree the main point is to push all the particles you have into KDTree, and then for each particle ask query: "give me all particles in range X". KDtree usually do this faster than bruteforce search.
You can read more for example here: https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/kdtrees.pdf
If you are using 2D or 3D space, then other option is to just cut the space into big grid (which cell size of masking radius) and assign each particle into one grid cell. Then you can find possible candidates for interaction just by checking neighboring cells (but you also have to do a distance check, but for much fewer particle pairs).
Here's a fairly simple technique using plain Python that can reduce the number of comparisons required.
We first sort the points along either the X, Y, or Z axis (selected by axis in the code below). Let's say we choose the X axis. Then we loop over point pairs like your code does, but when we find a pair whose distance is greater than the masking_radius we test whether the difference in their X coordinates is also greater than the masking_radius. If it is, then we can bail out of the inner j loop because all points with a greater j have a greater X coordinate.
My dist2 function calculates the squared distance. This is faster than calculating the actual distance because computing the square root is relatively slow.
I've also included code that behaves similar to your code, i.e., it tests every pair of points, for speed comparison purposes; it also serves to check that the fast code is correct. ;)
from random import seed, uniform
from operator import itemgetter
seed(42)
# Make some fake data
def make_point(hi=10.0):
return [uniform(-hi, hi) for _ in range(3)]
psize = 1000
points = [make_point() for _ in range(psize)]
masking_radius = 4.0
masking_radius2 = masking_radius ** 2
def dist2(p, q):
return (p[0] - q[0])**2 + (p[1] - q[1])**2 + (p[2] - q[2])**2
pair_count = 0
test_count = 0
do_fast = 1
if do_fast:
# Sort the points on one axis
axis = 0
points.sort(key=itemgetter(axis))
# Fast
for i, p in enumerate(points):
left, right = i - 1, i + 1
for j in range(i + 1, psize):
test_count += 1
q = points[j]
if dist2(p, q) < masking_radius2:
#interaction_pairs.append((i, j))
pair_count += 1
elif q[axis] - p[axis] >= masking_radius:
break
if i % 100 == 0:
print('\r {:3} '.format(i), flush=True, end='')
total_pairs = psize * (psize - 1) // 2
print('\r {} / {} tests'.format(test_count, total_pairs))
else:
# Slow
for i, p in enumerate(points):
for j in range(i+1, psize):
q = points[j]
if dist2(p, q) < masking_radius2:
#interaction_pairs.append((i, j))
pair_count += 1
if i % 100 == 0:
print('\r {:3} '.format(i), flush=True, end='')
print('\n', pair_count, 'pairs')
output with do_fast = 1
181937 / 499500 tests
13295 pairs
output with do_fast = 0
13295 pairs
Of course, if most of the point pairs are within masking_radius of each other, there won't be much benefit in using this technique. And sorting the points adds a little bit of time, but Python's TimSort is rather efficient, especially if the data is already partially sorted, so if the masking_radius is sufficiently small you should see a noticeable improvement in the speed.

Find maximum sum of sublist in list of positive integers under O(n^2) of specified length Python 3.5

For one of my programming questions, I am required to define a function that accepts two variables, a list of length l and an integer w. I then have to find the maximum sum of a sublist with length w within the list.
Conditions:
1<=w<=l<=100000
Each element in the list ranges from [1, 100]
Currently, my solution works in O(n^2) (correct me if I'm wrong, code attached below), which the autograder does not accept, since we are required to find an even simpler solution.
My code:
def find_best_location(w, lst):
best = 0
n = 0
while n <= len(lst) - w:
lists = lst[n: n + w]
cur = sum(lists)
best = cur if cur>best else best
n+=1
return best
If anyone is able to find a more efficient solution, please do let me know! Also if I computed my big-O notation wrongly do let me know as well!
Thanks in advance!
1) Find sum current of first w elements, assign it to best.
2) Starting from i = w: current = current + lst[i]-lst[i-w], best = max(best, current).
3) Done.
Your solution is indeed O(n^2) (or O(n*W) if you want a tighter bound)
You can do it in O(n) by creating an aux array sums, where:
sums[0] = l[0]
sums[i] = sums[i-1] + l[i]
Then, by iterating it and checking sums[i] - sums[i-W] you can find your solution in linear time
You can even calculate sums array on the fly to reduce space complexity, but if I were you, I'd start with it, and see if I can upgrade my solution next.

Using Sin-1 or inverse sin in python

Here is my code:
# point of intersection between opposite and hypotenuse
x,y = pygame.mouse.get_pos()
# using formula for length of line
lenline1 = (x-x)**2 + (300-y)**2
lenline2 = (x-700)**2 + (y-300)**2
opposite = math.sqrt(lenline1)
adjacent = math.sqrt(lenline2)
# Converting length of lines to angle
PQ = opposite/adjacent
k = math.sin(PQ)
j = math.asin(k)
print(j)
I'm not getting the results I expected, although after messing around with it I got close but it wasn't quite right. Could someone please tell me what I'm doin wrong. I have two lines:
opposite and adjacent
And I wish to get the angle using the inverse of sin. What am I doing wrong. I'm only a beginner so don't give too detailed info. I can't imagine this is hard to do.
Thanks.
To find the angle between two lines, use the following relation:
cos(angle) = (l1 dot l2) / (|l1| |l2|)
That is,
dotproduct = l1x * l2x + l1y * l2y
lenproduct = |l1| * |l2|
angle = acos(dotproduct / lenproduct)
where l1x, l1y are the x,y components of the line l1.
Don't bother with the k computation, its meaningless.
j = math.asin(PQ)
However, this only works for right-angled triangles and you have to appropriate side lengths in the right places. In general this will not work and you need to use the dot product method.
Looks like you're trying to find the angle of the triangle (700,300), (x,300), (x,y). You're making it much more complicated than it needs to be. the length of the hypotenuse is math.hypot((700-x),(300-y)) and the angle is math.atan2((700-x), (300-y)).

Categories