How do recursive calls work (sierpinksi triangle)? - python

I came across a program that draws the Sierpinski Triangle with recursion.
How I interpret this code is sierpinski1 is called until n == 0, and then only 3 small triangles (one triangle per call) would be drawn because n == 0 is the only case when something is drawn (panel.canvas.create_polygon). However, this is not how the code works because when run the number of triangles dependent upon n are drawn, not just the 3 small triangles I think would show.
Can someone explain to me how many things can be drawn when the function sierpinski1 only has 1 condition for when something can be drawn? That is the one part of the program that I can't understand. I looked up everything I could on recursion, but no information pertained to explaining why this format of recursion works.
def sierpinski(n):
x1 = 250
y1 = 120
x2 = 400
y2 = 380
x3 = 100
y3 = 380
panel = DrawingPanel(500,500)
sierpinski1(n,x1,y1,x2,y2,x3,y3,panel)
def sierpinski1(n,x1,y1,x2,y2,x3,y3,panel):
if n == 0:
panel.canvas.create_polygon(x1,y1,x2,y2,x3,y3, fill = 'yellow', outline = 'black')
else:
sierpinski1(n-1,x1,y1,(x1+x2)/2,(y1+y2)/2,(x1+x3)/2,(y1+y3)/2, panel)
sierpinski1(n-1,(x1+x3)/2,(y1+y3)/2,(x2+x3)/2,(y2+y3)/2,x3,y3,panel)
sierpinski1(n-1,(x1+x2)/2,(y1+y2)/2,x2,y2,(x2+x3)/2,(y2+y3)/2,panel)

This is the principle of how recursion works: there is a base case and there is a recursive case. Since recursion makes use of a LIFO structure (such as a call stack), we have to know when to stop adding calls to the stack.
The base case:
Occurs when n == 0
Performs the actual drawing action
Means that there are no more triangles to be generated, so it's okay to start drawing them.
The recursive case:
Occurs when n > 0 (and strictly speaking, when n < 0)
Makes three distinct calls to itself, each with varying values for x1, x2, y1, and y2.
Means that there are still more triangles to be generated.
Think of it like this. The number of triangles to be drawn is given by this formula T:
This holds for simple triangles: If n = 1, then there's only three triangles drawn. If n = 2, then 9 are drawn, and so forth.
Why will it work? The call stack plays a big role in this.
For brevity, here's a trace of n = 1:
sierpinski1(n,x1,y1,x2,y2,x3,y3,panel)
condition n = 0 FAILS
sierpinski1(n-1,x1,y1,(x1+x2)/2,(y1+y2)/2,(x1+x3)/2,(y1+y3)/2, panel)
condition n = 0 PASSES
panel.canvas.create_polygon(x1,y1,x2,y2,x3,y3, fill = 'yellow', outline = 'black')
sierpinski1(n-1,(x1+x3)/2,(y1+y3)/2,(x2+x3)/2,(y2+y3)/2,x3,y3,panel)
condition n = 0 PASSES
panel.canvas.create_polygon(x1,y1,x2,y2,x3,y3, fill = 'yellow', outline = 'black')
sierpinski1(n-1,(x1+x2)/2,(y1+y2)/2,x2,y2,(x2+x3)/2,(y2+y3)/2,panel)
condition n = 0 PASSES
panel.canvas.create_polygon(x1,y1,x2,y2,x3,y3, fill = 'yellow', outline = 'black')
So, for n = 1, there are exactly three lines drawn. For higher values of n, things get trickier to see at a pseudocode high level, but the same principle applies.

Things are only drawn when n = 0, but if it is called with n = 1, then three separate calls are made to it with n = 0. Similarly, if it is called with n = 2, then three calls are made to it with n = 1, each of which makes three calls to it with n = 0, for a total of nine drawings. In general, as the number of calls is multiplied by three each layer, there are 3^n small triangles drawn when it is called with n.

Can someone explain to me how many things can be drawn when the
function sierpinski1 only has 1 condition for when something can be
drawn?
Because the function makes three recursive calls at each non-zero step. That means for every n that is greater than 0, the function branches into three distinct paths whose value for n is smaller by 1. You will end up reaching n=0 a total number of 3n times.

Related

How to get 'n' random points between start and end coordinates in python?

I have a starting coordinate (x1,y1) and an ending coordinate (x2, y2). I want to generate 'n' random points between start and end coordinates without any duplicates. How to do this with python?
I know that a simple way would be to generate 'n' x values and 'n' y values. So we get n*n pairs and I choose 'n' among them with no duplicates. This way I mayn't get a uniform distribution of random points. Any other way to do this?
Edit: I require floating point coordinates in the rectangle formed by the start and end coordinates as opposite corners.
TL;DR:
from random import uniform
def gen_coords(x1, y1, x2, y2, n):
result = set()
# loops for each addition, avoiding duplicates
while len(result) < n:
result.add((uniform(x1, x2), uniform(y1, y2)))
return result
Arguably, practically:
from random import uniform
def gen_coords(x1, y1, x2, y2, n):
return [(uniform(x1, x2), uniform(y1, y2)) for _ in range(n)]
Considering that the odds of collisions are tiny.
Assuming that "between start and end coordinates" means in a rectangular section between these two corners in a Cartesian coordinate system (i.e. flat, 2D).
And assuming that a "uniform distribution" is achieved sufficiently ignoring the non-uniform distribution of floating point values. (i.e. not the exact same number of floating point values on any interval of equal length, nor a constant distance between floating point values in a continuum)
There's basically three ways of ensuring the randomly generated points are not duplicated:
pick them from a collection of possible values, removing each pick to avoid picking it again;
generate values within the allowed space, checking each pick against previous picks to avoid adding duplicates (and re-picking values until a new one is generated);
generate values and add to the set until the desired set size, removing duplicates after generation if any and repeating the process until done.
The first option can be a good choice if the space from which values are picked is of similar size to the target set size. However, when picking points with random floating point coordinates in some space, this is unlikely.
The second choice is the most straightforward, but can be expensive to compute if the target set size is large, as every new pick causes more comparisons.
The third choice is a bit more involved, but avoids comparisons until a candidate target set has been completed and certainly the best choice if the odds of collisions are small.
As a variant of the second choice, you could pick a target data structure that simply avoids the addition of duplicates altogether, relying on the language / interpreter to perform the checking more efficiently than any algorithm written in the language would be able to.
In Python, this means using a set instead of a list, which is the fastest way to achieve the result and would likely be the way you'd check for duplicates in the third option anyway - so you may as well use it right away and go with the variant of the second option.
Note that both the 2nd and 3rd option have a major flaw in case you're trying to create a set in the range of the selection function that's larger than the domain of the selection function. But for the given problem that's unlikely except for extremely large 'n'.
A solution (pitting the second option against the third):
from random import uniform
from timeit import timeit
def pick_coords_restricted(x1, y1, x2, y2, n):
result = set()
# loops for each addition, avoiding duplicates
while len(result) < n:
result.add((uniform(x1, x2), uniform(y1, y2)))
return result
def pick_coords_checked(x1, y1, x2, y2, n):
result = []
# loops once for attempt, checking after each iteration
while len(set(result)) < n:
if len(result) > 0:
result = list(set(result))
result += [(uniform(x1, x2), uniform(y1, y2)) for _ in range(n - len(result))]
else:
result = [(uniform(x1, x2), uniform(y1, y2)) for _ in range(n)]
return result
print(timeit(lambda: pick_coords_restricted(0, 0, 1, 1, 1000), number=10000))
print(timeit(lambda: pick_coords_checked(0, 0, 1, 1, 1000), number=10000))
Result (on my hardware):
4.3799341
3.9363368000000003
I get consistently, but marginally better results for the pick_coords_checked function - I would favour the clarity of the first implementation.

My code suddenly stops writing at beyond 15500 iterations

I'm studying how to code in Python and I'm trying to recreate a code I did in college.
The code is based on a 2D Ising model applied to epidemiology. What it does is:
it constructs a 2D 100x100 array using numpy, and assigns a value of -1 to each element.
The energy is calculated based on the function calc_h in the script below.
Then, the code randomly selects a cell from the lattice, changes the value to 1, then calculates the energy of the system again.
Then, the code compares if the energy of the system is less than or equal to the previous configuration. If it does, it "accepts" the change. If it isn't, a probability is compared to a random number to determine if the change is "accepted". This part is done in the metropolis function.
The code repeats the process using a while loop until the maximum specified iteration, max_iterations.
-The code tallies the number of elements with a -1 value (which is the s variable) and the number of elements with a 1 value (which is the i variable) in the countSI function. The script appends to a text file every 500 iteratons.
THE PROBLEM
I ran the script and besides taking too long to execute, the tallying stops at 15500. The code doesn't throw any error, but it just keeps going. I waited for around 3 hours for it to finish but it still goes only up to 15500 iterations.
I've tried commenting out the writing to csv block and instead printing the values first to observe it as it happens, and there I see, it stops at 15500 again.
I have no idea what's wrong as it doesn't throw in any error, and the code doesn't stop.
Here's the whole script. I put a description of what the part does below each block:
import numpy as np
import random as r
import math as m
import csv
init_size = input("Input array size: ")
size = int(init_size)
this part initializes the size of the 2D array. For observation purposes, I selected a 100 by 100 latice.
def check_overflow(indx, size):
if indx == size - 1:
return -indx
else:
return 1
I use this function for the calc_h function, to initialize a circular boundary condition. Simply put, the edges of the lattice are connected to one another.
def calc_h(pop, J1, size):
h_sum = 0
r = 0
c = 0
while r < size:
buffr = check_overflow(r, size)
while c < size:
buffc = check_overflow(c, size)
h_sum = h_sum + J1*pop[r,c] * pop[r,c-1] * pop[r-1,c] * pop[r+buffr,c] * pop[r,c+buffc]
c = c + 1
c = 0
r = r + 1
return h_sum
this function calculates the energy of the system by taking the sum of the product of the value of a cell, its top, bottom, left and right neighbors, multiplied to a constant J.
def metropolis(h, h0, T_):
if h <= h0:
return 1
else:
rand_h = r.random()
p = m.exp(-(h - h0)/T_)
if rand_h <= p:
return 1
else:
return 0
This determines whether the change from -1 to 1 is accepted depending on what calc_h gets.
def countSI(pop, sz, iter):
s = np.count_nonzero(pop == -1)
i = np.count_nonzero(pop == 1)
row = [iter, s, i]
with open('data.txt', 'a') as si_csv:
tally_data = csv.writer(si_csv)
tally_data.writerow(row)
si_csv.seek(0)
This part tallies the number of -1's and 1's in the lattice.
def main():
J = 1
T = 4.0
max_iterations = 150000
population = np.full((size, size), -1, np.int8) #initialize population array
The 2D array is initialized in population.
h_0 = calc_h(population, J, size)
turn = 1
while turn <= max_iterations:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
while population[inf_x,inf_y] == 1:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
population[inf_x, inf_y] = 1
h = calc_h(population, J, size)
accept_i = metropolis(h,h_0,T)
This is the main loop, where a random cell is selected, and whether the change is accepted or not is determined by the function metropolis.
if (accept_i == 0):
population[inf_x, inf_y] = -1
if turn%500 == 0 :
countSI(population, size, turn)
The script tallies every 500th iteration.
turn = turn + 1
h_0 = h
main()
The expected output is a text file with the tallies of the number of the s and i every 500th iteration. something that looks like this:
500,9736,264
1000,9472,528
1500,9197,803
2000,8913,1087
2500,8611,1389
3000,8292,1708
3500,7968,2032
4000,7643,2357
4500,7312,2688
5000,6960,3040
5500,6613,3387
6000,6257,3743
6500,5913,4087
7000,5570,4430
7500,5212,4788
I have no idea where to start at a solution. At first, I thought it was the writing to csv that's causing the problem, but probing through the print function proves otherwise. I tried to make it as concise as I can.
I hope you guys can help! I really wanna learn this language and start simulating a lot of stuff, and I think this mini project is a great starting step for me.
Thanks a lot!
Answer provided by #randomir in the comments:
Your code is probably wrong. It will block in that nested while loop whenever the number of spins to flip is smaller than the number of iterations. In your example from the previous comment, the size of the population is 10000 and you want to flip 15500 spins. Note that once spin is flipped up (with 100% prob), it will be flipped down with smaller prob, due to metropolis sampling.
works.

Minimum number of iterations in matrix where cell value replaced by maximum of neighbour cell value in single iteration

I have an matrix with values in each cell (minimum value=1), where the maximum value is 'max'.
At a time, I modify each cell value by the highest value of its neighboring cells i.e. all 8 neighbors, and this occurs for the whole matrix, simultaneously. I want to find after what minimum number of iterations after which value of all cells will be max.
One brute force method of doing this is by padding the matrix by zeros, and
for i in range (1,x_max+1):
for j in range(1,y_max+1):
maximum = 0
for k in range(-1,2):
for l in range(-1,2):
if matrix[i+k][j+l]>maximum:
maximum = matrix[i+k][j+l]
matrix[i][j] = maximum
But is there an intelligent and faster way of doing this?
Thanks in advance.
I think this can be solved by BFS(Breadth first Search).
Start BFS simulatneously with all the matrix cells with 'max' value.
dis[][] == infinite // min. distance of cell from nearest cell with 'max' value, initially infinite for all
Q // Queue
M[][] // matrix
for all i,j // travers the matrix, enqueue all cells with 'max'
if M[i][j] == 'max'
dis[i][j] = 0 , Q.push( cell(i,j) )
while !Q.empty:
cell Current = Q.front
for all neighbours Cell(p,q) of Current:
if dis[p][q] == infinite
dis[p][q] = dis[Current.row][Current.column] + 1
Q.push( cell(p,q))
Q.pop()
The cell with max(dis[i][j]) for all i,j will be the no. of iterations needed.
Use an array with a "border".
Testing the edge conditions is tedious and can be avoided by making the array 1-bigger around the edge, each element with the value of INT_MIN.
Additionally, consider 8 tests, rather than a double nested loop
// Data is in matrix[1...N][1...M], yet is size matrix[N+2][M+2]
for (i=1; i <= N; i++) {
for (j=1; j <= M; j++) {
maximum = matrix[i-1][j-l];
if (matrix[i-1][j+0] > maximum) maximum = matrix[i-1][j+0];
if (matrix[i-1][j+1] > maximum) maximum = matrix[i-1][j+1];
if (matrix[i+0][j-1] > maximum) maximum = matrix[i+0][j-1];
if (matrix[i+0][j+0] > maximum) maximum = matrix[i+0][j+0];
if (matrix[i+0][j+1] > maximum) maximum = matrix[i+0][j+1];
if (matrix[i+1][j-1] > maximum) maximum = matrix[i+1][j-1];
if (matrix[i+1][j+0] > maximum) maximum = matrix[i+1][j+0];
if (matrix[i+1][j+1] > maximum) maximum = matrix[i+1][j+1];
newmatrix[i][j] = maximum
All existing answers require examining every cell in the matrix. If you don't already know what the locations of the maximum value are, this is unavoidable, and in that case, Amit Kumar's BFS algorithm has optimal time complexity: O(wh), if the matrix has width w and height h.
OTOH, perhaps you already know the locations of the k maximum values, and k is relatively small. In that case, the following algorithm will find the answer in just O(k^2*(log(k)+log(max(w, h)))) time, which is much faster when either w or h is large. It doesn't actually look at any matrix entries; instead, it runs a binary search to look for candidate stopping times (that is, answers). For each candidate stopping time it builds the set of rectangles that would be occupied by max by that time, and checks whether any matrix cell remains uncovered by a rectangle.
To explain the idea, we first need some terms. Call the top row of a rectangle a "starting vertical event", and the row below its bottom edge an "ending vertical event". A "basic interval" is the interval of rows spanned by any pair of vertical events that does not have a third vertical event anywhere between them (the event pairs defining these intervals can be from the same or different rectangles). Notice that with k rectangles, there can never be more than 2k+1 basic intervals -- there is no dependence here on h.
The basic idea is to walk left-to-right through the columns of the matrix that correspond to horizontal events: columns in which either a new rectangle "starts" (the left vertical edge of a rectangle), or an existing rectangle "finishes" (the column to the right of the right vertical edge of a rectangle), keeping track of how many rectangles are currently covering every basic interval. If we ever detect a basic interval covered by 0 rectangles, we can stop: we have found a column containing one or more cells that are not yet covered at time t. If we get to the right edge of the matrix without this happening, then all cells are covered at time t.
Here is pseudocode for a function that checks whether any matrix cell remains uncovered by time t, given a length-k array peak, where (peak[i].x, peak[i].y) is the location of the i-th max-containing cell in the original matrix, in increasing order of x co-ordinate (so the leftmost max-containing cell is at (peak[1].x, peak[1].y)).
Function IsMatrixCovered(t, peak[]) {
# Discover all vertical events and basic intervals
Let vertEvents[] be an empty array of integers.
For i from 1 to k:
top = max(1, peak[i].y - t)
bot = min(h, peak[i].y + t)
Append top to vertEvents[]
Append bot+1 to vertEvents[]
Sort vertEvents in increasing order, and remove duplicates.
x = 1
Let horizEvents[] be an empty array of { col, type, top, bot } structures.
For i from 1 to k:
# Calculate the (clipped) rectangle that peak[i] will cover at time t:
lft = max(1, peak[i].x - t)
rgt = min(w, peak[i].x + t)
top = max(1, peak[i].y - t)
bot = min(h, peak[i].y + t)
# Convert vertical positions to vertical event indices
top = LookupIndexUsingBinarySearch(top, vertEvents[])
bot = LookupIndexUsingBinarySearch(bot+1, vertEvents[])
# Record horizontal events
Append (lft, START, top, bot) to horizEvents[]
Append (rgt+1, STOP, top, bot) to horizEvents[]
Sort horizEvents in increasing order by its first 2 fields, with START considered < STOP.
# Walk through all horizontal events, from left to right.
Let basicIntervals[] be an array of size(vertEvents[]) integers, initially all 0.
nOccupiedBasicIntervalsFirstCol = 0
For i from 1 to size(horizEvents[]):
If horizEvents[i].type = START:
d = 1
Else (if it is STOP):
d = -1
If horizEvents[i].col <= w:
For j from horizEvents[i].top to horizEvents[i].bot:
If horizEvents[i].col = 1 and basicIntervals[j] = 0:
++nOccupiedBasicIntervalsFirstCol # Must be START
basicIntervals[j] += d
If basicIntervals[j] = 0:
return FALSE
If nOccupiedBasicIntervalsFirstCol < size(basicIntervals):
return FALSE # Could have checked earlier, but the code is simpler this way
return TRUE
}
The above function can simply be called inside a binary search on t, that looks for the smallest value of t for which the function returns TRUE.
A further factor of k/log(k) could be removed by exploiting the fact that the set of basic intervals affected by any rectangle starting or ending is always an interval, through the use of Fenwick trees.

python codefights Count The Black Boxes, modeling the diagonal of a rectangle

Given the dimensions of a rectangle,(m,n), made up of unit squares, output the number of unit squares the diagonal of the rectangle touches- that includes borders and vertices.
My algorithm approaches this by cycling through all the unit squares(under assumption that can draw our diagonal from (0,0) to (m,n)
My algorithm solves 9 of 10 tests, but is too inefficient to solve the tenth test in given time.
I"m uopen to all efficiency suggestions, but in the name of asking a specific question... I seem to be having a disconnect in my own logic concerning adding a break statement, to cut some steps out of the process. My thinking is, this shouldn't matter, but it does affect the result, and I haven't been able to figure out why.
So, can someone help me understand how to insert a break that doesn't affect the output.
Or how to eliminate a loop. I"m currently using nested loops.
So, yeah, I think my problems are algorithmic rather than syntax.
def countBlackCells(m, n):
counter=0
y=[0,0]
testV=0
for i in xrange(n): #loop over m/x first
y[0]=float(m)/n*i
y[1]=float(m)/n*(i+1)
#print(str(y))
for j in xrange(m): #loop over every n/y for each x
if((y[0]<=(j+1) and y[0]>=j) or (y[1]>=(j) and y[1]<=j+1)):#is min of line in range inside teh box? is max of line?
counter+=1
#testV += 1
else: pass # break# thinking that once you are beyond the line in either direction, your not coming back to it by ranging up m anymore. THAT DOESN"T SEEM TO BE THE CASE
#tried adding a flag (testV), so that inner loop would only break if line was found and then lost again, still didn't count ALL boxes. There's something I'm not understanding here.
return counter
Some sample, input/output
Input:
n: 3
m: 4
Output:
6
Input:
n: 3
m: 3
Output:
7
Input:
n: 33
m: 44
Output:
86
Find G - the greatest common divisor of m and n.
If G > 1 then diagonal intersects G-1 inner vertices, touching (not intersecting) 2*(G-1) cells.
And between these inner vertices there are G sub-rectangles with mutually prime sides M x N (m/G x n/G)
Now consider case of mutually prime M and N. Diagonal of such rectangle does not intersect any vertex except for starting and ending. But it must intersect M vertical lines and N horizontal lines, and at every intersection diagonal enters into the new cell, so it intersects M + N - 1 cells (subtract 1 to account for the first corner where both vertical and horizontal lines are met together)
So use these clues and deduce final solution.
I used math.gcd() to solve the problem in python.
def countBlackCells(n, m):
return m+n+math.gcd(m,n)-2

Efficient way to find overlapping of N rectangles

I am trying to find an efficient solution for finding overlapping of n rectangles where rectangles are stored in two separate lists. We are looking for all rectangles in listA that overlap with rectangles in listB (and vice versa). Comparing one element from the first list to second list could take immensely large amount of time. I am looking for an efficient solution.
I have two list of rectangles
rect = Rectangle(10, 12, 56, 15)
rect2 = Rectangle(0, 0,1, 15)
rect3 = Rectangle (10, 12, 56, 15)
listA = [rect, rect2]
listB = [rect3]
which is created from the class:
import numpy as np
import itertools as it
class Rectangle(object):
def __init__(self, left, right, bottom, top):
self.left = left
self.bottom = right
self.right = bottom
self.top = top
def overlap(r1, r2):
hoverlaps = True
voverlaps = True
if (r1.left > r2.right) or (r1.right < r2.left):
hoverlaps = False
if (r1.top < r2.bottom) or (r1.bottom > r2.top):
voverlaps = False
return hoverlaps and voverlaps
I need to compare rectangle in listA to listB the code goes like this which is highly inefficient - comparing one by one
for a in it.combinations(listB):
for b in it.combinations(listA):
if a.overlap(b):
Any better efficient method to deal with the problem?
First off: As with many a problem from computational geometry, specifying the parameters for order-of-growth analysis needs care: calling the lengths of the lists m and n, the worst case in just those parameters is Ω(m×n), as all areas might overlap (in this regard, the algorithm from the question is asymptotically optimal). It is usual to include the size of the output: t = f(m, n, o) (Output-sensitive algorithm).
Trivially, f ∈ Ω(m+n+o) for the problem presented.
Line Sweep is a paradigm to reduce geometrical problems by one dimension - in its original form, from 2D to 1D, plane to line.
Imagine all the rectangles in the plane, different colours for the lists.
 Now sweep a line across this plane - left to right, conventionally, and infinitesimally further to the right "for low y-coordinates" (handle coordinates in increasing x-order, increasing y-order for equal x).
 For all of this sweep (or scan), per colour keep one set representing the "y-intervals" of all rectangles at the current x-coordinate, starting empty. (In a data structure supporting insertion, deletion, and enumerating all intervals that overlap a query interval: see below.)
 Meeting the left side of a rectangle, add the segment to the data structure for its colour. Report overlapping intervals/rectangles in any other colour.
 At a right side, remove the segment.
 Depending on the definition of "overlapping", handle left sides before right sides - or the other way round.
There are many data structures supporting insertion and deletion of intervals, and finding all intervals that overlap a query interval. Currently, I think Augmented Search-Trees may be easiest to understand, implement, test, analyse…
Using this, enumerating all o intersecting pairs of axis-aligned rectangles (a, b) from listA and listB should be possible in O((m+n)log(m+n)+o) time and O(m+n) space. For sizeable problem instances, avoid data structures needing more than linear space ((original) Segment Trees, for one example pertaining to interval overlap).
Another paradigm in algorithm design is Divide&Conquer: with a computational geometry problem, choose one dimension in which the problem can be divided into independent parts, and a coordinate such that the sub-problems for "coordinates below" and "coordinates above" are close in expected run-time. Quite possibly, another (and different) sub-problem "including the coordinate" needs to be solved. This tends to be beneficial when a) the run-time for solving sub-problems is "super-log-linear", and b) there is a cheap (linear) way to construct the overall solution from the solutions for the sub-problems.
This lends itself to concurrent problem solving, and can be used with any other approach for sub-problems, including line sweep.
There will be many ways to tweak each approach, starting with disregarding input items that can't possibly contribute to the output. To "fairly" compare implementations of algorithms of like order of growth, don't aim for a fair "level of tweakedness": try to invest fair amounts of time for tweaking.
A couple of potential minor efficiency improvements. First, fix your overlap() function, it potentially does calculations it needn't:
def overlap(r1, r2):
if r1.left > r2.right or r1.right < r2.left:
return False
if r1.top < r2.bottom or r1.bottom > r2.top:
return False
return True
Second, calculate the contaning rectangle for one of the lists and use it to screen the other list -- any rectangle that doesn't overlap the container doesn't need to be tested against all the rectangles that contributed to it:
def containing_rectangle(rectangles):
return Rectangle(min(rectangles, key=lambda r: r.left).left,
max(rectangles, key=lambda r: r.right).right,
min(rectangles, key=lambda r: r.bottom).bottom,
max(rectangles, key=lambda r: r.top).top
)
c = containing_rectangle(listA)
for b in listB:
if b.overlap(c):
for a in listA:
if b.overlap(a):
In my testing with hundreds of random rectangles, this avoided comparisons on the order of single digit percentages (e.g. 2% or 3%) and occasionally increased the number of comparisons. However, presumably your data isn't random and might fare better with this type of screening.
Depending on the nature of your data, you could break this up into a container rectangle check for each batch of 10K rectangles out of 50K or what ever slice gives you maximum efficiency. Possibly presorting the rectangles (e.g. by their centers) before assigning them to container batches.
We can break up and batch both lists with container rectangles:
listAA = [listA[x:x + 10] for x in range(0, len(listA), 10)]
for i, arrays in enumerate(listAA):
listAA[i] = [containing_rectangle(arrays)] + arrays
listBB = [listB[x:x + 10] for x in range(0, len(listB), 10)]
for i, arrays in enumerate(listBB):
listBB[i] = [containing_rectangle(arrays)] + arrays
for bb in listBB:
for aa in listAA:
if bb[0].overlap(aa[0]):
for b in bb[1:]:
if b.overlap(aa[0]):
for a in aa[1:]:
if b.overlap(a):
With my random data, this decreased the comparisons on the order of 15% to 20%, even counting the container rectangle comparisons. The batching of rectangles above is arbitrary and you can likely do better.
The exception you're getting comes from the last line of the code you show. The expression list[rect] is not valid, since list is a class, and the [] syntax in that context is trying to index it. You probably want just [rect] (which creates a new list containing the single item rect).
There are several other basic issues, with your code. For instance, your Rect.__init__ method doesn't set a left attribute, which you seem to expect in your collision testing method. You've also used different capitalization for r1 and r2 in different parts of the overlap method (Python doesn't consider r1 to be the same as R1).
Those issues don't really have anything to do with testing more than two rectangles, which your question asks about. The simplest way to do that (and I strongly advise sticking to simple algorithms if you're having basic issues like the ones mentioned above), is to simply compare each rectangle with each other rectangle using the existing pairwise test. You can use itertools.combinations to easily get all pairs of items from an iterable (like a list):
list_of_rects = [rect1, rect2, rect3, rect4] # assume these are defined elsewhere
for a, b in itertools.combinations(list_of_rects, 2):
if a.overlap(b):
# do whatever you want to do when two rectangles overlap here
This implementation using numpy is about 35-40 times faster according to a test I did. For 2 lists each with 10000 random rectangles this method took 2.5 secs and the method in the question took ~90 sec. In terms of complexity it's still O(N^2) like the method in the question.
import numpy as np
rects1=[
[0,10,0,10],
[0,100,0,100],
]
rects2=[
[20,50,20,50],
[200,500,200,500],
[0,12,0,12]
]
data=np.asarray(rects2)
def find_overlaps(rect,data):
data=data[data[::,0]<rect[1]]
data=data[data[::,1]>rect[0]]
data=data[data[::,2]<rect[3]]
data=data[data[::,3]>rect[2]]
return data
for rect in rects1:
overlaps = find_overlaps(rect,data)
for overlap in overlaps:
pass#do something here
Obviously, if your list (at least listB) is sorted by r2.xmin, you can search for r1.xmax in listB and stop testing overlap of r1 in this listB (the rest will be to the right). This will be O(n·log(n)).
A sorted vector has faster access than a sorted list.
I'm supposing that the rectangles edges are oriented same as axis.
Also fix your overlap() function as cdlane explained.
If you know the upper and lower limits for coordinates, you can narrow the search by partitioning the coordinate space into squares e.g. 100x100.
Make one "set" per coordinate square.
Go through all squares, putting them in the "set" of any square they overlap.
See also Tiled Rendering which uses partitions to speed up graphical operations.
// Stores rectangles which overlap (x, y)..(x+w-1, y+h-1)
public class RectangleSet
{
private List<Rectangle> _overlaps;
public RectangleSet(int x, int y, int w, int h);
}
// Partitions the coordinate space into squares
public class CoordinateArea
{
private const int SquareSize = 100;
public List<RectangleSet> Squares = new List<RectangleSet>();
public CoordinateArea(int xmin, int ymin, int xmax, int ymax)
{
for (int x = xmin; x <= xmax; x += SquareSize)
for (int y = ymin; y <= ymax; y += SquareSize)
{
Squares.Add(new RectangleSet(x, y, SquareSize, SquareSize);
}
}
// Adds a list of rectangles to the coordinate space
public void AddRectangles(IEnumerable<Rectangle> list)
{
foreach (Rectangle r in list)
{
foreach (RectangleSet set in Squares)
{
if (r.Overlaps(set))
set.Add(r);
}
}
}
}
Now you have a much smaller set of rectangles for comparison, which should speed things up nicely.
CoordinateArea A = new CoordinateArea(-500, -500, +1000, +1000);
CoordinateArea B = new CoordinateArea(-500, -500, +1000, +1000); // same limits for A, B
A.AddRectangles(listA);
B.AddRectangles(listB);
for (int i = 0; i < listA.Squares.Count; i++)
{
RectangleSet setA = A[i];
RectangleSet setB = B[i];
// *** small number of rectangles, which you can now check thoroghly for overlaps ***
}
I think you have to setup an additional data structure (spatial index) in order to have fast access to nearby rectangles that potentially overlap in order to reduce the time complexity from quadratic to linearithmic.
See also:
https://en.wikipedia.org/wiki/Spatial_database
Spatial Index for Rectangles With Fast Insert
find overlapping rectangles algorithm
Here is what I use to calculate overlap areas of many candidate rectangles (with candidate_coords [[l, t, r, b], ...]) with a target one (target_coords [l, t, r, b]):
comb_tensor = np.zeros((2, candidate_coords.shape[0], 4))
comb_tensor[0, :] = target_coords
comb_tensor[1] = candidate_coords
dx = np.amin(comb_tensor[:, :, 2].T, axis=1) - np.amax(comb_tensor[:, :, 0].T, axis=1)
dy = np.amin(comb_tensor[:, :, 3].T, axis=1) - np.amax(comb_tensor[:, :, 1].T, axis=1)
dx[dx < 0] = 0
dy[dy < 0] = 0
overlap_areas = dx * dy
This should be fairly efficient especially if there are many candidate rectangles as all is done using numpy functions operating on ndarrays. You can either do a loop calculating the overlap areas or perhaps add one more dimension to comb_tensor.
I think the below code will be useful.
print("Identifying Overlap between n number of rectangle")
#List to be used in set and get_coordinate_checked_list
coordinate_checked_list = []
def get_coordinate_checked_list():
#returns the overlapping coordinates of rectangles
"""
:return: list of overlapping coordinates
"""
return coordinate_checked_list
def set_coordinate_checked_list(coordinates):
#appends the overlapping coordinates of rectangles
"""
:param coordinates: list of overlapping coordinates to be appended in coordinate_checked_list
:return:
"""
coordinate_checked_list.append(coordinates)
def overlap_checked_for(coordinates):
# to find rectangle overlap is already checked, if checked "True" will be returned else coordinates will be added
# to coordinate_checked_list and return "False"
"""
:param coordinates: coordinates of two rectangles
:return: True if already checked, else False
"""
if coordinates in get_coordinate_checked_list():
return True
else:
set_coordinate_checked_list(coordinates)
return False
def __isRectangleOverlap(R1, R2):
#checks if two rectangles overlap
"""
:param R1: Rectangle1 with cordinates [x0,y0,x1,y1]
:param R2: Rectangle1 with cordinates [x0,y0,x1,y1]
:return: True if rectangles overlaps else False
"""
if (R1[0] >= R2[2]) or (R1[2] <= R2[0]) or (R1[3] <= R2[1]) or (R1[1] >= R2[3]):
return False
else:
print("Rectangle1 {} overlaps with Rectangle2 {}".format(R1,R2))
return True
def __splitByHeightandWidth(rectangles):
# Gets the list of rectangle, divide the paged with respect to height and width and position
# the rectangle in suitable section say left_up,left_down,right_up,right_down and returns the list of rectangle
# grouped with respect to section
"""
:param rectangles: list of rectangle coordinates each designed as designed as [x0,y0,x1,y1]
:return:list of rectangle grouped with respect to section, suspect list which holds the rectangles
positioned in more than one section
"""
lu_Rect = []
ll_Rect = []
ru_Rect = []
rl_Rect = []
sus_list = []
min_h = 0
max_h = 0
min_w = 0
max_w = 0
value_assigned = False
for rectangle in rectangles:
if not value_assigned:
min_h = rectangle[1]
max_h = rectangle[3]
min_w = rectangle[0]
max_w = rectangle[2]
value_assigned = True
if rectangle[1] < min_h:
min_h = rectangle[1]
if rectangle[3] > max_h:
max_h = rectangle[3]
if rectangle[0] < min_w:
min_w = rectangle[0]
if rectangle[2] > max_w:
max_w = rectangle[2]
for rectangle in rectangles:
if rectangle[3] <= (max_h - min_h) / 2:
if rectangle[2] <= (max_w - min_w) / 2:
ll_Rect.append(rectangle)
elif rectangle[0] >= (max_w - min_w) / 2:
rl_Rect.append(rectangle)
else:
# if rectangle[0] < (max_w - min_w) / 2 and rectangle[2] > (max_w - min_w) / 2:
ll_Rect.append(rectangle)
rl_Rect.append(rectangle)
sus_list.append(rectangle)
if rectangle[1] >= (max_h - min_h) / 2:
if rectangle[2] <= (max_w - min_w) / 2:
lu_Rect.append(rectangle)
elif rectangle[0] >= (max_w - min_w) / 2:
ru_Rect.append(rectangle)
else:
# if rectangle[0] < (max_w - min_w) / 2 and rectangle[2] > (max_w - min_w) / 2:
lu_Rect.append(rectangle)
ru_Rect.append(rectangle)
sus_list.append(rectangle)
if rectangle[1] < (max_h - min_h) / 2 and rectangle[3] > (max_h - min_h) / 2:
if rectangle[0] < (max_w - min_w) / 2 and rectangle[2] > (max_w - min_w) / 2:
lu_Rect.append(rectangle)
ll_Rect.append(rectangle)
ru_Rect.append(rectangle)
rl_Rect.append(rectangle)
sus_list.append(rectangle)
elif rectangle[2] <= (max_w - min_w) / 2:
lu_Rect.append(rectangle)
ll_Rect.append(rectangle)
sus_list.append(rectangle)
else:
# if rectangle[0] >= (max_w - min_w) / 2:
ru_Rect.append(rectangle)
rl_Rect.append(rectangle)
sus_list.append(rectangle)
return [lu_Rect, ll_Rect, ru_Rect, rl_Rect], sus_list
def find_overlap(rectangles):
#Find all possible overlap between the list of rectangles
"""
:param rectangles: list of rectangle grouped with respect to section
:return:
"""
split_Rectangles , sus_list = __splitByHeightandWidth(rectangles)
for section in split_Rectangles:
for rect in range(len(section)-1):
for i in range(len(section)-1):
if section[0] and section[i+1] in sus_list:
if not overlap_checked_for([section[0],section[i+1]]):
__isRectangleOverlap(section[0],section[i+1])
else:
__isRectangleOverlap(section[0],section[i+1])
section.pop(0)
arr =[[0,0,2,2],[0,0,2,7],[0,2,10,3],[3,0,4,1],[6,1,8,8],[0,7,2,8],[4,5,5,6],[4,6,10,7],[9,3,10,5],[5,3,6,4],[4,3,6,5],[4,3,5`enter code here`,6]]
find_overlap(arr)
For a simple solution that improves on pure brute force if the rectangles are relatively sparse:
sort all Y ordinates in a single list, and for every ordinate store the index of the rectangle, the originating list and a flag to distinguish bottom and top;
scan the list from bottom to top, maintaining two "active lists", one per rectangle set;
when you meet a bottom, insert the rectangle index in its active list and compare to all rectangles in the other list to detect overlaps on X;
when you meet a top, remove the rectangle index from its active list.
Assuming simple linear lists, the updates and searches will take time linear in the size of the active lists. So instead of M x N comparisons, you will perform M x n + m x N comparisons, where m and n denote the average list sizes. (If the rectangles do not overlap within their set, one can expect an average list length not exceeding √M and √N.)

Categories