What does cv2.cv.BoxPoints(rect) return?

What does cv2.cv.BoxPoints(rect) return? - python

rect = cv2.minAreaRect(largest_contour)
rect = ((rect[0][0] * self.scale_down, rect[0][1] * self.scale_down), (rect[1][0] * self.scale_down, rect[1][1] * self.scale_down), rect[2])
box = cv2.cv.BoxPoints(rect)
print box
box = np.int0(box)
cv2.drawContours(frame,[box], 0, (0, 0, 255), 2)
This is how my code looks like. I tried to print the box to see what's that and I got some printing result such as ((200.0, 472.0), (200.0, 228.0), (420.0, 228.0), (420.0, 472.0)). It should have something to do x and y coordinates right? I guess that's the four corners of the rectangle? So what are they exactly? Thanks!

The common misconception of the "box" values is that the first sub-list of the "box" ndarray is always the bottom-left point of the rectangle.
For example, in the rectangle shown below, the first sub-list of "box" ndarray need not represent point A always.
So here is what "box" values represent:
As the question rightly points out, when you print box, you will get a ndarray that looks something like this:
And then I went an extra mile for description and wrote this simple for loop to really understand what "box" values actually represent:
for i in box:
cv2.circle(image,(i[0],i[1]), 3, (0,255,0), -1)
imgplot = plt.imshow(image)
plt.show()
And the results are: (the images are in order)
I think the images should have cleared anybody's doubt about "box" values, but here is a summary anyway:
The lowest point of the rectangle(does not matter left or right) will always be the first sub-list of the "box" ndarray. So in the example I have given, the first sub-list [169 144] represents the "bottom right of this rectangle".
Now this point will be the reference point to decide what the next sub-list represents. Meaning, the next sub-list will always represent the point that you first get when you move in the clockwise direction. (as shown in the second image of the for loop)
And keep moving in the clockwise direction to see what the next sub-lists represent.
PS: It is sometimes very hard to read the OpenCV documentation(which is not the best in the world btw) and understand a function and its return values properly. So I suggest churn up little chunks of code, like the for loop and cv2.circle above, to really visualize the return values of a function. That should really clear all your doubts about any functions that you come across in OpenCV. After all, OpenCV is all about "visual"izing!

Those are the 4 points defining the rotated rectangle provided to it. Keep in mind that in opencv points are plotted as (x,y) not (row,column), and the y axis is positive downward. So the first point would be plotted 200 pixels right of the left side of the image and 472 pixels down from the top of the image. In other words, the first point is the bottom left point of the image.

Leaving this here for who -like me- finds this and reads the (current) most-voted answer: that now seems to be outdated.
Currently (using OpenCV 4.5.4, I don't know since when this is the case), the beaviour of cv.boxPoints() seems to match the behaviour of cv::RotatedRect::points(), i.e., th eorder of the returned points is: [bottom-left, top-left, top-right, bottom-right].
There is no explicit confirmation of this in the documetation, but the docs for cv.boxPoints() mention using directly cv::RotatedRect::points() in C++ and the following example shows that the solution by Sushanth seems to be wrong now (forgive the weird numbers, this comes directly out of the debugger in one of my projects):
rotrec = (
(27.425756454467773, 947.3493041992188), # center pt
(14.5321683883667, 50.921504974365234), # W, H
70.49755096435547 # angle
)
cv2.boxPoints(rotrec)
output:
array([[ 0.99999475, 949.0001 ],
[ 48.999996 , 932.0001 ],
[ 53.851517 , 945.6985 ],
[ 5.8515167 , 962.6985 ]], dtype=float32)
(note that the last point has a higher Y coordinate and should thus be the first point in the returned polygon, according to the algorithm described by Sushanth)

I think the first point will always be the bottom most points, and it will actually be the bottom right (if there are multiple points that could be the bottom most point).

I was facing the same issue..
first of all the syntax should be boxPoints not BoxPoints
then, run the program with python3.. it fixed my issue
example, python3 test.py

<class 'numpy.ndarray'>
you can find by type(elemnt you want to know)
It's strange that it is not in the documentation of the opencv python what the methods return

Related

Why does Python Pillow swap x and y axis

I tried this example from GeeksForGeeks where you create a blue picture. And I wanted to try something more out of it, so I wanted to change a single pixel from blue to red. Meanwhile, I did that successfully, I notice that position of the red pixel is reversed. Instead of [1,4] i got [4,1] pixel to turn red, I noticed the same problem of switching x and y with function Image.frombytes. I tried reading the class(PixelAccess) documentation but haven't found anything. I am using Python 3.10.6 and 9.2.0 the latest version of PIL which makes this post not relevant.
The easiest solution is to switch x and y in code, but I can't find a reason why are they swapped.
from PIL import Image
input = Image.new(mode="RGB",size=(10, 10),
color="blue")
input.save("input", format="png")
pixel_map = input.load()
pixel_map[1,4] = (255,0,0)
input.save("path\\example.png", format="png")
edit:
I have added a red thick red line in the middle.
So regarding this code, the line should vertical, not horizontal like it is
// this code goes instead of line: pixel_map[1,4] = (255,0,0)
i = 0
j = 0
for i in range (10):
for j in range (10):
if j == 4 or j == 5:
pixel_map[i,j] = (255,0,0)
pixel_map[i,j] = (255,0,0)

Summary of my comments:
It is pretty standard to access digital images via [x, y] coordinates, as opposed to [y, x] or [y][x]. In mathematics, arrays are usually indexed by row and then column, but with images, the width component is conventionally first - hence why we say resolutions like "1920x1080", which is the X and then Y value. And, just like when accessing a cartesian coordinate plane in mathematics, X refers to the horizontal component, and is first, while Y is second and refers to the vertical component. So, images tend to be treated more like a coordinate system than a matrix, at least when a layer of abstraction is added like PIL is doing. Hence, I think this can be confusing for those who are used to how 2D arrays are typically indexed.
Here is a post which does a great job explaining why there's different coordinate systems. It's far more detailed and well-researched than what I'm capable of coming up with right now.
Like I said I think there's just some understandable confusion when it comes to transitioning from thinking of the first index as the row, and the second as the column, when with digital images it's the other way around usually. In the end, the order is just determined by the tool you are using. Some tools use pixel coordinates (x, y) while others use matrix coordinates (y, x). Matrix coordinates are indeed how images are usually internally stored, but I think the (x, y) order is a layer of "convenience" that is added sometimes. This thread has some related discussion: why should I use (y,x) instead of (x,y) to access a pixel in opencv?
If you look at the Pillow source code that actually gets the pixel when accessing the loaded image data, you'll see that it actually indexes self.pixels[y][x]. So, internally, it's being stored how you expect; it's just that the pixel access object deliberately chose the index to be in (x, y) order. The PixelAccess object can be indexed via [x, y] but internally it stores the pixels as [y][x], so internally it swaps the order. You do not have access to this internal representation as far as I know. That's just an internal implementation detail though.

Implementing collisions using the Separating axis theorem and Pygame [duplicate]

This is what I am currently doing:
Creating 4 axis that are perpendicular to 4 edges of 2 rectangles. Since they are rectangles I do not need to generate an axis (normal) per edge.
I then loop over my 4 axes.
So for each axis:
I get the projection of every corner of a rectangle on to the axis.
There are 2 lists (arrays) containing those projections. One for each rectangle.
I then get the dot product of each projection and the axis. This returns a scalar value
that can be used to to determine the min and max.
Now the 2 lists contain scalars and not vectors. I sort the lists so I can easily select the min and max values. If the min of box B >= the max of box A OR the max of box B <= the min of box A then there is no collision on that axis and no collision between the objects.
At this point the function finishes and the loop breaks.
If those conditions are never met for all the axis then we have a collision
I hope this was the correct way of doing it.
The python code itself can be found here http://pastebin.com/vNFP3mAb
Also:
http://www.gamedev.net/page/reference/index.html/_/reference/programming/game-programming/collision-detection/2d-rotated-rectangle-collision-r2604
The problem i was having is that the code above does not work. It always detects a a collision even where there is not a collision. What i typed out is exactly what the code is doing. If I am missing any steps or just not understanding how SAT works please let me know.

In general it is necessary to carry out the steps outlined in the Question to determine if the rectangles "collide" (intersect), noting as the OP does that we can break (with a conclusion of non-intersection) as soon as a separating axis is found.
There are a couple of simple ways to "optimize" in the sense of providing chances for earlier exits. The practical value of these depends on the distribution of rectangles being checked, but both are easily incorporated in the existing framework.
(1) Bounding Circle Check
One quick way to prove non-intersection is by showing the bounding circles of the two rectangles do not intersect. The bounding circle of a rectangle shares its center, the midpoint of either diagonal, and has diameter equal to the length of either diagonal. If the distance between the two centers exceeds the sum of the two circles' radii, then the circles do not intersect. Thus the rectangles also cannot intersect. If the purpose was to find an axis of separation, we haven't accomplished that yet. However if we only want to know if the rectangles "collide", this allows an early exit.
(2) Vertex of one rectangle inside the other
The projection of a vertex of one rectangle on axes parallel to the other rectangle's edges provides enough information to detect when that vertex is inside the other rectangle. This check is especially easy when the latter rectangle has been translated and unrotated to the origin (with edges parallel to the ordinary axes). If it happens that a vertex of one rectangle is inside the other, the rectangles obviously intersect. Of course this is a sufficient condition for intersection, not a necessary one. But it allows for an early exit with a conclusion of intersection (and of course without finding an axis of separation because none will exist).

I see two things wrong. First, the projection should simply be the dot product of a vertex with the axis. What you're doing is way too complicated. Second, the way you get your axis is incorrect. You write:
Axis1 = [ -(A_TR[0] - A_TL[0]),
A_TR[1] - A_TL[1] ]
Where it should read:
Axis1 = [ -(A_TR[1] - A_TL[1]),
A_TR[0] - A_TL[0] ]
The difference is coordinates does give you a vector, but to get the perpendicular you need to exchange the x and y values and negate one of them.
Hope that helps.
EDIT Found another bug
In this code:
if not ( B_Scalars[0] <= A_Scalars[3] or B_Scalars[3] >= A_Scalars[0] ):
#no overlap so no collision
return 0
That should read:
if not ( B_Scalars[3] <= A_Scalars[0] or A_Scalars[3] <= B_Scalars[0] ):
Sort gives you a list increasing in value. So [1,2,3,4] and [10,11,12,13] do not overlap because the minimum of the later is greater than the maximum of the former. The second comparison is for when the input sets are swapped.

Rephrase spirograph code into function

I'm writing a python spirograph program, and I need some help with converting part of it into a function. The code is attempting to reproduce the result illustrated in the video I found here. One line rotates around the origin, and then another rotates off the end of that, etc.
With a little bit of research into (what I think is) trigonometry, I put together a function rotate(point, angle, center=(0, 0)). The user inputs a point to be rotated, the angle (clockwise) that it is to be rotated by, and the centerpoint for it to be rotated around.
Then, I implemented an initial test, whereby one line rotates around the other. The end of the second line draws as if it were holding a pen. The code's a little messy, but it looks like this.
x, y = 0, 0
lines = []
while 1:
point1 = rotate((0,50), x)
point2 = map(sum,zip(rotate((0, 50), y), point1))
if x == 0:
oldpoint2 = point2
else:
canvas.create_line(oldpoint2[0], oldpoint2[1], point2[0], point2[1])
lines.append( canvas.create_line(0, 0, point1[0], point1[1]) )
lines.append( canvas.create_line(point1[0], point1[1], point2[0], point2[1]) )
oldpoint2 = point2
tk.update()
x += 5
if x > 360 and y > 360:
x -= 360
canvas.delete("all")
time.sleep(1)
y += 8.8
if y > 360: y -= 360
for line in lines:
canvas.delete(line)
lines = []
Great, works perfectly. My ultimate goal is what's in the video, however. In the video, the user can input any arbitrary number of arms, then define the length and angular velocity for each arm. Mine only works with two arms. My question, ultimately, is how to put the code I posted into a function that looks like drawSpiral(arms, lenlist, velocitylist). It would take the number of arms, a list of the velocities for each arm, and a list of the length of each arm as arguments.
What I've Tried
I've already attempted this several times. Initially, I had something that didn't work at all. I got some cool shapes, but definitely not the desired output. I've worked for a few hours, and the closest I could get was this:
def drawSpiral(arms, lenlist, velocitylist):
if not arms == len(lenlist) == len(velocitylist):
raise ValueError("The lists don't match the provided number of arms")
iteration = 0
while 1:
tk.update()
iteration += 1
#Empty the list of points
pointlist = []
pointlist.append((0, 0))
#Create a list of the final rotation degrees for each point
rotations = []
for vel in velocitylist:
rotations.append(vel*iteration)
for n in range(arms):
point = tuple(map(sum,zip(rotate((0, lenlist[n]), rotations[n], pointlist[n]))))
pointlist.append(point)
for point in pointlist:
create_point(point)
for n in range(arms):
print pointlist[n], pointlist[n+1]
This is fairly close to my solution, I feel, but not quite there. Calling drawSpiral(2, [50, 75], [1, 5]) looks like it might be producing some of the right points, but not connecting the right sets. Staring at it for about an hour and trying a few things, I haven't made any progress. I've also gotten pretty confused looking at my own code. I'm stuck! The point rotating around the center is attached to a point that is just flying diagonally across the screen and back. The line attached to the center is stretching back and forth. Can someone point me in the right direction?
Results of further tests
I've set up both functions to plot points at the ends of each arm, and found some interesting results. The first arm, in both cases, is rotating at a speed of 5, and the second at a speed of -3. The loop, outside of the function, is producing the pattern:
The function, called with drawSpiral(2, [50, 50], [5, -3]), produces the result of
It seems to be stretching the top half. With both arms having a velocity of 5, the function would be expected to produce two circles, one larger than the other. However, it produces an upside-down cardioid shape, with the point connected to the center.
Now there's more evidence, can anyone who understands math more than me help me?

Your error is in
for n in range(arms):
point = tuple(map(sum,zip(rotate((0, lenlist[n]), rotations[n], pointlist[n]))))
pointlist.append(point)
Specifically,
rotate((0, lenlist[n])
replace it with
for n in range(arms):
point = tuple(map(sum,zip(rotate((pointlist[n][0], lenlist[n]), rotations[n], pointlist[n]))))
pointlist.append(point)
You go against the usual mathematical notation for polars (circular graphs) and that caused your confusion and eventual issues. As far as I can tell your function is plotting an (X,Y) point (0,length) and then finding the difference between that point and the center point (which is correctly defined as the last point you found) and rotating it around that center. The issue is that (0,length) is not 'length' away from the center. By replacing the (0,lenlist[n]) with (pointlist[n][0],lenlist[n]) makes the next point based upon the last point.
Also I would recommend editing your rotate function to be rotate(length,angle,centerpoint) which would simplify the inputs to a more traditional representation.

Resizing image algorithm in python

So, I'm learning my self python by this tutorial and I'm stuck with exercise number 13 which says:
Write a function to uniformly shrink or enlarge an image. Your function should take an image along with a scaling factor. To shrink the image the scale factor should be between 0 and 1 to enlarge the image the scaling factor should be greater than 1.
This is not meant as a question about PIL, but to ask which algorithm to use so I can code it myself.
I've found some similar questions like this, but I dunno how to translate this into python.
Any help would be appreciated.
I've come to this:
import image
win = image.ImageWin()
img = image.Image("cy.png")
factor = 2
W = img.getWidth()
H = img.getHeight()
newW = int(W*factor)
newH = int(H*factor)
newImage = image.EmptyImage(newW, newH)
for col in range(newW):
for row in range(newH):
p = img.getPixel(col,row)
newImage.setPixel(col*factor,row*factor,p)
newImage.draw(win)
win.exitonclick()
I should do this in a function, but this doesn't matter right now. Arguments for function would be (image, factor). You can try it on OP tutorial in ActiveCode. It makes a stretched image with empty columns :.

Your code as shown is simple and effective for what's known as a Nearest Neighbor resize, except for one little bug:
p = img.getPixel(col/factor,row/factor)
newImage.setPixel(col,row,p)
Edit: since you're sending a floating point coordinate into getPixel you're not limited to Nearest Neighbor - you can implement any interpolation algorithm you want inside. The simplest thing to do is simply truncate the coordinates to int which will cause pixels to be replicated when factor is greater than 1, or skipped when factor is less than 1.

Mark has the correct approach. To get a smoother result, you replace:
p = img.getPixel(col/factor,row/factor)
with a function that takes floating point coordinates and returns a pixel interpolated from several neighboring points in the source image. For linear interpolation it takes the four nearest neigbors; for higher-order interpolation it takes a larger number of surrounding pixels.
For example, if col/factor = 3.75 and row/factor = 1.9, a linear interpolation would take the source pixels at (3,1), (3,2), (4,1), and (4,2) and give a result between those 4 rgb values, weighted most heavily to the pixel at (4,2).

You can do that using the Python Imaging Library.
Image.resize() should do what you want.
See http://effbot.org/imagingbook/image.htm
EDIT
Since you want to program this yourself without using a module, I have added an extra solution.
You will have to use the following algorithm.
load your image
extract it's size
calculate the desired size (height * factor, width * factor)
create a new EmptyImage with the desired size
Using a nested loop through the pixels (row by column) in your image.
Then (for shrinking) you remove some pixels every once in while, or for (enlarging) you duplicate some pixels in your image.
If you want you want to get fancy, you could smooth the added, or removed pixels, by averaging the rgb values with their neighbours.

Separating Axis Theorem and Python

This is what I am currently doing:
Creating 4 axis that are perpendicular to 4 edges of 2 rectangles. Since they are rectangles I do not need to generate an axis (normal) per edge.
I then loop over my 4 axes.
So for each axis:
I get the projection of every corner of a rectangle on to the axis.
There are 2 lists (arrays) containing those projections. One for each rectangle.
I then get the dot product of each projection and the axis. This returns a scalar value
that can be used to to determine the min and max.
Now the 2 lists contain scalars and not vectors. I sort the lists so I can easily select the min and max values. If the min of box B >= the max of box A OR the max of box B <= the min of box A then there is no collision on that axis and no collision between the objects.
At this point the function finishes and the loop breaks.
If those conditions are never met for all the axis then we have a collision
I hope this was the correct way of doing it.
The python code itself can be found here http://pastebin.com/vNFP3mAb
Also:
http://www.gamedev.net/page/reference/index.html/_/reference/programming/game-programming/collision-detection/2d-rotated-rectangle-collision-r2604
The problem i was having is that the code above does not work. It always detects a a collision even where there is not a collision. What i typed out is exactly what the code is doing. If I am missing any steps or just not understanding how SAT works please let me know.

I see two things wrong. First, the projection should simply be the dot product of a vertex with the axis. What you're doing is way too complicated. Second, the way you get your axis is incorrect. You write:
Axis1 = [ -(A_TR[0] - A_TL[0]),
A_TR[1] - A_TL[1] ]
Where it should read:
Axis1 = [ -(A_TR[1] - A_TL[1]),
A_TR[0] - A_TL[0] ]
The difference is coordinates does give you a vector, but to get the perpendicular you need to exchange the x and y values and negate one of them.
Hope that helps.
EDIT Found another bug
In this code:
if not ( B_Scalars[0] <= A_Scalars[3] or B_Scalars[3] >= A_Scalars[0] ):
#no overlap so no collision
return 0
That should read:
if not ( B_Scalars[3] <= A_Scalars[0] or A_Scalars[3] <= B_Scalars[0] ):
Sort gives you a list increasing in value. So [1,2,3,4] and [10,11,12,13] do not overlap because the minimum of the later is greater than the maximum of the former. The second comparison is for when the input sets are swapped.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.