Why does Python Pillow swap x and y axis

Why does Python Pillow swap x and y axis - python

I tried this example from GeeksForGeeks where you create a blue picture. And I wanted to try something more out of it, so I wanted to change a single pixel from blue to red. Meanwhile, I did that successfully, I notice that position of the red pixel is reversed. Instead of [1,4] i got [4,1] pixel to turn red, I noticed the same problem of switching x and y with function Image.frombytes. I tried reading the class(PixelAccess) documentation but haven't found anything. I am using Python 3.10.6 and 9.2.0 the latest version of PIL which makes this post not relevant.
The easiest solution is to switch x and y in code, but I can't find a reason why are they swapped.
from PIL import Image
input = Image.new(mode="RGB",size=(10, 10),
color="blue")
input.save("input", format="png")
pixel_map = input.load()
pixel_map[1,4] = (255,0,0)
input.save("path\\example.png", format="png")
edit:
I have added a red thick red line in the middle.
So regarding this code, the line should vertical, not horizontal like it is
// this code goes instead of line: pixel_map[1,4] = (255,0,0)
i = 0
j = 0
for i in range (10):
for j in range (10):
if j == 4 or j == 5:
pixel_map[i,j] = (255,0,0)
pixel_map[i,j] = (255,0,0)

Summary of my comments:
It is pretty standard to access digital images via [x, y] coordinates, as opposed to [y, x] or [y][x]. In mathematics, arrays are usually indexed by row and then column, but with images, the width component is conventionally first - hence why we say resolutions like "1920x1080", which is the X and then Y value. And, just like when accessing a cartesian coordinate plane in mathematics, X refers to the horizontal component, and is first, while Y is second and refers to the vertical component. So, images tend to be treated more like a coordinate system than a matrix, at least when a layer of abstraction is added like PIL is doing. Hence, I think this can be confusing for those who are used to how 2D arrays are typically indexed.
Here is a post which does a great job explaining why there's different coordinate systems. It's far more detailed and well-researched than what I'm capable of coming up with right now.
Like I said I think there's just some understandable confusion when it comes to transitioning from thinking of the first index as the row, and the second as the column, when with digital images it's the other way around usually. In the end, the order is just determined by the tool you are using. Some tools use pixel coordinates (x, y) while others use matrix coordinates (y, x). Matrix coordinates are indeed how images are usually internally stored, but I think the (x, y) order is a layer of "convenience" that is added sometimes. This thread has some related discussion: why should I use (y,x) instead of (x,y) to access a pixel in opencv?
If you look at the Pillow source code that actually gets the pixel when accessing the loaded image data, you'll see that it actually indexes self.pixels[y][x]. So, internally, it's being stored how you expect; it's just that the pixel access object deliberately chose the index to be in (x, y) order. The PixelAccess object can be indexed via [x, y] but internally it stores the pixels as [y][x], so internally it swaps the order. You do not have access to this internal representation as far as I know. That's just an internal implementation detail though.

Related

Generating a 3D raster scan over the surface (shell) of non-rectangular shapes

So I have a hobby project of strapping a USB microscope onto a 3D printer to take pictures of an object at different X, Y, and Z locations, which are then stitched into a full image. I started with the code from this stack to generate a 2D rectangular raster pattern, and then upgraded that to do 2.5 D imaging (stacking and stitching) by looping repeatedly to move the Z axis after each 2D scan, and finally doing the same with a 4th axis to enable 3D imaging.
The problem is that most things are not rectangles, and the scan can end up becoming very wasteful, stopping to take pictures in areas that are not object or obviously out of focus. I would love to be able to 1: shift the "scan plane", for instance to scan a flat object tilted upwards, and 2: Just in general be able to generate arbitrary scan patterns, or at least simple ideal shapes.
How would you take information about a 3-dimensional shape (for instance, from an STL), and wrap an otherwise 2D "dot matrix" raster pattern around the surface (or at least the part pointing up)?

I figured out the basics for doing this with my original rectangular/cuboid scans. The math is simple to rotate about for instance the Y Axis:
def RotateArray(ScanLocations, degrees = 30):
#function to rotate a 3D array around a specified axis (currently Y). Ideally, around arb point in space.
#X Location minus offset becomes new hypotenuse after rotating.
#(sin(degrees) * X) + Z gives new Z .
#cos(degrees)* X gives new X. Right? Y should be unchanged.
XLocations,ZLocations = ScanLocations['X'],ScanLocations['Z']
sinof = sin(np.deg2rad(degrees))
cosof = cos(np.deg2rad(degrees))
XOffset = min(XLocations) #not fair to assume it is zeroth position
ZLocations = [round((x-XOffset)*sinof + z,2) for x, z in zip(XLocations, ZLocations)]
XLocations = [round(((i - XOffset) * cosof)+XOffset,2) for i in XLocations]
ScanLocations['X'] = XLocations
ScanLocations['Z'] = ZLocations
return (ScanLocations)
Visualization using ncviewer.com:
original scan
Rotated Scan around Y axis
A new problem I need to solve now is rearranging my movements to be more efficient. I'd like to selectively priority one axis, for instance, Z, so that images which will be stacked are taking without X/Y movements in between.

Changing the coordinate map in function of its coordinates (gravitational lensing)

I'm trying to compute the optical phenomenon called Gravitational Lensing. In simple words its when a massive object (or with massiva mass) its between me as an observer and a star or some clase of light source. Because its massive mass the light will bend and for us it will apparently come from another location than it real position. There is a particular case (and simpler) where we suppose the mass is spheric, so from our perspective its circular in a 2D plane (or photo).
My idea for code that was changing the coordinates of a 2D plane in function of where my source light its. In other words, if I have a spheric light source, if it is far from my massive object it will image no change, but if its close to te spheric mass it will change (in fact, if its exactly behind the massive object I as an observer will see the called Einstein Ring).
For compute that I first write a mapping of this function. I take the approximation of a = x + sin(t)/exp(x) , b = y + cos(t)/exp(y). So when the source light its far from the mass, the exponential will be approximately zero, and if it is just behind the mass the source light coordinates will be (0,0), so the imagen will return (sin(t),cos(t)) the Einstein circle I was expected to get.
I code that in this way, first I define my approximation:
def coso1(x,y):
t = arange(0,2*pi, .01);
a = x + sin(t)/exp(x)
b = y + cos(t)/exp(y)
plt.plot(a,b)
plt.show()
Then I try to plot that to see how the coordinate map is changing:
from numpy import *
from matplotlib.pyplot import *
x=linspace(-10,10,10)
y=linspace(-10,10,10)
y = y.reshape(y.size, 1)
x = x.reshape(x.size, 1)
plot(coso1(x,y))
And I get this plot.
Graphic
Notice that it looks that way because the intervale I choose to take values for the x and y coordinates. If I take place in the "frontier" case where x={-1,0,1} and y={-1,0,1} it will show how the space its been deformed (or I'm guessing thats what Im seeing).
I then have a few questions. An easy question but that I hadnt find an easy answer its if I can manipulate this transformation (rotate with the mouse to aprecciate the deformation, a controller of how x or y change). And the two hard questions: Can I plot the countour lines to see how exactly are changing the topography of my map in every level of x (suppose I let y be constant), and the other question: If this is my "new" way of how the map is acting, can I use this new coordinate map as a tool where If a project any image it will be distorted in function of this "new" map. Something analogous of how cameras works with fish lens effect.

What does cv2.cv.BoxPoints(rect) return?

rect = cv2.minAreaRect(largest_contour)
rect = ((rect[0][0] * self.scale_down, rect[0][1] * self.scale_down), (rect[1][0] * self.scale_down, rect[1][1] * self.scale_down), rect[2])
box = cv2.cv.BoxPoints(rect)
print box
box = np.int0(box)
cv2.drawContours(frame,[box], 0, (0, 0, 255), 2)
This is how my code looks like. I tried to print the box to see what's that and I got some printing result such as ((200.0, 472.0), (200.0, 228.0), (420.0, 228.0), (420.0, 472.0)). It should have something to do x and y coordinates right? I guess that's the four corners of the rectangle? So what are they exactly? Thanks!

The common misconception of the "box" values is that the first sub-list of the "box" ndarray is always the bottom-left point of the rectangle.
For example, in the rectangle shown below, the first sub-list of "box" ndarray need not represent point A always.
So here is what "box" values represent:
As the question rightly points out, when you print box, you will get a ndarray that looks something like this:
And then I went an extra mile for description and wrote this simple for loop to really understand what "box" values actually represent:
for i in box:
cv2.circle(image,(i[0],i[1]), 3, (0,255,0), -1)
imgplot = plt.imshow(image)
plt.show()
And the results are: (the images are in order)
I think the images should have cleared anybody's doubt about "box" values, but here is a summary anyway:
The lowest point of the rectangle(does not matter left or right) will always be the first sub-list of the "box" ndarray. So in the example I have given, the first sub-list [169 144] represents the "bottom right of this rectangle".
Now this point will be the reference point to decide what the next sub-list represents. Meaning, the next sub-list will always represent the point that you first get when you move in the clockwise direction. (as shown in the second image of the for loop)
And keep moving in the clockwise direction to see what the next sub-lists represent.
PS: It is sometimes very hard to read the OpenCV documentation(which is not the best in the world btw) and understand a function and its return values properly. So I suggest churn up little chunks of code, like the for loop and cv2.circle above, to really visualize the return values of a function. That should really clear all your doubts about any functions that you come across in OpenCV. After all, OpenCV is all about "visual"izing!

Those are the 4 points defining the rotated rectangle provided to it. Keep in mind that in opencv points are plotted as (x,y) not (row,column), and the y axis is positive downward. So the first point would be plotted 200 pixels right of the left side of the image and 472 pixels down from the top of the image. In other words, the first point is the bottom left point of the image.

Leaving this here for who -like me- finds this and reads the (current) most-voted answer: that now seems to be outdated.
Currently (using OpenCV 4.5.4, I don't know since when this is the case), the beaviour of cv.boxPoints() seems to match the behaviour of cv::RotatedRect::points(), i.e., th eorder of the returned points is: [bottom-left, top-left, top-right, bottom-right].
There is no explicit confirmation of this in the documetation, but the docs for cv.boxPoints() mention using directly cv::RotatedRect::points() in C++ and the following example shows that the solution by Sushanth seems to be wrong now (forgive the weird numbers, this comes directly out of the debugger in one of my projects):
rotrec = (
(27.425756454467773, 947.3493041992188), # center pt
(14.5321683883667, 50.921504974365234), # W, H
70.49755096435547 # angle
)
cv2.boxPoints(rotrec)
output:
array([[ 0.99999475, 949.0001 ],
[ 48.999996 , 932.0001 ],
[ 53.851517 , 945.6985 ],
[ 5.8515167 , 962.6985 ]], dtype=float32)
(note that the last point has a higher Y coordinate and should thus be the first point in the returned polygon, according to the algorithm described by Sushanth)

I think the first point will always be the bottom most points, and it will actually be the bottom right (if there are multiple points that could be the bottom most point).

I was facing the same issue..
first of all the syntax should be boxPoints not BoxPoints
then, run the program with python3.. it fixed my issue
example, python3 test.py

<class 'numpy.ndarray'>
you can find by type(elemnt you want to know)
It's strange that it is not in the documentation of the opencv python what the methods return

Creating a fool proof graphing calculator using python - Python 2.7

I am trying to create a fool proof graphing calculator using python and pygame.
I created a graphing calculator that works for most functions. It takes a user string infix expression and converts it to postfix for easier calculations. I then loop through and pass in x values into the postfix expression to get a Y value for graphing using pygame.
The first problem I ran into was when taking calculations of impossible things. (like dividing by zero, square root of -1, 0 ^ non-positive number). If something like this would happen I would output None and that pixel wouldn't be added to the list of points to be graphed.
* I have showed all the different attempts I have made at this to help you understand where I cam coming from. If you would like to only see my most current code and method, jump down to where it says "current".
Method 1
My first method was after I acquired all my pixel values, I would paint them using the pygame aalines function. This worked, except it wouldn't work when there were missing points in between actual points because it would just draw the line across the points. (1/x would not work but something like 0^x would)
This is what 1/x looks like using the aalines method
Method 1.1
My next Idea was to split the line into two lines every time a None was printed back. This worked for 1/x, but I quickly realized that it would only work if one of the passed in X values exactly landed on a Y value of None. 1/x might work, but 1/(x+0.0001) wouldn't work.
Method 2
My next method was to convert the each pixel x value into the corresponding x point value in the window (for example, (0,0) on the graphing window actually would be pixel (249,249) on a 500x500 program window). I would then calculate every y value with the x values I just created. This would work for any line that doesn't have a slope > 1 or < -1.
This is what 1/x would look like using this method.
Current
My most current method is supposed to be a advanced working version of method 2.
Its kind of hard to explain. Basically I would take the x value in between each column on the display window. For every pixel I would do this just to the left and just to the right of it. I would then plug those two values into the expression to get two Y values. I would then loop through each y value on that column and check if the current value is in between both of the Y values calculated earlier.
size is a list of size two that is the dimensions of the program window.
xWin is a list of size two that holds the x Min and x Max of the graphing window.
yWin is a list of size two that holds the y Min and y Max of the graphing window.
pixelToPoint is a function that takes scalar pixel value (just x or just y) and converts it to its corresponding value on the graphing window
pixels = []
for x in range(size[0]):
leftX = pixelToPoint(x,size[0]+1, xWin, False)
rightX = pixelToPoint(x+1, size[0]+1, xWin, False)
leftY = calcPostfix(postfix, leftX)
rightY = calcPostfix(postfix, rightX)
for y in range(size[1]):
if leftY != None and rightY != None:
yPoint = pixelToPoint(y,size[1],yWin, True)
if (rightY <= yPoint <= leftY) or (rightY >= yPoint >= leftY):
pixels.append((x,y))
for p in pixels:
screen.fill(BLACK, (p, (1, 1)))
This fixed the problem in method 2 of having the pixels not connected into a continuous line. However, it wouldn't fix the problem of method 1 and when graphing 1/x, it looked exactly the same as the aalines method.
-------------------------------------------------------------------------------------------------------------------------------
I am stuck and can't think of a solution. The only way I can think of fixing this is by using a whole bunch of x values. But this way seems really inefficient. Also I am trying to make my program as resizable and customizable as possible so everything must be variably driven and I am not sure what type of calculations are needed to find out how many x values are needed to be used depending on the program window size and the graph's window size.
I'm not sure if I am on the right track or if there is a completely different method of doing this, but I want to create my graphing calculator to able to graph any function (just like my actual graphing calculator).
Edit 1
I just tried using as many x values as there are pixels (500x500 display window calculates 250,000 y values).
Its worked for every function I've tried with it, but it is really slow. It takes about 4 seconds to calculate (it fluctuates depending on the equation). I've looked around online and have found graphing calculators that are almost instantaneous in their graphing, but I cant figure out how they do it.
This online graphing calcuator is extremely fast and effective. There must be some algorithm other than using a bunch of x values than can achieve what I want because that site is doing it..

The problem you have is that to be able to know if between two point you can reasonably draw a line you have to know if the function is continuous in the interval.
It is a complex problem in General what you could do is use the following heuristic. If the slope of the line have changed too much from the previous one guess you have a non continuous point in the interval and don't draw a line.

Another solution would be based on solution 2.
After have draw the points that correspond to every value of the x axis try to draw for every adjacent x: (x1, x2) the y within (y1 = f(x1), y2 = f(x2)) that can be reach by an x within (x1, x2).
This can be done by searching by dichotomy or via the Newton search heuristic an x that could fit.

Resizing image algorithm in python

So, I'm learning my self python by this tutorial and I'm stuck with exercise number 13 which says:
Write a function to uniformly shrink or enlarge an image. Your function should take an image along with a scaling factor. To shrink the image the scale factor should be between 0 and 1 to enlarge the image the scaling factor should be greater than 1.
This is not meant as a question about PIL, but to ask which algorithm to use so I can code it myself.
I've found some similar questions like this, but I dunno how to translate this into python.
Any help would be appreciated.
I've come to this:
import image
win = image.ImageWin()
img = image.Image("cy.png")
factor = 2
W = img.getWidth()
H = img.getHeight()
newW = int(W*factor)
newH = int(H*factor)
newImage = image.EmptyImage(newW, newH)
for col in range(newW):
for row in range(newH):
p = img.getPixel(col,row)
newImage.setPixel(col*factor,row*factor,p)
newImage.draw(win)
win.exitonclick()
I should do this in a function, but this doesn't matter right now. Arguments for function would be (image, factor). You can try it on OP tutorial in ActiveCode. It makes a stretched image with empty columns :.

Your code as shown is simple and effective for what's known as a Nearest Neighbor resize, except for one little bug:
p = img.getPixel(col/factor,row/factor)
newImage.setPixel(col,row,p)
Edit: since you're sending a floating point coordinate into getPixel you're not limited to Nearest Neighbor - you can implement any interpolation algorithm you want inside. The simplest thing to do is simply truncate the coordinates to int which will cause pixels to be replicated when factor is greater than 1, or skipped when factor is less than 1.

Mark has the correct approach. To get a smoother result, you replace:
p = img.getPixel(col/factor,row/factor)
with a function that takes floating point coordinates and returns a pixel interpolated from several neighboring points in the source image. For linear interpolation it takes the four nearest neigbors; for higher-order interpolation it takes a larger number of surrounding pixels.
For example, if col/factor = 3.75 and row/factor = 1.9, a linear interpolation would take the source pixels at (3,1), (3,2), (4,1), and (4,2) and give a result between those 4 rgb values, weighted most heavily to the pixel at (4,2).

You can do that using the Python Imaging Library.
Image.resize() should do what you want.
See http://effbot.org/imagingbook/image.htm
EDIT
Since you want to program this yourself without using a module, I have added an extra solution.
You will have to use the following algorithm.
load your image
extract it's size
calculate the desired size (height * factor, width * factor)
create a new EmptyImage with the desired size
Using a nested loop through the pixels (row by column) in your image.
Then (for shrinking) you remove some pixels every once in while, or for (enlarging) you duplicate some pixels in your image.
If you want you want to get fancy, you could smooth the added, or removed pixels, by averaging the rgb values with their neighbours.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.