i am trying to make a tracking program that takes an image and displays where the the object with the specified color is:
example: https://imgur.com/a/8LR40
to do this i am using RGB right now but it is realy hard to work with it so i want to convert it into a hue so it is easier to work with. i am trying to use colorsys but after doing some research i have no idea what parameters it wants in and what it gives. i have tried to get a match using colorizer.org but i get some nonsence.
>>> import colorsys
>>> colorsys.rgb_to_hsv(45,201,18)
(0.3087431693989071, 0.9104477611940298, 201)
alredy the colorsys is not acting as documented because at https://docs.python.org/2/library/colorsys.html it says that the output is always a float between 0 and 1, but the value is 201. that also is impossible as in standard HSV the value is between 0 and 100.
my questions are:
what does colorsys expect as an input?
how do i convert the output to standard HSV? (Hue = 0-360, saturation = 0-100, value = 0-100)
Coordinates in all of these color spaces are floating point values. In the YIQ space, the Y coordinate is between 0 and 1, but the I and Q coordinates can be positive or negative. In all other spaces, the coordinates are all between 0 and 1.
https://docs.python.org/3/library/colorsys.html
You must scale from 0 - 255 to 0 - 1, or divide your RGB values with 255. If using python 2 make sure not to do floor division.
Related
I can't find this question anywhere so...
What are kivy color codes ( like the ones with 4 numbers)
example
Window.clearcolor = (1, 1, 1, 1)
I know I can just easily use RGB with a few lines of code, but I really just curious on what this type of color system is and how it works.
Kivy just uses RGB values divided by 255. Additionally, there is an alpha color channel, which indicates transparency (A higher value is more opaque, a lower value is more transparent)
Therefore, if you want white (#FFFFFF, opaque) you take (255, 255, 255, 255) and divide each term by 255 to get (1, 1, 1, 1).
kivy.utils offers a hex interpreter, if you prefer that: kivy.utils.get_color_from_hex(). It takes a string such as '#FFFFFF' for white, and returns a color kivy can interpret.
Kivy's colours are RGB(A) - don't confuse the colour space with the in-memory representation of the colour.
Ultimately RGB measures each of the red, green and blue colour components on the scale from lightness zero (black) to maximum lightness (pure red/blue/green, if you like). Mathematically, this is a continuous interval consisting of infinite lightness choices, but in practice you can't represent all of those so you have to pick whatever is useful.
RGB colour codes with each component in the range 0-255 are often used because they map well to physical devices. For instance, each pixel of your monitor probably supports 255 (not-coincidentally 2^8 - 1) settings for each colour, so if you want to write code to directly address these it's natural to use an integer in this range: in this case 255 represents maximum lightness.
However, the choice of 255 here is quite arbitrary an exposes an implementation detail of the colour handling. It's especially arbitrary if you want to represent continuous colour intervals in which you do colour transformations. For instance, if rendering a complex scene you could imagine complex lighting transformations where lights bounce off surfaces (getting darker if not 100% reflective), or combine where two lights shine in the same place. If you clipped the values to 255 integers at every step the result would look rubbish as it totally fails to represent the continuous spectrum the light has passed through. In a case like this it makes much more sense to use a data type representing something as close as possible to a continuous interval: floats in the range 0-1 are the obvious choice, with 1 representing maximum lightness.
You might consider that this is also more mathematically pure, avoiding unnecessarily exposing an implementation detail of how the colours are stored, although there is some value in directly controlling pixel colours.
This is also why you can convert to Kivy's convention by dividing the 0-255 values by 255: they represent exactly the same thing.
In any case, the 0-1 range is standard in e.g. OpenGL for this reason. Kivy chooses this convention, rather than the 0-255 one.
I was working with the laplacian function to detect edges in OpenCV, when I ran into some confusion regarding the underlying principles behind the code.
The documentation features us reading an image with the following code and passing it through the laplacian function.
img = cv2.imread("messi5.jpg", cv2.IMREAD_GRAYSCALE)
lap = cv2.Laplacian(img, cv2.CV_32F, ksize=1)
Now, I am able to understand the code written above pretty well. As I believe, we read in an image, and calculate the Laplacian at each pixel. This value can be bigger or smaller than the original 8-bit unsigned int pixel, so we store it in an array of 32-bit floats.
My confusion begins with the next few lines of code. In the documentation, the image is converted back to an 8-bit usigned integer using the convertScaleAbs() function, and then displayed as seen below.
lap = cv2.convertScaleAbs(lap)
cv2.imshow(lap)
However, my instructor showed me the following method of converting back to uint8:
lap = np.uint8(np.absolute(lap))
cv2.imshow(lap)
Surprisingly both solutions display identical images. However, I am unable to understand why this occurs. From what I've seen, np.uint8 simply truncates values (floats, etc.) down to unsigned 8-bit integers. So for example, 1025 becomes 1 as all the other bits beyond the 8-th bit are discarded.
Yet this would literally mean that any value of our laplacian for each pixel would become heavily reduced and muddled. If our Laplacian for a pixel was 1024 (signaling a non-zero second derivative in both x and y dimensions), we would instead have the value 0 on hand (singaling a second derivative of zero and a possible local max/min, or in other words an edge). Thus by my logic, my instructor's solution should fail miserably, but surprisingly everything works fine. Why is this?
On the other hand, I do not have any idea about how convertScaleAbs() works. I'm going to assume it works similarly as my instructor's solution, but I'm not sure. Can someone please explain what's going on?
OpenCV BGR images or Grayscale have pixel values from 0 to 255 when in CV_8U 8 Bit which corresponds to np.uint8, more details here.
So when you use the Laplacian function with ddepth (Desired depth of the destination image.) set to cv2.CV_32F you get this:
lap = cv2.Laplacian(img, cv2.CV_32F, ksize=1)
print(np.amax(lap)) #=> 317.0
print(np.amin(lap)) #=> -315.0
So, you need to convert back to np.uint8, for example:
lap_uint8 = lap.copy()
lap_uint8[lap > 255] = 255
lap_uint8[lap < 0] = 0
lap_uint8 = lap_uint8.astype(np.uint8)
print(np.amax(lap_uint8)) #=> 255
print(np.amin(lap_uint8)) #=> 0
Or with any other more straightforward way which does the same.
But you can use also set -1 as argument for ddepth, see documentation, to get:
lap = cv2.Laplacian(img, -1, ksize=1)
print(np.amax(lap)) #=> 0
(print(np.amin(lap))) #=> 255
In this way you get a wrong result:
lap_abs = np.absolute(lap)
print(np.amax(lap_abs)) #=> 317.0
print(np.amin(lap_abs)) #=> 0.0
I am trying to read the x and y positions of the pixels in images. This is an example of what is shown when I run:
plt.figure(1)
plt.imshow(img)
plt.title('image')
plt.show()
Why are they non-integer values? My best guess is that some scaling is occurring? I am running python on spyder as an IDE.
Edit: Here is the image:
Edit 2: Upon closer inspection, inspecting pixel by pixel, they appear to be at the .5 marks rather than 0 to 1 as well. And here is a screenshot of my axis settings... something is definitely funky here. Anybody have an idea why?
My guess is, that the float values you worry about while hovering over the shown image with your mouse is just the mouse pointer position, which does not have to be integer. Yet still lays within a pixel (squared integer area) and thus gives you information about the channels at that pixel's position.
Another way to get information about your pixels in a more controlled way is given here:
Here is my working code snippet printing the pixel colours from an image:
import os, sys
import Image
im = Image.open("image.jpg")
x = 3
y = 4
pix = im.load()
print pix[x,y]
Answer edit 2: It just makes sense like that. The pixel centers fall on the integer values .0 you expect the pixels to have. If the edges would fall on the .0 a direct mapping between pixel coordinates and pixel values would not be possible within the visualization. Also the pixel having a height and width of 1 is exactly what we would expect.
I have a matrix consisting of True and False values. I want to print this as an image where all the True values are white and the False values are black. The matrix is called indices. I have tried the following:
indices = indices.astype(int) #To convert the true to 1 and false to 0
indices*=255 #To change all the 1's to 255
cv2.imshow('Indices',indices)
cv2.waitKey()
This is printing a fully black image. When I try, print (indices==255).sum(), it returns a values of 669 which means that there are 669 elements/pixels in the indices matrix which should be white. But I can only see a pure black image. How can I fix this?
As far as I know, opencv represents an image as a matrix of floats ranging from 0 to 1, or an integer with values between the minimum and the maximum of that type.
An int has no bounds (except the boundaries of what can be represented with all available memory). If you however use np.uint8, that means you are working with (unsigned) bytes, where the minimum is 0 and the maximum 255.
So there are several options. The two most popular would be:
cast to np.uint8 and then multiply with 255:
indices = indices.astype(np.uint8) #convert to an unsigned byte
indices*=255
cv2.imshow('Indices',indices)
cv2.waitKey()
Use a float representation:
indices = indices.astype(float)
cv2.imshow('Indices',indices)
cv2.waitKey()
Note that you could also choose to use np.uint16 for instance to use unsigned 16-bit integers. In that case you will have to multiply with 65'535. The advantage of this approach is that you can use an arbitrary color depth (although most image formats use 24-bit colors (8 bits per channel), there is no reason not to use 48-bit colors. If you for instance are doing image processing for a glossy magazine, then it can be beneficial to work with more color depth.
Furthermore even if the end result is a 24-bit colorpalette, one can sometimes better use a higher color depth for the different steps in image processing.
I'm using the colorsys lib of Python:
import colorsys
colorsys.rgb_to_hsv(64, 208, 61)
output:(0.16666666666666666, 0, 208)
But this output is wrong, this is the true value using a RGB to HSV online converter:
RGB to HSV
What's going on?
colorsys takes its values in the range 0 to 1:
Coordinates in all of these color spaces are floating point values. In the YIQ space, the Y coordinate is between 0 and 1, but the I and Q coordinates can be positive or negative. In all other spaces, the coordinates are all between 0 and 1.
You need to divide each of the values by 255. to get the expected output:
>>> colorsys.rgb_to_hsv(64/255., 208/255., 61/255.)
(0.3299319727891157, 0.7067307692307692, 0.8156862745098039)
To avoid errors like this you can use colorir. It uses colorsys under the hood but allows for format specification:
>>> from colorir import sRGB
>>> sRGB(64, 208, 61).hsv(round_to=1)
HSV(118.8, 0.7, 0.8)