Python/OpenCV: Converting images taken from capture - python

I'm trying to convert images taken from a capture (webcam) and do some processing on them with OpenCV, but I'm having a difficult time..
When trying to convert the image to grayscale, the program crashes. (Python.exe has stopped working)
Here is the main snippet of my code:
newFrameImageGS = cv.CreateImage ((320, 240), cv.IPL_DEPTH_8U, 1)
for i in range(0,5):
newFrameImage = cv.QueryFrame(ps3eye)
cv.CvtColor(newFrameImage,newFrameImageGS,cv.CV_BGR2GRAY)
golfSwing.append(newFrameImageGS)
When I try using cvConvertScale I get the assertion error:
src.size() == dst.size() && src.channels() == dst.channels()
which makes sense, but I'm pretty confused on how to go about converting the input images of my web cam into images that can be used by functions like cvUpdateMotionHistory() and cvCalcOpticalFlowLK()
Any ideas? Thanks.
UPDATE:
I converted the image to grayscale manually with this:
for row in range(0,newFrameImage.height):
for col in range(0,newFrameImage.width):
newFrameImageGS[row,col] = (newFrameImage8U[row,col][0] * 0.114 + # B
newFrameImage8U[row,col][1] * 0.587 + # G
newFrameImage8U[row,col][2] * 0.299) # R
But this takes quite a while.. and i still can't figure out why cvCvtColor is causing the program to crash.

For some reason, CvtColor caused the program to crash when the image depths where 8 bit. When I converted them to 32 bit, the program no longer crashed and everything seemed to work OK. I have no idea why this is, but at least it works now.
newFrameImage = cv.QueryFrame(ps3eye)
newFrameImage32F = cv.CreateImage((320, 240), cv.IPL_DEPTH_32F, 3)
cv.ConvertScale(newFrameImage,newFrameImage32F)
newFrameImageGS_32F = cv.CreateImage ((320,240), cv.IPL_DEPTH_32F, 1)
cv.CvtColor(newFrameImage32F,newFrameImageGS_32F,cv.CV_RGB2GRAY)
newFrameImageGS = cv.CreateImage ((320,240), cv.IPL_DEPTH_8U, 1)
cv.ConvertScale(newFrameImageGS_32F,newFrameImageGS)

There is a common mistake here:
You're creating a single image in the newFrameImageGS variable before the loop, then overwrite its contents in the loop, which is then appended to a list. The result will not be what you would expect. The list will contain five references to the same image instance at the end, since only the object reference is appended to the list, no copy of the object made this way. This image will contain the very last frame, so you get five of that frame as a result, which is not what you want, I guess. Please review the Python tutorial if it is not clear for you. You can solve this by moving the first line of the above code into the body of the for loop.
Another possibilities if fixing the above would not help you:
The CvtColor function seems to be the correct one for conversion to grayscale, since it can convert to a different number of channels.
According to this manual the CvtColor function requires a destination image of the same data type as the source. Please double check that newFrameImage is a IPL_DEPTH_8U image.

Related

EasyOCR not recognizing simple numbers

I am trying to analyze a page footer in a video and retrieve the current page number. I got the frame collection working but I am struggling on reading the page number itself, using EasyOCR.
I already tried using pytesseract, but that doesnt work well. I have misinterpreted numbers: 10 gets recognized as 113, 6 as 41 and so on. Overall its very inconsistent, even though I format my input image correctly with grayscale, threshholding and cropping (only analyzing the pagenumber area of the footer).
Here is the code:
def getPageNumberTest(path, psm):
image = cv2.imread(path)
height = len(image)
width = len(image[0])
# the height of the footer
footerHeight = 90 # int(height / 15.5)
# retrieve only the footer from the image
cropped = image[height-footerHeight:height,0:width]
results = reader.readtext(cropped)
Which gives me the following output:
Is there a setting I am missing? Is there a way to instruct EasyOCR to look for numbers only?
Any help or hint is appreciated!
EDIT:
After some fiddling around with some optimizations of the number-images, I am now back to the beginning, not optimizing the images at all. All thats left is the conversion to gray-scale and a resize.
This is what a normal input looks like:
But the results are:
Which is weird, because for most numbers (especially for single digits) this works flawlessly, yielding certainties of over 95%...
I tried deblurring, threshholding, denoising with cv2.filter2D(), blurring,...
When I use threshholding, for example, my output looks like this (ignoring the "1", same applies for the single digit "1"):
I had a look into pattern matching, which isnt an option because I don't know the pagenumber shape beforehand...
txt = pytesseract.image_to_string(final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
According to my tests, PaddleOCR works better than easyOCR in most scenes.

Making a copy of an image

I am supposed to create several functions for my python program, and each program requires me to work with a copy of an input image. Hence, I need to write img = image.copy() for every function in my code. However, when I run the code, I am returned an AttributeError saying "'tuple' object has no attribute 'copy'. "
Given that I still have to include the statement img = image.copy() somewhere inside my function, how do I go about changing my code to remove this error? Do I need to change the image into numpy array first before I can use copy()?
Code:
def func(image):
img = image.copy() #error code appeared here
np_img = np.array(image)
rsize, csize = len(img), len(img[0]) #denoting the rows and columns of pixels of the image respectively
(the rest of the code)
Error message: AttributeError: 'tuple' object has no attribute 'copy'
Given that you have to put img=image.copy() in your functions, the easiest way should be to flip the order of np_img = np.array(image) and the former line. I'm assuming that your argument image has not been converted into a numpy array prior to what we see here.
After that, you should change img=image.copy() to the appropriate variables.
That said, I think it's best to load the image as a numpy matrix right away before doing anything else. That way, you can make a copy before any of your functions, lowering each function's costs too.

Working with truncated images with PIL

I am trying to get the Python 2.7 PIL Library to work with JPEG images that are only available as a stream coming from a HDD image and are not complete.
I have set the option:
ImageFile.LOAD_TRUNCATED_IMAGES = True
And load the stream as far as it is available (or better said: as far as I am 100% sure that this data is still a image, not some other file type). I have tested different things and as far as I can tell (for JPEGs) PIL only accepts it as a valid JPEG Image if it finds the 0xFFDA (Start of Scan Marker). This is a short example of how I load the data:
from PIL import Image
from StringIO import StringIO
ImageFile.LOAD_TRUNCATED_IMAGES = True
with open("/path/to/image.raw", 'rb') as fp:
fp.seek("""jump to position in image where JPEG starts""")
data = fp.read("""number of bytes I know that those belong to that jpeg""")
img = Image.open(StringIO(data)) # This would throw exception if the data does
# not contain the 0xffda marker
pixel = img.load() # Would throw exception if LOAD_TRUNCATED_IMAGES = false
height,width = img.size
for i in range(height):
for j in range(width):
print pixel[i,j]
On the very last line I expected (or hoped) to see at least the read pixel data to be displayed. But for every pixel it returns (0,0,0).
The Question: Is what I am trying here not possible with PIL?
Some weeks ago I tried the same with a image file I truncated myself, simply by cutting data from it with an editor. It worked for the pixel-data that was available. As soon as it reached a pixel that I cut off, the program threw an exception (I will try this again later today to make sure that I am not remembering wrong).
If somebody is wondering why I am doing this: I need to make sure that the image/picture inside that hdd image is in consecutive blocks/clusters and is not fragmented. To make sure of this I wanted to use pixel matching.
EDIT:
I have tried it again and this is what I have seen.
I opened a truncated image in GIMP and it showed me a few pixel lines in the upper part, but PIL was not able to at least give me the RGB values of those pixels. It always returns (0,0,0).
I made the image slightly bigger such that the lower 4/5 of the image was not visible, but that was enough for PIL to show me the RGB values that were available. Everything else was (0,0,0).
I am still not 100% sure whether PIL can show me the RGB values, even if only view pixel-data is available.
I would try it with an uncompressed format like TGA. JPG being a compressed format may not make any sense to extract pixels from an incomplete image. JPEG actually stores the parameters for equations that describe the image, not pixel values. When you query a JPEG for a pixel value it evaluates the equations at that point and returns the result.
I have the same problem with Pillow==9.2.0
Let's downgrade to Pillow==8.3.2 and it works.
I don't really know about streaming, but I think that you simply cannot access rgb value the way you do.
Try:
rgb_im = img.convert('RGB')
r, g, b = rgb_im.getpixel((i, j))

Processing Multiple Images

I am writing an image processing program, which works well, but I need to process multiple images.
First, I made an array of images:
images = ((image1.tif),
(image2.tif),
(image3.tif))
Then, I created a for loop:
for image in images:
dna = cv2.imread(image)
{code}
The problem is, whenever I run the code, the console returns an error of
TypeError: expected string or Unicode object, tuple found
At this line:
dna = cv2.imread(image)
It seems that the program is trying to process the whole array at once. I thought that the loop worked by processing one image in the array at a time? Can anybody help me with this?
You should wrap the filenames using single or double quotes:
images = (('image1.tif'),
('image2.tif'),
('image3.tif'))
You can also use list instead of tuples:
images = ['image1.tif', 'image2.tif', image3.tif']
Use:
images = (("image1.tif"),
("image2.tif"),
("image3.tif"))

opencv zoom function strange results

i am trying to write a zoom function which looks something like this:
centre = ((im.width-1)/2, (im.height-1)/2)
width = int(im.width/(2.0*level))
height = int(im.height/(2.0*level))
rect = (centre[0]-width, centre[1]-height, width*2, height*2)
dst = cv.GetSubRect(im, rect)
cv.Resize(dst, im)
when I use exactly what is written above, I get an odd result where the bottom half of the resultant image is distorted and blurry. However when I replace the line cv.Resize(dst, im) with
size = cv.CloneImage(im)
cv.Resize(dst, size)
im = size
it works fine. Why is this? is there something fundamentally wrong with the way i am performing the zoom?
cv.Resize requires source and destination to be separate memory locations.
Now in the first snippet of your code, you are using cv.GetSubRect to generate an object pointing to area of image which you wish to zoom in. Here the new object is NOT pointing to a new memory location. It is pointing to a memory location which is a subset of original object.
Since cv.Resize requires both the memory locations to be different, what you are getting is a result of undefined behavior.
In the second part of your code you are fulfilling this criteria by using cv.CloneImage.
you are first creating a copy of im (i.e. size. however you could have used a blank image aswell) and then you are using cv.Resize to resize dst and write the resulting image in size.
My advice is to go through the function documentation before using them.

Categories