I've been working on building a machine learning algorithm to recognize images, starting by creating my own h5 database. I've been following this tutorial, and it's been useful, but I keep running into one major error - when using OpenCV in the image processing section of the code, the program is unable to save the processed image because it keeps flipping the height and width of my images. When I try to compile, I get the following error:
Traceback (most recent call last):
File "array+and+label+data.py", line 79, in <module>
hdf5_file["train_img"][i, ...] = img[None]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/Users/USER/miniconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 631, in __setitem__
for fspace in selection.broadcast(mshape):
File "/Users/USER/miniconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 299, in broadcast
raise TypeError("Can't broadcast %s -> %s" % (target_shape, count))
TypeError: Can't broadcast (1, 240, 320, 3) -> (1, 320, 240, 3)
My images are supposed to all be sized to 320 by 240, but you can see that this is being flipped somehow. Researching around has shown me that this is because OpenCV and NumPy use different conventions for height and width, but I'm not sure how to reconcile this issue within this code without patching my installation of OpenCV. Any ideas on how I can fix this? I'm a relative newbie to Python and all its libraries (though I know Java well)!
Thank you in advance!
Edit: adding more code for context, which is very similar to what's in the tutorial under the "Load images and save them" code example.
The size of my arrays:
train_shape = (len(train_addrs), 320, 240, 3)
val_shape = (len(val_addrs), 320, 240, 3)
test_shape = (len(test_addrs), 320, 240, 3)
The code that loops over the image addresses and resizes them:
# Loop over training image addresses
for i in range(len(train_addrs)):
# print how many images are saved every 1000 images
if i % 1000 == 0 and i > 1:
print ('Train data: {}/{}'.format(i, len(train_addrs)))
# read an image and resize to (320, 240)
# cv2 load images as BGR, convert it to RGB
addr = train_addrs[i]
img = cv2.imread(addr)
img = cv2.resize(img, (320, 240), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# save the image and calculate the mean so far
hdf5_file["train_img"][i, ...] = img[None]
mean += img / float(len(train_labels))
Researching around has shown me that this is because OpenCV and NumPy use different conventions for height and width
Not exactly. The only thing that is tricky about images is 2D arrays/matrices are indexed with (row, col) which is opposite from normal Cartesian coordinates (x, y) that we might use for images. Because of this, sometimes when you specify points in OpenCV functions, it wants them in (x, y) coordinates---and similarly, it wants the dimensions of the image to be specified in (w, h) instead of (h, w) like an array would be made. And this is the case inside OpenCV's resize() function. You're passing it in (h, w) but it actually wants (w, h). From the docs for resize():
dsize – output image size; if it equals zero, it is computed as:
dsize = Size(round(fx*src.cols), round(fy*src.rows))
Either dsize or both fx and fy must be non-zero.
So you can see here that the number of columns is the first dimension (the width) and the number of rows is the second (the height).
The simple fix is just to swap your (h, w) to (w, h) inside the resize() function:
img = cv2.resize(img, (240, 320), interpolation=cv2.INTER_CUBIC)
Related
This is similar to nkint's question from September 11, 2013. Link is here:
how to get all undistorted image with opencv
I'm a new user, so I didn't have enough reputation/clout to comment on the OP.
I have tried to emulate the code andrewmkeller posted, using Python instead of C++, with some minor changes based on Josh Bosch's response. The result is the following:
#!/usr/bin/env python
import cv2
import numpy as np
def loadUndistortedImage(fileName):
# load image
image = cv2.imread(fileName)
#print(image)
# set distortion coeff and intrinsic camera matrix (focal length, centerpoint offset, x-y skew)
cameraMatrix = np.array([[894.96803896,0,470.38713516],[0,901.32629374,922.41232898], [0,0,1]])
distCoeffs = np.array([[-0.340671222,0.110426603,-.000867987573,0.000189669273,-0.0160049526]])
# setup enlargement and offset for new image
y_shift = 60 #experiment with
x_shift = 70 #experiment with
imageShape = image.shape #image.size
print(imageShape)
imageSize = (int(imageShape[0])+2*y_shift, int(imageShape[1])+2*x_shift, 3)
print(imageSize)
# create a new camera matrix with the principal point offest according to the offset above
newCameraMatrix, validPixROI = cv2.getOptimalNewCameraMatrix(cameraMatrix, distCoeffs, imageSize,
1)
#newCameraMatrix = cv2.getDefaultNewCameraMatrix(cameraMatrix, imageSize, True) # imageSize, True
# create undistortion maps
R = np.array([[1,0,0],[0,1,0],[0,0,1]])
map1, map2 = cv2.initUndistortRectifyMap(cameraMatrix, distCoeffs, R, newCameraMatrix, imageSize,
cv2.CV_16SC2)
# remap
outputImage = cv2.remap(image, map1, map2, INTER_LINEAR)
#save output image as file with "FIX" appened to name - only works with .jpg files at the moment
index = filename.find('.jpg')
fixed_filename = filename[:index] +'_undistorted'+fileName[index:]
cv2.imwrite(fixed_filename, outputImage)
cv2.imshow('fix_img',outputImage)
cv2.waitKey(0)
return
#Undistort the images, then save the restored images
loadUndistortedImage('./calib/WIN_20200626_11_29_16_Pro.jpg')
This seemed good to me, but then problems came up when trying to use cv2.getOptimalNewCameraMatrix or cv2.getDefaultNewCameraMatrix and cv2.initUndistortRectifyMap. I kept getting told that 'the argument takes exactly 2 arguments (3 given)' even though I am putting the parameters as specified in their documentation here:
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html
I can remove the error from "...getDefault..." if I remove the optional params, but I'd rather not do that.
Stacktrace:
Traceback (most recent call last):
File ".\main.py", line 46, in <module>
loadUndistortedImage('./<image file name>.jpg')
File ".\main.py", line 27, in loadUndistortedImage
newCameraMatrix, validPixROI = cv2.getOptimalNewCameraMatrix(cameraMatrix, distCoeffs, imageSize, 1)
TypeError: function takes exactly 2 arguments (3 given)
I don't have enough reputation to comment, but you could try:
newcameramatrix, _ = cv2.getOptimalCameraMatrix(
camera_matrix, dist_coeffs, (width, height), 1, (width, height)
)
According to this, that's how the function should be called.
Now, instead of getting the undistorted image with cv2.initUndistortRectifyMap, you could just do:
undistorted_image = cv2.undistort(
image, camera_matrix, dist_coeffs, None, newcameramatrix
)
cv2.imshow("undistorted", undistorted_image)
Following up to my comment on Sebastian Liendo's answer, and also thanks to a Finnish responder on Github (whose Issues are not for these sort of general questions, I learned), here is 1) the updated documentation for the python functions, and 2) the heart of my revised code which does a decent job of getting around the cropping. (Don't do what I did in the question and post the ENTIRE code, just the part essential to your question.)
https://docs.opencv.org/4.3.0/d9/d0c/group_calib3d.html#ga7a6c4e032c97f03ba747966e6ad862b1
#load image
image = cv2.imread(fileName)
#images = glob.glob(pathName + '*.jpg') #loop within a specified directory
#for fileName in images:
#image = cv2.imread(fileName)
#set camera parameters
height, width = image.shape[:2]
cameraMatrix = np.array([[894.96803896,0,470.38713516],[0,901.32629374,922.41232898], [0,0,1]])
distCoeffs = np.array([[-0.340671222,0.110426603,-.000867987573,0.000189669273,-0.0160049526]])
#create new camera matrix
newCameraMatrix, validPixROI = cv2.getOptimalNewCameraMatrix(cameraMatrix, distCoeffs,(width, height), 1, (width, height))
#undistort
outputImage = cv2.undistort(image, cameraMatrix, distCoeffs, None, newCameraMatrix)
#crop, modified
x, y, w, h = validPixROI #(211, 991, 547, 755)
outputImage = outputImage[y-200:y+h+200, x-40:x+w+80] #fudge factor to minimize cropping
THE ONE CAVEAT: this code still crops a bit of the outer-trim of the original capture, but not by much. Minimizing that cropping is the reason for the fudge factor I put in the ouputImage = outputImage[...] line.
I'm trying to zoom in an image.
import numpy as np
from scipy.ndimage.interpolation import zoom
import Image
zoom_factor = 0.05 # 5% of the original image
img = Image.open(filename)
image_array = misc.fromimage(img)
zoomed_img = clipped_zoom(image_array, zoom_factor)
misc.imsave('output.png', zoomed_img)
Clipped Zoom Reference:
Scipy rotate and zoom an image without changing its dimensions
This doesn't works and throws this error:
ValueError: could not broadcast input array from shape
Any Help or Suggestions on this
Is there a way to zoom an image given a zoom factor. And what's the problem ?
Traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 1443, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "title_apis_proxy.py", line 798, in get
image, msg = resize_image(image_local_file, aspect_ratio, image_url, scheme, radius, sigma)
File "title_apis_proxy.py", line 722, in resize_image
z = clipped_zoom(face, 0.5, order=0)
File "title_apis_proxy.py", line 745, in clipped_zoom
out[top:top+zh, left:left+zw] = zoom(img, zoom_factor, **kwargs)
ValueError: could not broadcast input array from shape (963,1291,2) into shape (963,1291,3)
The clipped_zoom function you're using from my previous answer was written for single-channel images only.
At the moment it's applying the same zoom factor to the "color" dimension as well as the width and height dimensions of your input array. The ValueError occurs because the the out array is initialized to the same number of channels as the input, but the result of zoom has fewer channels because of the zoom factor.
To make it work for multichannel images you could either pass each color channel separately to clipped_zoom and concatenate the results, or you could pass a tuple rather than a scalar as the zoom_factor argument to scipy.ndimage.zoom.
I've updated my previous answer using the latter approach, so that it will now work for multichannel images as well as monochrome.
I've tried to resize image with scipy and everything seems to work fine until I try to save the image. When I try to save image I get error that you can see in title. Full traceback is available below.
import numpy as np
import scipy.misc
from PIL import Image
image_path = "img0.jpg"
def load_image(img_path):
img = Image.open(img_path)
img.load()
data = np.asarray(img, dtype="int32")
return data
def save_image(npdata, outfilename):
img = Image.fromarray(np.asarray(np.clip(npdata, 0, 255), dtype="uint8"), "L")
img.save(outfilename)
array_image = load_image(image_path)
array_resized_image = scipy.misc.imresize(array_image, (320, 240), interp='nearest', mode=None)
save_image(array_resized_image, "i1.jpg")
Full traceback of the error:
Traceback (most recent call last):
File "D:/Python/Playground/resize image with scipy.py", line 26, in <module>
save_image(array_resized_image, "i1.jpg")
File "D:/Python/Playground/resize image with scipy.py", line 16, in save_image
img = Image.fromarray(np.asarray(np.clip(npdata, 0, 255), dtype="uint8"), "L")
File "C:\Anaconda2\lib\site-packages\PIL\Image.py", line 2154, in fromarray
raise ValueError("Too many dimensions: %d > %d." % (ndim, ndmax))
ValueError: Too many dimensions: 3 > 2.
don't you need to convert it to a two dimensional array before doing the fromarray(... 'L')?
You can do that using a scipy function or, actually quicker, to multiply the RGB by factors. Like this
npdata = (npdata[:,:,:3] * [0.2989, 0.5870, 0.1140]).sum(axis=2)
array_resized_image has a shape of (320, 240, 3) - three dimensional because red, green and blue components are stored in this way. You can use scipy.misc.imread and scipy.misc.imsave for easier handling file loading and storing, so your example boils down to this:
import scipy.misc
image_path = "img0.jpg"
array_image = scipy.misc.imread(image_path)
array_resized_image = scipy.misc.imresize(array_image, (320, 240), interp='nearest', mode=None)
scipy.misc.imsave("i1.jpg", array_resized_image)
I'm using OpenCV 3.0.0 and Python 3.4.3 to process a very large RGB image (107162,79553,3). While I'm trying to resize it using the following code:
import cv2
image = cv2.resize(img, (0,0), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
I had this error message coming up:
cv2.error: C:\opencv-3.0.0\source\modules\imgproc\src\imgwarp.cpp:3208: error: (-215) ssize.area() > 0 in function cv::resize
I'm certain there is image content in the image array because I can save them into small tiles in jpg format. When I try to resize just a small part of the image, there is no problem and I end up with correctly resized image. (Taking a rather big chunk (50000,50000,3) still won't work, but it will work on a (10000,10000,3) chunk)
What could cause this problem and how can I solve this?
So it turns out that the problem comes from one line in modules\imgproc\src\imgwarp.cpp:
CV_Assert( ssize.area() > 0 );
When the product of rows and columns of the image to be resized is larger than 2^31, ssize.area() results in a negative number. This appears to be a bug in OpenCV and hopefully will be fixed in the future release. A temporary fix is to build OpenCV with this line commented out. While not ideal, it works for me.
And I just recently found out that the above applies only to image whose width is larger than height. For images with height larger than width, it's the following line that causes error:
CV_Assert( dsize.area() > 0 );
So this has to be commented out as well.
Turns out for me this error was actually telling the truth - I was trying to resize a Null image, which was usually the 'last' frame of a video file, so the assertion was valid.
Now I have an extra step before attempting the resize operation, which is to do the assertion myself:
def getSizedFrame(width, height):
"""Function to return an image with the size I want"""
s, img = self.cam.read()
# Only process valid image frames
if s:
img = cv2.resize(img, (width, height), interpolation = cv2.INTER_AREA)
return s, img
Now I don't see the error.
Also pay attention to the object type of your numpy array, converting it using .astype('uint8') resolved the issue for me.
I know this is a very old thread but I had the same problem which was due spaces in the images names.
e.g.
Image name: "hello o.jpg"
weirdly, by removing the spaces the function worked just fine.
Image name: "hello_o.jpg"
I am having OpenCV version 3.4.3 on MacOS.
I was getting the same error as above.
I changed my code from
frame = cv2.resize(frame, (0,0), fx=0.5, fy=0.5)
to
frame = cv2.resize(frame, None, fx=0.5, fy=0.5)
Now its working fine for me.
This type of error also takes place because the resize is unable to get the image in simple
the directory of the image may be wrong.In my case I left the forward slash during providing the location of file and this error took place after I put the slash problem was solved.
For me the following work-around worked:
split the array up into smaller sub arrays
resize the sub arrays
merge the sub arrays again
Here the code:
def split_up_resize(arr, res):
"""
function which resizes large array (direct resize yields error (addedtypo))
"""
# compute destination resolution for subarrays
res_1 = (res[0], res[1]/2)
res_2 = (res[0], res[1] - res[1]/2)
# get sub-arrays
arr_1 = arr[0 : len(arr)/2]
arr_2 = arr[len(arr)/2 :]
# resize sub arrays
arr_1 = cv2.resize(arr_1, res_1, interpolation = cv2.INTER_LINEAR)
arr_2 = cv2.resize(arr_2, res_2, interpolation = cv2.INTER_LINEAR)
# init resized array
arr = np.zeros((res[1], res[0]))
# merge resized sub arrays
arr[0 : len(arr)/2] = arr_1
arr[len(arr)/2 :] = arr_2
return arr
You can manually place a check in your code. Like this:
if result != []:
for face in result:
bounding_box = face['box']
x, y, w, h = bounding_box[0], bounding_box[1], bounding_box[2], bounding_box[3]
rect_face = cv2.rectangle(frame, (x, y), (x+w, y+h), (46, 204, 113), 2)
face = rgb[y:y+h, x:x+w]
#CHECK FACE SIZE (EXIST OR NOT)
if face.shape[0]*face.shape[1] > 0:
predicted_name, class_probability = face_recognition(face)
print("Result: ", predicted_name, class_probability)
Turns out I had a .csv file at the end of the folder from which I was reading all the images.
Once I deleted that it worked alright
Make sure that it's all images and that you don't have any other type of file
In my case I did a wrong modification in the image.
I was able to find the problem checking the image shape.
print img.shape
In my case,
image = cv2.imread(filepath)
final_img = cv2.resize(image, size_img)
filepath was incorrect, cv2.imshow didn't give any error in this case but due to wrong path cv2.resize was giving me error.
I came across the same error message while I was trying to enlarge the image size. Assigning the image type as uint8 did the work for me and I was able to resize the image 30 times of its original size. Here is an example as a reference for anyone else who has such issue.
scale_percent = 3000
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent /100)
dim = (width, height)
image = cv2.resize(img.astype('uint8'), dim, interpolation=cv2.INTER_AREA)
Same error message for me but issue was different:
The interpolation method 'INTER_AREA' was NOT compatible with int8 !
cv2.resize(frame_rgb, tuple([None, None]))
gives similar error. Notice the None values in the resizing tuple.
In my case there were some corrupt or not supported images. What i simple did is just check if it is not None than process it as shown below.
cv2.imread(image_path)
if img is not None:
cv2.resize(img,(150,150)) # You can give your own desired image size
I was working with 3 files: The python script, the image, and the trained model.
Everything worked when I moved these 3 files into their own folder instead of in the directory with the other python scripts.
I had the same error. Resizing the images resolved the issue. However, I used online tools to resize the images because using pillow to resize them did not solve my problem.
If my windows is in 32-bit color depth mode, then the following code gets a nice PIL Image from a window:
def image_grab_native(window):
hwnd = win32gui.GetDesktopWindow()
left, top, right, bot = get_rect(window)
w = right - left
h = bot - top
hwndDC = win32gui.GetWindowDC(hwnd)
mfcDC = win32ui.CreateDCFromHandle(hwndDC)
saveDC = mfcDC.CreateCompatibleDC()
saveBitMap = win32ui.CreateBitmap()
saveBitMap.CreateCompatibleBitmap(mfcDC, w, h)
saveDC.SelectObject(saveBitMap)
saveDC.BitBlt((0, 0), (w, h), mfcDC, (left, top), win32con.SRCCOPY)
bmpinfo = saveBitMap.GetInfo()
bmpstr = saveBitMap.GetBitmapBits(True)
im = Image.frombuffer(
'RGB',
(bmpinfo['bmWidth'], bmpinfo['bmHeight']),
bmpstr, 'raw', 'BGRX', 0, 1)
win32gui.DeleteObject(saveBitMap.GetHandle())
saveDC.DeleteDC()
mfcDC.DeleteDC()
win32gui.ReleaseDC(hwnd, hwndDC)
return im
However, when running in 16-bit mode, I get the error:
>>> image_grab_native(win)
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
image_grab_native(win)
File "C:\claudiu\bumhunter\finderbot\ezpoker\utils\win32.py", line 204, in image_grab_native
bmpstr, 'raw', 'BGRX', 0, 1)
File "c:\python25\lib\site-packages\PIL\Image.py", line 1808, in frombuffer
return apply(fromstring, (mode, size, data, decoder_name, args))
File "c:\python25\lib\site-packages\PIL\Image.py", line 1747, in fromstring
im.fromstring(data, decoder_name, args)
File "c:\python25\lib\site-packages\PIL\Image.py", line 575, in fromstring
raise ValueError("not enough image data")
ValueError: not enough image data
How should I form the frombuffer call to work in 16-bit mode? Also how can I make this function work in any bit depth mode, instead of say having to pass it as a parameter?
UPDATE: From this question I learned I must use "BGR;16" instead of "BGRX" for the 2nd mode parameter. It takes a correct picture, either specifying stride or not. The problem is that the pixel values are slightly off on some values:
x y native ImageGrab
280 0 (213, 210, 205) (214, 211, 206)
280 20 (156, 153, 156) (156, 154, 156)
280 40 (213, 210, 205) (214, 211, 206)
300 0 (213, 210, 205) (214, 211, 206)
just a sample of values taken from the same window. the screenshots look identical to the naked eye, but i have to do some pixel manipulation.. also the reason I want to use the native approach at all is that it's a bit faster and it behaves better when running inside virtual machines with dual monitors.. (yes pretty randomly complicated I know).
For the stride parameter, you need to give the row size in bytes. Your pixels are 16 bits each so you might naively assume stride = 2*bmpinfo['bmWidth']; unfortunately Windows adds padding to make the stride an even multiple of 32 bits. That means you'll have to round it to the next highest multiple of 4: stride = (stride + 3) / 4) * 4.
The documentation doesn't mention a 16-bit raw format so you'll have to check the Unpack.c module to see what's available.
The final thing you'll notice is that Windows likes to make its bitmaps upside down.
Edit: Your final little problem is easily explained - the conversion from 16 bit to 24 bit is not precisely defined, and an off-by-one difference between two different conversions is perfectly normal. It wouldn't be hard to adjust the data after you've converted it, as I'm sure the differences are constant based on the value.