In one of my personal projects, I tried to apply the following horizontal edge mask on a grayscale image. By applying the horizontal edge mask, I am trying to detect horizontal edges in the image.
[1 2 1
0 0 0
-1 -2 -1]
When I tried to convolve my image matrix with the mask given above, the output image is rotated by 180 degrees. I am not sure whether it is expected behavior or if I am doing something wrong?
Here is code snippet of convolution.
def convolution(self):
result = np.zeros((self.mat_width, self.mat_height))
print(self.mat_width)
print(self.mat_height)
for i in range(0, self.mat_width-self.window_width):
for j in range(0, self.mat_height-self.window_height):
# deflate both mat and mask
# if j+self.window_height >= self.mat_height:
# row_index = j+self.window_height + 1
# else:
row_index = j+self.window_height
col_index = i+self.window_width
mat_masked = self.mat[j:row_index, i:col_index]
# pixel position
index_i = i + int(self.window_width / 2)
index_j = j + int(self.window_height / 2)
prod = np.sum(mat_masked*self.mask)
if prod >= 255:
result[index_i, index_j] = 255
else:
result[index_i, index_j] = 0
return result
The original grayscale input image is here -
Here is the output that is getting generated.
The indices when writing to the output are reversed. You are flipping the horizontal and vertical coordinates which would actually be transposing your image output and the output that you see is a result of transposing an image.
In addition, you aren't declaring the output size of your image correctly. The first dimension spans the rows or the height while the second dimension spans the columns or the width. The first change you must make is swapping the input dimensions of the output image:
result = np.zeros((self.mat_height, self.mat_width))
Secondly, the variable index_i is traversing horizontally while the variable index_j is traversing vertically. You just have to flip the order so that you are writing the results correctly:
if prod >= 255:
result[index_j, index_i] = 255
else:
result[index_j, index_i] = 0
If for some reason you don't want to change the order, then leave your code the way it is, including how you declared the output dimensions of your image and simply return the result transposed:
return result.T
Related
I am using python and I need to implement a function to clear a pattern noise from images, I write this function and I get an OK result on some images but on others, I get a bad result, can U guys help me to improve the code?
I can't use cv2, just 'numpy' and 'from scipy.signal import convolve'
def fft_clean(img1: np.array, img2: np.array, img3: np.array, img4: np.array) \
-> (np.array, np.array, np.array, np.array):
'''
This function receives 4 grayscale images and clean them using FFT algorithm.
Args:
img1: image array in float format (range: 0..1) - the source grayscale
image.
img2: image array in float format (range: 0..1) - the source grayscale
image.
img3: image array in float format (range: 0..1) - the source grayscale
image.
img4: image array in float format (range: 0..1) - the source grayscale
image.
Returns:
img_c1: array in int format (values: 0, 1) - the cleaned image.
img_c2: array in int format (values: 0, 1) - the cleaned image.
img_c3: array in int format (values: 0, 1) - the cleaned image.
img_c4: array in int format (values: 0, 1) - the cleaned image.
'''
######################################################################
# TODO: Implement The fft algorithm.
######################################################################
def clean(img):
# perform FFT on the input image
f = fft2(img)
# shift the FFT result to center the image
f = fftshift(f)
# get the magnitude of the FFT result
mag = np.abs(f)
# find the maximum magnitude value
max_val = np.max(mag)
# set a threshold that eliminates most of the noise and leaves the line patterns
threshold = max_val * 0.1
# create a binary mask with the same size as the image
mask = np.zeros_like(mag)
# set the values in the mask corresponding to the line patterns to 1
mask[np.where(mag < threshold)] = 1
# apply the mask to the FFT result
f *= mask
# shift the result back to its original position
f = ifftshift(f)
# perform the inverse FFT to obtain the cleaned image
img_c = ifft2(f)
return img_c
# clean each of the input images and store the result
img_c1 = clean(img1)
img_c2 = clean(img2)
img_c3 = clean(img3)
img_c4 = clean(img4)
######################################################################
# END OF YOUR CODE #
######################################################################
return img_c1.real, img_c2.real, img_c3.real, img_c4.real
as u can see img1 have washed colors, img2 is great, img3 is ok but the color is little bit washed, and img4 look terrible
https://github.com/dnevo/ImageProcessing/tree/main/images
fft1 - ff4.tiff
I change the clean function to this:
def clean(img):
# perform FFT on the input image
f = fft2(img)
# shift the FFT result to center the image
f = fftshift(f)
# get the magnitude of the FFT result
mag = np.abs(f)
# find the maximum magnitude value
max_val = np.max(mag)
# set a threshold that eliminates most of the noise and leaves the line patterns
threshold = max_val * 0.094
# create a binary mask with the same size as the image
mask = np.zeros_like(mag)
# set the values in the mask corresponding to the line patterns to 1
mask[np.where(mag < threshold)] = 1
# apply the mask to the FFT result
f *= mask
# shift the result back to its original position
f = ifftshift(f)
# perform the inverse FFT to obtain the cleaned image
img_c = ifft2(f)
return img_c
I just change this line
threshold = max_val * 0.094
instead of 0.1, I set 0.094 and the last pic is ok now
but the first is still washed.
I also try to change to this:
def clean(img):
# perform FFT on the input image
f = fft2(img)
# shift the FFT result to center the image
f = fftshift(f)
# get the magnitude of the FFT result
mag = np.abs(f)
# set a threshold that eliminates most of the noise and leaves the line patterns
threshold = np.max(mag) * 0.094
# create a binary mask with the same size as the image
mask = np.zeros_like(mag)
# set the values in the mask corresponding to the line patterns to 1
mask[np.where(mag < threshold)] = 1
# find the coordinates of the peaks in the magnitude spectrum that are above the threshold
peaks = np.array(np.where(mag > threshold)).T
# remove the significant peaks that are not at the origin by setting their values in the mask to 0
for peak in peaks:
if peak[0] != mag.shape[0]//2 or peak[1] != mag.shape[1]//2:
mask[peak[0], peak[1]] = 0
# apply the mask to the FFT result
f *= mask
# shift the result back to its original position
f = ifftshift(f)
# perform the inverse FFT to obtain the cleaned image
img_c = ifft2(f)
return img_c
like #CrisLuengo suggest but I get the same results
please let me know where I can improve the code and get better results.
thanks!!
I need to split an image into multiple images, based on the white borders between them.
for example:
output:
using Python, I don't know how to start this mission.
Here is a solution for the "easy" case where we know the grid configuration. I provide this solution even though I doubt this is what you were asked to do.
In your example image of the cat, if we are given the grid configuration, 2x2, we can do:
from PIL import Image
def subdivide(file, nx, ny):
im = Image.open(file)
wid, hgt = im.size # Size of input image
w = int(wid/nx) # Width of each subimage
h = int(hgt/ny) # Height of each subimage
for i in range(nx):
x1 = i*w # Horicontal extent...
x2 = x1+w # of subimate
for j in range(ny):
y1 = j*h # Certical extent...
y2 = y1+h # of subimate
subim = im.crop((x1, y1, x2, y2))
subim.save(f'{i}x{j}.png')
subdivide("cat.png", 2, 2)
The above will create these images:
My previous answer depended on knowing the grid configuration of the input image. This solution does not.
The main challenge is to detect where the borders are and, thus, where the rectangles that contain the images are located.
To detect the borders, we'll look for (vertical and horizontal) image lines where all pixels are "white". Since the borders in the image are not really pure white, we'll use a value less than 255 as the whiteness threshold (WHITE_THRESH in the code.)
The gist of the algorithm is in the following lines of code:
whitespace = [np.all(gray[:,i] > WHITE_THRESH) for i in range(gray.shape[1])]
Here "whitespace" is a list of Booleans that looks like
TTTTTFFFFF...FFFFFFFFTTTTTTTFFFFFFFF...FFFFTTTTT
where "T" indicates the corresponding horizontal location is part of the border (white).
We are interested in the x-locations where there are transitions between T and F. The call to the function slices(whitespace) returns a list of tuples of indices
[(x1, x2), (x1, x2), ...]
where each (x1, x2) pair indicates the xmin and xmax location of images in the x-axis direction.
The slices function finds the "edges" where there are transitions between True and False using the exclusive-or operator and then returns the locations of the transitions as a list of tuples (pairs of indices).
Similar code is used to detect the vertical location of borders and images.
The complete runnable code below takes as input the OP's image "cat.png" and:
Extracts the sub-images into 4 PNG files "fragment-0-0.png", "fragment-0-1.png", "fragment-1-0.png" and "fragment-1-1.png".
Creates a (borderless) version of the original image by pasting together the above fragments.
The runnable code and resulting images follow. The program runs in about 0.25 seconds.
from PIL import Image
import numpy as np
def slices(lst):
""" Finds the indices where lst changes value and returns them in pairs
lst is a list of booleans
"""
edges = [lst[i-1] ^ lst[i] for i in range(len(lst))]
indices = [i for i,v in enumerate(edges) if v]
pairs = [(indices[i], indices[i+1]) for i in range(0, len(indices), 2)]
return pairs
def extract(xx_locs, yy_locs, image, prefix="image"):
""" Locate and save the subimages """
data = np.asarray(image)
for i in range(len(xx_locs)):
x1,x2 = xx_locs[i]
for j in range(len(yy_locs)):
y1,y2 = yy_locs[j]
arr = data[y1:y2, x1:x2, :]
Image.fromarray(arr).save(f'{prefix}-{i}-{j}.png')
def assemble(xx_locs, yy_locs, prefix="image", result='composite'):
""" Paste the subimages into a single image and save """
wid = sum([p[1]-p[0] for p in xx_locs])
hgt = sum([p[1]-p[0] for p in yy_locs])
dst = Image.new('RGB', (wid, hgt))
x = y = 0
for i in range(len(xx_locs)):
for j in range(len(yy_locs)):
img = Image.open(f'{prefix}-{i}-{j}.png')
dst.paste(img, (x,y))
y += img.height
x += img.width
y = 0
dst.save(f'{result}.png')
WHITE_THRESH = 110 # The original image borders are not actually white
image_file = 'cat.png'
image = Image.open(image_file)
# To detect the (almost) white borders, we make a grayscale version of the image
gray = np.asarray(image.convert('L'))
# Detect location of images along the x axis
whitespace = [np.all(gray[:,i] > WHITE_THRESH) for i in range(gray.shape[1])]
xx_locs = slices(whitespace)
# Detect location of images along the y axis
whitespace = [np.all(gray[i,:] > WHITE_THRESH) for i in range(gray.shape[0])]
yy_locs = slices(whitespace)
extract(xx_locs, yy_locs, image, prefix='fragment')
assemble(xx_locs, yy_locs, prefix='fragment', result='composite')
Individual fragments:
The composite image:
after preprocessing an image of a sudoku board (from web) with opencv, I managed to get the following picture:
looping through the contours and extracting each value using pytesseract and psm 10 (single character) resulted in junk values.
thus i would like to slice the image to rows and try to extract the values using the config psm 6, hoping it might work.
The approach i took is the simply numpy-slicing the row and trying to extract the values, although it doesn't work, giving me SystemError: tile cannot extend outside image after the first iteration although im sure the slicing occur inside the image
y = 1
for x in range(1, 9):
cropped_row = mask[y*33-33:y*33-1][x*33-33:x*33-1]
text = tess.image_to_string(np.array(cropped_row), config='--psm 6')
y += 1
print(text)
i would like some guidance to the ecorrect aproach in OCRing rows from the image
in the end i took a slightly different approach as explained by natancy in this answer.
I focused on the grid lines, and removed all values so that findcontours() will locate all grid cells.
then, i looped through all contours and checked if they're a cell (sizewise) or some other contour.
if it is a cell, a mask made only the current cell visible (and its values when used bitwise_and(original_image, mask)
that way i could get a blank image with only a single number, and i ran that image through tesseract.
some text clearing later i got my desired output.
extraction of numbers:
list_of_clues = []
for contour in contours:
extracted_value = ''
# create black mask
mask = np.zeros(processed.shape, dtype=np.uint8)
# check if contour is a cell
area = cv2.contourArea(contour)
if 700 <= area <= 1000: # contour is a cell
cv2.drawContours(mask, [contour], -1, WHITE, -1) # color everything in mask, but the contour- white
isolated_cell = cv2.bitwise_and(processed, mask)
isolated_cell[mask == 0] = 255 # invert isolated_cell's mask to WHITE (for tess)
# extract text from isolated_cell
text = tess.image_to_string(isolated_cell, config='--psm 10')
# clean non-numbers:
for ch in text:
if ch.isdigit():
extracted_value = ch
# calculate cell coordinates only if extracted_value exist
if extracted_value:
# relevant for my proj, extract grid coordinates of extracted value
[x_pos, y_pos, wid, hei] = cv2.boundingRect(contour) # get contour's sizes
x_coord = int(x_pos // (grid_size_pixels / 9)) # get x row-coordinate
y_coord = int(y_pos // (grid_size_pixels / 9)) # get y col-coordinate
list_of_clues.append(((x_coord, y_coord), int(extracted_value)))
else: # contour isn't a cell
continue
I have tried this:
custom_oem_psm_config = r'--oem 3 --psm 6 -c tessedit_char_whitelist="0123456789"'# -c preserve_interword_spaces=0'
text= pytesseract.pytesseract.image_to_string(otsu, config=custom_oem_psm_config)
print(text)
Output:
2 91
4 67 13
2 976
4 9
9816 2754
3 1
653 7
24 85 1
46 2
If you want to get the exact positions of the numbers, try numpy slicing and sort them from left to right and top to bottom, then pass each number to tesseract.
Given the coordinates of four arbitrary points in an image (which are guaranteed to form a rectangle), I want to extract the patch that they represent and get a vectorized (flat) representation of the same. How can I do this?
I saw the answer to this question and using it I am able to reach to the patch that I require. For example, given the image coordinates of the 4 corners of the green rectangle in this image:
I am able to get to the patch and get something like:
using the following code:
p1 = (334,128)
p2 = (438,189)
p3 = (396,261)
p4 = (292,200)
pts = np.array([p1, p2, p3, p4])
mask = np.zeros((img.shape[0], img.shape[1]))
cv2.fillConvexPoly(mask, pts, 1)
mask = mask.astype(np.bool)
out = np.zeros_like(img)
out[mask] = img[mask]
patch = img[mask]
cv2.imwrite(img_name, out)
However, the problem is that the patch variable that I obtain is simply an array of all pixels of the image that belong to the patch, when the image is read as a matrix in row-major order.
What I want is that patch variable should contain the pixels in the order they can form a genuine image so that I can perform operations on it. Is there an opencv function that I should be aware of that would help me in doing this?
Thanks!
This is how you can implement this:
Code:
# create a subimage with the outer limits of the points
subimg = out[128:261,292:438]
# calculate the angle between the 2 'lowest' points, the 'bottom' line
myradians = math.atan2(p3[0]-p4[0], p3[1]-p4[1])
# convert to degrees
mydegrees = 90-math.degrees(myradians)
# create rotationmatrix
h,w = subimg.shape[:2]
center = (h/2,w/2)
M = cv2.getRotationMatrix2D(center, mydegrees, 1)
# rotate subimage
rotatedImg = cv2.warpAffine(subimg, M, (h, w))
Result:
Next, the black areas in the image can be easily cropped by removing all rows/columns that are 100% black.
Final result:
Code:
# converto image to grayscale
img = cv2.cvtColor(rotatedImg, cv2.COLOR_BGR2GRAY)
# sum each row and each volumn of the image
sumOfCols = np.sum(img, axis=0)
sumOfRows = np.sum(img, axis=1)
# Find the first and last row / column that has a sum value greater than zero,
# which means its not all black. Store the found values in variables
for i in range(len(sumOfCols)):
if sumOfCols[i] > 0:
x1 = i
print('First col: ' + str(i))
break
for i in range(len(sumOfCols)-1,-1,-1):
if sumOfCols[i] > 0:
x2 = i
print('Last col: ' + str(i))
break
for i in range(len(sumOfRows)):
if sumOfRows[i] > 0:
y1 = i
print('First row: ' + str(i))
break
for i in range(len(sumOfRows)-1,-1,-1):
if sumOfRows[i] > 0:
y2 = i
print('Last row: ' + str(i))
break
# create a new image based on the found values
finalImage = rotatedImg[y1:y2,x1:x2]
I am making a stitching with opencv and Python. All works well, except one thing : I don't manage to compute the exact final size of the result picture.
My image is always too big and i have black border. Moreover, the offset doesn't seem to be correct because there is a black line where pictures have merged.
Here is my function :
def calculate_size(size_image1, size_image2, homography):
## Calculate the size and offset of the stitched panorama.
offset = abs((homography*(size_image2[0]-1,size_image2[1]-1,1))[0:2,2])
print offset
size = (size_image1[1] + int(offset[0]), size_image1[0] + int(offset[1]))
if (homography*(0,0,1))[0][1] > 0:
offset[0] = 0
if (homography*(0,0,1))[1][2] > 0:
offset[1] = 0
## Update the homography to shift by the offset
homography[0:2,2] += offset
return (size, offset)
## 4. Combine images into a panorama. [4] --------------------------------
def merge_images(image1, image2, homography, size, offset, keypoints):
## Combine the two images into one.
panorama = cv2.warpPerspective(image2,homography,size)
(h1, w1) = image1.shape[:2]
for h in range(h1):
for w in range(w1):
if image1[h][w][0] != 0 or image1[h][w][3] != 0 or image1[h][w][4] != 0:
panorama[h+offset[1]][w + offset[0]] = image1[h][w]
## TODO: Draw the common feature keypoints.
return panorama
And my results:
1st image :
2nd image :
Stitched image :
What am I doing wrong?
if (homography*(0,0,1))[0][1] > 0:
offset[0] = 0
if (homography*(0,0,1))[1][2] > 0:
offset[1] = 0
Your code is wrong.The right one as following:
if (homography*(0,0,1))[0][2] > 0:
offset[0] = 0
if (homography*(0,0,1))[1][2] > 0:
offset[1] = 0
Well, I don't know a lot about Python but basically I had the some problem.
To solve the size issues I did the following:
perspectiveTransform( obj_original_corners, scene_corners, homography);
After that, I just searched in both images the smallest_X, smallest_Y, biggest_X and biggest_Y.
These numbers I then used in:
cv::warpPerspective(img_2,WarpedImage,homography,cv::Size(biggestX-smallestX,biggestY-smallestY));
So in that case the new image itself will have the proper size even if the 2nd image has a negative x or negative y.
Only thing I'm still struggling with myself at this moment is how to apply the shift to warpPerspective because now part of my image is cutoff due to negative numbers.
Accordding to stitching,All your process are right.The result is because your source picture.
for h in range(h1):
for w in range(w1):
if image1[h][w][0] != 0 or image1[h][w][3] != 0 or image1[h][w][4] != 0:
panorama[h+offset[1]][w + offset[0]] = image1[h][w]
The operation only filter the pixel ,whose color is zero.In fact ,some pixel seems like black,but it is not pure black and very near black. So these seem black pixel will not filter out by your program.