I have a scanned document image of some black text on a white background. I first invert the image so background = black and all text = white.
I then use OpenCV's reduce() function to give me a vertical projection on the image, which looks a little like: (sum of all pixel values for each row in the image)
0,
0,
0,
434
34
0,
0,
From this, I can tell that 0 pixel values denote the background whereas values > 0 depict there is some text.
From here, I've looped through the pixel values looking for any 0 values in which the next value != 0 (The next value is contains text) and stored the positions in a two tuple list namely pairedCoordinates: [(23, 43), (54, 554)] etc...
From that point I can then loop through pairedCoordinates and draw boundingRects around my regions of interest:
for start, finish in pairedCoordinates:
height = finish - start
cv2.rectangle(small, (0, start), (0+cols, start + height), (255, 255, 255), 1)
Up to this point, all works fine. What I'm trying to do next, is for each rectangle, append its inner content (pixels) to another list so I can perform further computations on only sections of the image contained within the rects.
I've attempted the following:
# Initialise an empty list
roi = []
and then within the above for loop, I add the additional:
for start, finish in pairedCoordinates:
height = finish - start
cv2.rectangle(small, (0, start), (0+cols, start + height), (255, 255, 255), 1)
# cols = the number of columns in the image, small being the image
glyph = small[y: start + height, x:0+cols]
roi.append(glyph)
When attempting this code, I'm getting 'Unresolved references to x & y and I'm a little unsure why.
Could someone just point me in the right direction of how to actually achieve what I've explained above.
UPDATE
As mentioned by Miki in the comments, I forgot to initialise x, y.
I simply defined x=0 y=0 before computing the glyph and that seemed to do the trick.
I'm having some issues looping through each of the areas and writing them to a file though. Instead of each bounding rect being written individually, each new image file created is just appending the next pixels to the existing image?
for i, r in enumerate(roi):
cv2.imwrite("roi_%02d.png" % i, r)
Related
I am trying to create a function that will allow me to swap every red and blue pixel of an image. However, when running the function, new image does not change or do the intended. So far, I am only trying to change the image to only blue filter to test the function.
from CSE8AImage import *
img = load_img('images/cat.jpg')
def complement(img):
for r in range(len(img)):
for c in range(len(img[r])):
pix = img[r][c]
img[r][c] = (0, 0, pix[2])
return img
save_img(img, 'complement_cat.jpg')
What you're doing in your code is simply setting the red and green pixels to 0(assuming it's RGB? I couldn't find anything about the CSE8AImage library outside of this page which perfectly matches your question). I will continue assuming it's in RGB.
What you should change in your code to make it work is simply change img[r][c] = (0,0,pix[2]) to img[r][c] = pix[[2,1,0]] as this is saying to reorder the pixels (RGB, index 0,1,2) to the new order (BGR, index 2,1,0).
A simpler way would just be to do the whole array at once:
def complement(img):
return img[:,:,[2,1,0]]
This will only work if you can index it like an array in python. Ignore this if this is not the case.
Good afternoon. My task is to take a questionnaire and write the selected options into the database, according to the type true or false. With the help of Microsoft Recognize, I get the desired table and then with the help of cm2, I make a crop of the image to understand whether it is true or false.
imagePage = cv2.imread(image.path)
x = int(coordinates[0])
y = int(coordinates[1])
w = int(coordinates[2])
h = int(coordinates[3])
imageCell = imagePage[y:y+h, x:x+w]
cv2.rectangle(imagePage, (x, y), (x + w, y + h), (0, 255, 0), 3)
Table with selected cells:
Next, the idea was to check by the presence of black pixels. If there are many of them, then the item has been selected and it is true. Unfortunately, this does not work, because the border of the lines, which are also considered black pixels, fall.
def countBlackPixels(imageCell):
lowerBlack = np.array([0, 0, 0], np.uint8)
blackRange = cv2.inRange(imageCell, lowerBlack, lowerBlack)
blackPixels = cv2.countNonZero(blackRange)
return blackPixels
For example, one cell with X shows 258 black pixels, the second cell with X shows 98.
Are there any ways to read in the questionnaire the answers to the cells that are presented in the format: X, V?
The difference between two cells in terms of number of black cells felt like way too high.
This is possibly be the result of selection of borders in a given cell.
I wonder what is the result of
def countBlackPixels(imageCell):
for an empty cell.
Maybe we can set a threshold from this value to decide whether or not a cell is checked.
If that does not look promising, maybe we can rewrite the countBlackPixels() to look for the center area of the cell specifically.
Something like this:
Waiting to hear from you!
Actually, I have to find no of text lines in the given image For e.g. If I am having two images
from PIL import ImageGrab
img1=ImageGrab.grab([0,0,200,80])
img2=ImageGrab.grab([300,0,500,80])
first one is img1
and second one is img2
How can I get the number of text lines in an image, so that it outputs 5 for img1, and 4 for img2?
If you want to do this without OCR-ing the text, the typical approach, is to determine for each line in the image if it has one or more than one color.
The lines with one color can be assumed to be background any transition from more than one color to a single color is the "bottom" line of a text row. Count those transitions and you'll have the number of lines of text in an image.
This assumes:
characters of one line do no extend completely to the bottom of the cell they are drawn in (that would mean there might never be an empty line if the top line has a g and the bottom one an f - or similar configurations)
there is only text and not pictures (as in you samples).
You can find the number of lines in a text image using open cv :
grayscale = cv2.cvtColor(your_text_image, cv2.COLOR_BGR2GRAY)
# converting to binary image
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
# inverting to have white text on black background
binary = 255 - binary
# calculation y axis histogram
hist = cv2.reduce(binary, 1, cv2.REDUCE_AVG).reshape(-1)
# append every y position corresponding to a bottom of text line
lines = []
for y in range(h - 1):
if hist[y + 1] <= 2 < hist[y]:
lines.append(y)
number_of_lines = len(lines)
First Threshold the image.
Calculate mean pixel value of horizontally(top to bottom).
After getting all values find out the transitions/significant gap. If there is significant gap between black pixel then(you need to decide white pixel threshold: how many number of white pixels between two line).
Number of continuous black pixel cluster is your answer.
I am trying to paste an image onto another one. I am actually using the second answer by Joseph here because I am trying to do something very similar: resize my foregroud to the background image, and then copy only the black pixels in the foreground onto the background. My foreground is a color image with black contours, and I want only the contours to be pasted on the background. The line
mask = pixel_filter(mask, (0, 0, 0), (0, 0, 0, 255), (0, 0, 0, 0))
returns the error "image index out of range".
When I don't do this filtering process to see if pasting at least works, I get a "bad mask transparency error". I have set the background and foreground to RGB and RGBA both to see if any combination solves the problem, it doesn't.
What am I doing wrong in the mask() line, and what am I missing about the paste process? Thanks for any help.
The pixel filter function you are referencing has a slight bug it seems. It's trying to convert a 1 dimensional list index into a 2d index backwards. It should be (x,y) => (index/height, index%height) (see here). Below is the function (full attribution to the original author) rewritten.
def pixel_filter(image, condition, true_colour, false_colour):
filtered = Image.new("RGBA", image.size)
pixels = list(image.getdata())
for index, colour in enumerate(pixels):
if colour == condition:
filtered.putpixel((index/image.size[1],index%image.size[1]), true_colour)
else:
filtered.putpixel((index/image.size[1],index%image.size[1]), false_colour)
return filtered
I'm working on a program in which I need to separate spritesheets, or in other words, separate an image into contiguous regions of color.
I've never done any image processing before, so I'm wondering how I would go about this. What would I do after I test for pixel color? What's the best way to determine which pixel goes with each sprite?
All the input images have uniform backgrounds, and an alpha channel different from that of the background counts as color. The order of the output images needs to be left-right, up-down. My project is written in PySide, so I'm hoping to use it for this task too, but I could import more libraries if necessary.
Thanks your replies!
P.S.:
I'm not sure if the PySide tag is appropriate or not, since I'm using PySide, but the question doesn't involve the GUI aspects of it. If a mod feels it doesn't belong, feel free to remove it.
For example, I have a spritesheet that looks like this:
I want to separate it into these:
That sounds like something that should be implemented in anything that deals with sprites, but here we will implement our own sprite-spliter.
The first thing we need here is to extract the individual objects. In this situation, it is only a matter of deciding whether a pixel is a background one or not. If we assume the point at origin is a background pixel, then we are done:
from PIL import Image
def sprite_mask(img, bg_point=(0, 0)):
width, height = img.size
im = img.load()
bg = im[bg_point]
mask_img = Image.new('L', img.size)
mask = mask_img.load()
for x in xrange(width):
for y in xrange(height):
if im[x, y] != bg:
mask[x, y] = 255
return mask_img, bg
If you save the mask image created above and open it, here is what you would see on it (I added a rectangle inside your empty window):
With the image above, the next thing we need is to fill its holes if we want to join sprites that are inside others (like the rectangle added, see figure above). This is another simple rule: if a point cannot be reached from the point at [0, 0], then it is a hole and it must be filled. All that is left is then separating each sprite in individual images. This is done by connected component labeling. For each component we get its axis-aligned bounding box in order to define the dimensions of the piece, and then we copy from the original image the points that belong to a given component. To keep it short, the following code uses scipy for these tasks:
import sys
import numpy
from scipy.ndimage import label, morphology
def split_sprite(img, mask, bg, join_interior=True, basename='sprite_%d.png'):
im = img.load()
m = numpy.array(mask, dtype=numpy.uint8)
if join_interior:
m = morphology.binary_fill_holes(m)
lbl, ncc = label(m, numpy.ones((3, 3)))
for i in xrange(1, ncc + 1):
px, py = numpy.nonzero(lbl == i)
xmin, xmax, ymin, ymax = px.min(), px.max(), py.min(), py.max()
sprite = Image.new(img.mode, (ymax - ymin + 1, xmax - xmin + 1), bg)
sp = sprite.load()
for x, y in zip(px, py):
x, y = int(x), int(y)
sp[y - int(ymin), x - int(xmin)] = im[y, x]
name = basename % i
sprite.save(name)
print "Wrote %s" % name
sprite = Image.open(sys.argv[1])
mask, bg = sprite_mask(sprite)
split_sprite(sprite, mask, bg)
Now you have all the pieces (sprite_1.png, sprite_2.png, ..., sprite_8.png) exactly as you included in the question.