Use Python / PIL or similar to shrink whitespace - python

Any ideas how to use Python with the PIL module to shrink select all? I know this can be achieved with Gimp. I'm trying to package my app as small as possible, a GIMP install is not an option for the EU.
Say you have 2 images, one is 400x500, other is 200x100. They both are white with a 100x100 textblock somewhere within each image's boundaries. What I'm trying to do is automatically strip the whitespace around that text, load that 100x100 image textblock into a variable for further text extraction.
It's obviously not this simple, so just running the text extraction on the whole image won't work! I just wanted to query about the basic process. There is not much available on Google about this topic. If solved, perhaps it could help someone else as well...
Thanks for reading!

If you put the image into a numpy array, it's simple to find the edges which you can use PIL to crop. Here I'm assuming that the whitespace is the color (255,255,255), you can adjust to your needs:
from PIL import Image
import numpy as np
im = Image.open("test.png")
pix = np.asarray(im)
pix = pix[:,:,0:3] # Drop the alpha channel
idx = np.where(pix-255)[0:2] # Drop the color when finding edges
box = map(min,idx)[::-1] + map(max,idx)[::-1]
region = im.crop(box)
region_pix = np.asarray(region)
To show what the results look like, I've left the axis labels on so you can see the size of the box region:
from pylab import *
subplot(121)
imshow(pix)
subplot(122)
imshow(region_pix)
show()

The general algorithmn would be to find the color of the top left pixel, and then do a spiral scan inwards until you find a pixel not of that color. That will define one edge of your bounding box. Keep scanning until you hit one more of each edge.

http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
might be of some help. You can use the simple bounding box method described in that tutorial or #Tyler Eaves spiral suggestion which works equally as well

Related

How do I crop multiple images from one picture based on the background color?

So I have an image and I want to cut it up into multiple images to feed into OCR to read.
image example
I only want the messages with the white bubbles and exclude anything with the grey bubbles. I can't figure out how to make a loop to separate each white bubble.
import numpy as np
from PIL import ImageGrab, Image, ImageFilter
img = Image.open('test1.png').convert('RGB')
na = np.array(img)
orig = na.copy()
img = img.filter(ImageFilter.MedianFilter(3))
whiteY, whiteX = np.where(np.all(na==[255,255,255],axis=2))
top, bottom = whiteY[1], whiteY[-1]
left, right = whiteX[1], whiteX[-1]
You could try using the opencv threshold function, followed by the findContours function. This will, if you threshold the image correctly, give you the 'borders' of the bubbles above. Using that, you could then crop out each text bubble.
Here's a simple example of contours being used:
https://www.geeksforgeeks.org/find-and-draw-contours-using-opencv-python/
Otherwise if you'd like to understand better how the opencv functions I mentioned or those that are used in the article above, have a look at the opencv documentation.

Pixels retrieved from Viewer Node within Blender are darker than the actually rendered image... Why?

I am trying to create a pipeline in which I first render an image using the blender python API (I am using Blender 2.90) and then perform some image processing in python. I want to fetch the image directly from blender without first writing the rendered image to disk and then loading it again. I ran the following code within the blender GUI to do so:
import bpy
import numpy as np
import PIL.Image as Image
from skimage.util import img_as_ubyte
resolution_x = 512
resolution_y = 512
# render settings
scene = bpy.context.scene
scene.render.engine = 'BLENDER_EEVEE'
scene.render.resolution_x = resolution_x
scene.render.resolution_y = resolution_y
scene.render.image_settings.file_format = 'PNG'
scene.render.filepath = "path/to/good_image.png"
# create Viewer Layer in Compositor
scene.use_nodes = True
tree = scene.node_tree
nodes = tree.nodes
links = tree.links
for node in nodes:
nodes.remove(node)
render_layer_node = nodes.new('CompositorNodeRLayers')
viewer_node = nodes.new('CompositorNodeViewer')
links.new(viewer_node.inputs[0], render_layer_node.outputs[0])
# render scene and get pixels from Viewer Node
bpy.ops.render.render(write_still=True)
pixels = bpy.data.images['Viewer Node'].pixels
# do some processing and save
img = np.flip(img_as_ubyte(np.array(pixels[:]).reshape((resolution_y, resolution_x, 4))), axis=0)
Image.fromarray(img).save("path/to/bad_image.png")
Problem: The image I get from the Viewer Node is much darker (bad image) than the image saved in the conventional way (good image). Does anyone have an idea why this happens and how to fix it? Does blender maybe treat pixel values differently than I expect?
Some additional information:
Before conversion to uint8, the values of the alpha channel within the dark image are 1.0 (as they actually should be). Background values in the dark image are not 0.0 or negative (as one might guess from appearance), but 0.05...
What I tried:
I thought that pixels might be scaled within range -1 to 1, so I rescaled the pixels to range 0 to 1 before transforming to uint8... Did not lead to the correct image either :(
It's because the image that you get from the Viewer Node is the one "straight from compositing" before color management takes place. You can have a look at the documentation here: this image is still in the linear space.
Your good_image.png on the other hand is obtained after transformation into the "Display Space" (see diagram in the doc). Hence it was transformed into a log-space, maybe gamma-corrected, etc.
Finally, you can get an image that is close to (but slightly different though) to the good image from the viewer node by calling bpy.data.images['Viewer Node'].save_render(filepath) instead, but there is no way to directly extract the color-managed version without rendering to a file first. You can probably do it yourself by adding PyOpenColorIO to your script and applying the color management from this module.

Programming a picture maker template in Python possible?

I'm looking for a library that enables to "create pictures" (or even videos) with the following functions:
Accepting picture inputs
Resizing said inputs to fit given template / scheme
Positioning the pictures in pre-set up layers or coordinates
A rather schematic approach to look at this:
whereas the red spots are supposed to represent e.g. text, picture (or if possible video) elements.
The end goal would be to give the .py script multiple input pictures and the .py creating a finished version like mentioned above.
Solutions I tried were looking into Python PIL, but I wasn't able to find what I was looking for.
Yes, it is possible to do this with Python.
The library you are looking for is OpenCV([https://opencv.org][1]/).
Some basic OpenCV python tutorials (https://docs.opencv.org/master/d9/df8/tutorial_root.html).
1) You can use imread() function to read images from files.
2) You can use resize() function to resize the images.
3) You can create a empty master numpy array matching the size and depth(color depth) of the black rectangle in the figure you have shown, resize your image and copy the contents into the empty array starting from the position you want.
Below is a sample code which does something close to what you might need, you can modify this to suit your actual needs. (Since your requirements are not clear I have written the code like this so that it can at least guide you.)
import numpy as np
import cv2
import matplotlib.pyplot as plt
# You can store most of these values in another file and load them.
# You can modify this to set the dimensions of the background image.
BG_IMAGE_WIDTH = 100
BG_IMAGE_HEIGHT = 100
BG_IMAGE_COLOR_DEPTH = 3
# This will act as the black bounding box you have shown in your figure.
# You can also load another image instead of creating empty background image.
empty_background_image = np.zeros(
(BG_IMAGE_HEIGHT, BG_IMAGE_WIDTH, BG_IMAGE_COLOR_DEPTH),
dtype=np.int
)
# Loading an image.
# This will be copied later into one of those red boxes you have shown.
IMAGE_PATH = "./image1.jpg"
foreground_image = cv2.imread(IMAGE_PATH)
# Setting the resize target and top left position with respect to bg image.
X_POS = 4
Y_POS = 10
RESIZE_TARGET_WIDTH = 30
RESIZE_TARGET_HEIGHT = 30
# Resizing
foreground_image= cv2.resize(
src=foreground_image,
dsize=(RESIZE_TARGET_WIDTH, RESIZE_TARGET_HEIGHT),
)
# Copying this into background image
empty_background_image[
Y_POS: Y_POS + RESIZE_TARGET_HEIGHT,
X_POS: X_POS + RESIZE_TARGET_WIDTH
] = foreground_image
plt.imshow(empty_background_image)
plt.show()

How to present numpy array into pygame surface?

I'm writing a code that part of it is reading an image source and displaying it on the screen for the user to interact with. I also need the sharpened image data. I use the following to read the data and display it in pyGame
def image_and_sharpen_array(file_name):
#read the image data and return it, with the sharpened image
image = misc.imread(file_name)
blurred = ndimage.gaussian_filter(image,3)
edge = ndimage.gaussian_filter(blurred,1)
alpha = 20
out = blurred + alpha*(blurred - edge)
return image,out
#get image data
scan,sharpen = image_and_sharpen_array('foo.jpg')
w,h,c = scan.shape
#setting up pygame
pygame.init()
screen = pygame.display.set_mode((w,h))
pygame.surfarray.blit_array(screen,scan)
pygame.display.update()
And the image is displayed on the screen only rotated and inverted. Is this due to differences between misc.imread and pyGame? Or is this due to something wrong in my code?
Is there other way to do this? The majority of solution I read involved saving the figure and then reading it with ``pyGame''.
I often use the numpy swapaxes() method:
In this case we only need to invert x and y axis (axis number 0 and 1) before displaying our array :
return image.swapaxes(0,1),out
I thought technico provided a good solution - just a little lean on info. Assuming get_arr() is a function that returns the pixel array:
pixl_arr = get_arr()
pixl_arr = numpy.swapaxes(pixl_arr, 0, 1)
new_surf = pygame.pixelcopy.make_surface(pixl_arr)
screen.blit(new_surf, (dest_x, dest_y))
Alternatively, if you know that the image will always be of the same dimensions (as in iterating through frames of a video or gif file), it would be more efficient to reuse the same surface:
pixl_arr = get_arr()
pixl_arr = numpy.swapaxes(pixl_arr, 0, 1)
pygame.pixelcopy.array_to_surface(old_surf, pixl_arr)
screen.blit(old_surf, (dest_x, dest_y))
YMMV, but so far this is working well for me.
Every lib has its own way of interpreting image arrays. By 'rotated' I suppose you mean transposed. That's the way PyGame shows up numpy arrays. There are many ways to make it look 'correct'. Actually there are many ways even to show up the array, which gives you full control over channel representation and so on. In pygame version 1.9.2, this is the fastest array rendering that I could ever achieve. (Note for earlier version this will not work!).
This function will fill the surface with array:
def put_array(surface, myarr): # put array into surface
bv = surface.get_view("0")
bv.write(myarr.tostring())
If that is not working, use this, should work everywhere:
# put array data into a pygame surface
def put_arr(surface, myarr):
bv = surface.get_buffer()
bv.write(myarr.tostring(), 0)
You probably still get not what you want, so it is transposed or have swapped color channels. The idea is, manage your arrays in that form, which suites this surface buffer. To find out what is correct channel order and axes order, use openCV library (cv2.imread(filename)). With openCV you open images in BGR order as standard, and it has a lot of conversion functions. If I remember correctly, when writing directly to surface buffer, BGR is the correct order for 24 bit and BGRA for a 32 bit surface. So you can try to put the image array which you get out of file with this function and blit to the screen.
There are other ways to draw arrays e.g. here is whole set of helper functions http://www.pygame.org/docs/ref/surfarray.html
But I would not recommend using it, since surfaces are not for direct pixel manipulating, you will probably get lost in references.
Small tip: To do 'signalling test' use a picture, like this. So you will immediately see if something is wrong, just load as array and try to render.
My suggestion is to use the pygame.transform module. There are the flip and rotate methods, which you can use to however your transformation is. Look up the docs on this.
My recommendation is to save the output image to a new Surface, and then apply the transformations, and blit to the display.
temp_surf = pygame.Surface((w,h))
pygame.surfarray.blit(temp_surf, scan)
'''transform temp_surf'''
screen.blit(temp_surf, (0,0))
I have no idea why this is. It is probably something to do with the order in which the axes are transferred from a 2d array to a pygame Surface.

Using PIL to fill empty image space with nearby colors (aka inpainting)

I create an image with PIL:
I need to fill in the empty space (depicted as black). I could easily fill it with a static color, but what I'd like to do is fill the pixels in with nearby colors. For example, the first pixel after the border might be a Gaussian blur of the filled-in pixels. Or perhaps a push-pull type algorithm described in The Lumigraph, Gortler, et al..
I need something that is not too slow because I have to run this on many images. I have access to other libraries, like numpy, and you can assume that I know the borders or a mask of the outside region or inside region. Any suggestions on how to approach this?
UPDATE:
As suggested by belisarius, opencv's inpaint method is perfect for this. Here's some python code that uses opencv to achieve what I wanted:
import Image, ImageDraw, cv
im = Image.open("u7XVL.png")
pix = im.load()
#create a mask of the background colors
# this is slow, but easy for example purposes
mask = Image.new('L', im.size)
maskdraw = ImageDraw.Draw(mask)
for x in range(im.size[0]):
for y in range(im.size[1]):
if pix[(x,y)] == (0,0,0):
maskdraw.point((x,y), 255)
#convert image and mask to opencv format
cv_im = cv.CreateImageHeader(im.size, cv.IPL_DEPTH_8U, 3)
cv.SetData(cv_im, im.tostring())
cv_mask = cv.CreateImageHeader(mask.size, cv.IPL_DEPTH_8U, 1)
cv.SetData(cv_mask, mask.tostring())
#do the inpainting
cv_painted_im = cv.CloneImage(cv_im)
cv.Inpaint(cv_im, cv_mask, cv_painted_im, 3, cv.CV_INPAINT_NS)
#convert back to PIL
painted_im = Image.fromstring("RGB", cv.GetSize(cv_painted_im), cv_painted_im.tostring())
painted_im.show()
And the resulting image:
A method with nice results is the Navier-Stokes Image Restoration. I know OpenCV has it, don't know about PIL.
Your example:
I did it with Mathematica.
Edit
As per your reuquest, the code is:
i = Import["http://i.stack.imgur.com/uEPqc.png"];
Inpaint[i, ColorNegate#Binarize#i, Method -> "NavierStokes"]
The ColorNegate# ... part creates the replacement mask.
The filling is done with just the Inpaint[] command.
Depending on how you're deploying this application, another option might be to use the Gimp's python interface to do the image manipulation.
The doc page I linked to is oriented more towards writing GIMP plugins in python, rather than interacting with a background gimp instance from a python app, but I'm pretty sure that's also possible (it's been a while since I played with the gimp/python interface, I'm a little hazy).
You can also create the mask with the function CreateImage(), for instance:
inpaint_mask = cv.CreateImage(cv.GetSize(im), 8, 1)

Categories