Accounting for 'i' and 'j' dots in OCR python - python

I am trying to create an OCR system in python - the first part involves extracting all characters from an image. This works fine and all characters are separated into their own bounding boxes.
Code attached below:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from scipy.misc import imread,imresize
from skimage.segmentation import clear_border
from skimage.morphology import label
from skimage.measure import regionprops
image = imread('./ocr/testing/adobe.png',1)
bw = image < 120
cleared = bw.copy()
clear_border(cleared)
label_image = label(cleared,neighbors=8)
borders = np.logical_xor(bw, cleared)
label_image[borders] = -1
print label_image.max()
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
ax.imshow(bw, cmap='jet')
for region in regionprops(label_image):
if region.area > 20:
minr, minc, maxr, maxc = region.bbox
rect = mpatches.Rectangle((minc, minr), maxc - minc, maxr - minr,
fill=False, edgecolor='red', linewidth=2)
ax.add_patch(rect)
plt.show()
However, since the letters i and j have 'dots' on top of them, the code takes the dots as separate bounding boxes. I am using the regionprops library. Would it also be a good idea to resize and normalise each bounding box?
How would i modify this code to account for i and j? My understanding would be that I would need to merge the bounding boxes that are closeby? Tried with no luck... Thanks.

Yes, you generally want to normalize the content of your bounding boxes to fit your character classifier's input dimensions (assuming you are working on character classifiers with explicit segmentation and not with sequence classifiers segmenting implicitly).
For merging vertically isolated CCs of the same letter, e.g. i and j, I'd try an anisotropic Gaussian filter (very small sigma in x-direction, larger in y-direction). The exact parameterization will depend on your input data, but it should be easy to find a suitable value experimentally such that all letters result in exactly one CC.
An alternative would be to analyze CCs which exhibit horizontal overlap with other CCs and merge those pairs where the overlap exceeds a certain relative threshold.
Illustrating on the given example:
# Anisotropic Gaussian
from scipy.ndimage.filters import gaussian_filter
filtered = gaussian_filter(f, (2,0))
plt.imshow(filtered, cmap=plt.cm.gray)
# Now threshold
bin = filtered < 1
plt.imshow(bin, cmap=plt.cm.gray)
It's easy to see that each character is now represented by exactly one CC. Now we pretty much only have to apply each mask and crop the white areas to end up with the bounding boxes for each character. After normalizing their size we can directly feed them to the classifier (consider that we lose ascender/descender line information as well as width/height ratio though and those may be useful as a feature for the classifier; so it should help feeding them explicitly to the classifier in addition to the normalized bounding box content).

Related

How to remove white lines from an image in Python [duplicate]

I have image of skin colour with repetitive pattern (Horizontal White Lines) generated by a scanner that uses a line of sensors to perceive the photo.
My Question is how to denoise the image effectively using FFT without affecting the quality of the image much, somebody told me that I have to suppress the lines that appears in the magnitude spectrum manually, but I didn't know how to do that, can you please tell me how to do it?
My approach is to use Fast Fourier Transform(FFT) to denoise the image channel by channel.
I have tried HPF, and LPF in Fourier domain, but the results were not good as you can see:
My Code:
from skimage.io import imread, imsave
from matplotlib import pyplot as plt
import numpy as np
img = imread('skin.jpg')
R = img[...,2]
G = img[...,1]
B = img[...,0]
f1 = np.fft.fft2(R)
fshift1 = np.fft.fftshift(f1)
phase_spectrumR = np.angle(fshift1)
magnitude_spectrumR = 20*np.log(np.abs(fshift1))
f2 = np.fft.fft2(G)
fshift2 = np.fft.fftshift(f2)
phase_spectrumG = np.angle(fshift2)
magnitude_spectrumG = 20*np.log(np.abs(fshift2))
f3 = np.fft.fft2(B)
fshift3 = np.fft.fftshift(f3)
phase_spectrumB = np.angle(fshift3)
magnitude_spectrumB = 20*np.log(np.abs(fshift2))
#===============================
# LPF # HPF
magR = np.zeros_like(R) # = fshift1 #
magR[magR.shape[0]//4:3*magR.shape[0]//4,
magR.shape[1]//4:3*magR.shape[1]//4] = np.abs(fshift1[magR.shape[0]//4:3*magR.shape[0]//4,
magR.shape[1]//4:3*magR.shape[1]//4]) # =0 #
resR = np.abs(np.fft.ifft2(np.fft.ifftshift(magR)))
resR = R - resR
#===============================
magnitude_spectrumR
plt.subplot(221)
plt.imshow(R, cmap='gray')
plt.title('Original')
plt.subplot(222)
plt.imshow(magnitude_spectrumR, cmap='gray')
plt.title('Magnitude Spectrum')
plt.subplot(223)
plt.imshow(phase_spectrumR, cmap='gray')
plt.title('Phase Spectrum')
plt.subplot(224)
plt.imshow(resR, cmap='gray')
plt.title('Processed')
plt.show()
Here is a simple and effective linear filtering strategy to remove the horizontal line artifact:
Outline:
Estimate the frequency of the distortion by looking for a peak in the image's power spectrum in the vertical dimension. The function scipy.signal.welch is useful for this.
Design two filters: a highpass filter with cutoff just below the distortion frequency and a lowpass filter with cutoff near DC. We'll apply the highpass filter vertically and the lowpass filter horizontally to try to isolate the distortion. We'll use scipy.signal.firwin to design these filters, though there are many ways this could be done.
Compute the restored image as "image − (hpf ⊗ lpf) ∗ image".
Code:
# Copyright 2021 Google LLC.
# SPDX-License-Identifier: Apache-2.0
import numpy as np
from scipy.ndimage import convolve1d
from scipy.signal import firwin, welch
def remove_lines(image, distortion_freq=None, num_taps=65, eps=0.025):
"""Removes horizontal line artifacts from scanned image.
Args:
image: 2D or 3D array.
distortion_freq: Float, distortion frequency in cycles/pixel, or
`None` to estimate from spectrum.
num_taps: Integer, number of filter taps to use in each dimension.
eps: Small positive param to adjust filters cutoffs (cycles/pixel).
Returns:
Denoised image.
"""
image = np.asarray(image, float)
if distortion_freq is None:
distortion_freq = estimate_distortion_freq(image)
hpf = firwin(num_taps, distortion_freq - eps,
pass_zero='highpass', fs=1)
lpf = firwin(num_taps, eps, pass_zero='lowpass', fs=1)
return image - convolve1d(convolve1d(image, hpf, axis=0), lpf, axis=1)
def estimate_distortion_freq(image, min_frequency=1/25):
"""Estimates distortion frequency as spectral peak in vertical dim."""
f, pxx = welch(np.reshape(image, (len(image), -1), 'C').sum(axis=1))
pxx[f < min_frequency] = 0.0
return f[pxx.argmax()]
Examples:
On the portrait image, estimate_distortion_freq estimates that the frequency of the distortion is 0.1094 cycles/pixel (period of 9.14 pixels). The transfer function of the filtering "image − (hpf ⊗ lpf) ∗ image" looks like this:
Here is the filtered output from remove_lines:
On the skin image, estimate_distortion_freq estimates that the frequency of the distortion is 0.08333 cycles/pixel (period of 12.0 pixels). Filtered output from remove_lines:
The distortion is mostly removed on both examples. It isn't perfect: on the portrait image, a couple ripples are still visible near the top and bottom borders, a typical defect when using large filters or Fourier methods. Still, it's a good improvement over the original images.

Extract street network from a raster image

I have a 512x512 image of a street grid:
I'd like to extract polylines for each of the streets in this image (large blue dots = intersections, small blue dots = points along polylines):
I've tried a few techniques! One idea was to start with skeletonize to compress the streets down to 1px wide lines:
from skimage import morphology
morphology.skeletonize(streets_data))
Unfortunately this has some gaps that break the connectivity of the street network; I'm not entirely sure why, but my guess is that this is because some of the streets are 1px narrower in some places and 1px wider in others. (update: the gaps aren't real; they're entirely artifacts of how I was displaying the skeleton. See this comment for the sad tale. The skeleton is well-connected.)
I can patch these using a binary_dilation, at the cost of making the streets somewhat variable width again:
out = morphology.skeletonize(streets_data)
out = morphology.binary_dilation(out, morphology.selem.disk(1))
With a re-connected grid, I can run the Hough transform to find line segments:
import cv2
rho = 1 # distance resolution in pixels of the Hough grid
theta = np.pi / 180 # angular resolution in radians of the Hough grid
threshold = 8 # minimum number of votes (intersections in Hough grid cell)
min_line_length = 10 # minimum number of pixels making up a line
max_line_gap = 2 # maximum gap in pixels between connectable line segments
# Run Hough on edge detected image
# Output "lines" is an array containing endpoints of detected line segments
lines = cv2.HoughLinesP(
out, rho, theta, threshold, np.array([]),
min_line_length, max_line_gap
)
line_image = streets_data.copy()
for i, line in enumerate(lines):
for x1,y1,x2,y2 in line:
cv2.line(line_image,(x1,y1),(x2,y2), 2, 1)
This produces a whole jumble of overlapping line segments, along with some gaps (look at the T intersection on the right side):
At this point I could try to de-dupe overlapping line segments, but it's not really clear to me that this is a path towards a solution, especially given that gap.
Are there more direct methods available to get at the network of polylines I'm looking for? In particular, what are some methods for:
Finding the intersections (both four-way and T intersections).
Shrinking the streets to all be 1px wide, allowing that there may be some variable width.
Finding the polylines between intersections.
If you want to improve your "skeletonization", you could try the following algorithm to obtain the "1-px wide streets":
import imageio
import numpy as np
from matplotlib import pyplot as plt
from scipy.ndimage import distance_transform_edt
from skimage.segmentation import watershed
# read image
image_rgb = imageio.imread('1mYBD.png')
# convert to binary
image_bin = np.max(image_rgb, axis=2) > 0
# compute the distance transform (only > 0)
distance = distance_transform_edt(image_bin)
# segment the image into "cells" (i.e. the reciprocal of the network)
cells = watershed(distance)
# compute the image gradients
grad_v = np.pad(cells[1:, :] - cells[:-1, :], ((0, 1), (0, 0)))
grad_h = np.pad(cells[:, 1:] - cells[:, :-1], ((0, 0), (0, 1)))
# given that the cells have a constant value,
# only the edges will have non-zero gradient
edges = (abs(grad_v) > 0) + (abs(grad_h) > 0)
# extract points into (x, y) coordinate pairs
pos_v, pos_h = np.nonzero(edges)
# display points on top of image
plt.imshow(image_bin, cmap='gray_r')
plt.scatter(pos_h, pos_v, 1, np.arange(pos_h.size), cmap='Spectral')
The algorithm works on the "blocks" rather than the "streets", take a look into the cells image:
I was on the right track with skeleton; it does produce a connected, 1px wide version of the street grid. It's just that there was a bug in my display code (see this comment). Here's what the skeleton actually looks like:
from skimage import morphology
morphology.skeletonize(streets_data))
From here the reference to NEFI2 in RJ Adriaansen's comment was extremely helpful. I wasn't able to get an extraction pipeline to run on my image using their GUI, but I was able to cobble together something using their code and approach.
Here's the general procedure I wound up with:
Find candidate nodes. These are pixels in the skeleton with either one neighbor (end of a line) or 3+ neighbors (an intersection). NEFI2 calls this the Zhang-Suen algorithm. For four-way intersections, it produces multiple nodes so we need to merge them.
Repeat until no two nodes are too close together:
Use breadth-first search (flood fill) to connect nodes.
If two connected nodes are within D of each other, merge them.
Run shapely's simplify on the paths between nodes to get polylines.
I put this all together in a repo here: extract-raster-network.
This works pretty well on my test images! Here are some sample images:

How to clean binary image using horizontal projection?

I want to remove anything other than text from a license plate with a binary filter.
I have the projections on each axis but I don't know how to apply it. My idea is to erase the white outlines.
This is the image I'm working for now:
This is the projection in Axis X:
from matplotlib import pyplot as plt
import pylab
(rows,cols)=img.shape
h_projection = np.array([ x/255/rows for x in img.sum(axis=0)])
plt.plot(range(cols), h_projection.T)
And this is the result:
As you can see in the graph, at the end the line is shot by the white contour.
How can I erase everything that is at a certain threshold of the photo? Every help is appreciated
So, you want to extract the black areas within the white characters.
For example, you can select the columns (or rows) in your histograms where the value is less than a certain threshold.
from matplotlib import pyplot as plt
import pylab
import numpy as np
img = plt.imread('binary_image/iXWgw.png')
(rows,cols)=img.shape
h_projection = np.array([ x/rows for x in img.sum(axis=0)])
threshold = (np.max(h_projection) - np.min(h_projection)) / 4
print("we will use threshold {} for horizontal".format(threshold))
# select the black areas
black_areas = np.where(h_projection < threshold)
fig = plt.figure(figsize=(16,8))
fig.add_subplot(121)
for j in black_areas:
img[:, j] = 0
plt.plot((j, j), (0, 1), 'g-')
plt.plot(range(cols), h_projection.T)
v_projection = np.array([ x/cols for x in img.sum(axis=1)])
threshold = (np.max(v_projection) - np.min(v_projection)) / 4
print("we will use threshold {} for vertical".format(threshold))
black_areas = np.where(v_projection < threshold)
fig.add_subplot(122)
for j in black_areas:
img[j, :] = 0
plt.plot((0,1), (j,j), 'g-')
plt.plot(v_projection, range(rows))
plt.show()
# obscurate areas on the image
plt.figure(figsize=(16,12))
plt.subplot(211)
plt.title("Image with the projection mask")
plt.imshow(img)
# erode the features
import scipy
plt.subplot(212)
plt.title("Image after erosion (suggestion)")
eroded_img = scipy.ndimage.morphology.binary_erosion(img, structure=np.ones((5,5))).astype(img.dtype)
plt.imshow(eroded_img)
plt.show()
So now you have the horizontal and vertical projections, that look like this
And after that you can apply the mask: there are several ways of doing this, in the code is already applied within the for loop, where we set img[:,j] = 0 for the columns, and img[j,:] = 0 for the rows. It was easy and I think intuitive, but you can look for other methods.
As a suggestion, I would say you can look into the morphological operator of erosion that can help to separate the white parts.
So the output would look like this.
Unfortunately, the upper and lower part still show white regions. You can manually set the rows to white img[:10,:] = 0, img[100:,:] = 0, but that probably would not work on all the images you have (if you are trying to train a neural network I assume you have lots of them, so you need to have a code that works on all of them.
So, since now you ask for segmentation also, this opens another topic. Segmentation is a complex task, and it is not as straightforward as a binary mask. I would strongly suggest you read some material on that before you just apply something without understanding. For example here a guide on image processing with scipy, but you may look for more.
As a suggestion and a small snippet to make it work, you can use the labeling from scipy.ndimage.
Here a small part of code (from the guide)
label_im, nb_labels = scipy.ndimage.label(eroded_img)
plt.figure(figsize=(16,12))
plt.subplot(211)
plt.title("Segmentation")
plt.imshow(label_im)
plt.subplot(212)
plt.title("One Object as an example")
plt.imshow(label_im == 6) # change number for the others!
Which will output:
As an example I showed the S letter. if you change label_im == 6 you will get the next letter. As you will see yourself, it is not always correct and other little pieces of the image are also considered as objects. So you will have to work a little bit more on that.

Image segmentation to find cells in biological images

I have a bunch of images of cells and I want to extract where the cells are. I'm currently using circular Hough transforms and it works alright, but screws up regularly. Wondering if people have any pointers. Sorry this isn't a question specifically about software - it's how to get better performance in tis image segmentation problem.
I've tried other stuff in skimage with limited success, like the contour finding, edge detection and active contours. Nothing worked well out of the box, although it could just be that I didn't fiddle with the parameters correctly. I haven't done much image segmentation, and I don't really know how this stuff works or what the best ways are to jury-rig it.
Here is the code I currently am using that takes a grayscale image as a numpy array and looks for the cell as a circle:
import cv2
import numpy as np
smallest_dim = min(img.shape)
min_rad = int(img.shape[0]*0.05)
max_rad = int(img.shape[0]*0.5) #0.5
circles = cv2.HoughCircles((img*255).astype(np.uint8),cv2.HOUGH_GRADIENT,1,50,
param1=50,param2=30,minRadius=min_rad,maxRadius=max_rad)
circles = np.uint16(np.around(circles))
x, y, r = circles[0,:][:1][0]
Here is an example where the code found the wrong circle as the boundary of the cell. It seems like it got confused by the gunk that is surrounding the cell:
I think one issue may be the plotting of circle (coordinates may be wrong).
Also, like #Nicos mentioned, there is alot of tweaking involved with traditional image processing to make specific cases work (while more recent machine learning approaches, the tweaking is so that models do not over-train), my attempt with skimage is displayed below. Radius range, number of circles, edge detection image, all needs to be tweaked... given the potential variation among and within images. Within this image, there are, at least to me, 3 circles with varying gradient, from the canny edge detection image, you can sort of see we are getting more than 3 circles, further, the "illumination" seems to vary at different locations (due to this being an sem image)?!
import matplotlib.pyplot as plt
import numpy as np
import imageio
from skimage import data, color
from skimage.transform import hough_circle, hough_circle_peaks
from skimage.feature import canny
from skimage.draw import circle_perimeter
from skimage.util import img_as_ubyte
!wget https://i.stack.imgur.com/2tsWw.jpg
# rgb to gray https://stackoverflow.com/a/51571053/868736
im = imageio.imread('2tsWw.jpg')
gray = lambda rgb : np.dot(rgb[... , :3] , [0.299 , 0.587, 0.114])
gray = gray(im)
image = np.array(gray[60:220,210:450])
plt.imshow(image,cmap='gray')
edges = canny(image, sigma=3,)
plt.imshow(edges,cmap='gray')
overlayimage = np.copy(image)
# https://scikit-image.org/docs/dev/auto_examples/edges/plot_circular_elliptical_hough_transform.html
hough_radii = np.arange(30, 60, 2)
hough_res = hough_circle(edges, hough_radii)
# Select the most prominent X circles
x=1
accums, cx, cy, radii = hough_circle_peaks(hough_res, hough_radii,
total_num_peaks=x)
# Draw them
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(10, 4))
#image = color.gray2rgb(image)
for center_y, center_x, radius in zip(cy, cx, radii):
circy, circx = circle_perimeter(center_y, center_x, radius)
overlayimage[circy, circx] = 255
print(radii)
ax.imshow(overlayimage,cmap='gray')
plt.show()

how to locate the center of a bright spot in an image?

Here is an example of the kinds of images I'll be dealing with:
(source: csverma at pages.cs.wisc.edu)
There is one bright spot on each ball. I want to locate the coordinates of the centre of the bright spot. How can I do it in Python or Matlab? The problem I'm having right now is that more than one points on the spot has the same (or roughly the same) white colour, but what I need is to find the centre of this 'cluster' of white points.
Also, for the leftmost and rightmost images, how can I find the centre of the whole circular object?
You can simply threshold the image and find the average coordinates of what is remaining. This handles the case when there are multiple values that have the same intensity. When you threshold the image, there will obviously be more than one bright white pixel, so if you want to bring it all together, find the centroid or the average coordinates to determine the centre of all of these white bright pixels. There isn't a need to filter in this particular case. Here's something to go with in MATLAB.
I've read in that image directly, converted to grayscale and cleared off the white border that surrounds each of the images. Next, I split up the image into 5 chunks, threshold the image, find the average coordinates that remain and place a dot on where each centre would be:
im = imread('http://pages.cs.wisc.edu/~csverma/CS766_09/Stereo/callight.jpg');
im = rgb2gray(im);
im = imclearborder(im);
%// Split up images and place into individual cells
split_point = floor(size(im,2) / 5);
images = mat2cell(im, size(im,1), split_point*ones(5,1));
%// Show image to place dots
imshow(im);
hold on;
%// For each image...
for idx = 1 : 5
%// Get image
img = images{idx};
%// Threshold
thresh = img > 200;
%// Find coordinates of thresholded image
[y,x] = find(thresh);
%// Find average
xmean = mean(x);
ymean = mean(y);
%// Place dot at centre
%// Make sure you offset by the right number of columns
plot(xmean + (idx-1)*split_point, ymean, 'r.', 'MarkerSize', 18);
end
I get this:
If you want a Python solution, I recommend using scikit-image combined with numpy and matplotlib for plotting. Here's the above code transcribed in Python. Note that I saved the image referenced by the link manually on disk and named it balls.jpg:
import skimage.io
import skimage.segmentation
import numpy as np
import matplotlib.pyplot as plt
# Read in the image
# Note - intensities are floating point from [0,1]
im = skimage.io.imread('balls.jpg', True)
# Threshold the image first then clear the border
im_clear = skimage.segmentation.clear_border(im > (200.0/255.0))
# Determine where to split up the image
split_point = int(im.shape[1]/5)
# Show image in figure and hold to place dots in
plt.figure()
plt.imshow(np.dstack([im,im,im]))
# For each image...
for idx in range(5):
# Extract sub image
img = im_clear[:,idx*split_point:(idx+1)*split_point]
# Find coordinates of thresholded image
y,x = np.nonzero(img)
# Find average
xmean = x.mean()
ymean = y.mean()
# Plot on figure
plt.plot(xmean + idx*split_point, ymean, 'r.', markersize=14)
# Show image and make sure axis is removed
plt.axis('off')
plt.show()
We get this figure:
Small sidenote
I could have totally skipped the above code and used regionprops (MATLAB link, scikit-image link). You could simply threshold the image, then apply regionprops to find the centroids of each cluster of white pixels, but I figured I'd show you a more manual way so you can appreciate the algorithm and understand it for yourself.
Hope this helps!
Use a 2D convolution and then find the point with the highest intensity. You can apply a concave non-linear function (such as exp) on intensity values before applying the 2d convolution, to intensify the bright spots relative to the dimmer parts of the image. Something like conv2(exp(img),ker)

Categories