I'm trying to write code that takes TEM (Transmission Electron Microscope) TITFF images, and computes the FFT. But I always get plain Red, Green or Blue images.
Here's what the RAW TEM images look like :
Here's what the FFT image should look like :
But instead I get :
Here's my code :
import numpy as np
import diplib as dip
import matplotlib.pyplot as plt
from PIL import Image
from ncempy.io import dm
img1 = dip.ImageReadTIFF('RAW_FFT.tif')
f = np.fft.fft2(img1)
f = np.fft.fftshift(f)
plt.imshow(abs(f))
plt.show()
Do you have any idea what could be the problem? I even tried to convert the image to np.array and do FFT step by step but I get the same result.
FFT is complex and without a logarithm, Fourier transform would be so much brighter than all the other points that everything else will appear black.
see for details: https://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm
import cv2
import numpy as np
img=cv2.imread('inputfolder/yourimage.jpg',0)
def fft_image_inv(image):
f = np.fft.fft2(image)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 15*np.log(np.abs(fshift))
return magnitude_spectrum
fft= fft_image_inv(img)
cv2.imwrite('outputfolder/yourimage.jpg',fft)
output:
There are multiple issues here. First, sometimes grayscale images are written to file as if they were RGB images (in a TIFF file, this could be as simple as storing a grayscale color map, the pixel values will be interpreted as indices into the map, and the loaded image will be an RGB image instead of a grayscale image, even through it has only grayscale colors).
This is the case here. All three channels have exactly the same information, but there are three channels stored, and your FFT will compute the same thing three times!
After loading the image with dip.ImageReadTIFF(), you can use parentheses to index one of the channels:
img1 = dip.ImageReadTIFF('RAW_FFT.tif')
img1 = img1(0)
We now have an actual gray-scale image. This should get rid of the red color in the output.
After computing the FFT, we have a floating-point image with a very high dynamic range (the largest magnitude, at the middle pixel, is 437536704). pyplot, by default, will show floating-point images with 0 and all negative values as black, and 1 and all larger values as white (actual colors depend of course on the color map it uses). So your display will be all white. Use the vmax parameter to imshow to determine the value shown as white. Setting this to 1e6 should give you a similar display as in the GMS software.
Instead of pyplot you can use DIPlib for display. Its interactive viewer will let you use a slider to manually set the grayscale limits, and you can manually select to display the magnitude, as well as choose a logarithmic mapping (which tend to be most useful for displaying the frequency domain).
f = dip.FourierTransform(img)
dip.viewer.ShowModal(f)
Alternatively, you can use a static display, which uses pyplot under the hood:
f.Show((0, 1e6))
or
f.Show('log')
i would like to ask you one question : wanted to implement a code which clarifies a picture done by hand ( by pen), let us consider such image
it is done by blue pen, which should be converted to the gray scale image using following code
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
bw = gray.point(lambda x: 0 if x<100 else 255, '1')
bw.save("bw_image.jpg")
bw
img_array = cv2.imread("bw_image.jpg", cv2.IMREAD_GRAYSCALE)
img_array = cv2.bitwise_not(img_array)
print(img_array.size)
plt.imshow(img_array, cmap = plt.cm.binary)
plt.show()
img_size = 28
new_array = cv2.resize(img_array, (img_size,img_size))
plt.imshow(new_array, cmap = plt.cm.binary)
plt.show()
idea is that i am taking image from camera directly, but it is losing structure of digit and comes only empty and black picture, like this
therefore computer can't understand which digit it is and neural networks fails to predict its label correctly, could you please tell me which transformation should i apply in order to detect this image much more precisely ?
edit :
i have apply following code
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
plt.hist(img_array)
plt.show()
and got
You have several issues here, and you can methodically address them.
First of all you're having an issue with thresholding properly.
As I suggested in earlier comments, you can easily see why your original thresholding was unsuccessful.
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from matplotlib import cm
im = Image.open('whatever_path_you_choose.jpg').convert("L")
im = np.asarray(im)
plt.hist(im.flatten(), bins=np.arange(255));
Looking at the image you gave:
Clearly the threshold should be somewhere between 100-200, not as in your original code. Also note that this distribution isn't very bimodal - so I'm not sure otsu's method would work well here.
If we eyeball it (this can be tuned), we can see that thresholding at 145-ish gives decent results in terms of segmentation.
im_thresh = (im >= 145)
plt.imshow(im_thresh, cmap=cm.gray)
Now you might have an additional issue that you have horizontal lines, you can address this by writing on blank paper as suggested. This wasn't exactly your question but I will try to address it anyways (in a naive fashion). You can try a naive solution of using a sobel filter (think of it as the derivative of the image to get the lines), followed by a median filter to get the approximately most common pixel intensity - the size of the filter might have to vary for different digits though. This should clear up some of the lines. For a more rigorous approach try reading up on hough line transform for detecting horizontal lines and try to whiten them out.
This is my very naive approach:
from skimage.filters import sobel
from scipy.ndimage import median_filter
#Sobel filter reverses intensities so subtracting the result from 1.0 turns it back to the original
plt.imshow(1.0 - median_filter(sobel(im_thresh), [10, 3]), cmap=cm.gray)
You can try cropping automatically afterwards. Honestly I think most neural networks that could recognize MNIST-like digits could recognize the result I posted at the end as well.
Try using skimage package like this. This has inbuilt functions for image processing:
from skimage import io
from skimage.restoration import denoise_tv_chambolle
from skimage.filters import threshold_otsu
image = io.imread('path/to/your/image', as_gray=True)
# Denoising
denoised_image = denoise_tv_chambolle(image, weight=0.1, multichannel=True)
# Thresholding
threshold = threshold_otsu(denoised_image)
thresholded_image = denoised_image > threshold
I have some images and their associated ground truth outlined objects. For example this image shows the outlined objects for one of the original imagesoutlined objects in blue
Given this image and its original source, I would like to create some masks based on these outlines using openCV2 or skimage.
Using Contours I can roughly achieve that, but I have two problems:
1- Why I get repeated masks? (plz refer to the attached snippet)
2- How to overcome the issue of two touching objects
from skimage import io
from skimage import measure
import matplotlib.pyplot as plt
image = io.imread('path/to/the/attached/image')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
contours = measure.find_contours(gray, 0.1)
for n, contour in enumerate(contours):
r_mask = np.zeros_like(gray, dtype='bool')
r_mask[np.round(contour[:, 0]).astype('int'), np.round(contour[:,
1]).astype('int')] = 1
r_mask = ndimage.binary_fill_holes(r_mask)
io.imshow(r_mask)
plt.show()
Thank you
I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.
I have the following code:
import cv2
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from skimage import data
from skimage import filter
from skimage.filter import threshold_otsu
matplotlib.rcParams['font.size'] = 9
nomeimg = 'frame1_depth.png'
i = cv2.imread(nomeimg, -1)
#conversione da 16 a 32 uint
img = np.array(i, dtype=np.uint32)
img *= 65536
print img.dtype
#thresholding con il metodo Otsu
thresh = threshold_otsu(img)
binary = img > thresh
print thresh
plt.figure(1)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(8, 2.5))
ax1.imshow(img)
ax1.set_title('Original')
ax1.axis('off')
ax2.hist(img)
ax2.set_title('Histogram')
ax2.axvline(x=thresh, color='r', linestyle='dashed', linewidth=2)
ax3.imshow(binary, cmap=plt.cm.gray)
ax3.set_title('Thresholded')
ax3.axis('off')
plt.figure(2)
f, ax = plt.subplots(figsize=(8, 2.5))
ax.imshow(binary, cmap=plt.cm.gray)
ax.set_title('Thresholded')
ax.axis('off')
plt.show()
I have a set of depth images from the XBOX kinect, and so after a data type conversion so that I can use some opencv functions which work only with 8 or 32 bit images, I've thresholded the image with the Otsu algorithm and showed the results.
I have obtained a subplot in which I have my original image, the histogram and the thresholded black and white image. Now I want to work only on this black and white image and I want to save it and compute contours, the convex hull and others geometrical features. However, it's just thresholded, but with otsu.
How can I compute this?
If you want to work with findContours and other image processing analysis functions in OpenCV with your binary image, you simply need to take binary and convert it to uint8. Also, make sure you scale your image so that the non-zero values become 255. uint32 will only work under certain modes, and to avoid any unnecessary headaches in remembering which modes from which functions allow you to do this, just stick with uint8
As such, do this:
binary = 255*(binary.astype('uint8'))
Once you do this conversion, you can then call findContours:
contours, hierarchy = cv2.findContours(binary,cv2.RETR_LIST,cv2.CHAIN_APPROX_NONE)
The above is just one way to call it. I would recommend you look at the documentation for more details, which I've linked above.
As another example, if you want to find the convex hull of your thresholded image, which has a bunch of shapes via convexHull, you would need a set of points that represent your contours, which is exactly given by the contours output of cv2.findContours. However, the convexHull function assumes there is only a single object that represents one contour. If you have multiple objects and thus multiple contors, you'll have to iterate over each contour and store the results.
As such, do something like this:
hull = [cv2.convexHull(cnt) for cnt in contours]
Each element in hull will return the coordinates that compose of the convex hull for each contour. As such, to access the coordinates of the convex hull for contour i, you would do:
points = hull[i]
BTW, here are a couple of great links to get you started with OpenCV's shape analysis functions. Here's a link that talks about how to use cv2.findContours in a more general context:
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html
Here's another link that talks about OpenCV's other shape analysis functions, such as convex hull, image moments, etc.:
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_contours/py_contour_features/py_contour_features.html
Have fun!