I'm trying to write code that takes TEM (Transmission Electron Microscope) TITFF images, and computes the FFT. But I always get plain Red, Green or Blue images.
Here's what the RAW TEM images look like :
Here's what the FFT image should look like :
But instead I get :
Here's my code :
import numpy as np
import diplib as dip
import matplotlib.pyplot as plt
from PIL import Image
from ncempy.io import dm
img1 = dip.ImageReadTIFF('RAW_FFT.tif')
f = np.fft.fft2(img1)
f = np.fft.fftshift(f)
plt.imshow(abs(f))
plt.show()
Do you have any idea what could be the problem? I even tried to convert the image to np.array and do FFT step by step but I get the same result.
FFT is complex and without a logarithm, Fourier transform would be so much brighter than all the other points that everything else will appear black.
see for details: https://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm
import cv2
import numpy as np
img=cv2.imread('inputfolder/yourimage.jpg',0)
def fft_image_inv(image):
f = np.fft.fft2(image)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 15*np.log(np.abs(fshift))
return magnitude_spectrum
fft= fft_image_inv(img)
cv2.imwrite('outputfolder/yourimage.jpg',fft)
output:
There are multiple issues here. First, sometimes grayscale images are written to file as if they were RGB images (in a TIFF file, this could be as simple as storing a grayscale color map, the pixel values will be interpreted as indices into the map, and the loaded image will be an RGB image instead of a grayscale image, even through it has only grayscale colors).
This is the case here. All three channels have exactly the same information, but there are three channels stored, and your FFT will compute the same thing three times!
After loading the image with dip.ImageReadTIFF(), you can use parentheses to index one of the channels:
img1 = dip.ImageReadTIFF('RAW_FFT.tif')
img1 = img1(0)
We now have an actual gray-scale image. This should get rid of the red color in the output.
After computing the FFT, we have a floating-point image with a very high dynamic range (the largest magnitude, at the middle pixel, is 437536704). pyplot, by default, will show floating-point images with 0 and all negative values as black, and 1 and all larger values as white (actual colors depend of course on the color map it uses). So your display will be all white. Use the vmax parameter to imshow to determine the value shown as white. Setting this to 1e6 should give you a similar display as in the GMS software.
Instead of pyplot you can use DIPlib for display. Its interactive viewer will let you use a slider to manually set the grayscale limits, and you can manually select to display the magnitude, as well as choose a logarithmic mapping (which tend to be most useful for displaying the frequency domain).
f = dip.FourierTransform(img)
dip.viewer.ShowModal(f)
Alternatively, you can use a static display, which uses pyplot under the hood:
f.Show((0, 1e6))
or
f.Show('log')
Related
I have a 300*500 image. It's is in grayscale and ranges from 0-255. I want to iterate value by value and apply a heat map (say viridis but it doesn't matter) to each value.
My final heatmap image is in Red, Blue, Green and Alpha. I imagine the specific heat map function would take the grayscale values and output three values for each Red, Blue, Green and their appropriate weights.
f(0-255) = weightr(Red), weightb(Blue), weightg(Green).
My ending image would have dimensions (300,500,4) The four channels are r,b,g and an alpha channel.
What is the function that would achieve this? Almost certain it's going to be highly dependent on the specific heat map. Viridis is what I'm after, but I want to understand the concept as well.
The code below reads in a random image (the fact it's from unsplash does not matter) and turns it into a (300,500), 0-255 image called imgarray. I know matplotlib defaults to viridis, but I included the extra step to show what I would like to achieve with my own function.
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
img_src = 'https://unsplash.it/500/300'
response = requests.get(img_src)
imgarray = Image.open(BytesIO(response.content))
imgarray = np.asarray(imgarray.convert('L'))
from matplotlib import cm
print(cm.viridis(imgarray))
plt.imshow(cm.viridis(imgarray))
Matplotlib defines the viridis colormap as 256 RGB colors (one for each 8 bit gray scale value), where each color channel is a floating point value from [0, 1]. The definition can be found on github. The following code demonstrates how matplotlib applies the viridis colormap to a gray scale image.
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib._cm_listed import _viridis_data # the colormap look-up table
import requests
from PIL import Image
from io import BytesIO
img_src = 'https://unsplash.it/id/767/500/300'
response = requests.get(img_src)
imgarray = Image.open(BytesIO(response.content))
imgarray = np.asarray(imgarray.convert('L'))
plt.imshow(cm.viridis(imgarray))
plt.show()
# look-up table: from grayscale to RGB
viridis_lut = np.array(_viridis_data)
print(viridis_lut.shape) # (256, 3)
# convert grayscale to RGB using the LUT
img_viridis = viridis_lut.take(imgarray, axis=0, mode='clip')
plt.imshow(img_viridis)
plt.show()
# add an alpha channel
alpha = np.full(imgarray.shape + (1,), 1.) # shape: (300, 500, 1)
img_viridis_alpha = np.concatenate((img_viridis, alpha), axis=2)
assert (cm.viridis(imgarray) == img_viridis_alpha).all() # are both equal
Produces the following image:
The actual magic happens in the np.take(a, indices) method, which takes values from array a (the viridis LUT) at the given indices (gray scale values from 0..255 from the image). To get the same result as from the cm.viridis function, we just need to add an alpha channel (full of 1. = full opacity).
For reference, the same conversion happens around here in the matplotlib source code.
I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.
I'm trying to stretch an image's histogram using a logarithmic transformation. Basically, I am applying a log operation to each pixel's intensity. When I'm trying to change image's value in each pixel, the new values are not saved but the histogram looks OK. Also, the maximum value is not correct. This is my code:
import cv2
import numpy as np
import math
from matplotlib import pyplot as plt
img = cv2.imread('messi.jpg',0)
img2 = img
for i in range(0,img2.shape[0]-1):
for j in range(0,img2.shape[1]-1):
if (math.log(1+img2[i,j],2)) < 0:
img2[i,j]=0
else:
img2[i,j] = np.int(math.log(1+img2[i,j],2))
print (np.int(math.log(1+img2[i,j],2)))
print (img2.ravel().max())
cv2.imshow('LSP',img2)
cv2.waitKey(0)
fig = plt.gcf()
fig.canvas.set_window_title('LSP histogram')
plt.hist(img2.ravel(),256,[0,256]); plt.show()
img3 = img2
B = np.int(img3.max())
A = np.int(img3.min())
print ("Maximum intensity = ", B)
print ("minimum intensity = ", A)
This is also the histogram I get:
However, the maximum intensity shows 186! This isn't applying the proper logarithmic operation at all.
Any ideas?
The code you wrote performs a logarithmic transformation applied to the image intensities. The reason why you are getting such a high spurious intensity as the maximum is because your for loops are wrong. Specifically, your range is incorrect. range is exclusive of the ending interval, which means that you must go up to img.shape[0] and img.shape[1] respectively, and not img.shape[0]-1 or img.shape[1]-1. Therefore, you are missing the last row and last column of the image, and these don't get touched by logarithmic operation. The maximum that is reported is from one of these pixels in the last row or column that you didn't touch.
Once you correct this, you don't get those bad intensities anymore:
for i in range(0,img2.shape[0]): # Change
for j in range(0,img2.shape[1]): # Change
if (math.log(1+img2[i,j],2)) < 0:
img2[i,j]=0
else:
img2[i,j] = np.int(math.log(1+img2[i,j],2))
Doing that now gives us:
('Maximum intensity = ', 7)
('minimum intensity = ', 0)
However, what you're going to get now is a very dark image. The histogram that you have shown us illustrates that all of the image pixels are in the dark range... roughly between [0-7]. Because of that, the majority of your image is going to be dark if you use uint8 as the data type for visualization. Take note that I searched for the Lionel Messi image that's part of the OpenCV tutorials, and this is the image I found:
Source: https://opencv-python-tutroals.readthedocs.org/en/latest/_images/roi.jpg
Your code is converting this to grayscale, and that's fine for the purpose of your question. Now, using the above image, if you actually show what the histogram count looks like as well as what the intensities are per bin in the histogram, this is what we get for img2:
In [41]: np.unique(img2)
Out[41]: array([0, 1, 2, 3, 4, 5, 6, 7], dtype=uint8)
In [42]: np.bincount(img2.ravel())
Out[42]: array([ 86, 88, 394, 3159, 14841, 29765, 58012, 19655])
As you can see, the bulk of the image pixels are hovering between the [0-7] range, which is why everything looks black. If you want to see this better, perhaps scale the image by roughly 255 / 7 = 36 or so we can see the image better:
img2 = 36*img2
cv2.imshow('LSP',img2)
cv2.waitKey(0)
We get this image:
I also get this histogram:
That personally looks very ugly... at least to me. As such, I would recommend that you choose a more meaningful image transformation if you want to stretch the histogram. In fact, the log operation compresses the dynamic range of the histogram. If you want to stretch the histogram, go the opposite way and try a power-law operation. Specifically, given an input intensity and the output is defined as:
out = c*in^(p)
in is the input intensity, p is a power and c is a constant to ensure that you scale the image so that the maximum intensity gets mapped to the same maximum intensity of the input when you're finished and not anything larger. That can be done by calculating c so that:
c = (img2.max()) / (img2.max()**p)
... where p is the power you want. In addition, the transformation via power-law can be explained with this nice diagram:
Source: http://www.nptel.ac.in/courses/117104069/chapter_8/8_14.html
Basically, powers that are less than 1 perform an intensity expansion where darker intensities get pushed towards the lighter side. Similarly, powers that are greater than 1 perform an intensity compression where lighter intensities get pushed to the darker side. In your case, you want to expand the histogram, and so you want the first option. Specifically, try making the intensities that are smaller go towards the larger range. This can be done by choosing a power that's smaller than 1... try 0.5 for example.
You'd modify your code so that it is like this:
img2 = img2.astype(np.float) # Cast to float
c = (img2.max()) / (img2.max()**(0.5))
for i in range(0,img2.shape[0]-1):
for j in range(0,img2.shape[1]-1):
img2[i,j] = np.int(c*img2[i,j]**(0.5))
# Cast back to uint8 for display
img2 = img2.astype(np.uint8)
Doing that, I get this image:
I also get this histogram:
Minor Note
If I can suggest something in terms of efficiency, I wouldn't recommend that you loop through the entire image and set each pixel individually... that's how numpy arrays were not supposed to be used. You can achieve what you want vectorized in a single line of code.
With your old code, use np.log2, not math.log with the base 2 with numpy arrays:
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Your code
img = cv2.imread('messi.jpg',0)
# New code
img2 = np.log2(1 + img.astype(np.float)).astype(np.uint8)
# Back to your code
img2 = 36*img2 # Edit from before
cv2.imshow('LSP',img2)
cv2.waitKey(0)
fig = plt.gcf()
fig.canvas.set_window_title('LSP histogram')
plt.hist(img2.ravel(),256,[0,256]); plt.show()
img3 = img2
B = np.int(img3.max())
A = np.int(img3.min())
print ("Maximum intensity = ", B)
print ("minimum intensity = ", A)
cv2.destroyAllWindows() # Don't forget this
Similarly, if you want to apply a power-law transformation, it's very simply:
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Your code
img = cv2.imread('messi.jpg',0)
# New code
c = (img2.max()) / (img2.max()**(0.5))
img2 = (c*img.astype(np.float)**(0.5)).astype(np.uint8)
#... rest of code as before
I can only ever find examples in C/C++ and they never seem to map well to the OpenCV API. I'm loading video frames (both from files and from a webcam) and want to reduce them to 16 color, but mapped to a 24-bit RGB color-space (this is what my output requires - a giant LED display).
I read the data like this:
ret, frame = self._vid.read()
image = cv2.cvtColor(frame, cv2.COLOR_RGB2BGRA)
I did find the below python example, but cannot figure out how to map that to the type of output data I need:
import numpy as np
import cv2
img = cv2.imread('home.jpg')
Z = img.reshape((-1,3))
# convert to np.float32
Z = np.float32(Z)
# define criteria, number of clusters(K) and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 8
ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
# Now convert back into uint8, and make original image
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((img.shape))
cv2.imshow('res2',res2)
cv2.waitKey(0)
cv2.destroyAllWindows()
That obviously works for the OpenCV image viewer but trying to do the same errors on my output code since I need an RGB or RGBA format. My output works like this:
for y in range(self.height):
for x in range(self.width):
self._led.set(x,y,tuple(image[y,x][0:3]))
Each color is represented as an (r,g,b) tuple.
Any thoughts on how to make this work?
I think the following could be faster than kmeans, specially with a k = 16.
Convert the color image to gray
Contrast stretch this gray image to so that resulting image gray levels are between 0 and 255 (use normalize with NORM_MINMAX)
Calculate the histogram of this stretched gray image using 16 as the number of bins (calcHist)
Now you can modify these 16 values of the histogram. For example you can sort and assign ranks (say 0 to 15), or assign 16 uniformly distributed values between 0 and 255 (I think these could give you a consistent output for a video)
Backproject this histogram onto the stretched gray image (calcBackProject)
Apply a color-map to this backprojected image (you might want to scale the backprojected image befor applying a colormap using applyColorMap)
Tip for kmeans:
If you are using kmeans for video, you can use the cluster centers from the previous frame as the initial positions in kmeans for the current frame. That way, it'll take less time to converge, so kmeans in the subsequent frames will most probably run faster.
You can speed up your processing by applying the k-means on a downscaled version of your image. This will give you the cluster centroids. You can then quantify each pixel of the original image by picking the closest centroid.
I want to perform gaussian blur on an image but I don't want to be convert to grey scale. Is there anyway to perform this operation and keep the color?
from scipy import misc
import scipy
import numpy as np
a = misc.imread('A.jpg')
# A retains its color
misc.imsave('color.jpg', a)
# A_G_Blur gets converted to grey scale, I want to prevent this
a_g_blure = ndimage.uniform_filter(a, size=11)
# I want it to keep it's color
misc.imsave('now_grey.jpg', a)
a is a 3-d array with shape (M, N, 3). The problem is that ndimage.uniform_filter(a, size=11) applies a filter with length 11 to each dimension of a, include the third axis that holds the color channels. When you apply the filter with length 11 to an axis with length 3, the resulting values are all pretty close to the average of the three values, so you get something pretty close to a gray scale. (Depending on the image, you might have some color left.)
What you actually want is to apply a 2-d filter to each color channel separately. You can do this by giving a tuple as the size argument, using a size of 1 for the last axis:
a_g_blure = ndimage.uniform_filter(a, size=(11, 11, 1))
Note: uniform_filter is not a Gaussian blur. For that, you would use scipy.ndimage.gaussian_filter. You might also be interested in the filters provided by scikit-image. In particular, see skimage.filters.gaussian_filter.
For a gaussian blur, I recommend using skimage.filters.gaussian_filter.
from skimage.io import imread
from skimage.filters import gaussian_filter
sigma=5 # blur radius
img = imread('path/to/img')
# this will only return grayscale
grayscale_blur = gaussian_filter(src_img, sigma=sigma)
# passing multichannel param as True returns colors
color_blur = gaussian_filter(src_img, sigma=sigma, multichannel=True)