I have a graycale noisy image. I want to apply PCA for noise reduction and see the output after the application.
Here's what I tried to do:
[in]:
from sklearn.datasets import load_sample_image
from sklearn.feature_extraction import image
from sklearn.decomposition import PCA
# Create patches of size 25 by 25 and create a matrix from all patches
patches = image.extract_patches_2d(grayscale_image, (25, 25), random_state = 42)
print(patches.shape)
# reshape patches because I got an error when applying fit_transform(ValueError: FoundValueError: Found array with dim 3. Estimator expected <= 2.)
patches_reshaped = patches.reshape(2,-1)
#apply PCA
pca = PCA()
projected = pca.fit_transform(patches_reshaped.data)
denoised_image = pca.inverse_transform(projected)
imshow(denoised_image)
[out]:
(source: imggmi.com)
I get an array as a result. How to see the de-noised image?
In order to see your de-noised image, you need to convert your data which is represented in some low-dimension using the principal components back to the original space. To do that you can use the inverse_transform() function. As you can see from the documentation here, this function will accept the projected data and return an array like the original image. So you can do something like,
denoised_image = pca.inverse_transform(projected)
# then view denoised_image
Edit:
Here are some of the issues to look in to:
You have 53824 patches from your original image with sizes (25,25). In order to reshape you data and pass it to PCA, as you can see from the documentation here, you need to pass an array of size (n_samples, n_features). Your number of samples is 53824. So patches reshaped should be:
patches_reshaped = patches.reshape(patches.shape[0],-1)
# this should return a (53824, 625) shaped data
Now you use this reshaped data and transform it using PCA and inverse transform to get the data in your original domain. After doing that, your denoised_image is a set of reconstructed patches. You will need to combine these patches to get an image using the function image.reconstruct_from_patches_2d here is the documentation. So you can do something like,
denoised_image = image.reconstruct_from_patches_2d(denoised_image.reshape(-1,25,25), grayscale_image.shape)
Now you can view the denoised_image, which should look like grayscale_image.
Related
I've just started to learn about images frecuency domain.
I have this function:
def fourier_transform(img):
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))
return magnitude_spectrum
And I want to implement this function:
def inverse_fourier_transform(magnitude_spectrum):
return img
But I don't know how.
My idea is to use magnitude_spectrum to get the original img.
How can I do it?
You are loosing phases here: np.abs(fshift).
np.abs takes only real part of your data. You could separate the amplitudes and phases by:
abs = fshift.real
ph = fshift.imag
In theory, you could work on abs and join them later together with phases and reverse FFT by np.fft.ifft2.
EDIT:
You could try this approach:
import numpy as np
import matplotlib.pyplot as plt
# single chanel image
img = np.random.random((100, 100))
img = plt.imread(r'path/to/color/img.jpg')[:,:,0]
# should be only width and height
print(img.shape)
# do the 2D fourier transform
fft_img = np.fft.fft2(img)
# shift FFT to the center
fft_img_shift = np.fft.fftshift(fft_img)
# extract real and phases
real = fft_img_shift.real
phases = fft_img_shift.imag
# modify real part, put your modification here
real_mod = real/3
# create an empty complex array with the shape of the input image
fft_img_shift_mod = np.empty(real.shape, dtype=complex)
# insert real and phases to the new file
fft_img_shift_mod.real = real_mod
fft_img_shift_mod.imag = phases
# reverse shift
fft_img_mod = np.fft.ifftshift(fft_img_shift_mod)
# reverse the 2D fourier transform
img_mod = np.fft.ifft2(fft_img_mod)
# using np.abs gives the scalar value of the complex number
# with img_mod.real gives only real part. Not sure which is proper
img_mod = np.abs(img_mod)
# show differences
plt.subplot(121)
plt.imshow(img, cmap='gray')
plt.subplot(122)
plt.imshow(img_mod, cmap='gray')
plt.show()
You cannot recover the exact original image without the phase information, so you cannot only use the magnitude of the fft2.
To use the fft2 to recover the image, you just need to call numpy.fft.ifft2. See the code below:
import numpy as np
from numpy.fft import fft2, ifft2, fftshift, ifftshift
#do the 2D fourier transform
fft_img = fftshift(fft2(img))
# reverse the 2D fourier transform
freq_filt_img = ifft2(ifftshift(fft_img))
freq_filt_img = np.abs(freq_filt_img)
freq_filt_img = freq_filt_img.astype(np.uint8)
Note that calling fftshift and ifftshift is not necessary if you just want to recover the original image directly, but I added them in case there is some plotting to be done in the middle or some other operation that requires the centering of the zero frequency.
The result of calling numpy.abs() or freq_filt_img.real (assuming positive values for each pixel) to recover the image should be the same because the imaginary part of the ifft2 should be really small. Of course, the complexity of numpy.abs() is O(n) while freq_filt_img.real is O(1)
I have a question on the resampling 2-d array.
Sometimes, the original size of the geoscience data should be transformed to other size. If the ratio for each axis is equal, the task is simple, in which np.reshape allow a 2-d array of 100x100 to 50x50 without data loss. The code is shown as:
## creat a original data
xc1, xc2, yc1, yc2 = 100, 110, 35, 45
XSIZE,YSIZE=100,100
lon,lat = np.linspace(xc1,xc2,XSIZE),np.linspace(yc1,yc2,YSIZE)
pop = np.random.uniform(low=1000, high=50000, size=(XSIZE*YSIZE,)).reshape(YSIZE,XSIZE)
## reshape
shape = np.array(pop.shape, dtype=float)
coarseness = 2 # the new shape is in 50 x 50
new_shape = coarseness * np.ceil(shape/coarseness).astype(int)
zp_pop = np.zeros(new_shape)
zp_pop[:int(shape[0]), :int(shape[1])] = pop
temp = zp_pop.reshape((new_shape[0] // coarseness, coarseness,
new_shape[1] // coarseness, coarseness))
coarse_pop = np.sum(temp, axis=(1,3))
print (pop.sum())
print (coarse_pop.sum())
However, when the coarse factor is different for each axis, this method can not be implemented. I turned to apply other method. Here is an example I tried to use FFT to generate a 60*80 array as output
from scipy import fftpack
pop_fft = fftpack.fft2(pop,shape = (60,80))
pop_res = fftpack.ifft2(pop_fft).real
print(pop.sum())
print(pop_res.sum())
254208134.8356425
122048754.13639387
The data loss was significant. Thus, I posted my issue here. Maybe the resampling function I used was not correct. Or there are some better approach to deal with this situation. Any advices or comments are highly appreciated!
When you set up the 'coarse array' yourself you sum over adjacent entries, instead of computing the average or interpolating.
This way the sum over all elements in the coarse and original array are identical str((coarse_pop.sum()-pop.sum())/(0.5*(pop.sum()+coarse_pop.sum()))) gives '-1.1638426077573779e-16' only a tiny numerical error.
if you compare the mean of the fftpack resampled coarse array it matches up:
print(pop.mean())
print(pop_res.mean())
25606.832220313503
25496.03271480075
alternatively you can correct for the number of elements yourself:
print(pop.sum())
print(pop_res.sum()*100*100/(60*80))
256068322.20313504
254960327.14800745
I don't know about your problem but the fftpack way of downsampling the array makes more sense to me. if it's not what you want you can apply the prefactor to the original array, like pop_fft = fftpack.fft2(pop*100*100/(60*80),shape = (60,80))
I want to evaluate if an event is happening in my screen, every time it happens a particular box/image shows up in a screen region with very similar structure.
I have collected a bunch of 84x94 .png RGB images from that screen region and I'd like to build a classifier to tell me if the event is happening or not.
Therefore my idea was to create a pd.DataFrame (df) containing 2 columns, df['np_array'] contains every picture as a np.array and df['is_category'] contains boolean values telling if that image is indicating that the event is happening or not.
The structure looks like this (with != size):
I have resized the images to 10x10 for training and converted to greyscale
df = pd.DataFrame(
{'np_array': [np.random.random((10, 10,2)) for x in range(0,10)],
'is_category': [bool(random.getrandbits(1)) for x in range(0,10)]
})
My problem is that I can't fit a scikit learn classifier by doing clf.fit(df['np_array'],df['is_category'])
I've never tried image recognition before, thanks upfront for any help!
If its a 10x10 grayscale image, you can flatten it:
import numpy as np
from sklearn import ensemble
# generate random 2d arrays
image_data = np.random.rand(10,10, 100)
# generate random labels
labels = np.random.randint(0,2, 100)
X = image_data.reshape(100, -1)
# then use any scikit-learn classification model
clf = ensemble.RandomForestClassifier()
clf.fit(X, y)
By the way, for images the best performing algorithms are convolutional neural networks.
I am trying to train an image classifier in scikit-learn. I have a bunch of input images and I am using Pillow to process them. My question is about what shape to give the Pillow data to scikit-learn.
This is my code now:
training = glob.glob('./img/training/*/*.bmp')
data = []
classes = []
for imagefile in training:
edges = Image.open(imagefile).filter(ImageFilter.FIND_EDGES).convert("L")
in_data = np.asarray(edges, dtype=np.uint8)
data.append(in_data[0])
if 'class1' in imagefile:
classes.append('class1')
else:
classes.append('class2')
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(data, classes)
This runs without errors, but I have put the code together fairly crudely and I am not sure it is correct.
In particular, I'm not sure whether I should be using in_data[0]. I just did this because using in_data gives me an error: ValueError: Found array with dim 3. Estimator expected <= 2.
Unless you want the first row of the image matrix ( in_data[0] returns you the first row ) of each image, you probably want to use flattening.
Flattening will take each row of the image matrix and put the rows behind eachother in a 1 dimensional vector.
So it becomes data.append(in_data.flatten())
You could resize your image to a smaller format first, to reduce the number of columns of your data matrix.
I am trying to convert stft of a wav file into chromagram.
Here's my code :-
def stft(x,fs,framesize,hopsize):
frame = int(framesize*fs)
hop = int(hopsize*fs)
w = scipy.hamming(frame)
X = scipy.array([scipy.fft(w*x[i:i+frame])])
for i in range(0,len(x)-frame,hop)
return X
Here's the code for chromagram :-
def chromagram(x,fs,framesize,hopsize):
X = stft(x,fs,framesize,hopsize)
chroma = np.fmod(np.round(np.log2(X / 440) * 12), 12)
return chroma
When I calculate fft I get an array with complex values so I have to cast the result to float before calculating chroma. Am I doing anything wrong here?
Also, How do I plot the result?
I don't think, that works the way to do it. In X you have the complex-valued STFT. You can get its magnitude values with np.abs(X). Did you want to apply this formula? This was to convert frequencies to musical notes, but in X there are no frequencies. You can get the the corresponding frequencies with np.fft.fftfreq(framesize, 1.0/fs).
If you don't want to use the Bregman Audio-Visual Information Toolbox for Chroma Features, and want to implement them for you own, you could port the Matlab Chroma Toolbox. I think they use filterbanks instead of the FFT. Down on this page you find references where Chroma Features are explained in detail.
Anyway, if you have Chroma Features, you can plot them like any 2-dimensional array with imshow.
from matplotlib import pyplot as plt
import numpy as np
X = np.random.random((30, 30))
plt.imshow(X)
plt.show()