I have two images :
Original Image
Binarized Image
I have applied Discrete Cosine Transform to the two images by dividing the 256x256 image into 8x8 blocks. After, I want to compare their DCT Coefficient Distributions.
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import numpy as np
import os.path
import scipy
import statistics
from numpy import pi
from numpy import sin
from numpy import zeros
from numpy import r_
from PIL import Image
from scipy.fftpack import fft, dct
from scipy import signal
from scipy import misc
if __name__ == '__main__':
image_counter = 1
#Opens the noisy image.
noise_image_path = 'noise_images/' + str(image_counter) + '.png'
noise_image = Image.open(noise_image_path)
# Opens the binarize image
ground_truth_image_path = 'ground_truth_noise_patches/' + str(image_counter) + '.png'
ground_truth_image = Image.open( ground_truth_image_path)
#Converts the images into Ndarray
noise_image = np.array(noise_image)
ground_truth_image = np.array(ground_truth_image)
#Create variables `noise_dct_data` and `ground_truth_dct_data` where the DCT coefficients of the two images will be stored.
noise_image_size = noise_image.shape
noise_dct_data = np.zeros(noise_image_size)
ground_truth_image_size = ground_truth_image.shape
ground_truth_dct_data = np.zeros(ground_truth_image_size)
for i in r_[:noise_image_size[0]:8]:
for j in r_[:noise_image_size[1]:8]:
# Apply DCT to the two images every 8x8 block of it.
noise_dct_data[i:(i+8),j:(j+8)] = dct(noise_image[i:(i+8),j:(j+8)])
# Apply DCT to the binarize image every 8x8 block of it.
ground_truth_dct_data[i:(i+8),j:(j+8)] = dct(ground_truth_image[i:(i+8),j:(j+8)])
The above code gets the DCT of the two images. I want to create their DCT Coefficient Distribution just like the image below:
The thing is I dont know how to plot it. Below is what I did:
#Convert 2D array to 1D array
noise_dct_data = noise_dct_data.ravel()
ground_truth_dct_data = ground_truth_dct_data.ravel()
#I just used a Histogram!
n, bins, patches = plt.hist(ground_truth_dct_data, 2000, facecolor='blue', alpha=0.5)
plt.show()
n, bins, patches = plt.hist(noise_dct_data, 2000, facecolor='blue', alpha=0.5)
plt.show()
image_counter = image_counter + 1
My questions are:
What does the X and Y-axis in the figure represents?
Are the value stored in noise_dct_data and ground_truth_dct_data, the DCT coefficients?
Does the Y-axis represents the frequncy of its corresponding DCT coefficients?
Is the histogram appropriate to represent the DCT coefficient distribution.
The DCT coefficients are normally classified into three sub-bands based on their frequencies, namely low, middle and high frequency-bands. What is the threshold value we can use to classify a DCT Coefficient in low, middle or high frequency band? In other words, how can we classify the DCT coefficient frequency bands radially? Below is an example of the radial classification of the DCT coefficient frequency bands.
The idea is based from the paper : Noise Characterization in Ancient Document Images Based on DCT Coefficient Distribution
The plot example you shared looks, to me, like a kernel density plot. A density plot "a variation of a Histogram that uses kernel smoothing to plot values, allowing for smoother distributions by smoothing out the noise." (See https://datavizcatalogue.com/methods/density_plot.html)
The seaborn library, which is built on top of matplotlib, has a kdeplot function, and it can handle two sets of data. Here's a toy example:
import numpy as np
from scipy.fftpack import dct
import seaborn
sample1 = dct(np.random.rand(100))
sample2 = dct(np.random.rand(30))
seaborn.kdeplot(sample1, color="r")
seaborn.kdeplot(sample2, color="b")
Note that rerunning this code will produce a slightly different image, as I'm using randomly generated data.
To answer your numbered questions directly:
1. What do the X- and Y-axes in the figure represent?
In a kdeplot, the X axis represents the density, and the y axis represents the number of observations with those values. Unlike a histogram, it applies a smoothing method to try and estimate a "true" distribution of data behind the noisy observed data.
2. Are the value stored in noise_dct_data and ground_truth_dct_data, the DCT coefficients?
Based on the way you've set up your code, yes, those variables stored the result of the DCT transformations you do.
3. Does the Y-axis represents the frequency of its corresponding DCT coefficients?
Yes, but with smoothing. Analogous to a histogram but not exactly the same.
4. Is the histogram appropriate to represent the DCT coefficient distribution?
It depends on the number of observations but, if you have enough data, a histogram should give you very similar results.
5. The DCT coefficients are normally classified into three sub-bands based on their frequencies, namely low, middle and high frequency-bands. What is the threshold value we can use to classify a DCT Coefficient in low, middle or high frequency band? In other words, how can we classify the DCT coefficient frequency bands radially?
I think this question is possibly too complicated to answer satisfactorily on stack, but my advice here to is try and figure out how the authors of the article did this task. The cited article, "Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain" appears to be talking about a Radial Basis Function (RBF), but this looks like a way of training a supervised model on the frequency data to predict the overall quality of the scan.
Regarding data partitions, they state, "In order to capture directional information from the local image patches, the DCT block is partitioned directionally. ... The upper, middle, and lower partitions correspond to the low-frequency, mid-frequency, and high-frequency DCT subbands, respectively."
I take this to me that, in at least one of their scenarios, the partitions are determined by a Subband DCT. (See https://ieeexplore.ieee.org/document/499836) There appears to be a great deal of literature on these types of approaches.
Related
I am using Python to perform a Fast Fourier Transform on some data. I then need to extract the locations of the peaks in the transform in the form of the x-values. Right now I am using Scipy's fft tool to perform the transform, which seems to be working. However, when i use Scipy's find_peaks I only get the y-values, not the x-position that I need. I also get the warning:
ComplexWarning: Casting complex values to real discards the imaginary part
Is there a better way for me to do this? Here is my code at the moment:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.fft import fft
from scipy.signal import find_peaks
headers = ["X","Y"]
original_data = pd.read_csv("testdata.csv",names=headers)
x = original_data["X"]
y = original_data["Y"]
a = fft(y)
peaks = find_peaks(a)
print(peaks)
plt.plot(x,a)
plt.title("Fast Fourier transform")
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
plt.show()
There seem to be two points of confusion here:
What find_peaks is returning.
How to interpret complex values that the FFT is returning.
I will answer them separately.
Point #1
find_peaks returns the indices in "a" that correspond to peaks, so I believe they ARE values you seek, however you must plot them differently. You can see the first example from the documentation here. But basically "peaks" is the index, or x value, and a[peaks] will be the y value. So to plot all your frequencies, and mark the peaks you could do:
plt.plot(a)
plt.plot(peaks, a[peaks])
Point #2
As for the second point, you should probably read up more on the output of FFTs, this post is a short summary but you may need more background to understand it.
But basically, an FFT will return an array of complex numbers, which contains both phase and magnitude information. What you are currently doing is implicitly only looking at the real part of the solution (hence the warning that the imaginary portion is being discarded), what you probably want instead to take the magnitude of your "a" array, but without more information on your application it is impossible to say.
I tried to put as much details as possible:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.fft import fft, fftfreq
from scipy.signal import find_peaks
# First: Let's generate a dummy dataframe with X,Y
# The signal consists in 3 cosine signals with noise added. We terminate by creating
# a pandas dataframe.
import numpy as np
X=np.arange(start=0,stop=20,step=0.01) # 20 seconds long signal sampled every 0.01[s]
# Signal components given by [frequency, phase shift, Amplitude]
GeneratedSignal=np.array([[5.50, 1.60, 1.0], [10.2, 0.25, 0.5], [18.3, 0.70, 0.2]])
Y=np.zeros(len(X))
# Let's add the components one by one
for P in GeneratedSignal:
Y+=np.cos(2*np.pi*P[0]*X-P[1])*P[2]
# Let's add some gaussian random noise (mu=0, sigma=noise):
noise=0.5
Y+=np.random.randn(len(X))*noise
# Let's build the dataframe:
dummy_data=pd.DataFrame({'X':X,'Y':Y})
print('Dummy dataframe: ')
print(dummy_data.head())
# Figure-1: The dummy data
plt.plot(X,Y)
plt.title('Dummy data')
plt.xlabel('time [s]')
plt.ylabel('Amplitude')
plt.show()
# ----------------------------------------------------
# Processing:
headers = ["X","Y"]
#original_data = pd.read_csv("testdata.csv",names=headers)
# Let's take our dummy data:
original_data = dummy_data
x = np.array(original_data["X"])
y = np.array(original_data["Y"])
# Assuming the time step is constant:
# (otherwise you'll need to resample the data at a constant rate).
dt=x[1]-x[0] # time step of the data
# The fourier transform of y:
yf=fft(y, norm='forward')
# Note: see help(fft) --> norm. I chose 'forward' because it gives the amplitudes we put in.
# Otherwise, by default, yf will be scaled by a factor of n: the number of points
# The frequency scale
n = x.size # The number of points in the data
freq = fftfreq(n, d=dt)
# Let's find the peaks with height_threshold >=0.05
# Note: We use the magnitude (i.e the absolute value) of the Fourier transform
height_threshold=0.05 # We need a threshold.
# peaks_index contains the indices in x that correspond to peaks:
peaks_index, properties = find_peaks(np.abs(yf), height=height_threshold)
# Notes:
# 1) peaks_index does not contain the frequency values but indices
# 2) In this case, properties will contain only one property: 'peak_heights'
# for each element in peaks_index (See help(find_peaks) )
# Let's first output the result to the terminal window:
print('Positions and magnitude of frequency peaks:')
[print("%4.4f \t %3.4f" %(freq[peaks_index[i]], properties['peak_heights'][i])) for i in range(len(peaks_index))]
# Figure-2: The frequencies
plt.plot(freq, np.abs(yf),'-', freq[peaks_index],properties['peak_heights'],'x')
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
plt.show()
The terminal output:
Dummy dataframe:
X Y
0 0.00 0.611829
1 0.01 0.723775
2 0.02 0.768813
3 0.03 0.798328
Positions and magnitude of frequency peaks:
5.5000 0.4980
10.2000 0.2575
18.3000 0.0999
-18.3000 0.0999
-10.2000 0.2575
-5.5000 0.4980
NOTE: Since the signal is real-valued, each frequency component will have a "double" that is negative (this is a property of the Fourier transform). This also explains why the amplitudes are half of those we gave at the beginning. But if, for a particular frequency, we add the amplitudes for the negative and positive components, we get the original amplitude of the real-valued signal.
For further exploration: You can change the length of the signal to 1 [s] (at the beginning of the script):
X=np.arange(start=0,stop=1,step=0.01) # 1 seconds long signal sampled every 0.01[s]
Since the length of the signal is now reduced, the frequencies are less well defined (the peaks have now a width)
So, add: width=0 to the line containing the find_peaks instruction:
peaks_index, properties = find_peaks(np.abs(yf), height=height_threshold, width=0)
Then look at what is contained inside properties:
print(properties)
You'll see that find_peaks gives you much more informations than just
the peaks positions. For more info about what is inside properties:
help(find_peaks)
Figures:
I have a vibration data in time domain and want to convert it to frequency domain with fft. However the plot of the FFT only shows a big spike at zero and nothing else.
This is my vibration data: https://pastebin.com/7RK57kJW
My code:
import numpy as np
import matplotlib.pyplot as plt
t = np.arange(3000)
a1_fft= np.fft.fft(a1, axis=0)
freq = np.fft.fftfreq(t.shape[-1])
plt.plot(freq, a1_fft)
My FFT Plot:
What am I doing wrong here? I am pretty sure my data is uniform, which provoces in other cases a similar problem with fft.
The bins of the FFT correspond to the frequencies at 0, df, 2df, 3df, ..., F-2df, F-df, where df is determined by the number of bins and F is 1 cycle per bin.
Notice the zero frequency at the beginning. This is called the DC offset. It's the mean of your data. In the data that you show, the mean is ~1.32, while the amplitude of the sine wave is around 0.04. It's not surprising that you can't see a peak that's 33x smaller than the DC term.
There are some common ways to visualize the data that help you get around this. One common methods is to keep the DC offset but use a log scale, at least for the y-axis:
plt.semilogy(freq, a1_fft)
OR
plt.loglog(freq, a1_fft)
Another thing you can do is zoom in on the bottom 1/33rd or so of the plot. You can do this manually, or by adjusting the span of the displayed Y-axis:
p = np.abs(a1_fft[1:]).max() * [-1.1, 1.1]
plt.ylim(p)
If you are plotting the absolute values already, use
p = np.abs(a1_fft[1:]).max() * [-0.1, 1.1]
Another method is to remove the DC offset. A more elegant way of doing this than what #J. Schmidt suggests is to simply not display the DC term:
plt.plot(freq[1:], a1_fft[1:])
Or for the positive frequencies only:
n = freq.size
plt.plot(freq[1:n//2], a1_fft[1:n//2])
The cutoff at n // 2 is only approximate. The correct cutoff depends on whether the FFT has an even or odd number of elements. For even numbers, the middle bin actual has energy from both sides of the spectrum and often gets special treatment.
The peak at 0 is the DC-gain, which is very high since you didn't normalize your data. Also, the Fourier transform is a complex number, you should plot the absolute value and phase separately. In this code I also plotted only the positive frequencies:
import numpy as np
import matplotlib.pyplot as plt
#Import data
a1 = np.loadtxt('a1.txt')
plt.plot(a1)
#Normalize a1
a1 -= np.mean(a1)
#Your code
t = np.arange(3000)
a1_fft= np.fft.fft(a1, axis=0)
freq = np.fft.fftfreq(t.shape[-1])
#Only plot positive frequencies
plt.figure()
plt.plot(freq[freq>=0], np.abs(a1_fft)[freq>=0])
I realize there are several articles that demonstrate how to fit a GMM to a 1D Gaussian with sklearn ([1] and [2], to name a few). However, in all of those cases, the data is present as single points where the distribution is Gaussian. In my case, I'm essentially have a frequency table (I'm working with spectroscopic data), where the distribution is Gaussian, but the individual points are unknown.
My distribution (i.e., the data I'm trying to fit) looks like this: 1D Gaussian Peak
I'd like to use GMM to deconvolve the 2 initial Gaussian distributions that make up this peak.
So far, I've tried the following (assume my data is a 200x2 array, with position in one column and AFU on the second) :
import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt
def gengmm(nc=4, n_iter = 2):
g = mixture.GMM(n_components=nc) # number of components
g.init_params = "" # No initialization
g.n_iter = n_iter # iteration of EM method
return g
I tried to see if I could fit this peak to just a single Gaussian:
g = gengmm(1, 100)
g.fit(data)
However, the mean and covariance I get don't define my data particularly well (notably, the mean for that Gaussian distribution is 127.5, which is not what is recovered with a 1 component GMM).
Is there an easier way to do this? (I realize I can just use a least-squares fit to recover the initial Gaussian, but again, I'm trying to ultimately use this to determine the two underlying Gaussians distributions that make up the final one.)
Thanks!
I did a PCA in Python on audio spectrograms and face the following problem: I have a matrix, where each row consists of flattened song features. After applying PCA it's clear to me, that the dimensions are reduced. BUT I can't find those dimensional data in the regular dataset.
import sys
import glob
from scipy.io.wavfile import read
from scipy import signal
from scipy.fftpack import fft
import numpy as np
import matplotlib.pyplot as plt
import pylab
# Read file to get samplerate and numpy array containing the signal
files = glob.glob('../some/*.wav')
song_list = []
for wav in files:
(fs, x) = read(wav)
channels = [
np.array(x[:, 0]),
np.array(x[:, 1])
]
# Combine channels to make a mono signal out of stereo
channel = np.mean(channels, axis=0)
channel = channel[0:1024,]
# Generate spectrogram
## Freqs is the same with different songs, t differs slightly
Pxx, freqs, t, plot = pylab.specgram(
channel,
NFFT=128,
Fs=44100,
detrend=pylab.detrend_none,
window=pylab.window_hanning,
noverlap=int(128 * 0.5))
# Magnitude Spectrum to use
Pxx = Pxx[0:2]
X_flat = Pxx.flatten()
song_list.append(X_flat)
song_matrix = np.vstack(song_list)
If I now apply PCA to the song_matrix...
import matplotlib
from matplotlib.mlab import PCA
from sklearn import decomposition
#test = matplotlib.mlab.PCA(song_matrix.T)
pca = decomposition.PCA(n_components=2)
song_matrix_pca = pca.fit_transform(song_matrix.T)
pca.components_ #These components should be most helpful to discriminate between the songs due to their high variance
pca.components_
...the final 2 components are the following:
Final components - two dimensions from 15 wav-files
The problem is, that I can't find those two vectors in the original dataset with all dimensions What am I doing wrong or am I misinterpreting the whole thing?
PCA doesn't give you the vectors in your dataset.
From Wikipedia :
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.
Say you have a column vector V containing ONE flattened spectrogram. PCA will find a matrix M whose columns are orthogonal vectors (think of them as being at right angles to every other column in M).
Multiplying M and T will give you a vector of "scores", which can be used to determine how much variance each column of M captures from the original data and each column of M captures progressively less variance in the data.
Multiplying matrix M' (the first 2 columns of M) by V will produce a 2x1 vector T' representing the "dimension-reduced spectrogram". You could reconstruct an approximation of V by multiplying T' by the inverse of M'. This would work if you had a matrix of spectrograms, too. Keeping only two principal components would produce an extremely lossy compression of your data.
But what if you want to add a new song to your dataset? Unless it is very much like the original song (meaning it introduces little variance to the original data set), there's no reason to think that the vectors of M will describe the new song well. For that matter, even multiplying all the elements of V by a constant would render M useless. PCA is quite data specific. Which is why it's not used in image/audio compression.
The good news? You can use a Discrete Cosine transform to compress your training data. Instead of lines, it finds cosines that form a descriptive basis, and doesn't suffer from the data specific limitation. DCT is used in jpeg, mp3 and other compression schemes.
I have a question concerning fitting and getting random numbers.
Situation is as such:
Firstly I have a histogram from data points.
import numpy as np
"""create random data points """
mu = 10
sigma = 5
n = 1000
datapoints = np.random.normal(mu,sigma,n)
""" create normalized histrogram of the data """
bins = np.linspace(0,20,21)
H, bins = np.histogram(data,bins,density=True)
I would like to interpret this histogram as probability density function (with e.g. 2 free parameters) so that I can use it to produce random numbers AND also I would like to use that function to fit another histogram.
Thanks for your help
You can use a cumulative density function to generate random numbers from an arbitrary distribution, as described here.
Using a histogram to produce a smooth cumulative density function is not entirely trivial; you can use interpolation for example scipy.interpolate.interp1d() for values in between the centers of your bins and that will work fine for a histogram with a reasonably large number of bins and items. However you have to decide on the form of the tails of the probability function, ie for values less than the smallest bin or greater than the largest bin. You could give your distribution gaussian tails based on for example fitting a gaussian to your histogram), or any other form of tail appropriate to your problem, or simply truncate the distribution.
Example:
import numpy
import scipy.interpolate
import random
import matplotlib.pyplot as pyplot
# create some normally distributed values and make a histogram
a = numpy.random.normal(size=10000)
counts, bins = numpy.histogram(a, bins=100, density=True)
cum_counts = numpy.cumsum(counts)
bin_widths = (bins[1:] - bins[:-1])
# generate more values with same distribution
x = cum_counts*bin_widths
y = bins[1:]
inverse_density_function = scipy.interpolate.interp1d(x, y)
b = numpy.zeros(10000)
for i in range(len( b )):
u = random.uniform( x[0], x[-1] )
b[i] = inverse_density_function( u )
# plot both
pyplot.hist(a, 100)
pyplot.hist(b, 100)
pyplot.show()
This doesn't handle tails, and it could handle bin edges better, but it would get you started on using a histogram to generate more values with the same distribution.
P.S. You could also try to fit a specific known distribution described by a few values (which I think is what you had mentioned in the question) but the above non-parametric approach is more general-purpose.