Interpret numpy.fft.fft2 output - python

My goal is to obtain a plot with the spatial frequencies of an image - kind of like doing a fourier transformation on it. I don't care about the position on the image of features with the frequency f (for instance); I'd just like to have a graphic which tells me how much of every frequency I have (the amplitude for a frequency band could be represented by the sum of contrasts with that frequency).
I am trying to do this via the numpy.fft.fft2 function.
Here is a link to a minimal example portraying my use case.
As it turns out I only get distinctly larger values for frequencies[:30,:30], and of these the absolute highest value is frequencies[0,0]. How can I interpret this?
What exactly does the amplitude of each value stand for?
What does it mean that my highest value is in frequency[0,0] What is a 0 Hz frequency?
Can I bin the values somehow so that my frequency spectrum is orientation agnostic?

freq has a few very large values, and lots of small values. You can see that by plotting
plt.hist(freq.ravel(), bins=100)
(See below.) So, when you use
ax1.imshow(freq, interpolation="none")
Matplotlib uses freq.min() as the lowest value in the color range (which is by default colored blue), and freq.max() as the highest value in the color range (which is by default colored red). Since almost all the values in freq are near the blue end, the plot as a whole looks blue.
You can get a more informative plot by rescaling the values in freq so that the low values are more widely distributed on the color range.
For example, you can get a better distribution of values by taking the log of freq. (You probably don't want to throw away the highest values, since they correspond to frequencies with the highest power.)
import matplotlib as ml
import matplotlib.pyplot as plt
import numpy as np
import Image
file_path = "data"
image = np.asarray(Image.open(file_path).convert('L'))
freq = np.fft.fft2(image)
freq = np.abs(freq)
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(14, 6))
ax[0,0].hist(freq.ravel(), bins=100)
ax[0,0].set_title('hist(freq)')
ax[0,1].hist(np.log(freq).ravel(), bins=100)
ax[0,1].set_title('hist(log(freq))')
ax[1,0].imshow(np.log(freq), interpolation="none")
ax[1,0].set_title('log(freq)')
ax[1,1].imshow(image, interpolation="none")
plt.show()
From the docs:
The output, analogously to fft, contains the term for zero frequency
in the low-order corner of the transformed axes,
Thus, freq[0,0] is the "zero frequency" term. In other words, it is the constant term in the discrete Fourier Transform.

Related

Scipy.stats.gaussian_kde gives a pdf that is outside the range (0,1) [duplicate]

Sometimes when I create a histogram, using say seaborn's displot function, with norm_hist = True, the y-axis is less than 1 as expected for a PDF. Other times it takes on values greater than one.
For example if I run
sns.set();
x = np.random.randn(10000)
ax = sns.distplot(x)
Then the y-axis on the histogram goes from 0.0 to 0.4 as expected, but if the data is not normal the y-axis can be as large as 30 even if norm_hist = True.
What am I missing about the normalization arguments for histogram functions, e.g. norm_hist for sns.distplot? Even if I normalize the data myself by creating a new variable thus:
new_var = data/sum(data)
so that the data sums to 1, the y-axis will still show values far larger than 1 (like 30 for example) whether the norm_hist argument is True or not.
What interpretation can I give when the y-axis has such a large range?
I think what is happening is my data is concentrated closely around zero so in order for the data to have an area equal to 1 (under the kde for example) the height of the histogram has to be larger than 1...but since probabilities can't be above 1 what does the result mean?
Also, how can I get these functions to show probability on the y-axis?
The rule isn't that all the bars should sum to one. The rule is that all the areas of all the bars should sum to one. When the bars are very narrow, their sum can be quite large although their areas sum to one. The height of a bar times its width is the probability that a value would all in that range. To have the height being equal to the probability, you need bars of width one.
Here is an example to illustrate what's going on.
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
fig, axs = plt.subplots(ncols=2, figsize=(14, 3))
a = np.random.normal(0, 0.01, 100000)
sns.distplot(a, bins=np.arange(-0.04, 0.04, 0.001), ax=axs[0])
axs[0].set_title('Measuring in meters')
axs[0].containers[0][40].set_color('r')
a *= 1000
sns.distplot(a, bins=np.arange(-40, 40, 1), ax=axs[1])
axs[1].set_title('Measuring in milimeters')
axs[1].containers[0][40].set_color('r')
plt.show()
The plot at the left uses bins of 0.001 meter wide. The highest bin (in red) is about 40 high. The probability that a value falls into that bin is 40*0.001 = 0.04.
The plot at the right uses exactly the same data, but measures in milimeter. Now the bins are 1 mm wide. The highest bin is about 0.04 high. The probability that a value falls into that bin is also 0.04, because of the bin width of 1.
PS: As an example of a distribution for which the probability density function has zones larger than 1, see the Pareto distribution with α = 3.

Cumulative histogram for 2D data in Python

My data consists of a 2-D array of masses and distances. I want to produce a plot where the x-axis is distance and the y axis is the number of data elements with distance <= x (i.e. a cumulative histogram plot). What is the most efficient way to do this with Python?
PS: the masses are irrelevant since I already have filtered by mass, so all I am trying to produce is a plot using the distance data.
Example plot below:
You can combine numpy.cumsum() and plt.step():
import matplotlib.pyplot as plt
import numpy as np
N = 15
distances = np.random.uniform(1, 4, 15).cumsum()
counts = np.random.uniform(0.5, 3, 15)
plt.step(distances, counts.cumsum())
plt.show()
Alternatively, plt.bar can be used to draw a histogram, with the widths defined by the difference between successive distances. Optionally, an extra distance needs to be appended to give the last bar a width.
plt.bar(distances, counts.cumsum(), width=np.diff(distances, append=distances[-1]+1), align='edge')
plt.autoscale(enable=True, axis='x', tight=True) # make x-axis tight
Instead of appending a value, e.g. a zero could be prepended, depending on the exact interpretation of the data.
plt.bar(distances, counts.cumsum(), width=-np.diff(distances, prepend=0), align='edge')
This is what I figured I can do given a 1D array of data:
plt.figure()
counts = np.ones(len(data))
plt.step(np.sort(data), counts.cumsum())
plt.show()
This apparently works with duplicate elements also, as the ys will be added for each x.

Python plt.contour colorbar

I am trying to do a plot of a seismic wave using plt.contour.
I have 3 arrays:
time (x-axis)
frequency (y-axis)
amplitude (z-axis)
This is my results so far:
The problem is that I want to change the scaling of the colorbar: making a gradation and not having this white color when the amplitude is low. But I am not able to do so, even though I spent a lot of time browsing the doc.
I read that plt.pcolormesh is not appropriate here (it is just working here because I am in a special case), but this what I want to get regarding to the colours and colorbar:
This is the code I wrote:
T = len(time[0])*(time[0][1] - time[0][0]) # multiply ampFFT with T to offset
Z = abs(ampFFT)*(T) # abbreviation
# freq = frequency, ampFFT = Fast Fourier Transform of the amplitude of the wave
# freq, amFFT and time have same dimensions: 40 x 1418 (40 steps of time discretization x steps to have the total time. 2D because it is easier to use)
maxFreq = abs(freq).max() # maxium frequency for plot boundaries
maxAmpFFT = abs(Z).max()/2 # maxium ampFFT for plot boundaries of colorbar divided by 2 to scale better with the colors
minAmpFFT = abs(Z).min()
plt.figure(1)
plt.contour(time, freq, Z, vmin=minAmpFFT, vmax=maxAmpFFT)
plt.colorbar()
plt.ylim(0,maxFreq) # 0 to remove the negative frequencies useless here
plt.title("Amplitude intensity regarding to time and frequency")
plt.xlabel('time (in secondes)')
plt.ylabel('frequency (in Hz)')
plt.show()
Thank you for your attention!
NB : In case you were wondering about plt.pcolormesh: the plot is completely messed up when I choose to increase the time discretization (here I split the time in 40, but when I split the time in 1000 the plot is not correct, and I want to be able to split the time in smaller pieces).
EDIT: When I use plt.contourf instead of plt.contour I got this plot:
Which is not really convincing either. I understand why the yellow colour takes so much space (it is because I set a low vmax), but I don't understand why there is still white colour in my plot.
EDIT 2: My teacher plotted my data, and I have the correct data. The only problem that is left is the white background in my plot (and the deep blue on left and right border for nor apparent reason when I use plt.contourf). Despite those problems, the highest amplitude is located around 0.5 Hz, which is in agreement with the work of my teacher.
He used gnuplot, but since I don't know gnuplot, I prefer to use python.
Solution/Workaround I found
Here is what I did to display my data like countourf does, but without the display problems:
Explanation: for the surface, I took abs(freq) instead of just freq because I have negative frequencies.
It is because that when calculating the frequency of a FFT, you have a frequency that repeat itself a 2nd time like this:
You have 2 way of obtaining this frequency:
- the frequency is positive, this array is 2 x Nyquist frequency (so if you divide the array by 2, you have all your wave, and it doesn't repeat itself).
- the frequency starts negative and go to positive, this array also is 2 x Nyquist frequency (so if you remove the negative value you have all your wave, and it doesn't repeat itself).
Python fft.fftfreq use the 2nd option. plot_surface doesn't work well with removing the data of an array (for me it was still displayed). So I made the frequency value absolute and the problem disappeared.
fig = plt.figure(1, figsize=(18,15)) # figsize: increase plot size
ax = fig.gca(projection='3d')
surf = ax.plot_surface(time, abs(freq), Z, rstride=1, cstride=1, cmap=cm.magma, linewidth=0, antialiased=False, vmin=minAmpFFT, vmax=maxAmpFFT)
ax.set_zlim(0, maxAmpFFT)
ax.set_ylim(0, maxFreq)
ax.view_init(azim=90, elev=90) # change view to top view, with axis in the right direction
plt.title("Amplitude intensity (m/Hz^0.5) regarding to time and frequency")
plt.xlabel('x : time (in secondes)')
plt.ylabel('y : frequency (in Hz)')
# ax.yaxis._set_scale('log') # should be in log, but does not work
plt.gca().invert_xaxis() # invert x axis !! MUST BE AFTER X,Y,Z LIM
plt.gca().invert_yaxis() # invert y axis !! MUST BE AFTER X,Y,Z LIM
plt.colorbar(surf)
fig.tight_layout()
plt.show()
This is the plot I got:

Plot 2 histograms with different length of data points in one graph using matplotlib

I have two set of data with one containing around 11 million data points and the another around 5000. I would like to plot them both on one histogram. But because of the difference in size I need to normalise the frequency so I can plot them on the same figure. Below I have simulated what I have done with my data to be able to plot them. I have used the normed=True.
from numpy.random import randn
import matplotlib.pyplot as plt
import random
datalist1=[]
for x in range(1,50000):
datalist1.append(random.uniform(1,2))
datalist2=randn(5000000)
fig= plt.figure(1)
plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
plt.xlabel("Value")
plt.ylabel("Normalised Frequency")
plt.legend()
plt.show()
Can you please tell me if this is a good way to get around this issue? I would like to match the tallest hight between the two histogram frequencies to be 1 (or 100%).
The normed=True setting normalizes the histogram to an area of 1. That gives the histogram an interpretation as estimates of probability density functions.
In short, it actually makes sense not to normalize on the peak but on the area.
But if you really want to normalize by height you can modify the polygon data of the histogram:
h = plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()
h = plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()
This solution feels a bit hackish, but at least it's quick and dirty :)

Different luminance of Python imshow with transposed data

This might be a trivial question.
I store a series of spectrum with 1025 frequency bins into a list, using which I want to plot by imshow. The data is a list having 345 entries representing the number of time frames, each of which has 1025 dimensions representing frequency bins. The normal or conventional way to display the spectrogram is having x-axis as time frame and y-axis as frequency bin.
My attempts are as follows:
imshow(X, aspect='auto');show()
imshow(np.array(X), aspect='auto');show() # Seems to be same as the first one.
# The correct display would be x-axis as time and y-axis as frequency bin,
# and the y-axis should be ordered from lower to upper.
imshow(np.array(X).T, aspect='auto', origin='lower');show()
However, the third plot seems to have dimmer luminance and it could be the issue of normalization. How does imshow behave differently with transposed data?
EDIT:
Trying to specify the figure size at the first place
plt.figure(figsize=(7,5))
imshow(np.array(X).T, aspect='auto', origin='lower')
Though the figure size alters the luminance of the image, the relative magnitude of each component in y-axis doesn't look to be the same for me when compared to the former image, i.e., the first and second one. How exactly does imshow adjust the luminance with transposed data or different orientation?

Categories