i have a dataframe with 120 000 rows and i want to plot two features of it over the whole time of 120 000 seconds. It is the Y of my Regression and my prediction of the Random Forest-
I am doing it with this code:
laenge_xcheck = list(range(0,len(X_gute_vorhersage),1))
X_gute_vorhersage['Zeit_artificial'] = laenge_xcheck
from numpy import *
import math
import matplotlib.pyplot as plt
plt.figure(figsize=(910,50))
plt.plot(X_gute_vorhersage.Zeit_artificial, X_gute_vorhersage.Y_test, 'r') # plotting t, a separately
plt.plot(X_gute_vorhersage.Zeit_artificial, X_gute_vorhersage.predicitons_rf, 'b') # plotting t, b separately
#plt.savefig('test1.png')
plt.show()
my goal is to get a picture where i can zoom in with a high resolution, higher than 910,50 - because it is not possible to see a lot with this. I would like to have a really long x axis and a short y and then be able to zoom in. I get it already in this shape:
Is there any possibility to get this in a resolution where i can zoom in and see it in a good resolution. It doesn't matter if the picture gets 200MB. Is there any way to go beyond the size of 910,50 without getting this error message:
ValueError: Image size of 72000x3600 pixels is too large. It must be less than 2^16 in each direction.
Thank you,
R
Related
I'm new to Python and signal processing, and I'm having a problem with FFT.
I'm supposed to analyze a set of data and find the modulation frequencies from it. I wrote a basic FFT script to do this, and the output looked kinda weird. It does show the peaks like a normal FFT graph. However, for each line it has a horizontal line that connects the two ends, instead of the ends spreading out.
I would like to ask what might be the problem here.
This is my script:
from numpy.fft import fft, fftfreq
import matplotlib.pyplot as plt
Nsample = len(data)
n=96 #for zero-padded
window = np.hanning(Nsample/2) #Hann window
data1 = data[(data['condition']=='a')]
data2 = data[(data['condition']=='b')]
#Apply Hann window then do FFT
data1_Hann = data1['val']*window
data1_FFT = fft(data1_Hann, n)
freq1 = fftfreq(n, d=0.026)
data2_Hann = data2['val']*window
data2_FFT = fft(data2_Hann, n)
freq2 = fftfreq(n, d=0.026)
plt.plot(freq1, np.abs(data1_FFT), freq2, np.abs(data2_FFT))
plt.xlabel("Frequency (Hz)")
plt.ylabel("Amplitude")
data is a DataFrame containing values of 2 different conditions shuffled together, that's why I separate them and the Hanning window is applied for half the initial sample number. For each condition there are only 12 values, so I do zero padding so as to make the peaks appear clearly and the graph look smoother (I did try with smaller number for zero padding, but the line still remain).
Answer: The graph look like that because of the order of the fft calculation output: it starts with 0 Hz (more details presented here: https://numpy.org/doc/stable/reference/generated/numpy.fft.fftfreq.html)
I fixed this by using numpy.fft.fftshift() function (more details presented here: https://numpy.org/doc/stable/reference/generated/numpy.fft.fftshift.html).
Solution has been suggested by #gavinb and #yanziselman.
I've been getting really confused with FFT in python. What I'm trying to do is plot the FFT of the note number 61 (or middle C#). Here is the code that I tried to use which I found here using this wav file. After running that code, I got this output after zooming in a bit.
I think that this is completely wrong due to the fact that after looking online, the note number 61 has a frequency of 277.2hz. This means that there should be a peak around that value right? But to me, it seems like that the values are completely off. This is the code that I'm running right now to get the plot.
import matplotlib.pyplot as plt
from scipy.fftpack import fft
from scipy.io import wavfile # get the api
fs, data = wavfile.read("MAPS_ISOL_NO_P_S0_M61_AkPnBsdf.wav") # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
c = fft(b) # create a list of complex number
d = len(c)/2 # you only need half of the fft list
plt.plot(abs(c[:(d-1)]),'r')
plt.xlabel('Frequency')
plt.ylabel('Magnitude')
plt.show()
I'm also not sure if I have the axis labeled correctly for the x and y axis as I believe each entry in the array is a bin of size Fs / N where Fs is the sample rate and N is the size of the FFT? I'm just really confused and overwhelmed after looking online for weeks about all this. Thanks for any help!
I have a 2D numpy array containing X and Y data. The axis X contain time information with resolution of nano seconds. My problem occours because I need to compare simulated signal and a real signal. The problem of the simulated signal is that the simulator, with optimization purposes, has a diferent step sizes, as show on fig. 1.
In other hand my real data was acquired by an osciloscope and your data has exaclty 1 ns of diference between each point recorded. Because of this I need to have the same scale in the X axis to make a correct comparasion. How can I get the extra points to make my data with a constant step between the points?
EDIT 1
I need that this new points fill my array to make the simulated data with constant step, like show in fig 2.
The green points show an example of data extracted from extrapolated data.
A common way to do this is to simply duplicate some points (adding a point with same average value doesn't modify much most of statistical values)
So you have to change the dataset everytime you change the scale. Takes lots of time every scale change but it is super easy. If you don't have to change the scale too much, you can try.
This problem was solved using scipy interpolate module. Eg.
interpolate.py
import matplotlib.pyplot as plt
from scipy import interpolate as inter
import numpy as np
Fs = 0.1
f = 0.01
sample = 10
x = np.arange(sample)
y = np.sin(2 * np.pi * f * x / Fs)
inte = inter.interp1d(x,y)
new_x = np.arange(0,9,0.1)
new_y = inte(new_x)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(new_x,new_y,s=5,marker='.')
ax1.scatter(x,y,s=50,marker='*')
plt.show()
This code give the following result.
I am trying to plot a curve from molecular dynamics potential energies data stored in numpy array. As you can see from my figure attached, on the top left of the figure, a large number appears which is related to the label on y-axis. Look at it.
Even if I rescale the data, still a number appears there. I do not want it. Please can you suggest me howto sort out this issue? Thank you very much..
This is likely happening because your data is a small value offset by a large one. That's what the - sign means at the front of the number, "take the plotted y-values and subtract this number to get the actual values". You can remove it by plotting with the mean subtracted. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
y = -1.5*1e7 + np.random.random(100)
plt.plot(y)
plt.ylabel("units")
gives the form you don't like:
but subtracting the mean (or some other number close to that, like min or max, etc) will remove the large offset:
plt.figure()
plt.plot(y - np.mean(y))
plt.ylabel("offset units")
plt.show()
You can remove the offset by using:
plt.ticklabel_format(useOffset=False)
It seems your data is displayed in exponential form like: 1e+10, 2e+10, etc.
This question here might help:
How to prevent numbers being changed to exponential form in Python matplotlib figure
I am new to Python.
I intend to do Fourier Transform to an array of discrete points, (time, acceleration), and plot the result out.
I copy and paste the sample FFT code, and modify accordingly.
Please see codes:
import numpy as np
import matplotlib.pyplot as plt
# Load the .txt file in
myData = np.loadtxt('twenty_z_up.txt')
# Extract the time and acceleration columns
time = copy(myData[:,0])
# Extract the acceleration columns
zAcc = copy(myData[:,3])
t = np.arange(10080)
sp = np.fft.fft(zAcc)
freq = np.fft.fftfreq(t.shape[-1])
plt.plot(freq, sp.real)
myData is a rectangular matrix with 10080 rows and 10 columns.
Thus, zAcc is the row3 extracted from the matrix.
In the plot drawn by Spyder, most of the harmonics concentrated around 0.
They are all extremely small.
But my data are actually the accelerations of the phone carried by a walking person (including the gravity). So I expect the most significant harmonic happens around 2Hz.
Why is the graph non-sense?
Thanks in advance!
==============UPDATES: My Graphs======================
The first time domain one:
x-axis is in millisecond.
y-axis is in m/s^2, due to earth gravity, it has a DC offset of ~10.
You do get two spikes at (approximately) 2Hz. Your sampling period is around 2.8 ms (as best as I can infer from your first plot), giving +/-2Hz the normalized frequency of +/-0.056, which is about where your spikes are. fft.fftfreq by default returns the normalized frequency (which scales the sampling period). You can set the d argument to be the sampling period, and you'll get a vector containing the actual frequency.
Your huge spike in the middle is obviously the DC offset (which you can trivially remove by subtracting the mean).
As others said, we need to see the data, post it somewhere. Just to check, try first fixing the timestep size in fftfreq, then plot this synthetic signal, and then plot your signal to see how they compare:
timestep=1./50.#Assume sampling at 50Hz. Change this accordingly.
N=10080#the number of samples
T=N*timestep
t = np.linspace(0,T,N)#needed only to generate xAcc_synthetic
freq=2.#peak a frequency at 2Hz
#generate synthetic signal at 2Hz and add some noise to it
xAcc_synthetic = sin((2*np.pi)*freq*t)+np.random.rand(N)*0.2
sp_synthetic = np.fft.fft(xAcc_synthetic)
freq = np.fft.fftfreq(t.size,d=timestep)
print max(abs(freq))==(1/timestep)/2.#simple check highest freq.
plt.plot(freq, abs(sp_synthetic))
xlabel('Hz')
Now, at the x axis equal to 2 you actually have a physical frequency of 2Hz, and you may spot the more pronounced peak you are looking for. Moreover, you may want to have a look also at yAcc and zAcc.