I have to calculate the Fourier transform of an acceleration data that I've already coded. I have to do it the old fashion way (I mean, without the numpy np.fft.fft command, even though I don't master that neither) So, this is what I have for the integration:
ri = 1j # first time defining a complex number in python
Fmax = 50 # Hz, the maximum frequency to consider
df = 0.01 # frequency diferential
nf = int(Fmax / df) # number of sample points for frequency
# and I already have UD_Acc defined as a 1D numpy array, then the "for loop":
Int_UD = []
for i in range(UD_Acc.size):
w = []
for j in range(nf):
w.append(2 * np.pi * df * (j - 1))
Int_UD.append(Int_UD[i - 1] + UD_Acc[i] * np.exp(ri * w * (i - 1) * dt1))
First of all, in the for loop the w variable has a warning as:
Expected type 'complex', got 'List[Union[Union[float, int], Any]]' instead
And then, even if I run it, it says that the list index is out of range.
I know it may seem a little rudiment to integrate like this, or to find a Fourier transform without using scipy or np.fft, but is for class, and I'm trying to understand the basics, so thanks in advance.
Related
I'm converting a Matlab script to Python and I am getting different results in the 10**-4 order.
In matlab:
f_mean=f_mean+nanmean(f);
f = f - nanmean(f);
f_t = gradient(f);
f_tt = gradient(f_t);
if n_loop==1
theta = atan2( sum(f.*f_tt), sum(f.^2) );
end
theta = -2.2011167e+03
In Python:
f_mean = f_mean + np.nanmean(vel)
vel = vel - np.nanmean(vel)
firstDerivative = np.gradient(vel)
secondDerivative = np.gradient(firstDerivative)
if numberLoop == 1:
theta = np.arctan2(np.sum(vel * secondDerivative),
np.sum([vel**2]))
Although first and secondDerivative give the same results in Python and Matlab, f_mean is slightly different: -0.0066412 (Matlab) and -0.0066414 (Python); and so theta: -0.4126186 (M) and -0.4124718 (P). It is a small difference, but in the end leads to different results in my scripts.
I know some people asked about this difference, but always regarding std, which I get, but not regarding mean values. I wonder why it is.
One possible source of the initial difference you describe (between means) could be numpy's use of pairwise summation which on large arrays will typically be appreciably more accurate than the naive method:
a = np.random.uniform(-1, 1, (10**6,))
a = np.r_[-a, a]
# so the sum should be zero
a.sum()
# 7.815970093361102e-14
# use cumsum to get naive summation:
a.cumsum()[-1]
# -1.3716805469243809e-11
Edit (thanks #sascha): for the last word and as a "provably exact" reference you could use math.fsum:
import math
math.fsum(a)
# 0.0
Don't have matlab, so can't check what they are doing.
When doing some estimations, calculations, and other fun stuff in Python I came across something really weird and upsetting.
I have this thing where I estimate some parameters using ML-estimation, and have previously assumed that everything was peachy and fine. I read csv-data with pandas, and use the subsequent data for the estimation. Therefore, the data has originally been passed down to the ML-estimation function as Pandas Series. Today I wanted to try some matrix-operations on a thing in the calculation for kicks-and-giggles, and converted the input-data to numpy arrays. However, when I ran the code, the estimation results were different. After restoring some of the multiplications, it was still different. Then I changed back to using Pandas series, and it returned to the previously expected result.
This is where I got curious and now turn to you. Is it so that there is a rounding error between float64 Numpy arrays and float64 Pandas Series so different that when doing my calculations, they get so drastically different?
Consider the following code-example containing a sample from my ML-estimator
import pandas as pd
import numpy as np
import math
values = [3.41527085753, 3.606855606852, 3.5550625070226231, 3.680327020956565, \
3.30270511221, 3.704752803295, 3.6307205395804001, 3.200863997609199, \
2.90599272353, 3.555062501231, 2.8047528032711295, 3.415270760685753, \
3.50599277872, 3.445622506242, 3.3047528084632258, 3.219431344191376, \
3.68032756565, 3.451542245654, 3.2244456543387564, 2.999848273256456]
Ps = pd.Series(values, dtype=np.float64)
Narr = np.array(values, dtype=np.float64)
def getLambda(S, delta = 1/255):
n = len(S) - 1
Sx = sum( S[0:-1] )
Sy = sum( S[1:] )
Sxx = sum( S[0:-1]**2 )
Sxy = sum( S[0:-1]*S[1:] )
mu = (Sy*Sxx - Sx*Sxy) / ( n*(Sxx - Sxy) - (Sx**2 - Sx*Sy) )
lambd = np.log((Sxy - mu*Sx - mu*Sy + n*mu**2) / (Sxx -2*mu*Sx + n*mu**2) )/ delta
a = math.exp(-lambd*delta)
return mu, a, lambd
print("Numpy Array calculation gives me mu = {}, alpha = {} and Lambda = {}".format(getLambda(Narr)[0], getLambda(Narr)[1], getLambda(Narr)[2]))
print("Pandas Series calculation gives me mu = {}, alpha = {} and Lambda = {}".format(getLambda(Ps)[0], getLambda(Ps)[1], getLambda(Ps)[2]))
The values are just some random value picked from my original data in my larger project.
This will, atleast for me, print:
>> Numpy Array calculation gives me mu = 3.378432651661709, alpha = 102.09644146650535 and Lambda = -1179.6090571432392
>> Pandas Series calculation gives me mu = 3.3981019891871247, alpha = nan and Lambda = nan
The procedure, method, and original data is identical, and it still gets a difference of about 0.019669 already in the calculation of mu, which is for me really weird and upsetting.
If this is due to difference (keep in mind that I explicity stated that it should be float64 in both cases) in rounding between the two methods of handling data its weird as this just makes me question which and why I should use any of them. Otherwise, there has to be a bug in one of them? Or is there a third alternative which explains everything and was something that I did not know of to begin with?
I need to understand how I can translate these few lines of MATLAB code. I don't understand how to create a vector n1 of n elements and how to fill it using the same formula as in MATLAB.
Here's the MATLAB code:
nc = 200; ncmax = 600; dx = 0.15e-04;
r = (dx/2):dx:dx*(ncmax+3);
n1(1:nc) =(1 ./ (s.*sqrt(2*pi).*r(1:nc))).*exp(-((log(r(1:nc)) - med).^2)./(2*s^2));
I have the following in Python, but n1 is always an empty array of nc elements:
import numpy as np
r =np.arange((dx/2),(dx*(ncmax+3)),dx)
count=1
n1=np.empty(nc)
while (count<nc)
n1[count]=(1/(s*np.sqrt(2*pi)*r[count]))*np.exp(-((np.log(r[count]))-med)**2)/(2*s**2)
count=count+1
You have a beautifully vectorized solution in MATLAB. One of the main reason for using NumPy is that it also allows for vectorization - so you shouldn't be introducing loops.
As suggested in comments by lucianopaz, there is a guide to NumPy for MATLAB users which explains differences and similarities between the two. It further has a nice list of MATLAB functions and their NumPy equivalents. This may be of great help, when translating MATLAB programs.
Some hints and comments:
Use the NumPy versions of all functions, i.e. np.sqrt,
np.exp (as you were previously) and np.power (instead of **). These functions can be called in a vectorized fashion, just like in MATLAB.
As noticed by #Elisha, you are missing the definitions of s and med, so I'll just assume these are scalars, and set them to 1.
Instead of importing math just for the math.pi, you can also use np.pi, which is exactly the same.
You are creating a large r vector and only use the first nc elements. Why not make r only of size nc from the start, as shown below?
Resulting NumPy code:
import numpy as np
nc = 200
ncmax = 600
dx = 0.15e-04
s = 1
med = 1
r = np.arange(dx / 2, dx * nc, dx)
n1 = 1 / (s * np.sqrt(2 * np.pi) * r) * \
np.exp(-np.power(np.log(r) - med, 2) /
(2 * np.power(s, 2)))
you have several problems.
pi should be math.pi (after you add import math)
add : to your while line: while (count < nc):
s and med are not defined in the scope you wrote
I did the following script to integrate (average) data by intervals in python:
# N = points to mean in the array
# data = original data
# data_mean = average data each N points
data_mean = np.array([np.mean(i) for i in np.array_split(data, len(data)/N)])
How could do that in IDL?
There are a "mean" function, but a "array_split-like"?
The array_split functionality is usually done via REFORM to create a two (or higher) dimensional array from a 1-dimensional array using the same values. So for example:
n = 20
data = randomu(seed, 100)
data = reform(data, 100 / n, n)
print, mean(data, dimension=2)
The IDL mean function is equivalent to the numpy mean function, and the IDL reform can be used similarly to the numpy array_split:
data_mean = mean(reform(data, n_elements(data) / N), dimension=2)
If you don't mind data ending up with different dimensions, you can greatly speed this up using the /overwrite keyword:
data_mean = mean(reform(data, n_elements(data) / N, /overwrite), dimension=2)
Finally, if you have a version of IDL before IDL 8.0, then you won't have the dimension keyword for the mean function. Use this (less elegant) pattern instead:
data_mean = total(reform(data, n_elements(data) / N), 2) / N
Note that this version with total also accepts the /nan keyword, so that it works even when some data are missing.
I'm trying to solve a differential equation numerically, and am writing an equation that will give me an array of the solution to each time point.
import numpy as np
import matplotlib.pylab as plt
pi=np.pi
sin=np.sin
cos=np.cos
sqrt=np.sqrt
alpha=pi/4
g=9.80665
y0=0.0
theta0=0.0
sina = sin(alpha)**2
second_term = g*sin(alpha)*cos(alpha)
x0 = float(raw_input('What is the initial x in meters?'))
x_vel0 = float(raw_input('What is the initial velocity in the x direction in m/s?'))
y_vel0 = float(raw_input('what is the initial velocity in the y direction in m/s?'))
t_f = int(raw_input('What is the maximum time in seconds?'))
r0 = x0
vtan = sqrt(x_vel0**2+y_vel0**2)
dt = 1000
n = range(0,t_f)
r_n = r0*(n*dt)
r_nm1 = r0((n-1)*dt)
F_r = ((vtan**2)/r_n)*sina-second_term
r_np1 = 2*r_n - r_nm1 + dt**2 * F_r
data = [r0]
for time in n:
data.append(float(r_np1))
print data
I'm not sure how to make the equation solve for r_np1 at each time in the range n. I'm still new to Python and would like some help understanding how to do something like this.
First issue is:
n = range(0,t_f)
r_n = r0*(n*dt)
Here you define n as a list and try to multiply the list n with the integer dt. This will not work. Pure Python is NOT a vectorized language like NumPy or Matlab where you can do vector multiplication like this. You could make this line work with
n = np.arange(0,t_f)
r_n = r0*(n*dt),
but you don't have to. Instead, you should move everything inside the for loop to do the calculation at each timestep. At the present point, you do the calculation once, then add the same only result t_f times to the data list.
Of course, you have to leave your initial conditions (which is a key part of ODE solving) OUTSIDE of the loop, because they only affect the first step of the solution, not all of them.
So:
# Initial conditions
r0 = x0
data = [r0]
# Loop along timesteps
for n in range(t_f):
# calculations performed at each timestep
vtan = sqrt(x_vel0**2+y_vel0**2)
dt = 1000
r_n = r0*(n*dt)
r_nm1 = r0*((n-1)*dt)
F_r = ((vtan**2)/r_n)*sina-second_term
r_np1 = 2*r_n - r_nm1 + dt**2 * F_r
# append result to output list
data.append(float(r_np1))
# do something with output list
print data
plt.plot(data)
plt.show()
I did not add any piece of code, only rearranged your lines. Notice that the part:
n = range(0,t_f)
for time in n:
Can be simplified to:
for time in range(0,t_f):
However, you use n as a time variable in the calculation (previously - and wrongly - defined as a list instead of a single number). Thus you can write:
for n in range(0,t_f):
Note 1: I do not know if this code is right mathematically, as I don't even know the equation you're solving. The code runs now and provides a result - you have to check if the result is good.
Note 2: Pure Python is not the best tool for this purpose. You should try some highly optimized built-ins of SciPy for ODE solving, as you have already got hints in the comments.