How to interpret the output of scipy.fftpack.fft?

How to interpret the output of scipy.fftpack.fft? - python

I have 44100Hz audio, which means that there are 44100 samples per second. I would like to analyze it, so I split up the data to sub-arrays with length 1024.
For each array, I apply Fourier transformation (fft), which returns with an array of complex numbers. Those numbers should be the shift and phase values.
The length of the result is 1024, just like a chunk. But I don't know, which element of the array corresponds to which frequency. I checked the documentation, but the only thing I was able to find out, was that the result is symmetric, and I can skip the first part.
from scipy.fftpack import fft
res = fft(chunk)
But how is it possible to find out, that what is the frequency at a given index in the result?

You can see this directly by taking FFT of pure tones. Here I compare: constant function (zero frequency), frequency 1 (period = sampled interval), frequency 2 (period = half of sampled interval), and so on:
import numpy as np
from scipy.fftpack import fft
arr = np.linspace(0, 2*np.pi, 9)[:-1]
for k in range(5):
print np.round(np.abs(fft(np.cos(k*arr))), 10)
Result:
[ 8. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 4. 0. 0. 0. 0. 0. 4.]
[ 0. 0. 4. 0. 0. 0. 4. 0.]
[ 0. 0. 0. 4. 0. 4. 0. 0.]
[ 0. 0. 0. 0. 8. 0. 0. 0.]
So, the 0th entry is constant term, the entries 1 and -1 are for frequency for which the period is the time interval we sampled; the entries 2 and -2 are for period being half of sampled time interval; 3 and -3 for period being 1/3 of sampled time interval, etc, until we reach Nyquist frequency.
For a sample of size 1024:
1 and -1 are for frequency 1/1024 of sampling rate
2 and -2 are for frequency 2/1024 of sampling rate
3 and -3 are for frequency 3/1024 of sampling rate
...
512 is for Nyquist frequency, 1/2 = 512/1024 of sampling rate

Related

Iterate over padded area in 2D array in python

Assume I have a 2D array in Python and I add some padding. How can I iterate over the new padded area only?
For example
1 2 3
4 5 6
7 8 9
Becomes
x x x x x x x
x x x x x x x
x x 1 2 3 x x
x x 4 5 6 x x
x x 7 8 9 x x
x x x x x x x
x x x x x x x
How can I loop over only the x's?

Not sure if I understand what you are trying to do, but if you are using numpy, you can use masks:
import numpy as np
arr = np.array(np.arange(1,10)).reshape(3,3)
# mask full of True's
mask = np.ones((7,7),dtype=bool)
# setting the interior of the mask as False
mask[2:-2,2:-2] = False
# using zero padding as example
pad_arr = np.zeros((7,7))
pad_arr[2:-2,2:-2] = arr
print(pad_arr)
# loop for elements of the padding, where mask == True
for value in pad_arr[mask]:
print(value)
Returns:
[[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 2. 3. 0. 0.]
[0. 0. 4. 5. 6. 0. 0.]
[0. 0. 7. 8. 9. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]]
and 0.0 40 times (the padded values)

My conditional variable on my if statement is being changed by the statement, even though it doesn't appear in the statement. Why? [duplicate]

This question already has answers here:
Why does my original list change? [duplicate]
(2 answers)
Closed 3 years ago.
I want to create two matrices. Then make the second matrix numbers changed depending on the numbers in the first matrix. So I generate an If statement about my first matrix and if true this will induce a change in my second matrix. However, it induces a change in both matrices?
My code works perfectly with single digit objects. It only occurs when I try to apply it with matrices.
import numpy as np
n = 3
matr = np.zeros((n,n))
matr[0][0] = 1
matr2 = matr
print(matr)
[[1. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
print(matr2)
[[1. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
if matr[0][0] == 1:
matr2[0][0] = 9
print(matr)
[[9. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
print(matr2)
[[9. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Because "matr" doesn't occur as a subject in my if statement it shouldn't be altered right?
x = 1
y = x
if x == 1:
y = 9
print(x)
1
print(y)
9

Those 2 variables are just two references to the same matrix, not 2 different matrices; matr2 = matr just creates a new reference to the same matrix.
The statement matr2[0][0] = 9 modifies the one and only matrix that exists in your example, and it is exactly the same as using matr[0][0] = 9.

XGBoost feature_importances_ parameters gives a 0 valued vector

I have experimented XGBClassifier() with a large dataset of shape [400000,93],
the data contains a lot of NaN values, so I have used imputation from sklearn package
imputer = Imputer()
imputed_x = imputer.fit_transform(data)
data = imputed_x
but the feature importance values look like this:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Notice there is only a 1 and the rest are 0. For this reason, the resulting metrics are:
precision: 1.0
recall: 1.0
accuracy: 1.0
traning_accuracy: 1.0
Why the model can't fit the data.
Example code fragments
model_xboost = XGBClassifier(max_depth=5, n_estimators=100)
#train
model_xboost.fit(train_data, train_labels)
print(model_xboost.feature_importances_)

From the feature importance, there is only a 1 and the rest are 0. It looks as if you have included a column which is somewhat similar to the target column in the training data, thus resulting in that feature being perfectly correlated to the target!
For example I've come across a classification problem where I used the patient's background and medical parameters to predict whether or not the patient has cancer. There was 1 column called "data_source" which became the most significant. That's purely because patients who come from "XXX Cancer Hospital" will surely have cancer!
This is a good example of unintended data leakage.

You have one feature that is fully correlated to the target, with correlation value 1.0. That means you have trained your model with the target. You must remove it in training.

NetworkX: adjacency matrix does not correspond to graph

Say I have two options for generating the Adjacency Matrix of a network: nx.adjacency_matrix() and my own code. I wanted to test the correctness of my code and came up with some strange inequalities.
Example: a 3x3 lattice network.
import networkx as nx
N=3
G=nx.grid_2d_graph(N,N)
pos = dict( (n, n) for n in G.nodes() )
labels = dict( ((i,j), i + (N-1-j) * N ) for i, j in G.nodes() )
nx.relabel_nodes(G,labels,False)
inds=labels.keys()
vals=labels.values()
inds.sort()
vals.sort()
pos2=dict(zip(vals,inds))
plt.figure()
nx.draw_networkx(G, pos=pos2, with_labels=True, node_size = 200)
This is the visualization:
The adjacency matrix with nx.adjacency_matrix():
B=nx.adjacency_matrix(G)
B1=B.todense()
[[0 0 0 0 0 1 0 0 1]
[0 0 0 1 0 1 0 0 0]
[0 0 0 1 0 1 0 1 1]
[0 1 1 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 1]
[1 1 1 0 0 0 0 0 0]
[0 0 0 1 0 0 0 1 0]
[0 0 1 0 1 0 1 0 0]
[1 0 1 0 1 0 0 0 0]]
According to it, node 0 (entire 1st row and entire 1st column) is connected to nodes 5 and 8. But if you look at the image above this is wrong, as it connects to nodes 1 and 3.
Now my code (to be run in in the same script as the above):
import numpy
import math
P=3
def nodes_connected(i, j):
try:
if i in G.neighbors(j):
return 1
except nx.NetworkXError:
return False
A=numpy.zeros((P*P,P*P))
for i in range(0,P*P,1):
for j in range(0,P*P,1):
if i not in G.nodes():
A[i][:]=0
A[:][i]=0
elif i in G.nodes():
A[i][j]=nodes_connected(i,j)
A[j][i]=A[i][j]
for i in range(0,P*P,1):
for j in range(0,P*P,1):
if math.isnan(A[i][j]):
A[i][j]=0
print(A)
This yields:
[[ 0. 1. 0. 1. 0. 0. 0. 0. 0.]
[ 1. 0. 1. 0. 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 1. 0. 1. 0. 0.]
[ 0. 1. 0. 1. 0. 1. 0. 1. 0.]
[ 0. 0. 1. 0. 1. 0. 0. 0. 1.]
[ 0. 0. 0. 1. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 1. 0. 1. 0. 1.]
[ 0. 0. 0. 0. 0. 1. 0. 1. 0.]]
which says that node 0 is connected to nodes 1 and 3. Why does such difference exist? What is wrong in this situation?

Networkx doesn't know what order you want the nodes to be in.
Here is how to call it: adjacency_matrix(G, nodelist=None, weight='weight').
If you want a specific order, set nodelist to be a list in that order.
So for example adjacency_matrix(G, nodelist=range(9)) should get what you want.
Why is this? Well, because a graph can have just about anything as its nodes (anything hashable). One of your nodes could have been "parrot" or (1,2). So it stores the nodes as keys in a dict, rather than assuming it's the non-negative integers starting at 0. Dict keys have an arbitrary order.

A more general solution, if your nodes have some logical ordering as is the case if you generate a graph using G=nx.grid_2d_graph(3,3) (which returns tupples from (0,0) to (2,2), or in your example would be to use:
adjacency_matrix(G,nodelist=sorted(G.nodes()))
This sorts the returned list of nodes of G and passes it as the nodelist

matlab to python lu decomposition different

I'm converting some MATLAB code to Python and am observing large numerical discrepancies between the \ operator and scipy.linalg.lstsq, which apparently are interchangeable.
In my code I calculate the LU decomposition of some matrix, however Python and Matlab give slightly different answers for 'L'.
Given this input matrix, B:
B = [7.6822 0 -1.0000 0;
0 0.2896 -1.0000 0;
-6.4018 0 0 -1.0000;
0 -0.9350 0 -1.0000]
In Python, using P,L,U = scipy.linalg.lu(B):
L = [ 1. 0. 0. 0. ]
[ 0. 1. 0. 0. ]
[ 0. -0.30972791 1. 0. ]
[-0.83333333 -0. 0.83333333 1. ]
With Matlab [L,U] = lu(B):
L = 1.0000 0 0 0
0 -0.3097 1.0000 0
-0.8333 0 0.8333 1.0000
0 1.0000 0 0
In both cases U is this:
U = [ 7.6822128 0. -1. 0. ]
[ 0. -0.93502772 0. -1. ]
[ 0. 0. -1. -0.30972791]
[ 0. 0. 0. -0.74189341]

So I figured it out...in MATLAB, [L,U] = lu(A) returns L already premultiplied by permutation matrix P.

Notice that scipy.linalg.lu() has the optional parameter permute_l set to False. You can either set it to True, e.g.
(L,U) = scipy.linalg.lu(A,permute_l=True)
or alternatively performing the permutation yourself afterwards, e.g.,
(P,L,U) = scipy.linalg.lu(A)
L = P#L

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to interpret the output of scipy.fftpack.fft? - python

Related

Iterate over padded area in 2D array in python

My conditional variable on my if statement is being changed by the statement, even though it doesn't appear in the statement. Why? [duplicate]

XGBoost feature_importances_ parameters gives a 0 valued vector

NetworkX: adjacency matrix does not correspond to graph

matlab to python lu decomposition different

Categories

Resources