Dimensions of array don't match - python

I have a numpy array and when I print it i get this output. But I expected to get (105835, 99, 13) as output when printing the print(feat.shape) and was expecting feat to have 3 dimensions.
print(feat.ndim)
print(feat.shape)
print(feat.size)
print(feat[1].ndim)
print(feat[1].shape)
print(feat[1].size)`
1
(105835,)
105835
2
(99, 13)
1287
I don't know how to reduce this. But feat is a MFCC feature. If I print feat this is what I get.
array([array([[-1.0160675e+01, -1.3804866e+01, 9.1880971e-01, ...,
1.5415058e+00, 1.1875046e-02, -5.8664594e+00],
[-9.9697800e+00, -1.3823588e+01, -7.0778362e-02, ...,
1.5948311e+00, 4.3481258e-01, -5.1646194e+00],
[-9.9518738e+00, -1.2771760e+01, -1.2623003e-01, ...,
3.4290311e+00, 2.7361808e+00, -6.0621500e+00],
...,
[-11.605266 , -7.1909204, -33.44656 , ..., -11.974911 ,
12.825395 , 10.635098 ],
[-11.769397 , -9.340318 , -34.413307 , ..., -10.077869 ,
8.821722 , 7.704534 ],
[-12.301968 , -10.67318 , -32.46104 , ..., -6.829077 ,
15.29837 , 13.100596 ]], dtype=float32)], dtype=object)

the same structure can be create in a more simple way :
ain=rand(2,2)
a=ndarray(3,dtype=object)
a[:] = [ain]*3
#array([array([[ 0.14, 0.56],
# [ 0.9 , 0.9 ]]),
# array([[ 0.14, 0.56],
# [ 0.9 , 0.9 ]]),
# array([[ 0.14, 0.56],
# [ 0.9 , 0.9 ]])], dtype=object)
The problem arise because a.dtype is object. You can reconstruct your data by :
a= array(list(a))
#array([
# [[ 0.14, 0.56],
# [ 0.9 , 0.9 ]],
# [[ 0.14, 0.56],
# [ 0.9 , 0.9 ]],
# [[ 0.14, 0.56],
# [ 0.9 , 0.9 ]]])
With will have the float type inherited from the base dtype.

Related

numpy - align 2 vectors with potentially missing values

I have 2 numpy matrix with slightly different alignment
X
id, value
1, 0.78
2, 0.65
3, 0.77
...
...
98, 0.88
99, 0.77
100, 0.87
Y
id, value
1, 0.79
2, 0.65
3, 0.78
...
...
98, 0.89
100, 0.80
Y is simply missing a particular ID.
I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?
All the values are the same, so the extra element in x will be the difference between the sums.
This solution is o(n), other solutions here are o(n^2)
Data generation:
import numpy as np
# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]] # exclude 6
print(x)
np.random.shuffle(y)
print(y)
Solution:
Notice np.isclose() used for floating point comparison.
sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))
print(value_index)
Delete relevant index
deleted = np.delete(x, value_index)
print(deleted)
out:
[0.36373441 0.5030346 0.895204 0.03352821 0.20693263 0.28651572
0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346 0.895204
0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346 0.895204 0.03352821 0.20693263 0.28651572
0.97969841 0.77368822 0.80105397]
Use in1d:
>>> X
array([[ 1. , 0.53],
[ 2. , 0.72],
[ 3. , 0.44],
[ 4. , 0.35],
[ 5. , 0.32],
[ 6. , 0.14],
[ 7. , 0.52],
[ 8. , 0.4 ],
[ 9. , 0.1 ],
[10. , 0.1 ]])
>>> Y
array([[ 1. , 0.19],
[ 2. , 0.96],
[ 3. , 0.24],
[ 4. , 0.44],
[ 5. , 0.12],
[ 6. , 0.91],
[ 7. , 0.7 ],
[ 8. , 0.54],
[10. , 0.09]])
>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1. , 0.53],
[ 2. , 0.72],
[ 3. , 0.44],
[ 4. , 0.35],
[ 5. , 0.32],
[ 6. , 0.14],
[ 7. , 0.52],
[ 8. , 0.4 ],
[10. , 0.1 ]])
You can try this:
X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]
And there you can do whatever operation you want

matrix.dot(inv(matrix)) isn't equal to identity matrix

I'm encountering an issue since hours, I don't understand why the V matrix below doesn't equal the Identity matrix:
A = np.random.randint(50, size=(100, 2))
V = A.dot(A.T)
D = V.dot(inv(V))
D
The result I found is below either:
array([[ 3.26611328, 7.87890625, 14.1953125 , ..., 2. ,
-5. , -24. ],
[ -5.91061401, -26.05834961, 5.30126953, ..., -10. ,
8. , -16. ],
[ -2.64431763, 3.55639648, 3.10107422, ..., -0.5 ,
-5. , -4. ],
...,
[ -2.62512207, -7.78222656, 10.26367188, ..., -6. ,
18. , 0. ],
[ -3.0625 , 14. , -4. , ..., -0.0625 ,
0. , 8. ],
[ 2. , -7. , 16. , ..., -7.5 ,
-8. , -4. ]])
Thank you for your help
I've found my issue:
I was trying to find the inv() of a matrix which det(matrix) = 0, that's why the calculus wasn't correct.
D = V.T.dot(V)
inv(D).dot(D)
then I find the Identity matrix
Thank you
Habib

Add complementary values to numpy array

I have a 1D numpy array, for example the following:
import numpy as np
arr = np.array([0.33, 0.2, 0.8, 0.9])
Now I would like to change the array so that also one minus the value is included. That means the array should look like:
[[0.77, 0.33],
[0.8, 0.2],
[0.2, 0.8],
[0.1, 0.9]]
How can this be done?
>>> np.vstack((1 - arr, arr)).T
array([[0.67, 0.33],
[0.8 , 0.2 ],
[0.2 , 0.8 ],
[0.1 , 0.9 ]])
Alternatively, you can create an empty array and fill in entries:
>>> np.empty((*arr.shape, 2))
>>> x[..., 0] = 1 - arr
>>> x[..., 1] = arr
>>> x
array([[0.67, 0.33],
[0.8 , 0.2 ],
[0.2 , 0.8 ],
[0.1 , 0.9 ]])
Try column_stack
np.column_stack([1 - arr, arr])
Out[33]:
array([[0.67, 0.33],
[0.8 , 0.2 ],
[0.2 , 0.8 ],
[0.1 , 0.9 ]])
Use:
arr=np.insert(1-arr,np.arange(len(arr)),arr).reshape(-1,2)
arr
Output:
array([[0.33, 0.67],
[0.2 , 0.8 ],
[0.8 , 0.2 ],
[0.9 , 0.1 ]])

Inserting rows of zeros at specific places along the rows of a NumPy array

I have a two column numpy array. I want to go through each row of the 2nd column, and take the difference between each set of 2 numbers (9.6-0, 19.13-9.6, etc). If the difference is > 15, I want to insert a row of 0s for both columns. I really only need to end up with values in the first column (I only need the second to determine where to put 0s), so if it's easier to split them up that would be fine.
This is my input array:
[[0.00 0.00]
[1.85 9.60]
[2.73 19.13]
[0.30 28.70]
[2.64 38.25]
[2.29 47.77]
[2.01 57.28]
[2.61 66.82]
[2.20 76.33]
[2.49 85.85]
[2.55 104.90]
[2.65 114.47]
[1.79 123.98]
[2.86 133.55]]
and it should turn into:
[[0.00 0.00]
[1.85 9.60]
[2.73 19.13]
[0.30 28.70]
[2.64 38.25]
[2.29 47.77]
[2.01 57.28]
[2.61 66.82]
[2.20 76.33]
[2.49 85.85]
[0.00 0.00]
[2.55 104.90]
[2.65 114.47]
[1.79 123.98]
[2.86 133.55]]
You can do in a one liner using ediff1d , argmax and insert from numpy:
np.insert(arr, np.argmax(np.append(False, np.ediff1d(arr[:,1])>15)), 0, axis=0)
#array([[ 0. , 0. ],
# [ 1.85, 9.6 ],
# [ 2.73, 19.13],
# [ 0.3 , 28.7 ],
# [ 2.64, 38.25],
# [ 2.29, 47.77],
# [ 2.01, 57.28],
# [ 2.61, 66.82],
# [ 2.2 , 76.33],
# [ 2.49, 85.85],
# [ 0. , 0. ],
# [ 2.55, 104.9 ],
# [ 2.65, 114.47],
# [ 1.79, 123.98],
# [ 2.86, 133.55]])
Assuming A as the input array, here's a vectorized approach based on initialization with zeros -
# Get indices at which such diff>15 occur
cut_idx = np.where(np.diff(A[:,1]) > 15)[0]
# Initiaize output array
out = np.zeros((A.shape[0]+len(cut_idx),2),dtype=A.dtype)
# Get row indices in the output array at which rows from A are to be inserted.
# In other words, avoid rows to be kept as zeros. Finally, insert rows from A.
idx = ~np.in1d(np.arange(out.shape[0]),cut_idx + np.arange(1,len(cut_idx)+1))
out[idx] = A
Sample input, output -
In [50]: A # Different from the one posted in question to show variety
Out[50]:
array([[ 0. , 0. ],
[ 1.85, 0.6 ],
[ 2.73, 19.13],
[ 2.2 , 76.33],
[ 2.49, 85.85],
[ 2.55, 104.9 ],
[ 2.65, 114.47],
[ 1.79, 163.98],
[ 2.86, 169.55]])
In [51]: out
Out[51]:
array([[ 0. , 0. ],
[ 1.85, 0.6 ],
[ 0. , 0. ],
[ 2.73, 19.13],
[ 0. , 0. ],
[ 2.2 , 76.33],
[ 2.49, 85.85],
[ 0. , 0. ],
[ 2.55, 104.9 ],
[ 2.65, 114.47],
[ 0. , 0. ],
[ 1.79, 163.98],
[ 2.86, 169.55]])
a=[[0.00, 0.00],
[1.85, 9.60],
[2.73, 19.13],
[0.30, 28.70],
[2.64, 38.25],
[2.29, 47.77],
[2.01, 57.28],
[2.61, 66.82],
[2.20, 76.33],
[2.49, 85.85],
[2.55, 104.90],
[2.65, 114.47],
[1.79, 123.98],
[2.86, 133.55]]
i=0
while i <len(a)-1:
if (a[i+1][1]-a[i][1])>15:
a.insert(i+1,[0,0])
i=i+1
i=i+1
for line in a :
print line
output:
[0.0, 0.0]
[1.85, 9.6]
[2.73, 19.13]
[0.3, 28.7]
[2.64, 38.25]
[2.29, 47.77]
[2.01, 57.28]
[2.61, 66.82]
[2.2, 76.33]
[2.49, 85.85]
[0, 0]
[2.55, 104.9]
[2.65, 114.47]
[1.79, 123.98]
[2.86, 133.55]
Here's right algorithm:
arr = [ ... ]
result = []
result.append(arr[0])
for i in range(1, len(arr)):
if arr[i][1] - arr[i-1][1] > 15:
result.append([0.0,0.0])
result.append(arr[i])
print(result)
A one liner that can handle more than one fill slot. Here I'm testing it on the OP example, with one modified value.
In [70]: np.insert(a, np.where(np.diff(a[:,1])>15)[0]+2,0, axis=0)
In [71]: Out[70]:
array([[ 0. , 0. ],
[ 1.85, 9.6 ],
[ 2.73, 19.13],
[ 0.3 , 28.7 ],
[ 2.64, 38.25],
[ 2.29, 140. ], # modified
[ 0. , 0. ],
[ 2.01, 57.28],
[ 2.61, 66.82],
[ 2.2 , 76.33],
[ 2.49, 85.85],
[ 2.55, 104.9 ],
[ 0. , 0. ],
[ 2.65, 114.47],
[ 1.79, 123.98],
[ 2.86, 133.55]])
The use of where instead of argmax (Colonel's answer) handles more than one slot. The +2 is required because diff is one short, and we are inserting after. ediff1d has more options for handling the end points.
np.insert has various strategies for filling. In this case it probably is doing something similar to Divakar's answer - create an out, and copy values to the correct slots.
Another answer uses np.abs(). That might be needed, but in my example that would add another 0 row, after the 140 drops back to 57.
I'd be surprised if numpy didn't have some native methods for doing this sort of thing but I think this will work too:
i = 1
while i < len(lst):
if abs(lst[i][1] - lst[i-1][1]) > 15:
lst[i] = [0.0, 0.0]
# uncomment to change only the second column
# lst[i][1] = 0.0
i += 1
i += 1
Output:
>>> lst
array([[ 0. , 0. ],
[ 1.85, 9.6 ],
[ 2.73, 19.13],
[ 0.3 , 28.7 ],
[ 2.64, 38.25],
[ 2.29, 47.77],
[ 2.01, 57.28],
[ 2.61, 66.82],
[ 2.2 , 76.33],
[ 2.49, 85.85],
[ 2.55, 104.9 ],
[ 2.65, 114.47],
[ 1.79, 123.98],
[ 2.86, 133.55]])
>>>
>>> i = 1
>>> while i < len(lst):
... if abs(lst[i][1] - lst[i-1][1]) > 15:
... lst[i] = [0.0, 0.0]
... i += 1
... i += 1
...
>>> lst
array([[ 0. , 0. ],
[ 1.85, 9.6 ],
[ 2.73, 19.13],
[ 0.3 , 28.7 ],
[ 2.64, 38.25],
[ 2.29, 47.77],
[ 2.01, 57.28],
[ 2.61, 66.82],
[ 2.2 , 76.33],
[ 2.49, 85.85],
[ 0. , 0. ],
[ 2.65, 114.47],
[ 1.79, 123.98],
[ 2.86, 133.55]])
>>>

Flip non-zero values along each row of a lower triangular numpy array

I have a lower triangular array, like B:
B = np.array([[1,0,0,0],[.25,.75,0,0], [.1,.2,.7,0],[.2,.3,.4,.1]])
>>> B
array([[ 1. , 0. , 0. , 0. ],
[ 0.25, 0.75, 0. , 0. ],
[ 0.1 , 0.2 , 0.7 , 0. ],
[ 0.2 , 0.3 , 0.4 , 0.1 ]])
I want to flip it to look like:
array([[ 1. , 0. , 0. , 0. ],
[ 0.75, 0.25, 0. , 0. ],
[ 0.7 , 0.2 , 0.1 , 0. ],
[ 0.1 , 0.4 , 0.3 , 0.2 ]])
That is, I want to take all the positive values, and reverse within the positive values, leaving the trailing zeros in place. This is not what fliplr does:
>>> np.fliplr(B)
array([[ 0. , 0. , 0. , 1. ],
[ 0. , 0. , 0.75, 0.25],
[ 0. , 0.7 , 0.2 , 0.1 ],
[ 0.1 , 0.4 , 0.3 , 0.2 ]])
Any tips? Also, the actual array I am working with would be something like B.shape = (200,20,4,4) instead of (4,4). Each (4,4) block looks like the above example (with different numbers across the 200, 20 different entries).
How about this:
# row, column indices of the lower triangle of B
r, c = np.tril_indices_from(B)
# flip the column indices by subtracting them from r, which is equal to the number
# of nonzero elements in each row minus one
B[r, c] = B[r, r - c]
print(repr(B))
# array([[ 1. , 0. , 0. , 0. ],
# [ 0.75, 0.25, 0. , 0. ],
# [ 0.7 , 0.2 , 0.1 , 0. ],
# [ 0.1 , 0.4 , 0.3 , 0.2 ]])
The same approach will generalize to any arbitrary N-dimensional array that consists of multiple lower triangular submatrices:
# creates a (200, 20, 4, 4) array consisting of tiled copies of B
B2 = np.tile(B[None, None, ...], (200, 20, 1, 1))
print(repr(B2[100, 10]))
# array([[ 1. , 0. , 0. , 0. ],
# [ 0.25, 0.75, 0. , 0. ],
# [ 0.1 , 0.2 , 0.7 , 0. ],
# [ 0.2 , 0.3 , 0.4 , 0.1 ]])
r, c = np.tril_indices_from(B2[0, 0])
B2[:, :, r, c] = B2[:, :, r, r - c]
print(repr(B2[100, 10]))
# array([[ 1. , 0. , 0. , 0. ],
# [ 0.75, 0.25, 0. , 0. ],
# [ 0.7 , 0.2 , 0.1 , 0. ],
# [ 0.1 , 0.4 , 0.3 , 0.2 ]])
For an upper triangular matrix you could simply subtract r from c instead, e.g.:
r, c = np.triu_indices_from(B.T)
B.T[r, c] = B.T[c - r, c]
Here's one approach for a 2D array case -
mask = np.tril(np.ones((4,4),dtype=bool))
out = np.zeros_like(B)
out[mask] = B[:,::-1][mask[:,::-1]]
You can extend it to a 3D array case using the same 2D mask by masking the last two axes with it, like so -
out = np.zeros_like(B)
out[:,mask] = B[:,:,::-1][:,mask[:,::-1]]
.. and similarly for a 4D array case, like so -
out = np.zeros_like(B)
out[:,:,mask] = B[:,:,:,::-1][:,:,mask[:,::-1]]
As one can see, we are keeping the masking process to the last two axes of (4,4) and the solution basically stays the same.
Sample run -
In [95]: B
Out[95]:
array([[ 1. , 0. , 0. , 0. ],
[ 0.25, 0.75, 0. , 0. ],
[ 0.1 , 0.2 , 0.7 , 0. ],
[ 0.2 , 0.3 , 0.4 , 0.1 ]])
In [96]: mask = np.tril(np.ones((4,4),dtype=bool))
...: out = np.zeros_like(B)
...: out[mask] = B[:,::-1][mask[:,::-1]]
...:
In [97]: out
Out[97]:
array([[ 1. , 0. , 0. , 0. ],
[ 0.75, 0.25, 0. , 0. ],
[ 0.7 , 0.2 , 0.1 , 0. ],
[ 0.1 , 0.4 , 0.3 , 0.2 ]])

Categories