Related
I would like to create in python (using numpy) an upper triangular matrix in the form:
[[ 1, c, c^2],
[ 0, 1, c ],
[ 0, 0, 1 ]])
where c is a rational number and the rank of the matrix may vary (2, 3, 4, ...). Is there any smart way to do it other than creating rows and stacking them?
r = 3
c = 3
i,j = np.indices((r,r))
np.triu(float(c)**(j-i))
Result:
array([[1., 3., 9.],
[0., 1., 3.],
[0., 0., 1.]])
There are probably more straightforward solutions but this is what I came up with:
import numpy as np
c=5
m=np.triu(c**np.triu(np.ones((3,3)), 1).cumsum(axis =1))
print(m)
output:
[[ 1. 5. 25.]
[ 0. 1. 5.]
[ 0. 0. 1.]]
A small intro to my question.
I want to plot a sound file with numpy in dB. Just doing 20 * np.log10(arr) doesn't work because of negative numbers.
So I was looking into using np.where(). because np.where() is part of ufunc I just want to do it in that way (easier and more readable than using brackets).
I had some difficulties with it (unexpected results with a small test of random numbers). So I updated to latest version of numpy (was 1.18.x and is now 1.19.1).
Also I updated Spyder to 4.1.4.
So I did the following steps in the console for checking.
I also added comments for some steps.
In [1]: import numpy as np
In [2]: a = np.round(np.random.rand(10) * 3, 0)
In [3]: a
Out[3]: array([2., 1., 2., 1., 1., 3., 2., 0., 2., 1.])
In [4]: tf = np.where(a==2, True, False)
In [5]: b = np.power(a,3, where=np.where(a==2, True, False))
In [6]: b
Out[6]:
array([8. , 1.42180731, 8. , 1.31799691, 1.01436297,
2.82985094, 8. , 0.35036821, 8. , 0.73520376])
In [7]: np.power(a,3, where=tf)
Out[7]: array([8., 1., 8., 1., 1., 3., 8., 0., 8., 1.])
In[8]: np.power(a,3, where=np.where(a==2, True, False))
Out[8]: array([8., 1., 8., 1., 1., 3., 8., 0., 8., 1.])
So when just calculating in the console it looks fine, but when using a variable the original values messed up.
In[9]: b=np.power(a,3, where=tf)
In[10]: b
Out[10]:
array([8. , 1.42180731, 8. , 1.31799691, 1.01436297,
2.82985094, 8. , 0.35036821, 8. , 0.73520376])
In[11]: np.log10(a, where=np.where(a>2, True, False))
Out[11]:
array([8. , 1.42180731, 8. , 1.31799691, 1.01436297,
0.47712125, 8. , 0.35036821, 8. , 0.73520376])
Because of weird results I checked if a is still right
In[12]: a # Check if a still right
Out[12]: array([2., 1., 2., 1., 1., 3., 2., 0., 2., 1.])
Looks like it is. So I tried np.log10() next. Check what happens:
In[13]: np.log10(a, where=np.where(a>2, True, False))
Out[13]:
array([2. , 1. , 2. , 1. , 1. ,
0.47712125, 2. , 0. , 2. , 1. ])
In[14]: c = np.log10(a, where=np.where(a>2, True, False))
In[15]: c
Out[15]:
array([2. , 1. , 2. , 1. , 1. ,
0.47712125, 2. , 0. , 2. , 1. ])
somehow with c (log10) everything works / looks fine. So I removed b (I use spyder so I removed it from the variable explorer).
Then I recreated b
In[16]: b = np.power(a, 3, where=np.where(a==2, True, False))
In[17]: b
Out[17]: array([8., 1., 8., 1., 1., 3., 8., 0., 8., 1.])
In[18]: b1 = np.power(a, 3, where=tf)
In[19]: b1
Out[19]:
array([8.00000000e+000, 1.82804289e-322, 8.00000000e+000, 0.00000000e+000,
0.00000000e+000, 6.52741159e-038, 8.00000000e+000, 7.63251534e+169,
8.00000000e+000, 1.23967276e+224])
So I can't follow why this is the case. Did I do something wrong? (If yes please explain). Is this a bug in Numpy?
Edit: I have occurence on multiple laptops. So i created a mini script what should do the trick if it is on your pc / laptop. One adition before the script i use anaconda on all my machines for if that matters.
a = np.round(np.random.rand(10) * 3, 0)
tf = np.where(a==2, True, False)
b = np.power(a, 3, where=np.where(a==2, True, False))
b1 = np.power(a, 3, where=tf)
c = np.log10(a, where=np.where(a>2, True, False))
bits = 16
linarr = np.arange(2 ** bits) - 2 ** (bits - 1)
logarr = np.copy(linarr)
logarr = 20 * np.log10(logarr, where=np.where(linarr > 0, True, False))
I would expect that at least for the logarr i get something in the following direction array([-32768, -32767,-32766, ... , 0, ... 90.3085, 90.3087]) but i get array([1.76e-314, 1.72-314, 2.12e-312, ... , 0, ... 90.3085, 90.3087])
I am using the following code and getting an output numpy ndarray of size (2,9) that I am then trying to reshape into size (3,3,2). My hope was that calling reshape using (3,3,2) as the dimensions of the new array would take each row of the 2x9 array and shape it into a 3x3 array and wrap these two 3x3 arrays into another array.
For instance, when I index the result I would like the following behavior:
input: print(result)
output: [[ 2. 2. 1. 0. 8. 5. 2. 4. 5.]
[ 4. 7. 5. 6. 4. 3. -3. 2. 1.]]
result = result.reshape((3,3,2))
DESIRED NEW BEHAVIOR
input: print(result[:,:,0])
output: [[2. 2. 1.]
[0. 8. 5.]
[2. 4. 5.]]
input: print(result[:,:,1])
output: [[ 4. 7. 5.]
[ 6. 4. 3.]
[-3. 2. 1.]]
ACTUAL NEW BEHAVIOR
input: print(result[:,:,0])
output: [[2. 1. 8.]
[2. 5. 7.]
[6. 3. 2.]]
input: print(result[:,:,1])
output: [[ 2. 0. 5.]
[ 4. 4. 5.]
[ 4. -3. 1.]]
Is there a way to specify to reshape that I would like to go row by row along the depth dimension? I'm very confused as to why numpy by default makes the choice it does for reshape.
Here is the code I am using to produce result matrix, this code may or may not be necessary to analyze my issue. I feel as if it will not be necessary but am including it for completeness:
import numpy as np
# im2col implementation assuming width/height dimensions of filter and input_vol
# are the same (i.e. input_vol_width is equal to input_vol_height and the same
# for the filter spatial dimensions, although input_vol_width need not equal
# filter_vol_width)
def im2col(input, filters, input_vol_dims, filter_size_dims, stride):
receptive_field_size = 1
for dim in filter_size_dims:
receptive_field_size *= dim
output_width = output_height = int((input_vol_dims[0]-filter_size_dims[0])/stride + 1)
X_col = np.zeros((receptive_field_size,output_width*output_height))
W_row = np.zeros((len(filters),receptive_field_size))
pos = 0
for i in range(0,input_vol_dims[0]-1,stride):
for j in range(0,input_vol_dims[1]-1,stride):
X_col[:,pos] = input[i:i+stride+1,j:j+stride+1,:].ravel()
pos += 1
for i in range(len(filters)):
W_row[i,:] = filters[i].ravel()
bias = np.array([[1], [0]])
result = np.dot(W_row, X_col) + bias
print(result)
if __name__ == '__main__':
x = np.zeros((7, 7, 3))
x[:,:,0] = np.array([[0,0,0,0,0,0,0],
[0,1,1,0,0,1,0],
[0,2,2,1,1,1,0],
[0,2,0,2,1,0,0],
[0,2,0,0,1,0,0],
[0,0,0,1,1,0,0],
[0,0,0,0,0,0,0]])
x[:,:,1] = np.array([[0,0,0,0,0,0,0],
[0,2,0,1,0,2,0],
[0,0,1,2,1,0,0],
[0,2,0,0,2,0,0],
[0,2,1,0,0,0,0],
[0,1,2,2,2,0,0],
[0,0,0,0,0,0,0]])
x[:,:,2] = np.array([[0,0,0,0,0,0,0],
[0,0,0,2,1,1,0],
[0,0,0,2,2,0,0],
[0,2,1,0,2,2,0],
[0,0,1,2,1,2,0],
[0,2,0,0,2,1,0],
[0,0,0,0,0,0,0]])
w0 = np.zeros((3,3,3))
w0[:,:,0] = np.array([[1,1,0],
[1,-1,1],
[-1,1,1]])
w0[:,:,1] = np.array([[-1,-1,0],
[1,-1,1],
[1,-1,-1]])
w0[:,:,2] = np.array([[0,0,0],
[0,0,1],
[1,0,1]]
w1 = np.zeros((3,3,3))
w1[:,:,0] = np.array([[0,-1,1],
[1,1,0],
[1,1,0]])
w1[:,:,1] = np.array([[-1,-1,1],
[1,0,1],
[0,1,1]])
w1[:,:,2] = np.array([[-1,-1,0],
[1,-1,0],
[1,1,0]])
filters = np.array([w0,w1])
im2col(x,np.array([w0,w1]),x.shape,w0.shape,2)
Let's reshape a bit differently and then do a depth-wise dstack:
arr = np.dstack(result.reshape((-1,3,3)))
arr[..., 0]
array([[2., 2., 1.],
[0., 8., 5.],
[2., 4., 5.]])
Reshape keeps the original order of the elements
In [215]: x=np.array(x)
In [216]: x.shape
Out[216]: (2, 9)
Reshaping the size 9 dimension into a 3x3 keeps the element order that you want:
In [217]: x.reshape(2,3,3)
Out[217]:
array([[[ 2., 2., 1.],
[ 0., 8., 5.],
[ 2., 4., 5.]],
[[ 4., 7., 5.],
[ 6., 4., 3.],
[-3., 2., 1.]]])
But you have to index it with [0,:,:] to see one of those blocks.
To see the same blocks with [:,:,0], you have to move that size 2 dimension to the end. COLDSPEED's dstack does that by iterating on the first dimension, and joining the 2 blocks (each 3x3) on a new third dimension). Another way is to use transpose to reorder the dimensions:
In [218]: x.reshape(2,3,3).transpose(1,2,0)
Out[218]:
array([[[ 2., 4.],
[ 2., 7.],
[ 1., 5.]],
[[ 0., 6.],
[ 8., 4.],
[ 5., 3.]],
[[ 2., -3.],
[ 4., 2.],
[ 5., 1.]]])
In [219]: y = _
In [220]: y.shape
Out[220]: (3, 3, 2)
In [221]: y[:,:,0]
Out[221]:
array([[2., 2., 1.],
[0., 8., 5.],
[2., 4., 5.]])
I see cholesky decomposition in numpy.linalg.cholesky, but could not find a LDU decompositon. Can anyone suggest a function to use?
Scipy has an LU decomposition function: scipy.linalg.lu. Note that this also introduces a permutation matrix P into the mix. This answer gives a nice explanation of why this happens.
If you specifically need LDU, then you can just normalize the U matrix to pull out D.
Here's how you might do it:
>>> import numpy as np
>>> import scipy.linalg as la
>>> a = np.array([[2, 4, 5],
[1, 3, 2],
[4, 2, 1]])
>>> (P, L, U) = la.lu(a)
>>> P
array([[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]])
>>> L
array([[ 1. , 0. , 0. ],
[ 0.5 , 1. , 0. ],
[ 0.25 , 0.83333333, 1. ]])
>>> U
array([[ 4. , 2. , 1. ],
[ 0. , 3. , 4.5],
[ 0. , 0. , -2. ]])
>>> D = np.diag(np.diag(U)) # D is just the diagonal of U
>>> U /= np.diag(U)[:, None] # Normalize rows of U
>>> P.dot(L.dot(D.dot(U))) # Check
array([[ 2., 4., 5.],
[ 1., 3., 2.],
[ 4., 2., 1.]])
Try this:
import numpy as np
A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
U = np.triu(A,1)
L = np.tril(A,-1)
D = np.tril(np.triu(A))
print(A)
print(L)
print(D)
print(U)
Since collections.Counter is so slow, I am pursuing a faster method of summing mapped values in Python 2.7. It seems like a simple concept and I'm kind of disappointed in the built-in Counter method.
Basically, I need to be able to take arrays like this:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
And then "add" them so they look like this:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If there isn't a good way to do this quickly and efficiently, I'm open to any other ideas that will allow me to do something similar to this, and I'm open to modules other than Numpy.
Thanks!
Edit: Ready for some speedtests?
Intel win 64bit machine. All of the following values are in seconds; 20000 loops.
collections.Counter results:
2.131000, 2.125000, 2.125000
Divakar's union1d + masking results:
1.641000, 1.633000, 1.625000
Divakar's union1d + indexing results:
0.625000, 0.625000, 0.641000
Histogram results:
1.844000, 1.938000, 1.858000
Pandas results:
16.659000, 16.686000, 16.885000
Conclusions: union1d + indexing wins, the array size is too small for Pandas to be effective, and the histogram approach blew my mind with its simplicity but I'm guessing it takes too much overhead to create. All of the responses I received were very good, though. This is what I used to get the numbers. Thanks again!
Edit: And it should be mentioned that using Counter1.update(Counter2.elements()) is terrible despite doing the same exact thing (65.671000 sec).
Later Edit: I've been thinking about this a lot, and I've came to realize that, with Numpy, it might be more effective to fill each array with zeros so that the first column isn't even needed since we can just use the index, and that would also make it much easier to add multiple arrays together as well as do other functions. Additionally, Pandas makes more sense than Numpy since there would be no need to 0-fill, and it would definitely be more effective with large data sets (however, Numpy has the advantage of being compatible on more platforms, like GAE, if that matters at all). Lastly, the answer I checked was definitely the best answer for the exact question I asked--adding the two arrays in the way I showed--but I think what I needed was a change in perspective.
Here's one approach with np.union1d and masking -
def app1(a,b):
c0 = np.union1d(a[:,0],b[:,0])
out = np.zeros((len(c0),2))
out[:,0] = c0
mask1 = np.in1d(c0,a[:,0])
out[mask1,1] = a[:,1]
mask2 = np.in1d(c0,b[:,0])
out[mask2,1] += b[:,1]
return out
Sample run -
In [174]: a
Out[174]:
array([[ 0., 2.],
[ 12., 2.],
[ 23., 1.]])
In [175]: b
Out[175]:
array([[ 0., 3.],
[ 1., 1.],
[ 12., 5.]])
In [176]: app1(a,b)
Out[176]:
array([[ 0., 5.],
[ 1., 1.],
[ 12., 7.],
[ 23., 1.]])
Here's another with np.union1d and indexing -
def app2(a,b):
n = np.maximum(a[:,0].max(), b[:,0].max())+1
c0 = np.union1d(a[:,0],b[:,0])
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out = out0[c0.astype(int)]
out[:,0] = c0
return out
For the case where all indices are covered by the first column values in a and b -
def app2_specific(a,b):
c0 = np.union1d(a[:,0],b[:,0])
n = c0[-1]+1
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out0[:,0] = c0
return out0
Sample run -
In [234]: a
Out[234]:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
In [235]: b
Out[235]:
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
In [236]: app2_specific(a,b)
Out[236]:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If you know the number of fields, use np.bincount.
c = np.vstack([a, b])
counts = np.bincount(c[:, 0], weights = c[:, 1], minlength = numFields)
out = np.vstack([np.arange(numFields), counts]).T
This works if you're getting all your data at once. Make a list of your arrays and vstack them. If you're getting data chunks sequentially, you can use np.add.at to do the same thing.
out = np.zeros(2, numFields)
out[:, 0] = np.arange(numFields)
np.add.at(out[:, 1], a[:, 0], a[:, 1])
np.add.at(out[:, 1], b[:, 0], b[:, 1])
You can use a basic histogram, this will deal with gaps, too. You can filter out zero-count entries if need be.
import numpy as np
x = np.array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
y = np.array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.],
[ 5., 3.]])
c, w = np.vstack((x,y)).T
h, b = np.histogram(c, weights=w,
bins=np.arange(c.min(),c.max()+2))
r = np.vstack((b[:-1], h)).T
print(r)
# [[ 0. 5.]
# [ 1. 1.]
# [ 2. 7.]
# [ 3. 1.]
# [ 4. 0.]
# [ 5. 3.]]
r_nonzero = r[r[:,1]!=0]
Pandas have some functions doing exactly what you intend
import pandas as pd
pda = pd.DataFrame(a).set_index(0)
pdb = pd.DataFrame(b).set_index(0)
result = pd.concat([pda, pdb], axis=1).fillna(0).sum(axis=1)
Edit: If you actually need the data back in numpy format, just do
array_res = result.reset_index(name=1).values
This is a quintessential grouping problem, which numpy_indexed (disclaimer: I am its author) was created to solve elegantly and efficiently:
import numpy_indexed as npi
C = np.concatenate([A, B], axis=0)
labels, sums = npi.group_by(C[:, 0]).sum(C[:, 1])
Note: its cleaner to maintain your label arrays as a seperate int array; floats are finicky when it comes to labeling things, with positive and negative zeros, and printed values not relaying all binary state. Better to use ints for that.