I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):
globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid
where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.
edit: In response to the comments below, here is a example to illustrate what I mean:
First setup the matrices:
M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]])
m.todense()
matrix([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Then assign:
M[1:4, 1:4] = m
Now the result is:
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 0., 0., 2.],
[ 2., 0., 1., 0., 2.],
[ 2., 0., 0., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Whereas I need the result to be:
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 2., 2., 2.],
[ 2., 2., 1., 2., 2.],
[ 2., 2., 2., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Should this line
The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
be changed to?
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d] indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.
Let's try this with real matrices.
In [13]: M=sparse.lil_matrix((10,10))
In [14]: m=sparse.eye(3)
In [15]: M[4:7,5:8]=m
In [16]: m
Out[16]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
In [17]: M
Out[17]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in LInked List format>
In [18]: M.data
Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
In [19]: M.rows
Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)
M does not have any unnecessary 0's.
If there are unnecessary 0's in a sparse matrix, a round trip to csr format should take care of them
M.tocsr().tolil()
csr format also has an inplace .eliminate_zeros() method.
So your concern is with over writing the nonzeros of the target array.
With dense arrays, the use of nonzero (or where) takes care of this:
In [87]: X=np.ones((10,10),int)*2
In [88]: y=np.eye(3)
In [89]: I,J=np.nonzero(y)
In [90]: X[I+3,J+2]=y[I,J]
In [91]: X
Out[91]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
Trying the sparse equivalent:
In [92]: M=sparse.lil_matrix(X)
In [93]: M
Out[93]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 100 stored elements in LInked List format>
In [94]: m=sparse.coo_matrix(y)
In [95]: m
Out[95]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
In [96]: I,J=np.nonzero(m)
In [97]: I
Out[97]: array([0, 1, 2], dtype=int32)
In [98]: J
Out[98]: array([0, 1, 2], dtype=int32)
In [99]: M[I+3,J+2]=m[I,J]
...
TypeError: 'coo_matrix' object is not subscriptable
I could have used the sparse matrix own nonzero.
In [106]: I,J=m.nonzero()
For coo format, this is the same as
In [109]: I,J=m.row, m.col
In which case I can also use the data attribute:
In [100]: M[I+3,J+2]=m.data
In [101]: M.A
Out[101]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)
The code for m.nonzero may be instructive
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
So you need to be careful to make sure that index and data attributes of the sparse matrix match.
And also pay attention as to which sparse formats allow indexing. lil is good for changing values. csr allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo has this nice pairing of indices and data, but doesn't allow indexing.
Another subtle point: in constructing a coo you may repeat coordinates. When converted to csr format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local matrix was constructed, and know whether it is a 'clean' representation of the data.
I have found a working solution. Instead of using an assignment I loop over the data in the sparse matrix (using M.data and M.rows) and replace the elements one by one.
for idx, row in enumerate(m.rows):
for idy, col in enumerate(row):
M[yOffset+col, xOffset+idx] = m.data[idx][idy]
I am still curious though if there is no simpler/faster method to achieve this result.
My implementation of the above answer:
I,J = m.nonzero()
M[I+yOffset,J+xOffset] = m[I,J]
return M'
This was however marginally slower.
Related
I am trying to fill nan values in an array with values from another array. Since the arrays I am working on are 1-D np.where is not working. However, following the tip in the documentation I tried the following:
import numpy as np
sample = [1, 2, np.nan, 4, 5, 6, np.nan]
replace = [3, 7]
new_sample = [new_value if condition else old_value for (new_value, condition, old_value) in zip(replace, np.isnan(sample), sample)]
However, instead output I expected [1, 2, 3, 4, 5, 6, 7] I get:
[Out]: [1, 2]
What I am doing wrong?
np.where works
In [561]: sample = np.array([1, 2, np.nan, 4, 5, 6, np.nan])
Use isnan to identify the nan values (don't use ==)
In [562]: np.isnan(sample)
Out[562]: array([False, False, True, False, False, False, True])
In [564]: np.where(np.isnan(sample))
Out[564]: (array([2, 6], dtype=int32),)
Either one, the boolean or the where tuple can index the nan values:
In [565]: sample[Out[564]]
Out[565]: array([nan, nan])
In [566]: sample[Out[562]]
Out[566]: array([nan, nan])
and be used to replace:
In [567]: sample[Out[562]]=[1,2]
In [568]: sample
Out[568]: array([1., 2., 1., 4., 5., 6., 2.])
The three parameter also works - but returns a copy.
In [571]: np.where(np.isnan(sample),999,sample)
Out[571]: array([ 1., 2., 999., 4., 5., 6., 999.])
You can use numpy.argwhere. But #hpaulj shows that numpy.where works just as well.
import numpy as np
sample = np.array([1, 2, np.nan, 4, 5, 6, np.nan])
replace = np.array([3, 7])
sample[np.argwhere(np.isnan(sample)).ravel()] = replace
# array([ 1., 2., 3., 4., 5., 6., 7.])
I have a numpy array like this
import numpy as np
ar = np.array([1, 2, 3, 4])
and I want to create an array that looks like this:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
Thereby, each row corresponds to ar which is shifted by the row index + 1.
A straightforward implementation could look like this:
ar_roll = np.tile(ar, ar.shape[0]).reshape(ar.shape[0], ar.shape[0])
for indi, ri in enumerate(ar_roll):
ar_roll[indi, :] = np.roll(ri, indi + 1)
which gives me the desired output.
My question is whether there is a smarter way of doing this which avoids the loop.
Here's one approach using NumPy strides basically padding with the leftover elements and then the strides helping us in creating that shifted version pretty efficiently -
def strided_method(ar):
a = np.concatenate(( ar, ar[:-1] ))
L = len(ar)
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a[L-1:], (L,L), (-n,n))
Sample runs -
In [42]: ar = np.array([1, 2, 3, 4])
In [43]: strided_method(ar)
Out[43]:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
In [44]: ar = np.array([4,9,3,6,1,2])
In [45]: strided_method(ar)
Out[45]:
array([[2, 4, 9, 3, 6, 1],
[1, 2, 4, 9, 3, 6],
[6, 1, 2, 4, 9, 3],
[3, 6, 1, 2, 4, 9],
[9, 3, 6, 1, 2, 4],
[4, 9, 3, 6, 1, 2]])
Runtime test -
In [5]: a = np.random.randint(0,9,(1000))
# #Eric's soln
In [6]: %timeit roll_matrix(a)
100 loops, best of 3: 3.39 ms per loop
# #Warren Weckesser's soln
In [8]: %timeit circulant(a[::-1])
100 loops, best of 3: 2.03 ms per loop
# Strides method
In [18]: %timeit strided_method(a)
100000 loops, best of 3: 6.7 µs per loop
Making a copy (if you want to make changes and not just use as a read only array) won't hurt us too badly for the strides method -
In [19]: %timeit strided_method(a).copy()
1000 loops, best of 3: 381 µs per loop
Both of the existing answers are fine; this answer is probably only of interest if you are already using scipy.
The matrix that you describe is known as a circulant matrix. If you don't mind the dependency on scipy, you can use scipy.linalg.circulant to create one:
In [136]: from scipy.linalg import circulant
In [137]: ar = np.array([1, 2, 3, 4])
In [138]: circulant(ar[::-1])
Out[138]:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
Here's one approach
def roll_matrix(vec):
N = len(vec)
buffer = np.empty((N, N*2 - 1))
# generate a wider array that we want a slice into
buffer[:,:N] = vec
buffer[:,N:] = vec[:-1]
rolled = buffer.reshape(-1)[N-1:-1].reshape(N, -1)
return rolled[:,:N]
In your case, we build buffer to be
array([[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.]])
Then flatten it, trim it, reshape it to get rolled:
array([[ 4., 1., 2., 3., 1., 2.],
[ 3., 4., 1., 2., 3., 1.],
[ 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2.]])
And finally, slice off the garbage last columns
Lets say , S is the large scipy-csr-matrix(sparse) and a dictionary D with key -> index(position) of the row vector A in S & values -> list of all the indices(positions) of other row vectors l in S. For each row vector in l you subtract A and get the new vector which will be nothing but the new row vector to be updated in the new sparse matrix.
dictionary of form -> { 1 : [4 , 5 ,... ,63] }
then have to create a new sparse matrix with....
new_row_vector_1 -> S_vec1 - S_vec4
new_row_vector_2 -> S_vec1 - S_vec5
.
new_row_vector_n -> S_vec1 - S_vec63
where S_vecX is the Xth row vector of matrix S
Check out the pictorial explanation of the above statements
Numpy Example:
>>> import numpy as np
>>> s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
>>> s
array([[ 1, 5, 3, 4],
[ 3, 0, 12, 7],
[ 5, 6, 2, 4],
[ 4, 6, 6, 4],
[ 7, 12, 5, 67]])
>>> index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]}
>>> n = np.zeros((10,4)) #sum of all lengths of values in index_dictionary would be the number of rows for the new array(n) and columns remain the same as s.
>>> n
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> idx = 0
>>> for index in index_dictionary:
... for k in index_dictionary[index]:
... n[idx] = s[index]-s[k]
... idx += 1
...
>>> n
array([[ -4., -1., 1., 0.],
[ -6., -7., -2., -63.],
[ -1., -6., 6., 3.],
[ -4., -12., 7., -60.],
[ 2., 6., -10., -3.],
[ 1., 6., -6., -3.],
[ -1., 0., 4., 0.],
[ 4., 12., -7., 60.],
[ 3., 6., -1., 63.],
[ 2., 6., 3., 63.]])
n is what i want.
Here's a simple demonstration of what I think you are trying to do:
First the numpy array version:
In [619]: arr=np.arange(12).reshape(4,3)
In [620]: arr[[1,0,2,3]]-arr[0]
Out[620]:
array([[3, 3, 3],
[0, 0, 0],
[6, 6, 6],
[9, 9, 9]])
Now the sparse equivalent:
In [622]: M=sparse.csr_matrix(arr)
csr implements row indexing:
In [624]: M[[1,0,2,3]]
Out[624]:
<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>
In [625]: M[[1,0,2,3]].A
Out[625]:
array([[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[ 9, 10, 11]], dtype=int32)
But not broadcasting:
In [626]: M[[1,0,2,3]]-M[0]
....
ValueError: inconsistent shapes
So we can use an explicit form of broadcasting
In [627]: M[[1,0,2,3]]-M[[0,0,0,0]] # or M[[0]*4]
Out[627]:
<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 9 stored elements in Compressed Sparse Row format>
In [628]: _.A
Out[628]:
array([[3, 3, 3],
[0, 0, 0],
[6, 6, 6],
[9, 9, 9]], dtype=int32)
This may not be the fastest or most efficient, but it's a start.
I found in a previous SO question that M[[1,0,2,3]] indexing is performed with a matrix multiplication, in this case the equivalent of:
idxM = sparse.csr_matrix(([1,1,1,1],([0,1,2,3],[1,0,2,3])),(4,4))
M1 = idxM * M
Sparse matrix slicing using list of int
So my difference expression requires 2 such multiplication along with the subtraction.
We could try a row by row iteration, and building a new matrix from the result, but there's no guarantee that it will be faster. Depending on the arrays, converting to dense and back might even be faster.
=================
I can imagine 2 ways of applying this to the dictionary.
One is to iterate through the dictionary (what order?), perform this difference for each key, collect the results in a list (list of sparse matrices), and use sparse.bmat to join them into one matrix.
Another is to collect two lists of indexes, and apply the above indexed difference just once.
In [8]: index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]}
In [10]: alist=[]
...: for index in index_dictionary:
...: for k in index_dictionary[index]:
...: alist.append((index, k))
In [11]: idx = np.array(alist)
In [12]: idx
Out[12]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
applied to your dense s:
In [15]: s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
In [16]: s[idx[:,0]]-s[idx[:,1]]
Out[16]:
array([[ -4, -1, 1, 0],
[ -6, -7, -2, -63],
[ -1, -6, 6, 3],
[ -4, -12, 7, -60],
[ 2, 6, -10, -3],
[ 1, 6, -6, -3],
[ -1, 0, 4, 0],
[ 4, 12, -7, 60],
[ 3, 6, -1, 63],
[ 2, 6, 3, 63]])
and to the sparse equivalent
In [19]: arr= sparse.csr_matrix(s)
In [20]: arr
Out[20]:
<5x4 sparse matrix of type '<class 'numpy.int32'>'
with 19 stored elements in Compressed Sparse Row format>
In [21]: res=arr[idx[:,0]]-arr[idx[:,1]]
In [22]: res
Out[22]:
<10x4 sparse matrix of type '<class 'numpy.int32'>'
with 37 stored elements in Compressed Sparse Row format>
Is there a way to specify the dtype for numpy.gradient?
I'm using an array of subarrays and it's throwing the following error:
ValueError: setting an array element with a sequence.
Here is an example:
import numpy as np
a = np.empty([3, 3], dtype=object)
it = np.nditer(a, flags=['multi_index', 'refs_ok'])
while not it.finished:
i = it.multi_index[0]
j = it.multi_index[1]
a[it.multi_index] = np.array([i, j])
it.iternext()
print(a)
which outputs
[[array([0, 0]) array([0, 1]) array([0, 2])]
[array([1, 0]) array([1, 1]) array([1, 2])]
[array([2, 0]) array([2, 1]) array([2, 2])]]
I would like print(np.gradient(a)) to return
array(
[[array([[1, 0],[0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])]],
dtype=object)
Notice that, in this case, the gradient of the vector field is an identity tensor field.
why are you working an array of dtype object? That's more work than using a 2d array.
e.g.
In [53]: a1=np.array([[1,2],[3,4],[5,6]])
In [54]: a1
Out[54]:
array([[1, 2],
[3, 4],
[5, 6]])
In [55]: np.gradient(a1)
Out[55]:
[array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]]),
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])]
or working column by column, or row by row
In [61]: [np.gradient(i) for i in a1.T]
Out[61]: [array([ 2., 2., 2.]), array([ 2., 2., 2.])]
In [62]: [np.gradient(i) for i in a1]
Out[62]: [array([ 1., 1.]), array([ 1., 1.]), array([ 1., 1.])]
dtype=object only make sense if the subarrays/lists differ in type and/or shape. And even then it doesn't add much to a regular Python list.
==============================
I can take your 2d a, and make a 3d array with:
In [126]: a1=np.zeros((3,3,2),int)
In [127]: a1.flat[:]=[i for i in a.flatten()]
In [128]: a1
Out[128]:
array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]],
[[2, 0],
[2, 1],
[2, 2]]])
Or I could produce the same thing with meshgrid:
In [129]: X,Y=np.meshgrid(np.arange(3),np.arange(3),indexing='ij')
In [130]: a2=np.array([Y,X]).T
When I apply np.gradient to that I get 3 arrays, each (3,3,2) in shape.
In [136]: ga1=np.gradient(a1)
In [137]: len(ga1)
Out[137]: 3
In [138]: ga1[0].shape
Out[138]: (3, 3, 2)
It looks like the 1st 2 arrays have the values you want, so it's just a matter of rearranging them.
In [141]: np.array(ga1[:2]).shape
Out[141]: (2, 3, 3, 2)
In [143]: gga1=np.array(ga1[:2]).transpose([1,2,0,3])
In [144]: gga1.shape
Out[144]: (3, 3, 2, 2)
In [145]: gga1[0,0]
Out[145]:
array([[ 1., -0.],
[-0., 1.]])
If they must go back into a (3,3) object array, I could do:
In [146]: goa1=np.empty([3,3],dtype=object)
In [147]: for i in range(3):
for j in range(3):
goa1[i,j]=gga1[i,j]
.....:
In [148]: goa1
Out[148]:
array([[array([[ 1., -0.],
[-0., 1.]]),
array([[ 1., -0.],
[ 0., 1.]]),
array([[ 1., -0.],
...
[ 0., 1.]]),
array([[ 1., 0.],
[ 0., 1.]])]], dtype=object)
I still wonder what's the point to working with a object array.
This question already has answers here:
Concatenate a NumPy array to another NumPy array
(12 answers)
Closed 7 years ago.
I'm making a program where I need to make a matrix looking like this:
A = np.array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
So I started thinking about this np.arange(1,4)
But, how to append n columns of np.arange(1,4) to A?
As mentioned in docs you can use concatenate
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])
Here's another way, using broadcasting:
In [69]: np.arange(1,4)*np.ones((4,1))
Out[69]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
You can get something like what you typed in your question with:
N = 3
A = np.tile(np.arange(1, N+1), (N, 1))
I'm assuming you want a square array?
>>> np.repeat([np.arange(1, 4)], 4, 0)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])