Adding a non-zero scalar to sparse matrix - python

I want to add a value to each non-zero element in my sparse matrix. Can someone give me a method to do that.
y=sparse.csc_matrix((df[column_name].values,(df['user_id'].values, df['anime_id'].values)),shape=(rows, cols))
x=np.random.laplace(0,scale)
y=y+x
The above code is giving me an error.

Offered without comment:
In [166]: from scipy import sparse
In [167]: M = sparse.random(5,5,.2,'csc')
In [168]: M
Out[168]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Column format>
In [169]: M.A
Out[169]:
array([[0.24975586, 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.6863175 , 0. ],
[0.43488131, 0.19245474, 0.26190903, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]])
In [171]: x=np.random.laplace(0,10)
In [172]: x
Out[172]: 0.4773577605565098
In [173]: M+x
Traceback (most recent call last):
Input In [173] in <cell line: 1>
M+x
File /usr/local/lib/python3.8/dist-packages/scipy/sparse/_base.py:464 in __add__
raise NotImplementedError('adding a nonzero scalar to a '
NotImplementedError: adding a nonzero scalar to a sparse matrix is not supported
This is the error message you should have shown initially.
In [174]: M.data += x
In [175]: M.A
Out[175]:
array([[0.72711362, 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 1.16367526, 0. ],
[0.91223907, 0.6698125 , 0.73926679, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]])

Related

Tensorflow padding in map function

I currently have a dataset of nxm tensors that I need to pad to 13xm tensor (n <= 13); however, I cannot figure out how to do this without Tensorflow losing the shape of my tensor.
I am trying to apply the map function to these, but tf.constant cannot accept a tensor as part of the padding specification and because of map's requirement I cannot just use the numpy method.
def pad_tensor(x):
current_length = tf.shape(x)[0]
additional_length = 13 - current_length
padding = tf.constant([[0, additional_length], [0, 0]])
return tf.pad(x, padding, "CONSTANT")
I know I can use py_func but when I do that, tensorflow loses the shape of the data in the dataset.
Any help would be appreciated
PS: I'm not sure exactly what you mean by apply the map function to these and because of map's requirement I cannot just use the numpy method, if you still have problem after following example then please make the definition of the problem more clear
FWIW, add .numpy() and code running without any error:
import tensorflow as tf
import numpy as np
def pad_tensor(x):
current_length = tf.shape(x)[0]
additional_length = 13 - current_length.numpy()
padding = tf.constant([[0, additional_length], [0, 0]])
return tf.pad(x, padding, "CONSTANT")
n = 3
m = 5
x = tf.constant(np.random.rand(n, m))
pad_tensor(x)
Outputs:
<tf.Tensor: shape=(13, 5), dtype=float64, numpy=
array([[0.35710346, 0.49611589, 0.18744049, 0.91046784, 0.19934265],
[0.51464596, 0.96416921, 0.87008494, 0.52756893, 0.23010099],
[0.05335277, 0.88451633, 0.25949178, 0.91156944, 0.03638372],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]])>

Split numpy array into sub-arrays when a sequence has a particular value on either end

I have a numpy array as follows:
array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.00791667, 0. , 0. , 0. , 0. ,
0. , 0.06837452, 0.09166667, 0.00370881, 0. ,
0. , 0.00489809, 0. , 0. , 0. ,
0. , 0. , 0.23888889, 0. , 0.05927778,
0.12138889, 0. , 0. , 0. , 0.36069444,
0.31711111, 0.16333333, 0.15005556, 0.01 , 0.005 ,
0.14357413, 0. , 0.15722222, 0.29494444, 0.3245 ,
0.31276639, 0.095 , 0.04750292, 0.09127039, 0. ,
0.06847222, 0.17 , 0.18039233, 0.21567804, 0.15913079,
0.4579781 , 0. , 0.2459 , 0.14886556, 0.08447222,
0. , 0.13722222, 0.28336984, 0.0725 , 0.077355 ,
0.45166391, 0. , 0.24892933, 0.25360062, 0. ,
0.12923041, 0.16145892, 0.48771795, 0.38527778, 0.29432968,
0.31983305, 1.07573089, 0.30611111, 0. , 0.0216475 ,
0. , 0.62268056, 0.16829156, 0.46239719, 0.6415958 ,
0.02138889, 0.76457155, 0.05711551, 0.35050949, 0.34856278,
0.15686164, 0.23158889, 0.16593262, 0.34961111, 0.21247575,
0.14116667, 0.19414785, 0.09166667, 0.93376627, 0.12772222,
0.00366667, 0.10297222, 0.173 , 0.0381225 , 0.22441667,
0.46686111, 0.18761111, 0.56037889, 0.47566111])
From this array, I need to calculate the area under the curve for each sub-array where the first value is 0, where it goes above 0, and the last number should be the 0 after a non-zero number. Obviously the array lengths will vary. It may also occur that two of these sub-arrays will share a 0 value (the last 0 of the first array will be the fist 0 if the second array).
The expected first two arrays should be:
[0. , 0.00791667, 0. ]
[0. , 0.06837452, 0.09166667, 0.00370881, 0. ]
I've tried and splitting python lists based on a character being equal to 0, but haven't found anything useful. What can I do?
See the code below - I think this is the most efficient you'll be able to do.
First, split the array using the indices of all of the zeroes. Where multiple zeroes are together, this produces several [ 0. ] arrays, so filter those out (based on length, as all arrays must necessarily begin with a zero) to produce C. Finally, since they all begin with zero, but none end with zero, append a zero to each array.
import numpy as np
# <Your array here>
A = np.array(...)
# Split into arrays based on zeroes
B = np.split(A, np.where(A == 0)[0])
# Filter out arrays of length 1
# (just a zero, caused by multiple zeroes together)
f = np.vectorize(lambda a: len(a) > 1)
C = np.extract(f(B), B)
# Append a zero to each array
g = np.vectorize(lambda a: np.append(a, 0), otypes=[object])
D = g(C)
# Output result
for array in D:
print(array)
This gives the following output:
[ 0. 0.00791667 0. ]
[ 0. 0.06837452 0.09166667 0.00370881 0. ]
[ 0. 0.00489809 0. ]
[ 0. 0.23888889 0. ]
[ 0. 0.05927778 0.12138889 0. ]
[ 0. 0.36069444 0.31711111 0.16333333 0.15005556 0.01 0.005
0.14357413 0. ]
[ 0. 0.15722222 0.29494444 0.3245 0.31276639 0.095
0.04750292 0.09127039 0. ]
[ 0. 0.06847222 0.17 0.18039233 0.21567804 0.15913079
0.4579781 0. ]
[ 0. 0.2459 0.14886556 0.08447222 0. ]
[ 0. 0.13722222 0.28336984 0.0725 0.077355 0.45166391
0. ]
[ 0. 0.24892933 0.25360062 0. ]
[ 0. 0.12923041 0.16145892 0.48771795 0.38527778 0.29432968
0.31983305 1.07573089 0.30611111 0. ]
[ 0. 0.0216475 0. ]
[ 0. 0.62268056 0.16829156 0.46239719 0.6415958 0.02138889
0.76457155 0.05711551 0.35050949 0.34856278 0.15686164 0.23158889
0.16593262 0.34961111 0.21247575 0.14116667 0.19414785 0.09166667
0.93376627 0.12772222 0.00366667 0.10297222 0.173 0.0381225
0.22441667 0.46686111 0.18761111 0.56037889 0.47566111 0. ]

How do round off the elements of scipy.sparse.csr matrix to 2 decimal places ?

It can't be converted to numpy array due to memory error.
We can apply np.round to the data attribute of the matrix:
In [34]: from scipy import sparse
In [35]: M = sparse.random(5,5,.2,'csr')
In [36]: M
Out[36]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [37]: M.A
Out[37]:
array([[0. , 0. , 0.28058287, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.81478819, 0. , 0. , 0. ],
[0. , 0. , 0. , 0.06805299, 0.51048128],
[0. , 0. , 0. , 0.64388578, 0. ]])
In [38]: M.data
Out[38]: array([0.28058287, 0.81478819, 0.06805299, 0.51048128, 0.64388578])
In [39]: M.data=np.round(M.data,2)
In [40]: M.data
Out[40]: array([0.28, 0.81, 0.07, 0.51, 0.64])
In [41]: M.A
Out[41]:
array([[0. , 0. , 0.28, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.81, 0. , 0. , 0. ],
[0. , 0. , 0. , 0.07, 0.51],
[0. , 0. , 0. , 0.64, 0. ]])

Padding 1D NumPy array with zeros to form 2D array

I have a numpy array:
arr=np.array([0,1,0,0.5])
I need to form a new array from it as follows, such that every zero elements is repeated thrice and every non-zero element has 2 preceding zeroes, followed by the non-zero number. In short, every element is repeated thrice, zero as it is and non-zero has 2 preceding 0 and then the number itself. It is as follows:
([0,1,0,0.5])=0,0,0, [for index 0]
0,0,1 [for index 1]
0,0,0 [for index 2, which again has a zero] and
0,0,0.5
final output should be:
new_arr=[0,0,0,0,0,1,0,0,0,0,0,0.5]
np.repeat() repeats all the array elements n number of times, but i dont want that exactly. How should this be done? Thanks for the help.
A quick reshape followed by a call to np.pad will do it:
np.pad(arr.reshape(-1, 1), ((0, 0), (2, 0)), 'constant')
Output:
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 1. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0.5]])
You'll want to flatten it back again. That's simply done by calling .reshape(-1, ).
>>> np.pad(arr.reshape(-1, 1), ((0, 0), (2, 0)), 'constant').reshape(-1, )
array([ 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
0.5])
A variant on the pad idea is to concatenate a 2d array of zeros
In [477]: arr=np.array([0,1,0,0.5])
In [478]: np.column_stack([np.zeros((len(arr),2)),arr])
Out[478]:
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 1. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0.5]])
In [479]: _.ravel()
Out[479]:
array([ 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
0.5])
or padding in the other direction:
In [481]: np.vstack([np.zeros((2,len(arr))),arr])
Out[481]:
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 1. , 0. , 0.5]])
In [482]: _.T.ravel()
Out[482]:
array([ 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
0.5])

How to randomly select some non-zero elements from a numpy.ndarray?

I've implemented a matrix factorization model, say R = U*V, and now I would to train and test this model.
To this end, given a sparse matrix R (zero for missing value), I want to first hide some non-zero elements in the training and use these non-zero elements as test set later.
How can I randomly select some non-zero elements from a numpy.ndarray? Besides, I need to remember the index and column position of these selected elements to use these elements in testing.
for example:
In [2]: import numpy as np
In [4]: mtr = np.random.rand(10,10)
In [5]: mtr
Out[5]:
array([[ 0.92685787, 0.95496193, 0.76878455, 0.12304856, 0.13804963,
0.30867502, 0.60245974, 0.00797898, 0.1060602 , 0.98277982],
[ 0.88879888, 0.40209901, 0.35274404, 0.73097713, 0.56238248,
0.380625 , 0.16432029, 0.5383006 , 0.0678564 , 0.42875591],
[ 0.42343761, 0.31957986, 0.5991212 , 0.04898903, 0.2908878 ,
0.13160296, 0.26938537, 0.91442668, 0.72827097, 0.4511198 ],
[ 0.63979934, 0.33421621, 0.09218392, 0.71520048, 0.57100522,
0.37205284, 0.59726293, 0.58224992, 0.58690505, 0.4791199 ],
[ 0.35219557, 0.34954002, 0.93837312, 0.2745864 , 0.89569075,
0.81244084, 0.09661341, 0.80673646, 0.83756759, 0.7948081 ],
[ 0.09173706, 0.86250006, 0.22121994, 0.21097563, 0.55090202,
0.80954817, 0.97159981, 0.95888693, 0.43151554, 0.2265607 ],
[ 0.00723128, 0.95690539, 0.94214806, 0.01721733, 0.12552314,
0.65977765, 0.20845669, 0.44663729, 0.98392716, 0.36258081],
[ 0.65994805, 0.47697842, 0.35449045, 0.73937445, 0.68578224,
0.44278095, 0.86743906, 0.5126411 , 0.75683392, 0.73354572],
[ 0.4814301 , 0.92410622, 0.85267402, 0.44856078, 0.03887269,
0.48868498, 0.83618382, 0.49404473, 0.37328248, 0.18134919],
[ 0.63999748, 0.48718656, 0.54826717, 0.1001681 , 0.1940816 ,
0.3937014 , 0.48768013, 0.70610649, 0.03213063, 0.88371607]])
In [6]: mtr = np.where(mtr>0.5, 0, mtr)
In [7]: %clear
In [8]: mtr
Out[8]:
array([[ 0. , 0. , 0. , 0.12304856, 0.13804963,
0.30867502, 0. , 0.00797898, 0.1060602 , 0. ],
[ 0. , 0.40209901, 0.35274404, 0. , 0. ,
0.380625 , 0.16432029, 0. , 0.0678564 , 0.42875591],
[ 0.42343761, 0.31957986, 0. , 0.04898903, 0.2908878 ,
0.13160296, 0.26938537, 0. , 0. , 0.4511198 ],
[ 0. , 0.33421621, 0.09218392, 0. , 0. ,
0.37205284, 0. , 0. , 0. , 0.4791199 ],
[ 0.35219557, 0.34954002, 0. , 0.2745864 , 0. ,
0. , 0.09661341, 0. , 0. , 0. ],
[ 0.09173706, 0. , 0.22121994, 0.21097563, 0. ,
0. , 0. , 0. , 0.43151554, 0.2265607 ],
[ 0.00723128, 0. , 0. , 0.01721733, 0.12552314,
0. , 0.20845669, 0.44663729, 0. , 0.36258081],
[ 0. , 0.47697842, 0.35449045, 0. , 0. ,
0.44278095, 0. , 0. , 0. , 0. ],
[ 0.4814301 , 0. , 0. , 0.44856078, 0.03887269,
0.48868498, 0. , 0.49404473, 0.37328248, 0.18134919],
[ 0. , 0.48718656, 0. , 0.1001681 , 0.1940816 ,
0.3937014 , 0.48768013, 0. , 0.03213063, 0. ]])
Given such sparse ndarray, how can I select 20% of the non-zero elements and remember their position?
We'll use numpy.random.choice. First, we get arrays of the (i,j) indices where the data is nonzero:
i,j = np.nonzero(x)
Then we'll select 20% of these:
ix = np.random.choice(len(i), int(np.floor(0.2 * len(i))), replace=False)
Here ix is a list of random, unique indices, 20% the length of i and j (the length of i and j is the number of nonzero entries). To recover the indices, we do i[ix] and j[ix], so we can then select 20% of the nonzero entries of x by writing:
print x[i[ix], j[ix]]

Categories