I have a numpy array:
arr=np.array([0,1,0,0.5])
I need to form a new array from it as follows, such that every zero elements is repeated thrice and every non-zero element has 2 preceding zeroes, followed by the non-zero number. In short, every element is repeated thrice, zero as it is and non-zero has 2 preceding 0 and then the number itself. It is as follows:
([0,1,0,0.5])=0,0,0, [for index 0]
0,0,1 [for index 1]
0,0,0 [for index 2, which again has a zero] and
0,0,0.5
final output should be:
new_arr=[0,0,0,0,0,1,0,0,0,0,0,0.5]
np.repeat() repeats all the array elements n number of times, but i dont want that exactly. How should this be done? Thanks for the help.
A quick reshape followed by a call to np.pad will do it:
np.pad(arr.reshape(-1, 1), ((0, 0), (2, 0)), 'constant')
Output:
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 1. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0.5]])
You'll want to flatten it back again. That's simply done by calling .reshape(-1, ).
>>> np.pad(arr.reshape(-1, 1), ((0, 0), (2, 0)), 'constant').reshape(-1, )
array([ 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
0.5])
A variant on the pad idea is to concatenate a 2d array of zeros
In [477]: arr=np.array([0,1,0,0.5])
In [478]: np.column_stack([np.zeros((len(arr),2)),arr])
Out[478]:
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 1. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0.5]])
In [479]: _.ravel()
Out[479]:
array([ 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
0.5])
or padding in the other direction:
In [481]: np.vstack([np.zeros((2,len(arr))),arr])
Out[481]:
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 1. , 0. , 0.5]])
In [482]: _.T.ravel()
Out[482]:
array([ 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. ,
0.5])
Related
I have an array Pe. I want to exclude certain rows mentioned in the list J and ensure the other rows have all zero elements. For example, for Pe[0], J[0]=[0,1] which means 0,1 rows of Pe[0] are to be excluded but 2 row of Pe[0] should contain all zero elements. Similarly, for Pe[1]. But I am getting an error. I also present the expected output.
import numpy as np
Pe = [np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]]),
np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])] #Entry pressure
J = [[0,1],[2]]
for i in range(0,len(Pe)):
out = np.zeros_like(Pe[i])
for j in range(0,len(J)):
out[i][J[j]] = Pe[i][J[j]]
print([out])
The error is
in <module>
out[i][J[j]] = Pe[i][J[j]]
ValueError: shape mismatch: value array of shape (2,12) could not be broadcast to indexing result of shape (2,)
The expected output is
[np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ]]),
np.array([[0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0., 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])]
Using lists and loops in Numpy is often an anti-pattern, and that is the case here. You should be using vectorised operations throughout. J is jagged so you need to reinterpret it as a boolean indexer. Also, Pe should not start with repeated dimensions; it should start as a single two-dimensional array without a list.
import numpy as np
Pe = np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])
J = np.ones((2, Pe.shape[0]), dtype=bool)
J[0, 0:2] = 0
J[1, 2] = 0
Pe_indexed = np.tile(Pe, (J.shape[0], 1, 1))
Pe_indexed[J, :] = 0
Pe_indexed will now be a proper three-dimensional matrix, no lists.
out = []
for arr, ind in zip(Pe, J):
x = np.zeros_like(arr)
x[ind] = arr[ind]
out.append(x)
I have a matrix like this:
profile=np.array([[0,0,0.5,0.1],
[0.3,0,0,0],
[0,0,0.1,0.9],
[0,0,0,0.1],
[0,0.5,0,0]])
And I want to add a row before and after filled with zeros. How can I do that?
I thought of using np.pad but not sure how.
Output should be:
np.array([[0,0,0,0],
[0,0,0.5,0.1],
[0.3,0,0,0],
[0,0,0.1,0.9],
[0,0,0,0.1],
[0,0.5,0,0]
[0,0,0,0]])
The np.pad function allows you to specify the axes you want to pad:
In [3]: np.pad(profile, ((1, 1), (0, 0)))
Out[3]:
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0.5, 0.1],
[0.3, 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9],
[0. , 0. , 0. , 0.1],
[0. , 0.5, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
The nested tuple can be read as: pad 1 array "above", and 1 array "below" axis 0, and pad 0 arrays "above" and 0 arrays "below" axis 1.
Another example, which pads five columns "after" on axis 1:
In [4]: np.pad(profile, ((0, 0), (0, 5)))
Out[4]:
array([[0. , 0. , 0.5, 0.1, 0. , 0. , 0. , 0. , 0. ],
[0.3, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.1, 0. , 0. , 0. , 0. , 0. ],
[0. , 0.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
You can use np.pad:
out = np.pad(profile, 1)[:, 1:-1]
Output:
>>> out
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0.5, 0.1],
[0.3, 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9],
[0. , 0. , 0. , 0.1],
[0. , 0.5, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
Because np.pad pads it on all sides (left and right, in addition to top and bottom), [:, 1:-1] slices off the first and last columns.
I have a numpy array as follows:
array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.00791667, 0. , 0. , 0. , 0. ,
0. , 0.06837452, 0.09166667, 0.00370881, 0. ,
0. , 0.00489809, 0. , 0. , 0. ,
0. , 0. , 0.23888889, 0. , 0.05927778,
0.12138889, 0. , 0. , 0. , 0.36069444,
0.31711111, 0.16333333, 0.15005556, 0.01 , 0.005 ,
0.14357413, 0. , 0.15722222, 0.29494444, 0.3245 ,
0.31276639, 0.095 , 0.04750292, 0.09127039, 0. ,
0.06847222, 0.17 , 0.18039233, 0.21567804, 0.15913079,
0.4579781 , 0. , 0.2459 , 0.14886556, 0.08447222,
0. , 0.13722222, 0.28336984, 0.0725 , 0.077355 ,
0.45166391, 0. , 0.24892933, 0.25360062, 0. ,
0.12923041, 0.16145892, 0.48771795, 0.38527778, 0.29432968,
0.31983305, 1.07573089, 0.30611111, 0. , 0.0216475 ,
0. , 0.62268056, 0.16829156, 0.46239719, 0.6415958 ,
0.02138889, 0.76457155, 0.05711551, 0.35050949, 0.34856278,
0.15686164, 0.23158889, 0.16593262, 0.34961111, 0.21247575,
0.14116667, 0.19414785, 0.09166667, 0.93376627, 0.12772222,
0.00366667, 0.10297222, 0.173 , 0.0381225 , 0.22441667,
0.46686111, 0.18761111, 0.56037889, 0.47566111])
From this array, I need to calculate the area under the curve for each sub-array where the first value is 0, where it goes above 0, and the last number should be the 0 after a non-zero number. Obviously the array lengths will vary. It may also occur that two of these sub-arrays will share a 0 value (the last 0 of the first array will be the fist 0 if the second array).
The expected first two arrays should be:
[0. , 0.00791667, 0. ]
[0. , 0.06837452, 0.09166667, 0.00370881, 0. ]
I've tried and splitting python lists based on a character being equal to 0, but haven't found anything useful. What can I do?
See the code below - I think this is the most efficient you'll be able to do.
First, split the array using the indices of all of the zeroes. Where multiple zeroes are together, this produces several [ 0. ] arrays, so filter those out (based on length, as all arrays must necessarily begin with a zero) to produce C. Finally, since they all begin with zero, but none end with zero, append a zero to each array.
import numpy as np
# <Your array here>
A = np.array(...)
# Split into arrays based on zeroes
B = np.split(A, np.where(A == 0)[0])
# Filter out arrays of length 1
# (just a zero, caused by multiple zeroes together)
f = np.vectorize(lambda a: len(a) > 1)
C = np.extract(f(B), B)
# Append a zero to each array
g = np.vectorize(lambda a: np.append(a, 0), otypes=[object])
D = g(C)
# Output result
for array in D:
print(array)
This gives the following output:
[ 0. 0.00791667 0. ]
[ 0. 0.06837452 0.09166667 0.00370881 0. ]
[ 0. 0.00489809 0. ]
[ 0. 0.23888889 0. ]
[ 0. 0.05927778 0.12138889 0. ]
[ 0. 0.36069444 0.31711111 0.16333333 0.15005556 0.01 0.005
0.14357413 0. ]
[ 0. 0.15722222 0.29494444 0.3245 0.31276639 0.095
0.04750292 0.09127039 0. ]
[ 0. 0.06847222 0.17 0.18039233 0.21567804 0.15913079
0.4579781 0. ]
[ 0. 0.2459 0.14886556 0.08447222 0. ]
[ 0. 0.13722222 0.28336984 0.0725 0.077355 0.45166391
0. ]
[ 0. 0.24892933 0.25360062 0. ]
[ 0. 0.12923041 0.16145892 0.48771795 0.38527778 0.29432968
0.31983305 1.07573089 0.30611111 0. ]
[ 0. 0.0216475 0. ]
[ 0. 0.62268056 0.16829156 0.46239719 0.6415958 0.02138889
0.76457155 0.05711551 0.35050949 0.34856278 0.15686164 0.23158889
0.16593262 0.34961111 0.21247575 0.14116667 0.19414785 0.09166667
0.93376627 0.12772222 0.00366667 0.10297222 0.173 0.0381225
0.22441667 0.46686111 0.18761111 0.56037889 0.47566111 0. ]
Let W be some matrix of dimension (x, nP) [see end of question]
Right now, I'm doing the following code:
uUpperDraw = np.zeros(W.shape)
for p in np.arange(0, nP):
uUpperDraw[s, p] = (W[s+1,:(p+1)]).sum()
I want to vectorize this for efficiency gains. Given a pGrid = [0, 1, ...], how can I reproduce the following?
uUpperDraw = np.array([sum(W[x, 0]), sum(W[x,0] + W[x, 1]), sum(W[x,0] + W[x, 1] + W[x, 2]) ...
Here is some reproducible example.
>>> s, nP
(3, 10)
>>> W
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 2. , 1.63636364, 1.38461538, 1.2 , 1.05882353,
0.94736842, 0.85714286, 0.7826087 , 0.72 , 0.66666667]])
>>> uUpperDraw
array([[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 2. , 3.63636364, 5.02097902, 6.22097902,
7.27980255, 8.22717097, 9.08431383, 9.86692252,
10.58692252, 11.25358919],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ]])
This looks like the cumulative sum. When you want to have the cumulative sum for each row seperately this here works
uUpperDraw = np.cumsum(W,axis=1)
I've implemented a matrix factorization model, say R = U*V, and now I would to train and test this model.
To this end, given a sparse matrix R (zero for missing value), I want to first hide some non-zero elements in the training and use these non-zero elements as test set later.
How can I randomly select some non-zero elements from a numpy.ndarray? Besides, I need to remember the index and column position of these selected elements to use these elements in testing.
for example:
In [2]: import numpy as np
In [4]: mtr = np.random.rand(10,10)
In [5]: mtr
Out[5]:
array([[ 0.92685787, 0.95496193, 0.76878455, 0.12304856, 0.13804963,
0.30867502, 0.60245974, 0.00797898, 0.1060602 , 0.98277982],
[ 0.88879888, 0.40209901, 0.35274404, 0.73097713, 0.56238248,
0.380625 , 0.16432029, 0.5383006 , 0.0678564 , 0.42875591],
[ 0.42343761, 0.31957986, 0.5991212 , 0.04898903, 0.2908878 ,
0.13160296, 0.26938537, 0.91442668, 0.72827097, 0.4511198 ],
[ 0.63979934, 0.33421621, 0.09218392, 0.71520048, 0.57100522,
0.37205284, 0.59726293, 0.58224992, 0.58690505, 0.4791199 ],
[ 0.35219557, 0.34954002, 0.93837312, 0.2745864 , 0.89569075,
0.81244084, 0.09661341, 0.80673646, 0.83756759, 0.7948081 ],
[ 0.09173706, 0.86250006, 0.22121994, 0.21097563, 0.55090202,
0.80954817, 0.97159981, 0.95888693, 0.43151554, 0.2265607 ],
[ 0.00723128, 0.95690539, 0.94214806, 0.01721733, 0.12552314,
0.65977765, 0.20845669, 0.44663729, 0.98392716, 0.36258081],
[ 0.65994805, 0.47697842, 0.35449045, 0.73937445, 0.68578224,
0.44278095, 0.86743906, 0.5126411 , 0.75683392, 0.73354572],
[ 0.4814301 , 0.92410622, 0.85267402, 0.44856078, 0.03887269,
0.48868498, 0.83618382, 0.49404473, 0.37328248, 0.18134919],
[ 0.63999748, 0.48718656, 0.54826717, 0.1001681 , 0.1940816 ,
0.3937014 , 0.48768013, 0.70610649, 0.03213063, 0.88371607]])
In [6]: mtr = np.where(mtr>0.5, 0, mtr)
In [7]: %clear
In [8]: mtr
Out[8]:
array([[ 0. , 0. , 0. , 0.12304856, 0.13804963,
0.30867502, 0. , 0.00797898, 0.1060602 , 0. ],
[ 0. , 0.40209901, 0.35274404, 0. , 0. ,
0.380625 , 0.16432029, 0. , 0.0678564 , 0.42875591],
[ 0.42343761, 0.31957986, 0. , 0.04898903, 0.2908878 ,
0.13160296, 0.26938537, 0. , 0. , 0.4511198 ],
[ 0. , 0.33421621, 0.09218392, 0. , 0. ,
0.37205284, 0. , 0. , 0. , 0.4791199 ],
[ 0.35219557, 0.34954002, 0. , 0.2745864 , 0. ,
0. , 0.09661341, 0. , 0. , 0. ],
[ 0.09173706, 0. , 0.22121994, 0.21097563, 0. ,
0. , 0. , 0. , 0.43151554, 0.2265607 ],
[ 0.00723128, 0. , 0. , 0.01721733, 0.12552314,
0. , 0.20845669, 0.44663729, 0. , 0.36258081],
[ 0. , 0.47697842, 0.35449045, 0. , 0. ,
0.44278095, 0. , 0. , 0. , 0. ],
[ 0.4814301 , 0. , 0. , 0.44856078, 0.03887269,
0.48868498, 0. , 0.49404473, 0.37328248, 0.18134919],
[ 0. , 0.48718656, 0. , 0.1001681 , 0.1940816 ,
0.3937014 , 0.48768013, 0. , 0.03213063, 0. ]])
Given such sparse ndarray, how can I select 20% of the non-zero elements and remember their position?
We'll use numpy.random.choice. First, we get arrays of the (i,j) indices where the data is nonzero:
i,j = np.nonzero(x)
Then we'll select 20% of these:
ix = np.random.choice(len(i), int(np.floor(0.2 * len(i))), replace=False)
Here ix is a list of random, unique indices, 20% the length of i and j (the length of i and j is the number of nonzero entries). To recover the indices, we do i[ix] and j[ix], so we can then select 20% of the nonzero entries of x by writing:
print x[i[ix], j[ix]]