How can I join two numpy ndarrays to accomplish the following in a fast way, using optimized numpy, without any looping?
>>> a = np.random.rand(2,2)
>>> a
array([[ 0.09028802, 0.2274419 ],
[ 0.35402772, 0.87834376]])
>>> b = np.random.rand(2,2)
>>> b
array([[ 0.4776325 , 0.73690098],
[ 0.69181444, 0.672248 ]])
>>> c = ???
>>> c
array([[ 0.09028802, 0.2274419, 0.4776325 , 0.73690098],
[ 0.09028802, 0.2274419, 0.69181444, 0.672248 ],
[ 0.35402772, 0.87834376, 0.4776325 , 0.73690098],
[ 0.35402772, 0.87834376, 0.69181444, 0.672248 ]])
Not the prettiest, but you could combine hstack, repeat, and tile:
>>> a = np.arange(4).reshape(2,2)
>>> b = a+10
>>> a
array([[0, 1],
[2, 3]])
>>> b
array([[10, 11],
[12, 13]])
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0, 1, 10, 11],
[ 0, 1, 12, 13],
[ 2, 3, 10, 11],
[ 2, 3, 12, 13]])
Or for a 3x3 case:
>>> a = np.arange(9).reshape(3,3)
>>> b = a+10
>>> np.hstack([np.repeat(a,len(a),0),np.tile(b,(len(b),1))])
array([[ 0, 1, 2, 10, 11, 12],
[ 0, 1, 2, 13, 14, 15],
[ 0, 1, 2, 16, 17, 18],
[ 3, 4, 5, 10, 11, 12],
[ 3, 4, 5, 13, 14, 15],
[ 3, 4, 5, 16, 17, 18],
[ 6, 7, 8, 10, 11, 12],
[ 6, 7, 8, 13, 14, 15],
[ 6, 7, 8, 16, 17, 18]])
What you want is, apparently, the cartesian product of a and b, stacked horizontally. You can use the itertools module to generate the indices for the numpy arrays, then numpy.hstack to stack them:
import numpy as np
from itertools import product
a = np.array([[ 0.09028802, 0.2274419 ],
[ 0.35402772, 0.87834376]])
b = np.array([[ 0.4776325 , 0.73690098],
[ 0.69181444, 0.672248 ],
[ 0.79941110, 0.52273 ]])
a_inds, b_inds = map(list, zip(*product(range(len(a)), range(len(b)))))
c = np.hstack((a[a_inds], b[b_inds]))
This results in a c of:
array([[ 0.09028802, 0.2274419 , 0.4776325 , 0.73690098],
[ 0.09028802, 0.2274419 , 0.69181444, 0.672248 ],
[ 0.09028802, 0.2274419 , 0.7994111 , 0.52273 ],
[ 0.35402772, 0.87834376, 0.4776325 , 0.73690098],
[ 0.35402772, 0.87834376, 0.69181444, 0.672248 ],
[ 0.35402772, 0.87834376, 0.7994111 , 0.52273 ]])
Breaking down the indices thing:
product(range(len(a)), range(len(b)) will generate something that looks like this if you convert it to a list:
[(0, 0), (0, 1), (1, 0), (1, 1)]
You want something like this: [0, 0, 1, 1], [0, 1, 0, 1], so you need to transpose the generator. The idiomatic way to do this is with zip(*zipped_thing). However, if you just directly assign these, you'll get tuples, like this:
[(0, 0, 1, 1), (0, 1, 0, 1)]
But numpy arrays interpret tuples as multi-dimensional indexes, so you want to turn them to lists, which is why I mapped the list constructor onto the result of the product function.
Let's walk through a prospective solution to handle generic cases involving different shaped arrays with some inlined comments to explain the method involved.
(1) First off, we store shapes of input arrays.
ma,na = a.shape
mb,nb = b.shape
(2) Next up, initialize a 3D array with number of columns being the sum of number of columns in input arraysa and b. Use np.empty for this task.
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
(3) Then, set the first axis of the 3D array for the first "na" columns with the rows from a with a[:,None,:]. So, if we assign it to out[:,:,:na], that second colon would indicate to NumPy that we need a broadcasted setting, if possible as always happens with singleton dims in NumPy arrays. In effect, this would be same as tiling/repeating, but possibly in an efficient way.
out[:,:,:na] = a[:,None,:]
(4) Repeat for setting elements from b into output array. This time we would broadcast along the first axis of out with out[:,:,na:], with that first colon helping us do that broadcasting.
out[:,:,na:] = b
(5) Final step is to reshape the output to a 2D shape. This could be done with simply changing the shape with the required 2D shape tuple. Reshaping just changes view and is effectively zero cost.
out.shape = (ma*mb,na+nb)
Condensing everything, the full implementation would look like this -
ma,na = a.shape
mb,nb = b.shape
out = np.empty((ma,mb,na+nb),dtype=a.dtype)
out[:,:,:na] = a[:,None,:]
out[:,:,na:] = b
out.shape = (ma*mb,na+nb)
You can use dstack() and broadcast_arrays():
import numpy as np
a = np.random.randint(0, 10, (3, 2))
b = np.random.randint(10, 20, (4, 2))
np.dstack(np.broadcast_arrays(a[:, None], b)).reshape(-1, a.shape[-1] + b.shape[-1])
Try either np.hstack or np.vstack. This would work even for arrays that are not the same length. All you would need to do is this:
np.hstack(appendedarray[:]) or np.vstack(appendedarray[:])
All arrays are indexable, so you can merge the by just calling:
a[:2],b[:2]
or you can use core numpy stacking functions, should look something like this:
c = np.vstack(a,b)
Related
I am translating a J language code into Python, but the way of python's apply function seems little unclear to me...
I currently have a (3, 3, 2) matrix A, and a (3, 3) matrix B.
I want to divide each matrix in A by rows in B:
A = np.arange(1,19).reshape(3,3,2)
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]]])
B = np.arange(1,10).reshape(3,3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
That is the result would be like
1 2
1.5 2
1.66667 2
1.75 2
1.8 2
1.83333 2
1.85714 2
1.875 2
1.88889 2
for the first matrix of the result, the way I want to compute is the following:
1/1 2/1
3/2 4/2
5/3 6/3
I have tried
np.apply_along_axis(np.divide,1,A,B)
but it says
operands could not be broadcast together with shapes (10,) (10,10,2)
Any advice?
Thank you in advance = ]
ps. the J code is
A %"2 1 B
This means "divide each matrix("2) from A by each row ("1) from B"
or just simply
A % B
Broadcasting works if the trailing dimensions match or are one! So we can basically add a dummy dimension!
import numpy as np
A = np.arange(1,19).reshape(3,3,2)
B = np.arange(1,10).reshape(3,3)
B = B[...,np.newaxis] # This adds new dummy dimension in the end, B's new shape is (3,3,1)
A/B
array([[[1. , 2. ],
[1.5 , 2. ],
[1.66666667, 2. ]],
[[1.75 , 2. ],
[1.8 , 2. ],
[1.83333333, 2. ]],
[[1.85714286, 2. ],
[1.875 , 2. ],
[1.88888889, 2. ]]])
For a numpy array X, the location of its element X[k[0], ..., k[d-1]] is offset from the location of X[0,..., 0] by k[0]*s[0] + ... + k[d-1]*s[d-1], where (s[0],...,s[d-1]) is the tuple representing X.strides.
As far as I understand nothing in numpy array specs requires that distinct indexes of array X correspond to distinct addresses in memory, the simplest instance of this being a zero value of the stride, e.g. see advanced NumPy section of scipy lectures.
Does the numpy have a built-in predicate to test if the strides and the shape are such that distinct indexes map to distinct memory addresses?
If not, how does one write one, preferably so as to avoid sorting of the strides?
edit: It took me a bit to figure what you are asking about. With striding tricks it's possible to index the same element in a databuffer in different ways, and broadcasting actually does this under the covers. Normally we don't worry about it because it is either hidden or intentional.
Recreating in the strided mapping and looking for duplicates may be the only way to test this. I'm not aware of any existing function that checks it.
==================
I'm not quite sure what you concerned with. But let me illustrate how shape and strides work
Define a 3x4 array:
In [453]: X=np.arange(12).reshape(3,4)
In [454]: X.shape
Out[454]: (3, 4)
In [455]: X.strides
Out[455]: (16, 4)
Index an item
In [456]: X[1,2]
Out[456]: 6
I can get it's index in a flattened version of the array (e.g. the original arange) with ravel_multi_index:
In [457]: np.ravel_multi_index((1,2),X.shape)
Out[457]: 6
I can also get this location using strides - keeping mind that strides are in bytes (here 4 bytes per item)
In [458]: 1*16+2*4
Out[458]: 24
In [459]: (1*16+2*4)/4
Out[459]: 6.0
All these numbers are relative to the start of the data buffer. We can get the data buffer address from X.data or X.__array_interface__['data'], but usually don't need to.
So this strides tells us that to go from entry to the next, step 4 bytes, and to go from one row to the next step 16. 6 is located at one row down, 2 over, or 24 bytes into the buffer.
In the as_strided example of your link, strides=(1*2, 0) produces repeated indexing of specific values.
With my X:
In [460]: y=np.lib.stride_tricks.as_strided(X,strides=(16,0), shape=(3,4))
In [461]: y
Out[461]:
array([[0, 0, 0, 0],
[4, 4, 4, 4],
[8, 8, 8, 8]])
y is a 3x4 that repeatedly indexes the 1st column of X.
Changing one item in y ends up changing one value in X but a whole row in y:
In [462]: y[1,2]=10
In [463]: y
Out[463]:
array([[ 0, 0, 0, 0],
[10, 10, 10, 10],
[ 8, 8, 8, 8]])
In [464]: X
Out[464]:
array([[ 0, 1, 2, 3],
[10, 5, 6, 7],
[ 8, 9, 10, 11]])
as_strided can produce some weird effects if you aren't careful.
OK, maybe I've figured out what's bothering you - can I identify a situation like this where two different indexing tuples end up pointing to the same location in the data buffer? Not that I'm aware of. That y strides contains a 0 is a pretty good indicator.
as_stridedis often used to create overlapping windows:
In [465]: y=np.lib.stride_tricks.as_strided(X,strides=(8,4), shape=(3,4))
In [466]: y
Out[466]:
array([[ 0, 1, 2, 3],
[ 2, 3, 10, 5],
[10, 5, 6, 7]])
In [467]: y[1,2]=20
In [469]: y
Out[469]:
array([[ 0, 1, 2, 3],
[ 2, 3, 20, 5],
[20, 5, 6, 7]])
Again changing 1 item in y ends up changing 2 values in y, but only 1 in X.
Ordinary array creation and indexing does not have this duplicate indexing issue. Broadcasting may do something like, under the cover, where a (4,) array is changed to (1,4) and then to (3,4), effectively replicating rows. I think there's another stride_tricks function that does this explicitly.
In [475]: x,y=np.lib.stride_tricks.broadcast_arrays(X,np.array([.1,.2,.3,.4]))
In [476]: x
Out[476]:
array([[ 0, 1, 2, 3],
[20, 5, 6, 7],
[ 8, 9, 10, 11]])
In [477]: y
Out[477]:
array([[ 0.1, 0.2, 0.3, 0.4],
[ 0.1, 0.2, 0.3, 0.4],
[ 0.1, 0.2, 0.3, 0.4]])
In [478]: y.strides
Out[478]: (0, 8)
In any case, in normal array use we don't have to worry about this ambiguity. We get it only with intentional actions, not accidental ones.
==============
How about this for a test:
def dupstrides(x):
uniq={sum(s*j for s,j in zip(x.strides,i)) for i in np.ndindex(x.shape)}
print(uniq)
print(len(uniq))
print(x.size)
return len(uniq)<x.size
In [508]: dupstrides(X)
{0, 32, 4, 36, 8, 40, 12, 44, 16, 20, 24, 28}
12
12
Out[508]: False
In [509]: dupstrides(y)
{0, 4, 8, 12, 16, 20, 24, 28}
8
12
Out[509]: True
It turns out this test is already implemented in numpy, see mem_overlap.c:842.
The test is exposed as numpy.core.multiarray_tests.internal_overlap(x).
Example:
>>> import numpy as np
>>> from numpy.core.multiarray_tests import internal_overlap
>>> from numpy.lib.stride_tricks import as_strided
Now, create a contiguous array, and use as_strided to create an array with internal overlapping, and confirm this with the testing:
>>> x = np.arange(3*4, dtype=np.float64).reshape((3,4))
>>> y = as_strided(x, shape=(5,4), strides=(16, 8))
>>> y
array([[ 0., 1., 2., 3.],
[ 2., 3., 4., 5.],
[ 4., 5., 6., 7.],
[ 6., 7., 8., 9.],
[ 8., 9., 10., 11.]])
>>> internal_overlap(x)
False
>>> internal_overlap(y)
True
The function is optimized to quickly returns False for Fortran- or C- contiguous arrays.
How can I read a Numpy array from a string? Take a string like:
"[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
and convert it to an array:
a = from_string("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
where a becomes the object: np.array([[0.5544, 0.4456], [0.8811, 0.1189]]).
I'm looking for a very simple interface. A way to convert 2D arrays (of floats) to a string and then a way to read them back to reconstruct the array:
arr_to_string(array([[0.5544, 0.4456], [0.8811, 0.1189]])) should return "[[ 0.5544 0.4456], [ 0.8811 0.1189]]".
string_to_arr("[[ 0.5544 0.4456], [ 0.8811 0.1189]]") should return the object array([[0.5544, 0.4456], [0.8811, 0.1189]]).
Ideally arr_to_string would have a precision parameter that controlled the precision of floating points converted to strings, so that you wouldn't get entries like 0.4444444999999999999999999.
There's nothing I can find in the NumPy docs that does this both ways. np.save lets you make a string but then there's no way to load it back in (np.load only works for files).
The challenge is to save not only the data buffer, but also the shape and dtype. np.fromstring reads the data buffer, but as a 1d array; you have to get the dtype and shape from else where.
In [184]: a=np.arange(12).reshape(3,4)
In [185]: np.fromstring(a.tostring(),int)
Out[185]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [186]: np.fromstring(a.tostring(),a.dtype).reshape(a.shape)
Out[186]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
A time honored mechanism to save Python objects is pickle, and numpy is pickle compliant:
In [169]: import pickle
In [170]: a=np.arange(12).reshape(3,4)
In [171]: s=pickle.dumps(a*2)
In [172]: s
Out[172]: "cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\nI4\ntp6\ncnumpy\ndtype\np7\n(S'i4'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x04\\x00\\x00\\x00\\x06\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\n\\x00\\x00\\x00\\x0c\\x00\\x00\\x00\\x0e\\x00\\x00\\x00\\x10\\x00\\x00\\x00\\x12\\x00\\x00\\x00\\x14\\x00\\x00\\x00\\x16\\x00\\x00\\x00'\np13\ntp14\nb."
In [173]: pickle.loads(s)
Out[173]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
There's a numpy function that can read the pickle string:
In [181]: np.loads(s)
Out[181]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
You mentioned np.save to a string, but that you can't use np.load. A way around that is to step further into the code, and use np.lib.npyio.format.
In [174]: import StringIO
In [175]: S=StringIO.StringIO() # a file like string buffer
In [176]: np.lib.npyio.format.write_array(S,a*3.3)
In [177]: S.seek(0) # rewind the string
In [178]: np.lib.npyio.format.read_array(S)
Out[178]:
array([[ 0. , 3.3, 6.6, 9.9],
[ 13.2, 16.5, 19.8, 23.1],
[ 26.4, 29.7, 33. , 36.3]])
The save string has a header with dtype and shape info:
In [179]: S.seek(0)
In [180]: S.readlines()
Out[180]:
["\x93NUMPY\x01\x00F\x00{'descr': '<f8', 'fortran_order': False, 'shape': (3, 4), } \n",
'\x00\x00\x00\x00\x00\x00\x00\x00ffffff\n',
'#ffffff\x1a#\xcc\xcc\xcc\xcc\xcc\xcc##ffffff*#\x00\x00\x00\x00\x00\x800#\xcc\xcc\xcc\xcc\xcc\xcc3#\x99\x99\x99\x99\x99\x197#ffffff:#33333\xb3=#\x00\x00\x00\x00\x00\x80##fffff&B#']
If you want a human readable string, you might try json.
In [196]: import json
In [197]: js=json.dumps(a.tolist())
In [198]: js
Out[198]: '[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]'
In [199]: np.array(json.loads(js))
Out[199]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Going to/from the list representation of the array is the most obvious use of json. Someone may have written a more elaborate json representation of arrays.
You could also go the csv format route - there have been lots of questions about reading/writing csv arrays.
'[[ 0.5544 0.4456], [ 0.8811 0.1189]]'
is a poor string representation for this purpose. It does look a lot like the str() of an array, but with , instead of \n. But there isn't a clean way of parsing the nested [], and the missing delimiter is a pain. If it consistently uses , then json can convert it to list.
np.matrix accepts a MATLAB like string:
In [207]: np.matrix(' 0.5544, 0.4456;0.8811, 0.1189')
Out[207]:
matrix([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
In [208]: str(np.matrix(' 0.5544, 0.4456;0.8811, 0.1189'))
Out[208]: '[[ 0.5544 0.4456]\n [ 0.8811 0.1189]]'
Forward to string:
import numpy as np
def array2str(arr, precision=None):
s=np.array_str(arr, precision=precision)
return s.replace('\n', ',')
Backward to array:
import re
import ast
import numpy as np
def str2array(s):
# Remove space after [
s=re.sub('\[ +', '[', s.strip())
# Replace commas and spaces
s=re.sub('[,\s]+', ', ', s)
return np.array(ast.literal_eval(s))
If you use repr() to convert array to string, the conversion will be trivial.
I'm not sure there's an easy way to do this if you don't have commas between the numbers in your inner lists, but if you do, then you can use ast.literal_eval:
import ast
import numpy as np
s = '[[ 0.5544, 0.4456], [ 0.8811, 0.1189]]'
np.array(ast.literal_eval(s))
array([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
EDIT: I haven't tested it very much, but you could use re to insert commas where you need them:
import re
s1 = '[[ 0.5544 0.4456], [ 0.8811 -0.1189]]'
# Replace spaces between numbers with commas:
s2 = re.sub('(\d) +(-|\d)', r'\1,\2', s1)
s2
'[[ 0.5544,0.4456], [ 0.8811,-0.1189]]'
and then hand on to ast.literal_eval:
np.array(ast.literal_eval(s2))
array([[ 0.5544, 0.4456],
[ 0.8811, -0.1189]])
(you need to be careful to match spaces between digits but also spaces between a digit an a minus sign).
In my case I found following command helpful for dumping:
string = str(array.tolist())
And for reloading:
array = np.array( eval(string) )
This should work for any dimensionality of numpy array.
numpy.fromstring() allows you to easily create 1D arrays from a string. Here's a simple function to create a 2D numpy array from a string:
import numpy as np
def str2np(strArray):
lItems = []
width = None
for line in strArray.split("\n"):
lParts = line.split()
n = len(lParts)
if n==0:
continue
if width is None:
width = n
else:
assert n == width, "invalid array spec"
lItems.append([float(str) for str in lParts])
return np.array(lItems)
Usage:
X = str2np("""
-2 2
-1 3
0 1
1 1
2 -1
""")
print(f"X = {X}")
Output:
X = [[-2. 2.]
[-1. 3.]
[ 0. 1.]
[ 1. 1.]
[ 2. -1.]]
I have a numpy array consisting of a lot of 0s and a few non-zero entries e.g. like this (just a toy example):
myArray = np.array([[ 0. , 0. , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
Now I would like to move each of the non-zero entries with a given probability which means that some of the entries are moved, some might remain at the current position. Some of the rows are not allowed to contain a non-zero entry which means that values are not allowed to be moved there. I implemented that as follows:
import numpy as np
# for reproducibility
np.random.seed(2)
myArray = np.array([[ 0. , 0. , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
# list of rows where numbers are not allowed to be moved to
ignoreRows = [2]
# moving probability
probMove = 0.3
# get non-zero entries
nzEntries = np.nonzero(myArray)
# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0], nzEntries[1])
# store values
valNZ = [myArray[i] for i in indNZ]
# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(nzEntries))
allowedRows = [ind for ind in xrange(myArray.shape[0]) if ind not in ignoreRows] # replace by "range" in python 3.x
allowedCols = [ind for ind in xrange(myArray.shape[1])] # replace by "range" in python 3.x
for indProb, prob in enumerate(lProb):
# only move with a certain probability
if prob <= probMove:
# randomly change position
myArray[np.random.choice(allowedRows), np.random.choice(allowedCols)] = valNZ[indProb]
# set old position to zero
myArray[indNZ[indProb]] = 0.
print myArray
First, I determine all the indices and values of the non-zero entries. Then I assign a certain probability to each of these entries which determines whether the entry will be moved. Then I get the allowed target rows.
In the second step, I loop through the list of indices and move them according to their moving probability which is done by choosing from the allowed rows and columns, assigning the respective value to these new indices and set the "old" value to 0.
It works fine with the code above, however, speed really matters in this case and I wonder whether there is a more efficient way of doing this.
EDIT:
Hpaulj's answer helped me to get rid of the for-loop which is nice and the reason why I accepted his answer. I incorporated his comments and posted an answer below as well, just in case someone else stumbles over this example and wonders how I used his answer in the end.
You can index elements with arrays, so:
valNZ=myArray[nzEntries]
can replace the zip and comprehension.
Simplify these 2 assignments:
allowedCols=np.arange(myArray.shape[1]);
allowedRows=np.delete(np.arange(myArray.shape[0]), ignoreRows)
With:
I=lProb<probMove; valNZ=valNZ[I];indNZ=indNZ[I]
you don't need to perform the prog<probMove test each time in the loop; just iterate over valNZ and indNZ.
I think your random.choice can be generated for all of these valNZ at once:
np.random.choice(np.arange(10), 10, True)
# 10 choices from the range with replacement
With that it should be possible to move all of the points without a loop.
I haven't worked out the details yet.
There is one way in which your iterative move will be different from any parallel one. If a destination choice is another value, the iterative approach can over write, and possibly move a given value a couple of times. Parallel code will not perform the sequential moves. You have to decide whether one is correct or not.
There is a ufunc method, .at, that performs unbuffered operations. It works for operations like add, but I don't know if would apply to an indexing move like this.
simplified version of the iterative moving:
In [106]: arr=np.arange(20).reshape(4,5)
In [107]: I=np.nonzero(arr>10)
In [108]: v=arr[I]
In [109]: rows,cols=np.arange(4),np.arange(5)
In [110]: for i in range(len(v)):
dest=(np.random.choice(rows),np.random.choice(cols))
arr[dest]=v[i]
arr[I[0][i],I[1][i]] = 0
In [111]: arr
Out[111]:
array([[ 0, 18, 2, 14, 11],
[ 5, 16, 7, 13, 19],
[10, 0, 0, 0, 0],
[ 0, 17, 0, 0, 0]])
possible vectorized version:
In [117]: dest=(np.random.choice(rows,len(v),True),np.random.choice(cols,len(v),True))
In [118]: dest
Out[118]: (array([1, 1, 3, 1, 3, 2, 3, 0, 0]), array([3, 0, 0, 1, 2, 3, 4, 0, 1]))
In [119]: arr[dest]
Out[119]: array([ 8, 5, 15, 6, 17, 13, 19, 0, 1])
In [120]: arr[I]=0
In [121]: arr[dest]=v
In [122]: arr
Out[122]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 16, 0],
[13, 0, 15, 0, 17]])
If I sets 0 after, there are more zeros.
In [124]: arr[dest]=v
In [125]: arr[I]=0
In [126]: arr
Out[126]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]])
same dest, but done iteratively:
In [129]: for i in range(len(v)):
.....: arr[dest[0][i],dest[1][i]] = v[i]
.....: arr[I[0][i],I[1][i]] = 0
In [130]: arr
Out[130]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 16, 0],
[ 0, 0, 0, 0, 0]])
With this small size, and high moving density, the differences between iterative and vectorized solutions are large. For a sparse array they would be fewer.
Below you can find the code I came up with after incorporating hpaulj's answer and the answer from this question. This way, I got rid of the for-loop which improved the code a lot. Therefore, I accepted hpaulj's answer. Maybe the code below helps someone else in a similar situation.
import numpy as np
from itertools import compress
# for reproducibility
np.random.seed(2)
myArray = np.array([[ 0. , 0.2 , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
# list of rows where numbers are not allowed to be moved to
ignoreRows= []
# moving probability
probMove = 0.5
# get non-zero entries
nzEntries = np.nonzero(myArray)
# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0],nzEntries[1])
# store values
valNZ = myArray[nzEntries]
# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(valNZ))
# get the rows/columns where the entries are allowed to be moved
allowedCols = np.arange(myArray.shape[1]);
allowedRows = np.delete(np.arange(myArray.shape[0]), ignoreRows)
# get the entries that are actually moved
I = lProb < probMove
print I
# get the values of the entries that are moved
valNZ = valNZ[I]
# get the indices of the entries that are moved
indNZ = list(compress(indNZ, I))
# get the destination for the entries that are moved
dest = (np.random.choice(allowedRows, len(valNZ), True), np.random.choice(allowedCols, len(valNZ), True))
print myArray
print indNZ
print dest
# set the old indices to 0
myArray[zip(*indNZ)] = 0
# move the values to their respective destination
myArray[dest] = valNZ
print myArray
I have a 1D array in NumPy that implicitly represents some 2D data in row-major order. Here's a trivial example:
import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])
I want to get a 1D array in column-major order (ie. b = [1,5,2,6,3,7,4,8] in the example above).
Normally, I would just do the following:
mat = np.reshape(a, (-1,4))
b = mat.flatten('F')
Unfortunately, the length of my input array is not an exact multiple of the row length I want (ie. a = [1,2,3,4,5,6,7]), so I can't call reshape. I want to keep that extra data, though, which might be quite a lot since my rows are pretty long. Is there any straightforward way to do this in NumPy?
The simplest way I can think of is not to try and use reshape with methods such as ravel('F'), but just to concatenate sliced views of your array.
For example:
>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])
This works for any length of array and any number of columns:
>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Alternatively, use as_strided to reshape. The fact that the array a is too small to fit the (2, 4) shape doesn't matter: you'll just get junk (i.e. whatever's in memory) in the last place:
>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 168430121]])
>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])
In the general case, given an array b and a desired number of columns cols you can do this:
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
This unravels the "good" part of the array (those columns not containing junk values) and the bad part (except for the junk values which lie in the bottom row) and concatenates the two unraveled arrays. For example:
>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Use some value to represent null to make the array be a multiple of how you want to split it. If casting to float is acceptable, you could use nan's to represent the added elements that represent nulls. Then reshape to 2D, call transpose, and reshape to 1D. Then eliminate the nulls.
import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,) # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)] # remove NaN
print d
[ 1. 5. 2. 6. 3. 7. 4.]