is it possible, in a fast way, to create a (large) 2d numpy array which
contains a value n times per row (randomly placed). e.g., for n = 3
1 0 1 0 1
0 0 1 1 1
1 1 1 0 0
...
same as 1., but place groups of that size n randomly per row. e.g.
1 1 1 0 0
0 0 1 1 1
1 1 1 0 0
...
of course, I could enumerate all rows, but I am wondering if there's a way to create the array using np.fromfunctionor some faster way?
The answer to your first question has a simple one-line solution, which I imagine is pretty efficient. Functions like np.random.shuffle or np.random.permutation must be doing something similar under the hood, but they require a python loop over the rows, which might become a problem if you have very many short rows.
The second question also has a pure numpy solution which should be quite efficient, although it is a little less elegant.
import numpy as np
rows = 20
cols = 10
n = 3
#fixed number of ones per row in random places
print (np.argsort(np.random.rand(rows, cols)) < n).view(np.uint8)
#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows*n)/n
J = (np.random.randint(0,cols-n+1, (rows,1))+np.arange(n)).flatten()
data[I, J] = 1
print data
Edit: here is a slightly longer, but more elegant and more performant solution to your second question:
import numpy as np
rows = 20
cols = 10
n = 3
def running_view(arr, window, axis=-1):
"""
return a running view of length 'window' over 'axis'
the returned array has an extra last dimension, which spans the window
"""
shape = list(arr.shape)
shape[axis] -= (window-1)
assert(shape[axis]>0)
return np.lib.index_tricks.as_strided(
arr,
shape + [window],
arr.strides + (arr.strides[axis],))
#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows)
J = np.random.randint(0,cols-n+1, rows)
running_view(data, n)[I,J,:] = 1
print data
First of all you need to import some functions of numpy:
from numpy.random import rand, randint
from numpy import array, argsort
Case 1:
a = rand(10,5)
b=[]
for i in range(len(a)):
n=3 #number of 1's
b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)
Result:
print b
array([[ 1, 0, 0, 1, 1],
[ 1, 0, 0, 1, 1],
[ 0, 1, 0, 1, 1],
[ 1, 0, 1, 0, 1],
[ 1, 0, 0, 1, 1],
[ 1, 1, 0, 0, 1],
[ 0, 1, 1, 1, 0],
[ 0, 1, 1, 0, 1],
[ 1, 0, 1, 0, 1],
[ 0, 1, 1, 1, 0]])
Case 2:
a = rand(10,5)
b=[]
for i in range(len(a)):
n=3 #max number of 1's
n=randint(0,(n+1))
b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)
Result:
print b
array([[ 0, 0, 1, 0, 0],
[ 0, 1, 0, 1, 0],
[ 1, 0, 1, 0, 1],
[ 0, 1, 1, 0, 0],
[ 1, 0, 1, 0, 0],
[ 1, 0, 0, 1, 1],
[ 0, 1, 1, 0, 1],
[ 1, 0, 1, 0, 0],
[ 1, 1, 0, 1, 0],
[ 1, 0, 1, 1, 0]])
I think that could work. To get the result i generate lists of random floats and with "argsort" see what of those are the n biggests of the list, then i filter them as ints (boolean*1-> int).
Just for the fun of it, I tried to find a solution for your first question even if I'm quite new to Python. Here what I have so far :
np.vstack([np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0]))])
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
It is not the final answer, but maybe it can help you find an alternate solution using random numbers and permutation.
Related
How would I transform
a=[[0,6],
[3,7],
[5,5]]
into
b=[[1,0,0,0,0,0,1,0],
[0,0,0,1,0,0,0,1],
[0,0,0,0,0,1,0,0]]
I want to bring notice to how the final array in b only has one value set to 1 due to the repeat in the final array in a.
Using indexing:
a = np.array([[0,6],
[3,7],
[5,5]])
b = np.zeros((len(a), a.max()+1), dtype=int)
b[np.arange(len(a)), a.T] = 1
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])
This can also be done using numpy broadcasting and boolean comparision in the following way:
a = np.array([[0,6],
[3,7],
[5,5]])
# Convert to 3d array so that each element is present along the last axis
# Compare with max+1 to get the index of values as True.
b = (a[:,:,None] == np.arange(a.max()+1))
# Check if any element along axis 1 is true and convert the type to int
b = b.any(1).astype(int)
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])
I need to insert a matrix inside another one using numpy
The matrix i need to insert is like this one:
tetraminos = [[0, 1, 0],
[1, 1, 1]]
While the other matrix is like this:
board = numpy.array([
[6,0,0,0,0,0,0,0,0,0],
[6,0,0,0,0,0,0,0,0,0]
])
The code i'm actually using this one:
board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])] = tetraminos
The problem matrix that i'm getting is this one:
wrong_matrix = numpy.array([
[[0,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]]
])
while the expected result is:
expected_result = numpy.array([
[6,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]
])
The error is that, since the matrix contains 0, when i insert it inside the new one i lost the first value in the first row (the number 6), while i wanted to keep it
Full code:
import numpy
if __name__ == '__main__':
board = numpy.array([
[6,0,0,0,0,0,0,0,0,0],
[6,0,0,0,0,0,0,0,0,0]
])
tetraminos = [[0, 1, 0], [1, 1, 1]]
board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])] = tetraminos
print(board)
expected_result = numpy.array([
[6,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]
])
exit(1)
As long as you always want to put a constant value in there, you can treat your tetramino as a mask and use the np.putmask function:
>>> board = np.array([[6,0,0,0,0,0,0,0,0],[6,0,0,0,0,0,0,0,0]])
>>> board
array([[6, 0, 0, 0, 0, 0, 0, 0, 0],
[6, 0, 0, 0, 0, 0, 0, 0, 0]])
>>> tetraminos = [[0,1,0],[1,1,1]]
>>> np.putmask(board[0:len(tetraminos),0:len(tetraminos[0])], tetraminos,1)
>>> board
array([[6, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0]])
You might do it in two steps:
tetraminos = np.array([0, 1, 0], [1, 1, 1])
temp = board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])]
board[0:0 + tetraminos.shape[0], 0:0 + tetraminos.shape[1]] = np.where(tetraminos == 0, temp, tetraminos)
Output:
array([[6, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
A variant on the putmask
In [243]: board = np.array([
...: [6,0,0,0,0,0,0,0,0,0],
...: [6,0,0,0,0,0,0,0,0,0]
...: ])
In [244]: tetraminos = np.array([[0, 1, 0],
...: [1, 1, 1]])
In [245]: aview = board[:tetraminos.shape[0],:tetraminos.shape[1]]
Use the tetraminos as boolean to selection slots of aview to put values:
In [246]: aview[tetraminos.astype(bool)]=1
In [247]: board
Out[247]:
array([[6, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
This could be generalized if tetraminos has other non-zero values.
I have a one-dimensional numpy array consisting of ones and zeroes, like this:
[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
For each non-zero element of the array, I want to calculate the "distance" to the next non-zero element. That is, I want to answer the question "How far away is the next non-zero element?" So the result for the above array would be:
[0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0]
Is there a built-in numpy function for this? And if not, what's the most efficient way to implement this in numpy?
Here is 2 liners. If you don't want override original a replace with copy()
import numpy as np
a = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
ix = np.where(a)[0]
a[ix[:-1]] = np.diff(ix)
print(a[:-1]) # --> array([0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0])
Probably not the best answer.
np.where will give you the locations of the non-zero indices in increasing order. By iterating through the result, you know the location of each 1 and the location of the following 1, and can build the result array yourself easily. If the 1s are sparse, this is probably pretty efficient.
Let me see if I can think of something more numpy-ish.
== UPDATE ==
Ah, just came to me
# Find the ones in the array
temp = np.where(x)[0]
# find the difference between adjacent elements
deltas = temp[1:] - temp[:-1]
# Build the result based on these
result = np.zeros_like(x)
result[temp[:-1]] = deltas
Let's try:
import numpy as np
arr = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
# create output
res = np.zeros_like(arr)
# select indices non-zero
where, = np.where(arr)
# assign the indices of the non-zero the diff
res[where[:-1]] = np.diff(where)
print(res)
Output
[0 0 0 1 3 0 0 6 0 0 0 0 0 4 0 0 0 0]
I have a matrix A=
np.matrix([[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 1, 1]]
I wanna build a matrix B where B[i,j]=5 if A[i,j]=1 and (i+1)%3=0; B[i,j]=0 otherwise.
The B should be: B=
np.matrix([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 5, 0, 0]]
Is there any possible method to achieve this without using for loop, just like matrix calculation? Thank you.
UPDATED ANSWER:
I have not thought of a way to eliminate the for-loop in the list comprehension for filtering on the remainder condition, but the "heavy" part of the computation exploits numpy's optimizations.
import numpy as np
newdata = 5 * np.array([(i + 1) % 3 == 0 for i in range(data.shape[-1])]) * np.array(data)
ORIGINAL ANSWER (prior to condition that for-loops cannot be used):
Assuming your matrix is stored as data, then you can use list comprehension syntax to get what you want.
newdata = [[5 if val == 1 and (idx + 1) % 3 == 0 else 0
for idx, val in enumerate(row)]
for row in data]
I have a Python numpy array like this Let's call it
my_numpy_array And can go up to a million values!
>>> my_numpy_array
array([[1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
. . . . . . . . . . . .
. . . . . . . . . . . .
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
and another numpy array like this call it second_array ,(which is not so huge)
array([[1, 1, 1, 0, 0, 0, 0, 1], #row 1
[1, 1, 0, 0, 0, 0, 1, 0], #row 2
[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
....................... #row 9
[1, 1, 1, 0, 0, 0, 0, 1]]) #can be any number of ROWS!!!
I want to XOR these 9 (this is X..i.e can be any number) rows with every 9 rows in my_numpy_array. I tried working around with np.logical_xor() but could'nt do what I wanted!
Also note if the number of rows in my_numpy_arr is not a multiple of 9 (i.e X) say the no of rows is 2701..
for the first 2700 no problem! but the last one will be XOR-ed with only the first one from the second_array
if it was 2702 then only the first two rows from the second_array..
Any help much appreciated! Thanks
If the XOR filter is just one row, you could simply use numpy broadcasting:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
filt = np.asarray([1, 1, 1, 0, 0, 0, 0, 1])
res = arr ^ filt
If not, it doesn't look so pretty:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1]])
filt = np.asarray([[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 1, 0, 0, 0]])
filt_rows = filt.shape[0]
arr_rows = arr.shape[0]
res = arr ^ np.tile(filt, (1 + arr_rows // filt_rows ,1))[:arr_rows,:]
The filter rows are tiled to a larger array than your my_numpy_array and then cut back by indexing, so both arrays have the same shape. Not sure, how this works with larger sizes, since it makes a copy of the array and doesn't work in place.
Method 1: repeat
x = np.ones((271, 10))
y = np.zeros((9, 10))
np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
Method 1 is to repeat y enough times and subset the part that fits x's row part.
Method 2: reshape
def method2(x, y):
ry, ly = y.shape
rx, lx = x.shape
arr1 = np.logical_xor(x[:rx//ry*ry].reshape((ry, ly, rx // ry)),
y.reshape((ry, ly, 1)))
arr2 = np.logical_xor(x[rx//ry*ry:], y[:rx%ry, :]) # remainder part
return np.append(arr1.reshape((arr1.shape[0]*arr1.shape[2], arr1.shape[1])),
arr2, axis=0)
For method 2, we split the original x into two parts: the part is a multiple of y's row part and the remainder part. Take OP's problem for example, we split 2702 rows into 2700 rows and 2 rows because 2700 is a multiple of 9 and 2 is the remainder part. (The purpose of the messay part inside square brackets, like [:rx//ry*ry], is to do the split.)
For the 2700-row part, we can reshape it as a 3 dimensional tensor with shape (9, X, 30). Then, we reshape y as (9, X, 1). In this case, while performing the operation np.logical_xor, y will be broadcasted as the same size of (9, X, 30). See broadcasting for more information.
We also perform xor for the 2-row part and then use np.append to glue these two results.
Timing: For Large x, method 2 is faster
x = np.ones((2720000, 10))
y = np.zeros((9, 10))
%timeit method2(x, y)
52.5 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
175 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
repeat creates a new array that is the same size of x while the second broadcasting/reshape method do not do so. Thus, repeat could cost more time when x is big.