Numpy array creation with patterns - python

is it possible, in a fast way, to create a (large) 2d numpy array which
contains a value n times per row (randomly placed). e.g., for n = 3
1 0 1 0 1
0 0 1 1 1
1 1 1 0 0
...
same as 1., but place groups of that size n randomly per row. e.g.
1 1 1 0 0
0 0 1 1 1
1 1 1 0 0
...
of course, I could enumerate all rows, but I am wondering if there's a way to create the array using np.fromfunctionor some faster way?

The answer to your first question has a simple one-line solution, which I imagine is pretty efficient. Functions like np.random.shuffle or np.random.permutation must be doing something similar under the hood, but they require a python loop over the rows, which might become a problem if you have very many short rows.
The second question also has a pure numpy solution which should be quite efficient, although it is a little less elegant.
import numpy as np
rows = 20
cols = 10
n = 3
#fixed number of ones per row in random places
print (np.argsort(np.random.rand(rows, cols)) < n).view(np.uint8)
#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows*n)/n
J = (np.random.randint(0,cols-n+1, (rows,1))+np.arange(n)).flatten()
data[I, J] = 1
print data
Edit: here is a slightly longer, but more elegant and more performant solution to your second question:
import numpy as np
rows = 20
cols = 10
n = 3
def running_view(arr, window, axis=-1):
"""
return a running view of length 'window' over 'axis'
the returned array has an extra last dimension, which spans the window
"""
shape = list(arr.shape)
shape[axis] -= (window-1)
assert(shape[axis]>0)
return np.lib.index_tricks.as_strided(
arr,
shape + [window],
arr.strides + (arr.strides[axis],))
#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows)
J = np.random.randint(0,cols-n+1, rows)
running_view(data, n)[I,J,:] = 1
print data

First of all you need to import some functions of numpy:
from numpy.random import rand, randint
from numpy import array, argsort
Case 1:
a = rand(10,5)
b=[]
for i in range(len(a)):
n=3 #number of 1's
b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)
Result:
print b
array([[ 1, 0, 0, 1, 1],
[ 1, 0, 0, 1, 1],
[ 0, 1, 0, 1, 1],
[ 1, 0, 1, 0, 1],
[ 1, 0, 0, 1, 1],
[ 1, 1, 0, 0, 1],
[ 0, 1, 1, 1, 0],
[ 0, 1, 1, 0, 1],
[ 1, 0, 1, 0, 1],
[ 0, 1, 1, 1, 0]])
Case 2:
a = rand(10,5)
b=[]
for i in range(len(a)):
n=3 #max number of 1's
n=randint(0,(n+1))
b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)
Result:
print b
array([[ 0, 0, 1, 0, 0],
[ 0, 1, 0, 1, 0],
[ 1, 0, 1, 0, 1],
[ 0, 1, 1, 0, 0],
[ 1, 0, 1, 0, 0],
[ 1, 0, 0, 1, 1],
[ 0, 1, 1, 0, 1],
[ 1, 0, 1, 0, 0],
[ 1, 1, 0, 1, 0],
[ 1, 0, 1, 1, 0]])
I think that could work. To get the result i generate lists of random floats and with "argsort" see what of those are the n biggests of the list, then i filter them as ints (boolean*1-> int).

Just for the fun of it, I tried to find a solution for your first question even if I'm quite new to Python. Here what I have so far :
np.vstack([np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0]))])
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
It is not the final answer, but maybe it can help you find an alternate solution using random numbers and permutation.

Related

Transform 2d numpy array into 2d one hot encoding

How would I transform
a=[[0,6],
[3,7],
[5,5]]
into
b=[[1,0,0,0,0,0,1,0],
[0,0,0,1,0,0,0,1],
[0,0,0,0,0,1,0,0]]
I want to bring notice to how the final array in b only has one value set to 1 due to the repeat in the final array in a.
Using indexing:
a = np.array([[0,6],
[3,7],
[5,5]])
b = np.zeros((len(a), a.max()+1), dtype=int)
b[np.arange(len(a)), a.T] = 1
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])
This can also be done using numpy broadcasting and boolean comparision in the following way:
a = np.array([[0,6],
[3,7],
[5,5]])
# Convert to 3d array so that each element is present along the last axis
# Compare with max+1 to get the index of values as True.
b = (a[:,:,None] == np.arange(a.max()+1))
# Check if any element along axis 1 is true and convert the type to int
b = b.any(1).astype(int)
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])

Insert matrix inside another matrix using numpy without overwriting some original values

I need to insert a matrix inside another one using numpy
The matrix i need to insert is like this one:
tetraminos = [[0, 1, 0],
[1, 1, 1]]
While the other matrix is like this:
board = numpy.array([
[6,0,0,0,0,0,0,0,0,0],
[6,0,0,0,0,0,0,0,0,0]
])
The code i'm actually using this one:
board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])] = tetraminos
The problem matrix that i'm getting is this one:
wrong_matrix = numpy.array([
[[0,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]]
])
while the expected result is:
expected_result = numpy.array([
[6,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]
])
The error is that, since the matrix contains 0, when i insert it inside the new one i lost the first value in the first row (the number 6), while i wanted to keep it
Full code:
import numpy
if __name__ == '__main__':
board = numpy.array([
[6,0,0,0,0,0,0,0,0,0],
[6,0,0,0,0,0,0,0,0,0]
])
tetraminos = [[0, 1, 0], [1, 1, 1]]
board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])] = tetraminos
print(board)
expected_result = numpy.array([
[6,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]
])
exit(1)
As long as you always want to put a constant value in there, you can treat your tetramino as a mask and use the np.putmask function:
>>> board = np.array([[6,0,0,0,0,0,0,0,0],[6,0,0,0,0,0,0,0,0]])
>>> board
array([[6, 0, 0, 0, 0, 0, 0, 0, 0],
[6, 0, 0, 0, 0, 0, 0, 0, 0]])
>>> tetraminos = [[0,1,0],[1,1,1]]
>>> np.putmask(board[0:len(tetraminos),0:len(tetraminos[0])], tetraminos,1)
>>> board
array([[6, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0]])
You might do it in two steps:
tetraminos = np.array([0, 1, 0], [1, 1, 1])
temp = board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])]
board[0:0 + tetraminos.shape[0], 0:0 + tetraminos.shape[1]] = np.where(tetraminos == 0, temp, tetraminos)
Output:
array([[6, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
A variant on the putmask
In [243]: board = np.array([
...: [6,0,0,0,0,0,0,0,0,0],
...: [6,0,0,0,0,0,0,0,0,0]
...: ])
In [244]: tetraminos = np.array([[0, 1, 0],
...: [1, 1, 1]])
In [245]: aview = board[:tetraminos.shape[0],:tetraminos.shape[1]]
Use the tetraminos as boolean to selection slots of aview to put values:
In [246]: aview[tetraminos.astype(bool)]=1
In [247]: board
Out[247]:
array([[6, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
This could be generalized if tetraminos has other non-zero values.

Distance to next non-zero element in one-dimensional numpy array

I have a one-dimensional numpy array consisting of ones and zeroes, like this:
[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
For each non-zero element of the array, I want to calculate the "distance" to the next non-zero element. That is, I want to answer the question "How far away is the next non-zero element?" So the result for the above array would be:
[0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0]
Is there a built-in numpy function for this? And if not, what's the most efficient way to implement this in numpy?
Here is 2 liners. If you don't want override original a replace with copy()
import numpy as np
a = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
ix = np.where(a)[0]
a[ix[:-1]] = np.diff(ix)
print(a[:-1]) # --> array([0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0])
Probably not the best answer.
np.where will give you the locations of the non-zero indices in increasing order. By iterating through the result, you know the location of each 1 and the location of the following 1, and can build the result array yourself easily. If the 1s are sparse, this is probably pretty efficient.
Let me see if I can think of something more numpy-ish.
== UPDATE ==
Ah, just came to me
# Find the ones in the array
temp = np.where(x)[0]
# find the difference between adjacent elements
deltas = temp[1:] - temp[:-1]
# Build the result based on these
result = np.zeros_like(x)
result[temp[:-1]] = deltas
Let's try:
import numpy as np
arr = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
# create output
res = np.zeros_like(arr)
# select indices non-zero
where, = np.where(arr)
# assign the indices of the non-zero the diff
res[where[:-1]] = np.diff(where)
print(res)
Output
[0 0 0 1 3 0 0 6 0 0 0 0 0 4 0 0 0 0]

How to construct matrix based on condition and position?

I have a matrix A=
np.matrix([[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 1, 1]]
I wanna build a matrix B where B[i,j]=5 if A[i,j]=1 and (i+1)%3=0; B[i,j]=0 otherwise.
The B should be: B=
np.matrix([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 5, 0, 0]]
Is there any possible method to achieve this without using for loop, just like matrix calculation? Thank you.
UPDATED ANSWER:
I have not thought of a way to eliminate the for-loop in the list comprehension for filtering on the remainder condition, but the "heavy" part of the computation exploits numpy's optimizations.
import numpy as np
newdata = 5 * np.array([(i + 1) % 3 == 0 for i in range(data.shape[-1])]) * np.array(data)
ORIGINAL ANSWER (prior to condition that for-loops cannot be used):
Assuming your matrix is stored as data, then you can use list comprehension syntax to get what you want.
newdata = [[5 if val == 1 and (idx + 1) % 3 == 0 else 0
for idx, val in enumerate(row)]
for row in data]

How to perform XOR operations on every X consecutive rows numpy Python

I have a Python numpy array like this Let's call it
my_numpy_array And can go up to a million values!
>>> my_numpy_array
array([[1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
. . . . . . . . . . . .
. . . . . . . . . . . .
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
and another numpy array like this call it second_array ,(which is not so huge)
array([[1, 1, 1, 0, 0, 0, 0, 1], #row 1
[1, 1, 0, 0, 0, 0, 1, 0], #row 2
[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
....................... #row 9
[1, 1, 1, 0, 0, 0, 0, 1]]) #can be any number of ROWS!!!
I want to XOR these 9 (this is X..i.e can be any number) rows with every 9 rows in my_numpy_array. I tried working around with np.logical_xor() but could'nt do what I wanted!
Also note if the number of rows in my_numpy_arr is not a multiple of 9 (i.e X) say the no of rows is 2701..
for the first 2700 no problem! but the last one will be XOR-ed with only the first one from the second_array
if it was 2702 then only the first two rows from the second_array..
Any help much appreciated! Thanks
If the XOR filter is just one row, you could simply use numpy broadcasting:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 1, 0, 0, 0, 1, 1]])
filt = np.asarray([1, 1, 1, 0, 0, 0, 0, 1])
res = arr ^ filt
If not, it doesn't look so pretty:
arr = np.asarray([ [1, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1]])
filt = np.asarray([[1, 1, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 1, 0, 0, 0]])
filt_rows = filt.shape[0]
arr_rows = arr.shape[0]
res = arr ^ np.tile(filt, (1 + arr_rows // filt_rows ,1))[:arr_rows,:]
The filter rows are tiled to a larger array than your my_numpy_array and then cut back by indexing, so both arrays have the same shape. Not sure, how this works with larger sizes, since it makes a copy of the array and doesn't work in place.
Method 1: repeat
x = np.ones((271, 10))
y = np.zeros((9, 10))
np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
Method 1 is to repeat y enough times and subset the part that fits x's row part.
Method 2: reshape
def method2(x, y):
ry, ly = y.shape
rx, lx = x.shape
arr1 = np.logical_xor(x[:rx//ry*ry].reshape((ry, ly, rx // ry)),
y.reshape((ry, ly, 1)))
arr2 = np.logical_xor(x[rx//ry*ry:], y[:rx%ry, :]) # remainder part
return np.append(arr1.reshape((arr1.shape[0]*arr1.shape[2], arr1.shape[1])),
arr2, axis=0)
For method 2, we split the original x into two parts: the part is a multiple of y's row part and the remainder part. Take OP's problem for example, we split 2702 rows into 2700 rows and 2 rows because 2700 is a multiple of 9 and 2 is the remainder part. (The purpose of the messay part inside square brackets, like [:rx//ry*ry], is to do the split.)
For the 2700-row part, we can reshape it as a 3 dimensional tensor with shape (9, X, 30). Then, we reshape y as (9, X, 1). In this case, while performing the operation np.logical_xor, y will be broadcasted as the same size of (9, X, 30). See broadcasting for more information.
We also perform xor for the 2-row part and then use np.append to glue these two results.
Timing: For Large x, method 2 is faster
x = np.ones((2720000, 10))
y = np.zeros((9, 10))
%timeit method2(x, y)
52.5 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.logical_xor(x, np.repeat(y, x.shape[0]//y.shape[0]+1, axis=0)[:x.shape[0],:])
175 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
repeat creates a new array that is the same size of x while the second broadcasting/reshape method do not do so. Thus, repeat could cost more time when x is big.

Categories