stacking numpy arrays? - python

I am trying to stack arrays horizontally, using numpy hstack, but can't get it to work. Instead, it all comes out in one list, instead of a 'matrix-looking' 2D array.
import numpy as np
y = np.array([0,2,-6,4,1])
y_bool = y > 0
y_bool = [1 if l == True else 0 for l in y_bool] #convert to decimals for classification
y_range = range(0,len(y))
print y
print y_bool
print y_range
print np.hstack((y,y_bool,y_range))
Prints this:
[ 0 2 -6 4 1]
[0, 1, 0, 1, 1]
[0, 1, 2, 3, 4]
[ 0 2 -6 4 1 0 1 0 1 1 0 1 2 3 4]
How do I instead get the last line to look like this:
[0 0 0
2 1 1
-6 0 2
4 1 3]

If you want to create a 2D array, do:
print np.transpose(np.array((y, y_bool, y_range)))
# [[ 0 0 0]
# [ 2 1 1]
# [-6 0 2]
# [ 4 1 3]
# [ 1 1 4]]

Well, close enough h is for horizontal/column wise, if you check its help, you will see under See Also
vstack : Stack arrays in sequence vertically (row wise).
dstack : Stack arrays in sequence depth wise (along third axis).
concatenate : Join a sequence of arrays together.
Edit: First thought vstack does it, but it would be if np.vstack(...).T or np.dstack(...).squeeze(). Other then that the "problem" is that the arrays are 1D and you want them to act like 2D, so you could do:
print np.hstack([np.asarray(a)[:,np.newaxis] for a in (y,y_bool,y_range)])
the np.asarray is there just in case one of the variables is a list. The np.newaxis makes them 2D to make it clearer what happens when concatenating.

Related

Is np.argpartition giving me the wrong results?

Take the following code:
import numpy as np
one_dim = np.array([2, 3, 1, 5, 4])
partitioned = np.argpartition(one_dim, 0)
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')
The following output results:
Unpartitioned array: [2 3 1 5 4]
Partitioned array index: [2 1 0 3 4]
Partitioned array: [1 3 2 5 4]
The output for the partitioned array should be [1 2 3 5 4]. How is three on the left side of two? It seems to me the function is making an error or am I missing something?
The second argument is which index will be in sorted position after partitioning, so it is correct that index 0 of the partition (element value 1) is in sorted position, and everything to the right is greater.

Create variable padding around 1d numpy array

arr= [1,2,3,4]
k = 4 (can be different)
so result will be 2 d array. How to do this without using any loop? and can't hard code k.
k and arr can vary as per input.
Must use numpy.pad
[[1,2,3,4,0,0,0], #k-1 zeros
[0,1,2,3,4,0,0],
[0,0,1,2,3,4,0],
[0,0,0,1,2,3,4]]
If you really have to do it without a loop (for educational purposes)
np.pad(np.tile(arr,[k,1]), [(0,0),(0,k)]).reshape(-1)[:-k].reshape(k,-1)
Using list comprehension as a one liner :
import numpy as np
arr= np.array([1,2,3,4])
k = 4
print( np.array( [ np.pad(arr, (0+i , k-1-i ) ) for i in range(0,k)] ) )
Out :
[[1 2 3 4 0 0 0]
[0 1 2 3 4 0 0]
[0 0 1 2 3 4 0]
[0 0 0 1 2 3 4]]

How to find numpy array shape in a larger array?

big_array = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
print(big_array)
[[0 1 0 0 1 0 0 1]
[0 1 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0]
[1 0 0 0 1 0 0 0]]
Is there a way to iterate over this numpy array and for each 2x2 cluster of 0s, set all values within that cluster = 5? This is what the output would look like.
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 0]
[0 1 5 5 1 5 5 0]
[0 0 5 5 1 5 5 0]
[1 0 5 5 1 5 5 0]]
My thoughts are to use advanced indexing to set the 2x2 shape = to 5, but I think it would be really slow to simply iterate like:
1) check if array[x][y] is 0
2) check if adjacent array elements are 0
3) if all elements are 0, set all those values to 5.
big_array = [1, 7, 0, 0, 3]
i = 0
p = 0
while i <= len(big_array) - 1 and p <= len(big_array) - 2:
if big_array[i] == big_array[p + 1]:
big_array[i] = 5
big_array[p + 1] = 5
print(big_array)
i = i + 1
p = p + 1
Output:
[1, 7, 5, 5, 3]
It is a example, not whole correct code.
Here's a solution by viewing the array as blocks.
First you need to define this function rolling_window from here https://gist.github.com/seberg/3866040/revisions
Then break the array big, your starting array, into 2x2 blocks using this function.
Also generate an array which has indices of every element in big and break it similarly into 2x2 blocks.
Then generate a boolean mask where the 2x2 blocks of big are all zero, and use the index array to get those elements.
blks = rolling_window(big,window=(2,2)) # 2x2 blocks of original array
inds = np.indices(big.shape).transpose(1,2,0) # array of indices into big
blkinds = rolling_window(inds,window=(2,2,0)).transpose(0,1,4,3,2) # 2x2 blocks of indices into big
mask = blks == np.zeros((2,2)) # generate a mask of every 2x2 block which is all zero
mask = mask.reshape(*mask.shape[:-2],-1).all(-1) # still generating the mask
# now blks[mask] is every block which is zero..
# but you actually want the original indices in the array 'big' instead
inds = blkinds[mask].reshape(-1,2).T # indices into big where elements need replacing
big[inds[0],inds[1]] = 5 #reassign
You need to test this: I did not. But the idea is to break the array into blocks, and an array of indices into blocks, then develop a boolean condition on the blocks, use those to get the indices, and then reassign.
An alternative would be to iterate through indblks as defined here, then test the 2x2 obtained from big at each indblk element and reassign if necessary.
This is my attempt to help you solve your problem. My solution may be subject to fair criticism.
import numpy as np
from itertools import product
m = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
h = 2
w = 2
rr, cc = tuple(d + 1 - q for d, q in zip(m.shape, (h, w)))
slices = [(slice(r, r + h), slice(c, c + w))
for r, c in product(range(rr), range(cc))
if not m[r:r + h, c:c + w].any()]
for s in slices:
m[s] = 5
print(m)
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 5]
[0 1 5 5 1 5 5 5]
[0 5 5 5 1 5 5 5]
[1 5 5 5 1 5 5 5]]

repeat indices with different repeat values in numpy

I'm looking for an efficient way to do the following with Numpy:
Given a array counts of positive integers containing for instance:
[3, 1, 0, 6, 3, 2]
I would like to generate another array containing the indices of the first one, where the index i is repeated counts[i] times:
[0 0 0 1 3 3 3 3 3 3 4 4 4 5 5]
My problem is that this array is potentially very large and I'm looking for a vectorial (or fast) way to do this.
You can do it with numpy.repeat:
import numpy as np
arr = np.array([3, 1, 0, 6, 3, 2])
repix = np.repeat(np.arange(arr.size), arr)
print(repix)
Output:
[0 0 0 1 3 3 3 3 3 3 4 4 4 5 5]

indexing numpy multidimensional arrays

I need to access this numpy array, sometimes with only the rows where the last column is 0, and sometimes the rows where the value of the last column is 1.
y = [0 0 0 0
1 2 1 1
2 -6 0 1
3 4 1 0]
I have to do this over and over, but would prefer to shy away from creating duplicate arrays or having to recalculate each time. Is there someway that I can identify the indices concerned and just call them? So that I can do this:
>>print y[LAST_COLUMN_IS_0]
[0 0 0 0
3 4 1 0]
>>print y[LAST_COLUMN_IS_1]
[1 2 1 1
2 -6 0 1]
P.S. The number of columns in the array never changes, it's always going to have 4 columns.
You can use numpy's boolean indexing to identify which rows you want to select, and numpy's fancy indexing/slicing to select the whole row.
print y[y[:,-1] == 0, :]
print y[y[:,-1] == 1, :]
You can save y[:,-1] == 0 and ... == 1 as usual, since they are just numpy arrays.
(The y[:,-1] selects the whole of the last column, and the == equality check happens element-wise, resulting in an array of booleans.)

Categories