Zero pad array based on other array's shape - python

I've got K feature vectors that all share dimension n but have a variable dimension m (n x m). They all live in a list together.
to_be_padded = []
to_be_padded.append(np.reshape(np.arange(9),(3,3)))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
to_be_padded.append(np.reshape(np.arange(18),(3,6)))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
to_be_padded.append(np.reshape(np.arange(15),(3,5)))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I am looking for is a smart way to zero pad the rows of these np.arrays such that they all share the same dimension m. I've tried solving it with np.pad but I have not been able to come up with a pretty solution. Any help or nudges in the right direction would be greatly appreciated!
The result should leave the arrays looking like this:
array([[0, 1, 2, 0, 0, 0],
[3, 4, 5, 0, 0, 0],
[6, 7, 8, 0, 0, 0]])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
array([[ 0, 1, 2, 3, 4, 0],
[ 5, 6, 7, 8, 9, 0],
[10, 11, 12, 13, 14, 0]])

You could use np.pad for that, which can also pad 2-D arrays using a tuple of values specifying the padding width, ((top, bottom), (left, right)). For that you could define:
def pad_to_length(x, m):
return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')
Usage
You could start by finding the ndarray with the highest amount of columns. Say you have two of them, a and b:
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
b = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
m = max(i.shape[1] for i in [a,b])
# 5
And then use this parameter to pad the ndarrays:
pad_to_length(a, m)
array([[0, 1, 2, 0, 0],
[3, 4, 5, 0, 0],
[6, 7, 8, 0, 0]])

I believe there is no very efficient solution for this. I think you will need to loop over the list with a for loop and treat every array individually:
for i in range(len(to_be_padded)):
padded = np.zeros((n, maxM))
padded[:,:to_be_padded[i].shape[1]] = to_be_padded[i]
to_be_padded[i] = padded
where maxM is the longest m of the matrices in your list.

Related

Select non-consecutive row and column indices from 2d numpy array

I have an array a
a = np.arange(5*5).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and want to select the last two columns from row one and two, and the first two columns of row three and four.
The result should look like this
array([[3, 4, 10, 11],
[8, 9, 15, 16]])
How to do that in one go without indexing twice and concatenation?
I tried using take
a.take([[0,1,2,3], [3,4,0,1]])
array([[0, 1, 2, 3],
[3, 4, 0, 1]])
ix_
a[np.ix_([0,1,2,3], [3,4,0,1])]
array([[ 3, 4, 0, 1],
[ 8, 9, 5, 6],
[13, 14, 10, 11],
[18, 19, 15, 16]])
and r_
a[np.r_[0:2, 2:4], np.r_[3:5, 0:2]]
array([ 3, 9, 10, 16])
and a combination of ix_ and r_
a[np.ix_([0,1,2,3], np.r_[3:4, 0:1])]
array([[ 3, 0],
[ 8, 5],
[13, 10],
[18, 15]])
Using integer advanced indexing, you can do something like this
index_rows = np.array([
[0, 0, 2, 2],
[1, 1, 3, 3],
])
index_cols = np.array([
[-2, -1, 0, 1],
[-2, -1, 0, 1],
])
a[index_rows, index_cols]
where you just select directly what elements you want.

Adding the previous n rows as columns to a NumPy array

I want to add the previous n rows as columns to a NumPy array.
For example, if n=2, the array below...
[[ 1, 2]
[ 3, 4]
[ 5, 6]
[ 7, 8]
[ 9, 10]
[11, 12]]
...should be turned into the following one:
[[ 1, 2, 0, 0, 0, 0]
[ 3, 4, 1, 2, 0, 0]
[ 5, 6, 3, 4, 1, 2]
[ 7, 8, 5, 6, 3, 4]
[ 9, 10, 7, 8, 5, 6]
[11, 12, 9, 10, 7, 8]]
Any ideas how I could do that without going over the entire array in a for loop?
Here's a vectorized approach -
def vectorized_app(a,n):
M,N = a.shape
idx = np.arange(a.shape[0])[:,None] - np.arange(n+1)
out = a[idx.ravel(),:].reshape(-1,N*(n+1))
out[N*(np.arange(1,M+1))[:,None] <= np.arange(N*(n+1))] = 0
return out
Sample run -
In [255]: a
Out[255]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
In [256]: vectorized_app(a,3)
Out[256]:
array([[ 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 4, 5, 6, 1, 2, 3, 0, 0, 0, 0, 0, 0],
[ 7, 8, 9, 4, 5, 6, 1, 2, 3, 0, 0, 0],
[10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3],
[13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6],
[16, 17, 18, 13, 14, 15, 10, 11, 12, 7, 8, 9]])
Runtime test -
I am timing #Psidom's loop-comprehension based method and the vectorized method listed in this post on a 100x scaled up version (in terms of size) of the sample posted in the question :
In [246]: a = np.random.randint(0,9,(600,200))
In [247]: n = 200
In [248]: %timeit np.column_stack(mypad(a, i) for i in range(n + 1))
1 loops, best of 3: 748 ms per loop
In [249]: %timeit vectorized_app(a,n)
1 loops, best of 3: 224 ms per loop
Here is a way to pad 0 in the beginning of the array and then column stack them:
import numpy as np
n = 2
def mypad(myArr, n):
if n == 0:
return myArr
else:
return np.pad(myArr, ((n,0), (0,0)), mode = "constant")[:-n]
np.column_stack(mypad(arr, i) for i in range(n + 1))
# array([[ 1, 2, 0, 0, 0, 0],
# [ 3, 4, 1, 2, 0, 0],
# [ 5, 6, 3, 4, 1, 2],
# [ 7, 8, 5, 6, 3, 4],
# [ 9, 10, 7, 8, 5, 6],
# [11, 12, 9, 10, 7, 8]])

Numpy array: Changing the values of the last column when lines of a 2d-array are equal to lines of another 2d-array

I have huge 2D numpy array (called DATA). I want to change the last value (column) of all lines of DATA if those ones are similar to a same shaped external line (called ExtLine).
# -*- coding: utf-8 -*-
import numpy
DATA=numpy.array([
[1,2,3,4,5,6,0],
[2,5,6,84,1,6,0],
[9,9,9,9,9,9,0],
[1,2,3,4,5,6,0],
[2,5,6,84,1,6,0],
[0,2,5,4,8,9,0] ])
# Pool of lines that will be compared to DATA
PoolOfExtLines=numpy.array([[1,2,3,4,5,6,0],[2,5,6,84,1,6,0]])
for j in xrange(PoolOfExtLines.shape[0]): # loop on pool of lines
# convert ExtLine into a continous code (to be compare to future lines of DATA
b=numpy.ascontiguousarray(PoolOfExtLines[j]).view(numpy.dtype((numpy.void, PoolOfExtLines[j].dtype.itemsize * PoolOfExtLines[j].shape[0])))
for i in xrange(DATA.shape[0]): # loop on DATA lines
# convert the current line into a continous code (to be compare to b)
a=numpy.ascontiguousarray(DATA[i]).view(numpy.dtype((numpy.void, DATA[i].dtype.itemsize * DATA[i].shape[0])))
if a == b:
DATA[i,-1]=-1
it results into a DATA arrays modified as I want (tag -1 at the end of lines that where similar to those of PoolOfExtLines:
[[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 0, 2, 5, 4, 8, 9, 0]]
My question: I feel that this code can be enhance and is quite complicated in regard to what I want to do. I feel that using some (built-in) methods I missed or smart direct (how?) comparisons, I can make the code clearer and faster. Thanks for your incoming help.
You can use NumPy's broadcasting capability alongwith boolean indexing to solve it in a vectorized manner -
DATA[((DATA == PoolOfExtLines[:,None,:]).all(2)).any(0),-1] = -1
Sample run -
In [17]: DATA
Out[17]:
array([[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0],
[ 0, 2, 5, 4, 8, 9, 0]])
In [18]: PoolOfExtLines
Out[18]:
array([[ 1, 2, 3, 4, 5, 6, 0],
[ 2, 5, 6, 84, 1, 6, 0]])
In [19]: DATA[((DATA == PoolOfExtLines[:,None,:]).all(2)).any(0),-1] = -1
In [20]: DATA
Out[20]:
array([[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 9, 9, 9, 9, 9, 9, 0],
[ 1, 2, 3, 4, 5, 6, -1],
[ 2, 5, 6, 84, 1, 6, -1],
[ 0, 2, 5, 4, 8, 9, 0]])

Indexing a 2d array with a 3d array in numpy

I have two arrays.
"a", a 2d numpy array.
import numpy.random as npr
a = array([[5,6,7,8,9],[10,11,12,14,15]])
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 14, 15]])
"idx", a 3d numpy array constituting three index variants I want to use to index "a".
idx = npr.randint(5, size=(nsamp,shape(a)[0], shape(a)[1]))
array([[[1, 2, 1, 3, 4],
[2, 0, 2, 0, 1]],
[[0, 0, 3, 2, 0],
[1, 3, 2, 0, 3]],
[[2, 1, 0, 1, 4],
[1, 1, 0, 1, 0]]])
Now I want to index "a" three times with the indices in "idx" to obtain an object as follows:
array([[[6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
The naive "a[idx]" does not work. Any ideas as to how to do this? (I use Python 3.4 and numpy 1.9)
You can use choose to make the selection from a:
>>> np.choose(idx, a.T[:,:,np.newaxis])
array([[[ 6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[ 5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[ 7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
As you can see, a has to be reshaped from an array with shape (2, 5) to an array with shape (5, 2, 1) first. This is essentially so that it is broadcastable with idx, which has shape (3, 2, 5).
(I learned this method from #immerrr's answer here: https://stackoverflow.com/a/26225395/3923281)
You can use take array method:
import numpy
a = numpy.array([[5,6,7,8,9],[10,11,12,14,15]])
idx = numpy.random.randint(5, size=(3, a.shape[0], a.shape[1]))
print a.take(idx)

Python: what's an elegant way to select specific rows of an 2-d array into a new array?

For example, given a python numpy.ndarray a = array([[1, 2], [3, 4], [5, 6]]), I want to select the 0th and 2nd row of array a into a new array b, such that b becomes array([[1,2],[5,6]].
I need to solution to work on more general problems, where the original 2d array can have more rows and I should be able to select the rows based on some disjoint ranges. In general, I was looking for something like a[i:j] + a[k:p] that works for 1-d list, but it seems 2d-arrays won't add up this way.
update
It seems that I can use vstack((a[i:j], a[k:p])) to get this working, but is there any elegant way to do this?
You can use list indexing:
a[ [0,2], ]
More generally, to select rows i:j and k:p (I'm assuming in the python sense, meaning rows i to j but not including j):
a[ range(i,j) + range(k,p) , ]
Note that the range(i,j) + range(k,p) creates a flat list of [ i, i+1, ..., j-1, k, k+1, ..., p-1 ], which is then used to index the rows of a.
numpy is kind of clever when it comes to indexing. You can give it a list of indexes and it will return the sliced part.
In : a = numpy.array([[i]*10 for i in range(10)])
In : a
Out:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])
In : a[[0,5,9]]
Out:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])
In : a[range(0,2)+range(5,8)]
Out:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7]])

Categories