Adding the previous n rows as columns to a NumPy array

Adding the previous n rows as columns to a NumPy array - python

I want to add the previous n rows as columns to a NumPy array.
For example, if n=2, the array below...
[[ 1, 2]
[ 3, 4]
[ 5, 6]
[ 7, 8]
[ 9, 10]
[11, 12]]
...should be turned into the following one:
[[ 1, 2, 0, 0, 0, 0]
[ 3, 4, 1, 2, 0, 0]
[ 5, 6, 3, 4, 1, 2]
[ 7, 8, 5, 6, 3, 4]
[ 9, 10, 7, 8, 5, 6]
[11, 12, 9, 10, 7, 8]]
Any ideas how I could do that without going over the entire array in a for loop?

Here's a vectorized approach -
def vectorized_app(a,n):
M,N = a.shape
idx = np.arange(a.shape[0])[:,None] - np.arange(n+1)
out = a[idx.ravel(),:].reshape(-1,N*(n+1))
out[N*(np.arange(1,M+1))[:,None] <= np.arange(N*(n+1))] = 0
return out
Sample run -
In [255]: a
Out[255]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
In [256]: vectorized_app(a,3)
Out[256]:
array([[ 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 4, 5, 6, 1, 2, 3, 0, 0, 0, 0, 0, 0],
[ 7, 8, 9, 4, 5, 6, 1, 2, 3, 0, 0, 0],
[10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3],
[13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6],
[16, 17, 18, 13, 14, 15, 10, 11, 12, 7, 8, 9]])
Runtime test -
I am timing #Psidom's loop-comprehension based method and the vectorized method listed in this post on a 100x scaled up version (in terms of size) of the sample posted in the question :
In [246]: a = np.random.randint(0,9,(600,200))
In [247]: n = 200
In [248]: %timeit np.column_stack(mypad(a, i) for i in range(n + 1))
1 loops, best of 3: 748 ms per loop
In [249]: %timeit vectorized_app(a,n)
1 loops, best of 3: 224 ms per loop

Here is a way to pad 0 in the beginning of the array and then column stack them:
import numpy as np
n = 2
def mypad(myArr, n):
if n == 0:
return myArr
else:
return np.pad(myArr, ((n,0), (0,0)), mode = "constant")[:-n]
np.column_stack(mypad(arr, i) for i in range(n + 1))
# array([[ 1, 2, 0, 0, 0, 0],
# [ 3, 4, 1, 2, 0, 0],
# [ 5, 6, 3, 4, 1, 2],
# [ 7, 8, 5, 6, 3, 4],
# [ 9, 10, 7, 8, 5, 6],
# [11, 12, 9, 10, 7, 8]])

Related

Subtracting Row From Column in NumPy

I have a m-dimensional NumPy array A and a n-dimensional NumPy array B
I want to create a m x n matrix C such that C[i, j] = B[j] - A[i]
Is there a efficient/vectorized way to do this in NumPy?
Currently I am using:
C = np.zeros((M, N))
for i in range(0, M):
C[i, :] = (B - A[i])
Edit:
m, n are big numbers, thus, C is a even bigger matrix (of m*n entries)
I tried np.repeat and np.subtract.outer but both of those crash my RAM

I think you are looking for ǹp.subtract.outer
M = 5
N = 10
A = np.arange(N)
B = np.arange(M)
np.subtract.outer(A,B)
output:
array([[ 0, -1, -2, -3, -4],
[ 1, 0, -1, -2, -3],
[ 2, 1, 0, -1, -2],
[ 3, 2, 1, 0, -1],
[ 4, 3, 2, 1, 0],
[ 5, 4, 3, 2, 1],
[ 6, 5, 4, 3, 2],
[ 7, 6, 5, 4, 3],
[ 8, 7, 6, 5, 4],
[ 9, 8, 7, 6, 5]])
It will apply subtraction to all pairs in A and B. For 1 dimension it is equivalent to
C = empty(len(A),len(B))
for i in range(len(A)):
for j in range(len(B)):
C[i,j] = A[i] - B[j]
EDIT
If you have memory issues you could try allocating the output buffer before doing the operation and setting the out keyword appropriately:
C = np.zeros((M, N))
np.subtract.outer(B, A, out=C)

You could repeat one of the arrays on a new axis, and then subtract the other array.
Example:
m = 10
n = 20
A = np.array(range(m))
B = np.array(range(n))
C = np.repeat(B[:, np.newaxis], m, axis=1) - A
Then C would contain:
array([[ 0, -1, -2, -3, -4, -5, -6, -7, -8, -9],
[ 1, 0, -1, -2, -3, -4, -5, -6, -7, -8],
[ 2, 1, 0, -1, -2, -3, -4, -5, -6, -7],
[ 3, 2, 1, 0, -1, -2, -3, -4, -5, -6],
[ 4, 3, 2, 1, 0, -1, -2, -3, -4, -5],
[ 5, 4, 3, 2, 1, 0, -1, -2, -3, -4],
[ 6, 5, 4, 3, 2, 1, 0, -1, -2, -3],
[ 7, 6, 5, 4, 3, 2, 1, 0, -1, -2],
[ 8, 7, 6, 5, 4, 3, 2, 1, 0, -1],
[ 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
[11, 10, 9, 8, 7, 6, 5, 4, 3, 2],
[12, 11, 10, 9, 8, 7, 6, 5, 4, 3],
[13, 12, 11, 10, 9, 8, 7, 6, 5, 4],
[14, 13, 12, 11, 10, 9, 8, 7, 6, 5],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6],
[16, 15, 14, 13, 12, 11, 10, 9, 8, 7],
[17, 16, 15, 14, 13, 12, 11, 10, 9, 8],
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9],
[19, 18, 17, 16, 15, 14, 13, 12, 11, 10]])

This is a simple broadcasting task:
In [31]: A =np.arange(3); B=np.arange(4)
In [32]: C = B - A[:,None]
In [33]: C.shape
Out[33]: (3, 4)
In [34]: C
Out[34]:
array([[ 0, 1, 2, 3],
[-1, 0, 1, 2],
[-2, -1, 0, 1]])
This is like the https://stackoverflow.com/a/66842410/901925 answer, but there's no need to use repeat. That should cut down the memory usage a bit, but if M*N*8 is anywhere close to your memory limits, this or subsequent operations using C could hit that limit.

Get ndarray from pandas column when cell elements are list

I have a pandas data frame that looks like this:
>>> df = pd.DataFrame({'a': list(range(10))})
>>> df['a'] = df.a.apply(lambda x: x*np.array([1,2,3]))
>>>df.head()
a
0 [0, 0, 0]
1 [1, 2, 3]
2 [2, 4, 6]
3 [3, 6, 9]
4 [4, 8, 12]
I would like to get column a from the df as a ndarray. But when I do that I get an array of arrays
>>> df.a.values
array([array([0, 0, 0]), array([1, 2, 3]), array([2, 4, 6]),
array([3, 6, 9]), array([ 4, 8, 12]), array([ 5, 10, 15]),
array([ 6, 12, 18]), array([ 7, 14, 21]), array([ 8, 16, 24]),
array([ 9, 18, 27])], dtype=object)
How can I get the returnd output to be
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
# ...
])

Using pandas,
df.a.apply(pd.Series).values
Using numpy,
np.vstack(df.a.values)
You get
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
[ 5, 10, 15],
[ 6, 12, 18],
[ 7, 14, 21],
[ 8, 16, 24],
[ 9, 18, 27]])

Check
np.array(df['a'].tolist())
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
[ 5, 10, 15],
[ 6, 12, 18],
[ 7, 14, 21],
[ 8, 16, 24],
[ 9, 18, 27]], dtype=int64)

Zero pad array based on other array's shape

I've got K feature vectors that all share dimension n but have a variable dimension m (n x m). They all live in a list together.
to_be_padded = []
to_be_padded.append(np.reshape(np.arange(9),(3,3)))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
to_be_padded.append(np.reshape(np.arange(18),(3,6)))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
to_be_padded.append(np.reshape(np.arange(15),(3,5)))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I am looking for is a smart way to zero pad the rows of these np.arrays such that they all share the same dimension m. I've tried solving it with np.pad but I have not been able to come up with a pretty solution. Any help or nudges in the right direction would be greatly appreciated!
The result should leave the arrays looking like this:
array([[0, 1, 2, 0, 0, 0],
[3, 4, 5, 0, 0, 0],
[6, 7, 8, 0, 0, 0]])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
array([[ 0, 1, 2, 3, 4, 0],
[ 5, 6, 7, 8, 9, 0],
[10, 11, 12, 13, 14, 0]])

You could use np.pad for that, which can also pad 2-D arrays using a tuple of values specifying the padding width, ((top, bottom), (left, right)). For that you could define:
def pad_to_length(x, m):
return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')
Usage
You could start by finding the ndarray with the highest amount of columns. Say you have two of them, a and b:
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
b = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
m = max(i.shape[1] for i in [a,b])
# 5
And then use this parameter to pad the ndarrays:
pad_to_length(a, m)
array([[0, 1, 2, 0, 0],
[3, 4, 5, 0, 0],
[6, 7, 8, 0, 0]])

I believe there is no very efficient solution for this. I think you will need to loop over the list with a for loop and treat every array individually:
for i in range(len(to_be_padded)):
padded = np.zeros((n, maxM))
padded[:,:to_be_padded[i].shape[1]] = to_be_padded[i]
to_be_padded[i] = padded
where maxM is the longest m of the matrices in your list.

Select Multiple slices from Numpy array at once

I want to implement a vectorized SGD algorithm and would like to generate multiple mini batches at once.
Suppose data = np.arange(0, 100), miniBatchSize=10, n_miniBatches=10 and indices = np.random.randint(0, n_miniBatches, 5) (5 mini batches). What I would like to achieve is
miniBatches = np.zeros(5, miniBatchSize)
for i in range(5):
miniBatches[i] = data[indices[i]: indices[i] + miniBatchSize]
Is there any way to avoid for loop?
Thanks!

It can be done using stride tricks:
from numpy.lib.stride_tricks import as_strided
a = as_strided(data[:n_miniBatches], shape=(miniBatchSize, n_miniBatches), strides=2*data.strides, writeable=False)
miniBatches = a[:, indices].T
# E.g. indices = array([0, 7, 1, 0, 0])
Output:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Accessing elements in array Python

This is my first time handling multidimensional arrays and I'm having problems accessing elements. I'm trying to get the red pixels of a picture but just the first 8 elements within the array. Here's the code
import Image
import numpy as np
im = Image.open("C:\Users\Jones\Pictures\1.jpg")
pix = im.load()
r, g, b = np.array(im).T
print r[0:8]

Since you're dealing with images, r is a 2-D array. To get the first 8 pixels in the image, try
r.flatten()[:8]
This will wrap around automatically if the first row has less than 8 pixels.

do you want all rows too? Try this r[:,:8]
only want the first row? Try this r[0,:8]

You can do it like this:
r[0][:8]
Note, however, that this will not work if the first row has less than 8 pixels. To fix that, do this:
from itertools import chain
r = list(chain.from_iterable(r))
r[:8]
or (if you don't want to import an entire module):
r = [val for element in r for val in element]
r[:8]

I think it could be more simple. This example uses a random matrix (this will be your r matrix):
In [7]: from pylab import * # convention
In [8]: r = randint(0,10,(10,10)) # this is your image
In [9]: r
array([[7, 9, 5, 5, 6, 8, 1, 4, 3, 4],
[5, 4, 4, 4, 2, 6, 2, 6, 4, 2],
[1, 4, 9, 9, 2, 6, 1, 9, 0, 6],
[5, 9, 0, 7, 9, 9, 5, 2, 0, 7],
[8, 3, 3, 9, 0, 0, 5, 9, 2, 2],
[5, 3, 7, 8, 8, 1, 6, 3, 2, 0],
[0, 2, 5, 7, 0, 1, 0, 2, 1, 2],
[4, 0, 4, 5, 9, 9, 3, 8, 3, 7],
[4, 6, 9, 9, 5, 9, 3, 0, 5, 1],
[6, 9, 9, 0, 3, 4, 9, 7, 9, 6]])
Then, extract first 8 columns and do something
In [17]: r_8 = r[:,:8] # extract columns
In [18]: r_8
Out[18]:
array([[7, 9, 5, 5, 6, 8, 1, 4],
[5, 4, 4, 4, 2, 6, 2, 6],
[1, 4, 9, 9, 2, 6, 1, 9],
[5, 9, 0, 7, 9, 9, 5, 2],
[8, 3, 3, 9, 0, 0, 5, 9],
[5, 3, 7, 8, 8, 1, 6, 3],
[0, 2, 5, 7, 0, 1, 0, 2],
[4, 0, 4, 5, 9, 9, 3, 8],
[4, 6, 9, 9, 5, 9, 3, 0],
[6, 9, 9, 0, 3, 4, 9, 7]])
In [19]: r_8 = r_8 * 2 # do something
In [20]: r_8
Out[20]:
array([[14, 18, 10, 10, 12, 16, 2, 8],
[10, 8, 8, 8, 4, 12, 4, 12],
[ 2, 8, 18, 18, 4, 12, 2, 18],
[10, 18, 0, 14, 18, 18, 10, 4],
[16, 6, 6, 18, 0, 0, 10, 18],
[10, 6, 14, 16, 16, 2, 12, 6],
[ 0, 4, 10, 14, 0, 2, 0, 4],
[ 8, 0, 8, 10, 18, 18, 6, 16],
[ 8, 12, 18, 18, 10, 18, 6, 0],
[12, 18, 18, 0, 6, 8, 18, 14]])
Now, this is the trick. Replace the first 8 columns in r using hstack:
In [21]: r = hstack((r_8, r[:,8:])) # it replaces the FISRT 8 columns, note the indexing notation
In [22]: r
Out[22]:
array([[14, 18, 10, 10, 12, 16, 2, 8, 3, 4], # it does not touch the last 2 columns
[10, 8, 8, 8, 4, 12, 4, 12, 4, 2],
[ 2, 8, 18, 18, 4, 12, 2, 18, 0, 6],
[10, 18, 0, 14, 18, 18, 10, 4, 0, 7],
[16, 6, 6, 18, 0, 0, 10, 18, 2, 2],
[10, 6, 14, 16, 16, 2, 12, 6, 2, 0],
[ 0, 4, 10, 14, 0, 2, 0, 4, 1, 2],
[ 8, 0, 8, 10, 18, 18, 6, 16, 3, 7],
[ 8, 12, 18, 18, 10, 18, 6, 0, 5, 1],
[12, 18, 18, 0, 6, 8, 18, 14, 9, 6]])

EDIT: as to what DSM pointed out, OP is infact using a numpy array.
i retract my answer as nneonneo's correct

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding the previous n rows as columns to a NumPy array - python

Related

Subtracting Row From Column in NumPy

Get ndarray from pandas column when cell elements are list

Zero pad array based on other array's shape

Select Multiple slices from Numpy array at once

Accessing elements in array Python

Categories

Resources