Creating 4D numpy array by repeating values from 2D numpy array - python

I have a 2D numpy array, say array1 with values. array1 is of dimensions 2x4. I want to create a 4D numpy array array2 with dimensions 20x20x2x4 and I wish to replicate the array array1 to get this array.
That is, if array1 was
[[1, 2, 3, 4],
[5, 6, 7, 8]]
I want
array2[0, 0] = array1
array2[0, 1] = array1
array2[0, 2] = array1
array2[0, 3] = array1
# etc.
How can I do this?

One approach with initialization -
array2 = np.empty((20,20) + array1.shape,dtype=array1.dtype)
array2[:] = array1
Runtime test -
In [400]: array1 = np.arange(1,9).reshape(2,4)
In [401]: array1
Out[401]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
# #MSeifert's soln
In [402]: %timeit np.tile(array1, (20, 20, 1, 1))
100000 loops, best of 3: 8.01 µs per loop
# Proposed soln in this post
In [403]: %timeit initialization_based(array1)
100000 loops, best of 3: 4.11 µs per loop
# #MSeifert's soln for READONLY-view
In [406]: %timeit np.broadcast_to(array1, (20, 20, 2, 4))
100000 loops, best of 3: 2.78 µs per loop

There are two easy ways:
np.broadcast_to:
array2 = np.broadcast_to(array1, (20, 20, 2, 4)) # array2 is a READONLY-view
and np.tile:
array2 = np.tile(array1, (20, 20, 1, 1)) # array2 is a normal numpy array
If you don't want to modify your array2 then np.broadcast_to should be really fast and simple. Otherwise np.tile or assigning to a new allocated array (see Divakars answer) should be preferred.

i got the answer.
array2[:, :, :, :] = array1.copy()
this should work fine

Related

Create a new array with Timesteps and multiple features, e.g for LSTM

Hi I am using numpy to create a new array with timesteps and multiple features, for an LSTM.
i have looked at a number of approaches using strides and reshaping but haven't managed to find an efficient solution.
Here is a function that solves a toy problem, however i have 30,000 samples, each with 100 features.
def make_timesteps(a, timesteps):
array = []
for j in np.arange(len(a)):
unit = []
for i in range(timesteps):
unit.append(np.roll(a, i, axis=0)[j])
array.append(unit)
return np.array(array)
inArr = np.array([[1, 2], [3,4], [5,6]])
inArr.shape => (3, 2)
outArr = make_timesteps(inArr, 2)
outArr.shape => (3, 2, 2)
assert(np.array_equal(outArr,
np.array([[[1, 2], [3, 4]], [[3, 4], [5, 6]], [[5, 6], [1, 2]]])))
=> True
Is there a more efficeint way of doing this (there must be!!) can someone please help?
One trick would be to append last L-1 rows off the array and append those to the start of the array. Then, it would be a simple case of using the very efficient NumPy strides. For people wondering about the cost of this trick, as we will see later on through the timing tests, it's as good as nothing.
The trick leading upto the final goal that would support both forward and backward striding in codes would look something like this -
Backward striding :
def strided_axis0_backward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr[-L+1:], inArr ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))
Forward striding :
def strided_axis0_forward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr , inArr[:L-1] ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))
Sample run -
In [42]: inArr
Out[42]:
array([[1, 2],
[3, 4],
[5, 6]])
In [43]: strided_axis0_backward(inArr, 2)
Out[43]:
array([[[1, 2],
[5, 6]],
[[3, 4],
[1, 2]],
[[5, 6],
[3, 4]]])
In [44]: strided_axis0_forward(inArr, 2)
Out[44]:
array([[[1, 2],
[3, 4]],
[[3, 4],
[5, 6]],
[[5, 6],
[1, 2]]])
Runtime test -
In [53]: inArr = np.random.randint(0,9,(1000,10))
In [54]: %timeit make_timesteps(inArr, 2)
...: %timeit strided_axis0_forward(inArr, 2)
...: %timeit strided_axis0_backward(inArr, 2)
...:
10 loops, best of 3: 33.9 ms per loop
100000 loops, best of 3: 12.1 µs per loop
100000 loops, best of 3: 12.2 µs per loop
In [55]: %timeit make_timesteps(inArr, 10)
...: %timeit strided_axis0_forward(inArr, 10)
...: %timeit strided_axis0_backward(inArr, 10)
...:
1 loops, best of 3: 152 ms per loop
100000 loops, best of 3: 12 µs per loop
100000 loops, best of 3: 12.1 µs per loop
In [56]: 152000/12.1 # Speedup figure
Out[56]: 12561.98347107438
The timings of strided_axis0 stays the same even as we increase the length of subarrays in the output. That just goes to show us the massive benefit with strides and of course the crazy speedups too over the original loopy version.
As promised at the start, here's the timings on stacking cost with np.vstack -
In [417]: inArr = np.random.randint(0,9,(1000,10))
In [418]: L = 10
In [419]: %timeit np.vstack(( inArr[-L+1:], inArr ))
100000 loops, best of 3: 5.41 µs per loop
The timings support the idea of stacking to be a pretty efficient one.

Altering arrays of different dimensions to be broadcasted together

I am looking for a more optimized way to convert a (n,n) or (n,n,1) matrix to a (n,n,3) matrix. I start out with an (n,n,3), but my dimensions get reduced after I perform a sum over the second axis to (n,n). Essentially, I want to keep the original size of the array and have the second axis just repeated 3 times. The reason I need this is that I will later be broadcasting it with another (n,n,3) array, but they need the same dimensions.
My current method works, but does not seem elegant.
a0=np.random.random((n,n))
b=a.flatten().tolist()
a=np.array(zip(b,b,b))
a.shape=n,n,3
This setup has the desired result, but is clunky and hard to follow. Is there perhaps a way to go directly from an (n,n) to an (n,n,3) by duplicating the second index? or perhaps a way to not downsize the array to begin with?
None or np.newaxis is a common way of adding a dimension to an array. reshape with (3,3,1) works just as well:
In [64]: arr=np.arange(9).reshape(3,3)
In [65]: arr1 = arr[...,None]
In [66]: arr1.shape
Out[66]: (3, 3, 1)
repeat as function or method replicates this.
In [72]: arr2=arr1.repeat(3,axis=2)
In [73]: arr2.shape
Out[73]: (3, 3, 3)
In [74]: arr2[0,0,:]
Out[74]: array([0, 0, 0])
But you might not need to do this. With broadcasting a (3,3,1) works with a (3,3,3).
In [75]: (arr1+arr2).shape
Out[75]: (3, 3, 3)
In fact it will broadcast with a (3,) to produce (3,3,3).
In [77]: arr1+np.ones(3,int)
Out[77]:
array([[[1, 1, 1],
[2, 2, 2],
...
[[7, 7, 7],
[8, 8, 8],
[9, 9, 9]]])
So arr1+np.zeros(3,int) is another way of expanding that (3,3,1) to (3,3,3).
The broadcasting rules are:
(3,3,1) + (3,) => (3,3,1) + (1,1,3) => (3,3,3)
broadcasting adds dimensions at the start as needed.
When you sum on an axis, you can keep the original number of dimensions with a parameter:
In [78]: arr2.sum(axis=2).shape
Out[78]: (3, 3)
In [79]: arr2.sum(axis=2, keepdims=True).shape
Out[79]: (3, 3, 1)
This is handy if you want to subtract the mean from an array along any dimension:
arr2-arr2.mean(axis=2, keepdims=True)
You can firstly create a new axis (axis = 2) on a and then use np.repeat along this new axis:
np.repeat(a[:,:,None], 3, axis = 2)
Or another approach, flatten the array, repeat elements and then reshape:
np.repeat(a.ravel(), 3).reshape(n,n,3)
The result comparison:
import numpy as np
n = 4
a=np.random.random((n,n))
b=a.flatten().tolist()
a1=np.array(zip(b,b,b))
a1.shape=n,n,3
# a1 is the result from the original method
(np.repeat(a[:,:,None], 3, axis = 2) == a1).all()
# True
(np.repeat(a.ravel(), 3).reshape(4,4,3) == a1).all()
# True
Timing, use built-in numpy.repeat also shows a speed up:
import numpy as np
n = 4
a=np.random.random((n,n))
​
def rep():
b=a.flatten().tolist()
a1=np.array(zip(b,b,b))
a1.shape=n,n,3
%timeit rep()
# 100000 loops, best of 3: 7.11 µs per loop
%timeit np.repeat(a[:,:,None], 3, axis = 2)
# 1000000 loops, best of 3: 1.64 µs per loop
%timeit np.repeat(a.ravel(), 3).reshape(4,4,3)
# 1000000 loops, best of 3: 1.9 µs per loop

How to append a tuple to a numpy array without it being preformed element-wise?

If I try
x = np.append(x, (2,3))
the tuple (2,3) does not get appended to the end of the array, rather 2 and 3 get appended individually, even if I originally declared x as
x = np.array([], dtype = tuple)
or
x = np.array([], dtype = (int,2))
What is the proper way to do this?
I agree with #user2357112 comment:
appending to NumPy arrays is catastrophically slower than appending to ordinary lists. It's an operation that they are not at all designed for
Here's a little benchmark:
# measure execution time
import timeit
import numpy as np
def f1(num_iterations):
x = np.dtype((np.int32, (2, 1)))
for i in range(num_iterations):
x = np.append(x, (i, i))
def f2(num_iterations):
x = np.array([(0, 0)])
for i in range(num_iterations):
x = np.vstack((x, (i, i)))
def f3(num_iterations):
x = []
for i in range(num_iterations):
x.append((i, i))
x = np.array(x)
N = 50000
print timeit.timeit('f1(N)', setup='from __main__ import f1, N', number=1)
print timeit.timeit('f2(N)', setup='from __main__ import f2, N', number=1)
print timeit.timeit('f3(N)', setup='from __main__ import f3, N', number=1)
I wouldn't use neither np.append nor vstack, I'd just create my python array properly and then use it to construct the np.array
EDIT
Here's the benchmark output on my laptop:
append: 12.4983000173
vstack: 1.60663705793
list: 0.0252208517006
[Finished in 14.3s]
You need to supply the shape to numpy dtype, like so:
x = np.dtype((np.int32, (1,2)))
x = np.append(x,(2,3))
Outputs
array([dtype(('<i4', (2, 3))), 1, 2], dtype=object)
[Reference][1]http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
If I understand what you mean, you can use vstack:
>>> a = np.array([(1,2),(3,4)])
>>> a = np.vstack((a, (4,5)))
>>> a
array([[1, 2],
[3, 4],
[4, 5]])
I do not have any special insight as to why this works, but:
x = np.array([1, 3, 2, (5,7), 4])
mytuple = [(2, 3)]
mytuplearray = np.empty(len(mytuple), dtype=object)
mytuplearray[:] = mytuple
y = np.append(x, mytuplearray)
print(y) # [1 3 2 (5, 7) 4 (2, 3)]
As others have correctly pointed out, this is a slow operation with numpy arrays. If you're just building some code from scratch, try to use some other data type. But if you know your array will always remain small or you're not going to append much or if you have existing code that you need to tweak quickly, then go ahead.
simplest way:
x=np.append(x,None)
x[-1]=(2,3)
np.append is easy to use with a case like:
In [94]: np.append([1,2,3],4)
Out[94]: array([1, 2, 3, 4])
but its first example is harder to understand. It shows the same sort of flat concatenate that bothers you:
>>> np.append([1, 2, 3], [[4, 5, 6], [7, 8, 9]])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Stripped of dimensional tests, np.append does
In [166]: np.append(np.array([1,2],int),(2,3))
Out[166]: array([1, 2, 2, 3])
In [167]: np.concatenate([np.array([1,2],int),np.array((2,3))])
Out[167]: array([1, 2, 2, 3])
So except for the simplest cases you need to understand what np.array((2,3)) does, and how concatenate handles dimensions.
So apart from the speed issues, np.append can be trickier to use that the interface suggests. The parallels to list append are only superficial.
As for append (or concatenate) with dtype=object (not dtype=tuple) or a compound dtype ('i,i'), I couldn't tell you what happens without testing. At a minimum the inputs should already be arrays, and should have a matching dtype. Otherwise the results can unpredicatable.
edit
Don't trust the timings in https://stackoverflow.com/a/38985245/901925. The functions don't produce the same things.
Corrected functions:
In [233]: def g1(num_iterations):
...: x = np.ones((0,2),int)
...: for i in range(num_iterations):
...: x = np.append(x, [(i, i)], axis=0)
...: return x
...:
...: def g2(num_iterations):
...: x = np.ones((0, 2),int)
...: for i in range(num_iterations):
...: x = np.vstack((x, (i, i)))
...: return x
...:
...: def g3(num_iterations):
...: x = []
...: for i in range(num_iterations):
...: x.append((i, i))
...: x = np.array(x)
...: return x
...:
In [234]: g1(3)
Out[234]:
array([[0, 0],
[1, 1],
[2, 2]])
In [235]: g2(3)
Out[235]:
array([[0, 0],
[1, 1],
[2, 2]])
In [236]: g3(3)
Out[236]:
array([[0, 0],
[1, 1],
[2, 2]])
np.append and np.vstack timings are much closer. Both use np.concatenate to do the actual joining. They differ in how the inputs are processed prior to sending them to concatenate.
In [237]: timeit g1(1000)
9.69 ms ± 6.25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [238]: timeit g2(1000)
12.8 ms ± 7.53 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [239]: timeit g3(1000)
537 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The wrong results. Note that f1 produces a 1d object dtype array, because the starting value is object dtype array, and there's not axis parameter. f2 duplicates the starting array.
In [240]: f1(3)
Out[240]: array([dtype(('<i4', (2, 1))), 0, 0, 1, 1, 2, 2], dtype=object)
In [241]: f2(3)
Out[241]:
array([[0, 0],
[0, 0],
[1, 1],
[2, 2]])
Not only is it slower to use np.append or np.vstack in a loop, it is also hard to do it right.

How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?

This is motivated by my answer here.
Given array A with shape (n0,n1), and array J with shape (n0), I'd like to create an array B with shape (n0) such that
B[i] = A[i,J[i]]
I'd also like to be able to generalize this to k-dimensional arrays, where A has shape (n0,n1,...,nk) and J has shape (n0,n1,...,n(k-1))
There are messy, flattening ways of doing this that make assumptions about index order:
import numpy as np
B = A.ravel()[ J+A.shape[-1]*np.arange(0,np.prod(J.shape)).reshape(J.shape) ]
The question is, is there a way to do this that doesn't rely on flattening arrays and dealing with indexes manually?
For the 2 and 1d case, this indexing works:
A[np.arange(J.shape[0]), J]
Which can be applied to more dimensions by reshaping to 2d (and back):
A.reshape(-1, A.shape[-1])[np.arange(np.prod(A.shape[:-1])).reshape(J.shape), J]
For 3d A this works:
A[np.arange(J.shape[0])[:,None], np.arange(J.shape[1])[None,:], J]
where the 1st 2 arange indices broadcast to the same dimension as J.
With functions in lib.index_tricks, this can be expressed as:
A[np.ogrid[0:J.shape[0],0:J.shape[1]]+[J]]
A[np.ogrid[slice(J.shape[0]),slice(J.shape[1])]+[J]]
or for multiple dimensions:
A[np.ix_(*[np.arange(x) for x in J.shape])+(J,)]
A[np.ogrid[[slice(k) for k in J.shape]]+[J]]
For small A and J (eg 2*3*4), J.choose(np.rollaxis(A,-1)) is faster. All of the extra time is in preparing the index tuple. np.ix_ is faster than np.ogrid.
np.choose has a size limit. At its upper end it is slower than ix_:
In [610]: Abig=np.arange(31*31).reshape(31,31)
In [611]: Jbig=np.arange(31)
In [612]: Jbig.choose(np.rollaxis(Abig,-1))
Out[612]:
array([ 0, 32, 64, 96, 128, 160, ... 960])
In [613]: timeit Jbig.choose(np.rollaxis(Abig,-1))
10000 loops, best of 3: 73.1 µs per loop
In [614]: timeit Abig[np.ix_(*[np.arange(x) for x in Jbig.shape])+(Jbig,)]
10000 loops, best of 3: 22.7 µs per loop
In [635]: timeit Abig.ravel()[Jbig+Abig.shape[-1]*np.arange(0,np.prod(Jbig.shape)).reshape(Jbig.shape) ]
10000 loops, best of 3: 44.8 µs per loop
I did similar indexing tests at https://stackoverflow.com/a/28007256/901925, and found that flat indexing was faster for much larger arrays (e.g. n0=1000). That's where I learned about the 32 limit for choice.
It doesn't solve your problem exactly, but choose() should nevertheless help:
>>> A = array(range(1, 28)).reshape(3, 3, 3)
>>> B = array([0, 0, 0, 1, 1, 1, 2, 2, 2]).reshape(3, 3)
>>> B.choose(A)
array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
It selects among the first dimension instead of the last.

Fastest way to mix arrays in numpy?

a= array([1,3,5,7,9])
b= array([2,4,6,8,10])
I want to mix pair of arrays so that their sequences insert element by element
Example: using a and b, it should result into
c= array([1,2,3,4,5,6,7,8,9,10])
I need to do that using pairs of long arrays (more than one hundred elements) on thousand of sequences. Any smarter ideas than pickling element by element on each array?
thanks
c = np.empty(len(a)+len(b), dtype=a.dtype)
c[::2] = a
c[1::2] = b
(That assumes a and b have the same dtype.)
You asked for the fastest, so here's a timing comparison (vstack, ravel and empty are all numpy functions):
In [40]: a = np.random.randint(0, 10, size=150)
In [41]: b = np.random.randint(0, 10, size=150)
In [42]: %timeit vstack((a,b)).T.flatten()
100000 loops, best of 3: 5.6 µs per loop
In [43]: %timeit ravel([a, b], order='F')
100000 loops, best of 3: 3.1 µs per loop
In [44]: %timeit c = empty(len(a)+len(b), dtype=a.dtype); c[::2] = a; c[1::2] = b
1000000 loops, best of 3: 1.94 µs per loop
With vstack((a,b)).T.flatten(), a and b are copied to create vstack((a,b)), and then the data is copied again by the flatten() method.
ravel([a, b], order='F') is implemented as asarray([a, b]).ravel(order), which requires copying a and b, and then copying the result to create an array with order='F'. (If you do just ravel([a, b]), it is about the same speed as my answer, because it doesn't have to copy the data again. Unfortunately, order='F' is needed to get the alternating pattern.)
So the other two methods copy the data twice. In my version, each array is copied once.
This'll do it:
vstack((a,b)).T.flatten()
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Using numpy.ravel:
>>> np.ravel([a, b], order='F')
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Categories