Simplify numpy.dot within loop - python

Is it possible to simplify this:
import numpy as np
a = np.random.random_sample((40, 3))
data_base = np.random.random_sample((20, 3))
mean = np.random.random_sample((40,))
data = []
for s in data_base:
data.append(mean + np.dot(a, s))
data should be of size (20, 40). I was wondering if I could do some broadcasting instead of the loop. I was not able to do it with np.add and some [:, None]. I certainly do not use this correctly.

Your data creates a (20,40) array:
In [385]: len(data)
Out[385]: 20
In [386]: data = np.array(data)
In [387]: data.shape
Out[387]: (20, 40)
The straight forward application of dot produces the same thing:
In [388]: M2=mean+np.dot(data_base, a.T)
In [389]: np.allclose(M2,data)
Out[389]: True
The matmul operator also works with these arrays (no need to expand and squeeze):
M3 = data_base#a.T + mean

Related

How to stack numpy array along an axis

I have two numpy arrays, one with shape let's say (10, 5, 200), and another one with the shape (1, 200), how can I stack them so I get as a result an array of dimensions (10, 6, 200)? Basically by stacking it to each 2-d array iterating along the first dimension
a = np.random.random((10, 5, 200))
b = np.zeros((1, 200))
I'v tried with hstack and vstack but I get an error in incorrect number of axis
Let's say:
a = np.random.random((10, 5, 200))
b = np.zeros((1, 200))
Let's look at the volume (number of elements) of each array:
The volume of a is 10*5*200 = 10000.
The volume of an array with (10,6,200) is 10*5*200=1200.
That is you want to create an array that has 2000 more elements.
However, the volume of b is 1*200 = 200.
This means a and b can't be stacked.
As hpaulj mentioned in the comments, one way is to define an numpy array and then fill it:
result = np.empty((a.shape[0], a.shape[1] + b.shape[0], a.shape[2]))
result[:, :a.shape[1], :] = a
result[:, a.shape[1]:, :] = b

How to add a given number of rows and columns to a numpy array?

I have an numpy array of shape (2000,1) and I need the shape to be (2000,7).
The values in the 6 column we are adding can be anything.
Is there a function or method to accomplish this?
Thanks.
You can try numpy.hstack for this.
>>> x = np.zeros((2000, 1))
>>> x.shape
(2000, 1)
>>> x = np.hstack((x, np.zeros((2000, 6))))
>>> x.shape
(2000, 7)
An interesting option is np.pad function.
The second parameter (pad_with - a list of 2-tuples) defines how many
elements to add at the beginning / end of each dimension.
Example:
arr = np.arange(1,7)[:, np.newaxis]
arr.shape # (6, 1)
result = np.pad(arr, [(0, 0), (0, 6)])
result.shape # (6, 7)
There can be passed also third parameter - mode - defining various
ways what value to pad with. For details see the documentation.
You can do this with broadcasting:
x = np.zeros((2000, 1))
np.broadcast_to(x, (2000,7))
In this case, the values in the first row will be repeated along the second axis.

How to join two 3D numpy arrays so that np.arr(1,m,n) + np.arr(1,m,n) = np.arr(2,m,n)

I have several 3-dimensional numpy arrays that I want to join together to feed them as a training set for my LSTM neural network. They are mostly of shape (1,m,n)
I want to join them so that, for e.g. np.arr(1,50,20) + np.arr(1,50,20) = np.arr(2,50,20) and np.arr(1,50,20) + np.arr(3,50,20) = np.arr(4,50,20)
Which of the stack functions of numpy would suit my problem? Or is there another way to solve it more efficiently?
Use numpy concatenate with the first axis.
import numpy as np
rng = np.random.default_rng()
a = rng.integers(0, 10, (1, 3, 20))
b = rng.integers(-10, -1, (2, 3, 20))
c = np.concatenate((a, b), axis=0)
print(c.shape)
(3, 3, 20)
Use np.vstack
x = np.array([[[2,3,5],[4,5,1]]])
y = np.array([[[1,5,8],[8,0,9]]])
x.shape
(1,2,3)
np.vstack((x,y)).shape
(2,2,3)

Binning of data along one axis in numpy

I have a large two dimensional array arr which I would like to bin over the second axis using numpy. Because np.histogram flattens the array I'm currently using a for loop:
import numpy as np
arr = np.random.randn(100, 100)
nbins = 10
binned = np.empty((arr.shape[0], nbins))
for i in range(arr.shape[0]):
binned[i,:] = np.histogram(arr[i,:], bins=nbins)[0]
I feel like there should be a more direct and more efficient way to do that within numpy but I failed to find one.
You could use np.apply_along_axis:
x = np.array([range(20), range(1, 21), range(2, 22)])
nbins = 2
>>> np.apply_along_axis(lambda a: np.histogram(a, bins=nbins)[0], 1, x)
array([[10, 10],
[10, 10],
[10, 10]])
The main advantage (if any) is that it's slightly shorter, but I wouldn't expect much of a performance gain. It's possibly marginally more efficient in the assembly of the per-row results.
I was a bit confused by the lambda in Ami's solution so I expanded it out to show what it's doing:
def hist_1d(a):
return np.histogram(a, bins=bins)[0]
counts = np.apply_along_axis(hist_1d, axis=1, arr=x)
To bin a numpy array along any axis you may use :
def bin_nd_data(arr, bin_n = 2, axis = -1):
""" bin a nD array along one specific axis, to check.."""
ss = list( arr.shape )
if ss[axis]%bin_n==0:
ss[ axis ] = int( ss[axis]/bin_n)
print('ss is ', ss )
if axis==-1:
ss.append( bin_n)
return np.mean( np.reshape(arr, ss, order='F' ), axis=-1 )
else:
ss.insert( axis+1, bin_n )
return np.mean( np.reshape(arr, ss, order='F' ), axis=axis+1 )
else:
print('bin nd data, not divisible bin given : array shape :', arr.shape, ' bin ', bin_n)
return None
It is a slight bother to take into account the case 'axis=-1'.
You have to use numpy.histogramdd specifically meant for your problem

Vectorized way of accessing row specific elements in a numpy array

I have a 2-D NumPy array and a set of indices the size of which is the first dimension of the NumPy array.
X = np.random.rand(5, 3)
a = np.random.randint(0, 3, 5)
I need to do something like
for i, ind in enumerate(a):
print X[i][ind]
Is there a vectorized way of doing this?
Here you go:
X = np.random.rand(5, 3)
a = np.random.randint(0, 3, 5)
In [12]: X[np.arange(a.size), a]
Out[12]: array([ 0.99653335, 0.30275346, 0.92844957, 0.54728781, 0.43535668])
In [13]: for i, ind in enumerate(a):
print X[i][ind]
# ....:
#0.996533345844
#0.30275345582
#0.92844956619
#0.54728781105
#0.435356681672
I'm assuming here that you don't need each value on a separate line and just want to extract the values.

Categories