How can I map over numpy dataset? - python

I am working with Keras and the provided MNIST data set. I believe the dataset is a numpy array. I have reshaped it as follows:
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
This gives a (60000, 1, 28, 28) numpy array. This can be read as there are 60000 28 x 28 images. I want to extract every single 28 x 28 image and apply some sort of function f to it. I have tried the following:
f = lambda a, _: print a.shape
np.apply_over_axes(f, data, [2,3])
But I am unsure exactly the second axis parameter comes into play though...
I have also tried:
f = lambda a: print a.shape
np.apply_along_axis(f, 0, data)
But the shape is always (60000,) instead of what I would expect (1, 28, 28). How do I get each subimage?

There is no performance gained by using np.apply_along_axis, np.vectorize, etc. Just use a loop:
import numpy as np
s = (4,1,28,28)
a = np.zeros(s)
for img in a[:,0]:
print(img.shape)
# (28, 28)
# (28, 28)
# (28, 28)
# (28, 28)

This lambda doesn't make sense:
lambda a, _: print a.shape
it's equivalent to
def foo(a, x):
return print a.shape
print a.shape prints something, and returns nothing, maybe even an error.
lambda a,x: a.shape is better, returning the shape of a, and ignoring the x argument.
If the size 1 dimension is in the way, why not just omit it?
X_train = X_train.reshape(X_train.shape[0], 28, 28)
or remove it
X_train[:,0,...]
np.squeeze(X_train)
But what's the point of the apply_over? Just to find the shape of a set of submatrices?
In [304]: X = np.ones((6,1,2,3))
In [305]: [x.shape for x in X]
Out[305]: [(1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3)]
or
[x.shape for x in X[:,0]]
to remove the 2nd dimension, getting just the shape of the last 2.
This apply_along_axis, iterates on the last 3 dim, passing a 1d array to the lambda. So in effect it is returning X[:,0,i,j].shape.
In [308]: np.apply_along_axis(lambda a: a.shape, 0, X)
Out[308]:
array([[[[6, 6, 6],
[6, 6, 6]]]])
Generally iterations like this aren't needed. And when used, are slow compared to 'full-array' ones.

Related

Python/Numpy broadcast join between two arrays

The question is on how to join two arrays in this case more efficiently- There's a numpy array one of shape (N, M, 1) and array two of shape (M,F). It's required to join the second array with the first, to create an array of the shape (N, M, F+1). The elements of the second array will be broadcast along N.
One solution is copying array 2 to have size of the first (along all dims but one) and then concatenate. But this if the copying can be done as a broadcast during the join/concat it would use much lesser memory.
Any suggestions on how to make this more efficient?
The setup:
import numpy as np
arr1 = np.random.randint(0,10,(5,10))
arr1 = np.expand_dims(arr1, axis=-1) #(5,10, 1)
arr2 = np.random.randint(0,4,(10,15))
arr2 = np.expand_dims(arr2, axis=0) #(1, 10, 15)
arr2_2 = arr2
for i in range(len(arr1)-1):
arr2_2 = np.concatenate([arr2_2, arr2],axis=0)
arr2_2.shape #(5, 10, 15)
np.concatenate([arr1, arr2_2],axis=-1) # (5, 10, 16) -> correct end result
Joining arr1 and arr2 to get
try this
>>> a = np.random.randint(0, 10, (5, 10))
>>> b = np.random.randint(0, 4, (10, 15))
>>> c = np.dstack((a[:, :, np.newaxis], np.broadcast_to(b, (a.shape[0], *b.shape))))
>>> a.shape, b.shape, c.shape
((5, 10), (10, 15), (5, 10, 16)))

combining multi numpy arrays (images) in one array (image) in python

I have some numpy arrays which its elements are pixels of 28*28 images like this:
25 of these arrays are in one array in shape of (25,28,28) or (5,5,28,28). Is there any efficient way to stack them to have one image: 5*5 of 28*28 images.
I tried np.reshape to (140,140) array and plt.imgshow. But the output was a messed image.
"I tried np.reshape to (140,140)..." That will work if you first transpose the input appropriately.
Suppose the input x has shape (5, 5, 28, 28). To get the array y with shape (140, 140) that contains the images arranged the way you want, you can do:
xshp = x.shp
y = x.transpose((0, 2, 1, 3)).reshape((xshp[0]*xshp[2], xshp[1]*xshp[3]))
If x always has shape (5, 5, 28, 28), you can hardcode the constant 140:
y = x.transpose((0, 2, 1, 3)).reshape((140, 140))
For example, here I create x with shape (5, 5, 28, 28) where each 28x28 image is a constant. The constants are chosen randomly. The tranposed, reshaped array y is plotted, and you can see that all the constant blocks are arranged correctly.
In [148]: rng = np.random.default_rng()
In [149]: x = np.repeat(rng.integers(0, 256, size=(5, 5)), 28*28, axis=-1).reshape((5, 5, 28, 28))
In [150]: y = x.transpose((0, 2, 1, 3)).reshape((140, 140))
In [151]: imshow(y)

Keras: how to access specific index for multiplication

I am building a function that multiplies input from one model branch in a particular way with inputs from another model branch, but accessing specific parts of the tensors isn't doing what I expect.
Minimal example: Imagine we get two tensors, one of which contains [1, 2] and the other [10, 20, 30] and one of the outputs should be [1] x [10, 20, 30] by taking the first value of the first tensor.
If I start by making variables like this:
import keras.backend as K
import numpy as np
from keras.layers import Multiply
x = K.variable(value=np.array([1,2]))
y = K.variable(value=np.array([[10,20,30]]))
Then I can access x[0] easily enough:
print(K.eval(x[0]))
gives: 1.0
But it seems like that same indexing doesn't work for Multiply, as this code:
z = Multiply()([x[0], y])
Generates:
IndexError: tuple index out of range
Thus the question: how can I access specific value indexes within a Multiply layer in keras (or how else can I do the equivalent)?
Just to show you an example of how one could achieve what you want. Let's assume that we have two inputs:
input_1 = Input(shape=(2,))
input_2 = Input(shape=(3,))
Now - let's define the following function:
def custom_multiply(list_):
x, y = list_[0], list_[1]
y = K.reshape(y, (-1, 1, 3)) # (1, 2, 3) -> ((1), (2), (3))
x = K.reshape(x, (-1, 2, 1)) # (1, 2) -> ((1, 2))
partial_result = K.batch_dot(x, y)
return K.reshape(partial_result, (-1, 6))
Now - output = custom_multiply([input_1, input_2]) should do what you've expected. Called on a pair [(1, 2), (3, 4, 5)] should return (3, 4, 5, 6, 8, 10).

Change shape of nparray

import numpy as np
​
image1 = np.zeros((120, 120))
image2 = np.zeros((120, 120))
image3 = np.zeros((120, 120))
​
pack1 = np.array([image1,image2,image3])
pack2 = np.array([image1,image2,image3])
​
result = np.array([pack1,pack2])
print result.shape
the result is :
(2, 3, 120, 120)
Question : how can I make array with shape (2,120,120,3) with same data without mixing?
Use np.rollaxis to move (OK, roll) a single axis to a specified position:
>>> a.shape
(2, 3, 11, 11)
>>> np.rollaxis(a, 0, 4).shape
(3, 11, 11, 2)
Here the syntax is "roll the zeroth axis so that it becomes the 4th in the new array".
Notice that rollaxis creates a view and does not copy:
>>> np.rollaxis(a, 0, 4).base is a
True
An alternative (and often more readable) way would be to use the fact that np.transpose accepts a tuple of where to place the axes. Observe:
>>> np.transpose(a, (1, 2, 3, 0)).shape
(3, 11, 11, 2)
>>> np.transpose(a, (1, 2, 3, 0)).base is a
True
Here the syntax is "permute the axes so that what was the zeroth axis in the original array becomes the 4th axis in the new array"
You can transpose your packs
pack1 = np.array([image1,image2,image3]).T
pack2 = np.array([image1,image2,image3]).T
and the result has your desired shape.
The (relatively) new stack function gives more control that np.array on how arrays are joined.
Use stack to join them on a new last axis:
In [24]: pack1=np.stack((image1,image2,image3),axis=2)
In [25]: pack1.shape
Out[25]: (120, 120, 3)
In [26]: pack2=np.stack((image1,image2,image3),axis=2)
then join on a new first axis (same as np.array()):
In [27]: result=np.stack((pack1,pack2),axis=0)
In [28]: result.shape
Out[28]: (2, 120, 120, 3)

Dynamically indexing/choosing the dimension of numpy array

Just working on a CNN and am stuck on a tensor algorithm.
I want to be able to iterate through a list, or tuple, of dimensions and choose a range of elements of X (a multi dimensional array) from that dimension, while leaving the other dimensions alone.
x = np.random.random((10,3,32,32)) #some multi dimensional array
dims = [2,3] #aka the 32s
#for a dimension in dims
#I want the array of numbers from i:i+window in that dimension
#something like
arr1 = x.index(i:i+3,axis = dim[0])
#returns shape 10,3,3,32
arr2 = arr1.index(i:i+3,axis = dim[1])
#returns shape 10,3,3,3
np.take should work for you (read its docs)
In [237]: x=np.ones((10,3,32,32),int)
In [238]: dims=[2,3]
In [239]: arr1=x.take(range(1,1+3), axis=dims[0])
In [240]: arr1.shape
Out[240]: (10, 3, 3, 32)
In [241]: arr2=x.take(range(1,1+3), axis=dims[1])
In [242]: arr2.shape
Out[242]: (10, 3, 32, 3)
You can try slicing with
arr1 = x[:,:,i:i+3,:]
and
arr2 = arr1[:,:,:,i:i+3]
Shape is then
>>> x[:,:,i:i+3,:].shape
(10, 3, 3, 32)

Categories