Change shape of nparray - python

import numpy as np
​
image1 = np.zeros((120, 120))
image2 = np.zeros((120, 120))
image3 = np.zeros((120, 120))
​
pack1 = np.array([image1,image2,image3])
pack2 = np.array([image1,image2,image3])
​
result = np.array([pack1,pack2])
print result.shape
the result is :
(2, 3, 120, 120)
Question : how can I make array with shape (2,120,120,3) with same data without mixing?

Use np.rollaxis to move (OK, roll) a single axis to a specified position:
>>> a.shape
(2, 3, 11, 11)
>>> np.rollaxis(a, 0, 4).shape
(3, 11, 11, 2)
Here the syntax is "roll the zeroth axis so that it becomes the 4th in the new array".
Notice that rollaxis creates a view and does not copy:
>>> np.rollaxis(a, 0, 4).base is a
True
An alternative (and often more readable) way would be to use the fact that np.transpose accepts a tuple of where to place the axes. Observe:
>>> np.transpose(a, (1, 2, 3, 0)).shape
(3, 11, 11, 2)
>>> np.transpose(a, (1, 2, 3, 0)).base is a
True
Here the syntax is "permute the axes so that what was the zeroth axis in the original array becomes the 4th axis in the new array"

You can transpose your packs
pack1 = np.array([image1,image2,image3]).T
pack2 = np.array([image1,image2,image3]).T
and the result has your desired shape.

The (relatively) new stack function gives more control that np.array on how arrays are joined.
Use stack to join them on a new last axis:
In [24]: pack1=np.stack((image1,image2,image3),axis=2)
In [25]: pack1.shape
Out[25]: (120, 120, 3)
In [26]: pack2=np.stack((image1,image2,image3),axis=2)
then join on a new first axis (same as np.array()):
In [27]: result=np.stack((pack1,pack2),axis=0)
In [28]: result.shape
Out[28]: (2, 120, 120, 3)

Related

Python/Numpy broadcast join between two arrays

The question is on how to join two arrays in this case more efficiently- There's a numpy array one of shape (N, M, 1) and array two of shape (M,F). It's required to join the second array with the first, to create an array of the shape (N, M, F+1). The elements of the second array will be broadcast along N.
One solution is copying array 2 to have size of the first (along all dims but one) and then concatenate. But this if the copying can be done as a broadcast during the join/concat it would use much lesser memory.
Any suggestions on how to make this more efficient?
The setup:
import numpy as np
arr1 = np.random.randint(0,10,(5,10))
arr1 = np.expand_dims(arr1, axis=-1) #(5,10, 1)
arr2 = np.random.randint(0,4,(10,15))
arr2 = np.expand_dims(arr2, axis=0) #(1, 10, 15)
arr2_2 = arr2
for i in range(len(arr1)-1):
arr2_2 = np.concatenate([arr2_2, arr2],axis=0)
arr2_2.shape #(5, 10, 15)
np.concatenate([arr1, arr2_2],axis=-1) # (5, 10, 16) -> correct end result
Joining arr1 and arr2 to get
try this
>>> a = np.random.randint(0, 10, (5, 10))
>>> b = np.random.randint(0, 4, (10, 15))
>>> c = np.dstack((a[:, :, np.newaxis], np.broadcast_to(b, (a.shape[0], *b.shape))))
>>> a.shape, b.shape, c.shape
((5, 10), (10, 15), (5, 10, 16)))

How to find the index of a tuple in a 2D array in python?

I have an array with the form as follows (with much more elements):
coords = np.array(
[[(2, 1), 1613, 655],
[(2, 5), 906, 245],
[(5, 2), 0, 0]])
And I would like to find the index of a specific tuple. For example, I might be looking for the position of the tuple (2, 5), which should be in position 1 in this case.
I have tried with np.where and np.argwhere, with no luck:
pos = np.argwhere(coords == (2,5))
print(pos)
>> DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
pos = np.where(coords == (2,5))
print(pos)
>> DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
How can I get the index of a tuple?
If you intend to use a numpy array containing objects, all comparison will be done using python itself. At that point, you have given up almost all the advantages of numpy and may as well use a list:
coords = coords.tolist()
index = next((i for i, n in enumerate(coords) if n[0] == (2, 5)), -1)
If you really want to use numpy, I suggest you transform your data appropriately. Two simple options come to mind. You can either expand your tuple and create an array of shape (N, 4), or you can create a structured array that preserves the arrangement of the data as a unit, and has shape (N,). The former is much simpler, while the later is, in my opinion, more elegant.
If you flatten the coordinates:
coords = np.array([[x[0][0], x[0][1], x[1], x[2]] for x in coords])
index = np.flatnonzero(np.all(coords[:, :2] == [2, 5], axis=1))
The structured solution:
coordt = np.dtype([('x', np.int_), ('y', np.int_)])
dt = np.dtype([('coord', coordt), ('a', np.int_), ('b', np.int_)])
coords = np.array([((2, 1), 1613, 655), ((2, 5), 906, 245), ((5, 2), 0, 0)], dtype=dt)
index = np.flatnonzero(coords['coord'] == np.array((2, 5), dtype=coordt))
You can also just transform the first part of your data to a real numpy array, and operate on that:
coords = np.array(coords[:, 0].tolist())
index = np.flatnonzero((coords == [2, 5]).all(axis=1))
You should not compare (2, 5) and coords, but compare (2, 5) and coords[:, 0].
Try this code.
np.where([np.array_equal(coords[:, 0][i], (2, 5)) for i in range(len(coords))])[0]
Try this one
import numpy as np
coords = np.array([[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]])
tpl=(2,5)
i=0 # index of the column in which the tuple you are looking for is listed
pos=([t[i] for t in coords].index(tpl))
print(pos)
Assuming your target tuple (e.g. (2,5) ) is always in the first column of the numpy array coords i.e. coords[:,0] you can simply do the following without any loops!
[*coords[:,0]].index((2,5))
If the tuples aren't necessarily in the first column always, then you can use,
[*coords.flatten()].index((2,5))//3
Hope that helps.
First of all, the tuple (2, 5) is in position 0 as it is the first element of the list [(2, 5), 906, 245].
And second of all, you can use basic python functions to check the index of a tuple in that array. Here's how you do it:
>>> coords = np.array([[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]])
>>>
>>> coords_list = cl = list(coords)
>>> cl
[[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]]
>>>
>>> tuple_to_be_checked = tuple_ = (2, 5)
>>> tuple_
(2, 5)
>>>
>>> for i in range(0, len(cl), 1): # Dynamically works for any array `cl`
for j in range(0, len(cl[i]), 1): # Dynamic; works for any list `cl[i]`
if cl[i][j] == tuple_: # Found the tuple
# Print tuple index and containing list index
print(f'Tuple at index {j} of list at index {i}')
break # Break to avoid unwanted loops
Tuple at index 0 of list at index 1
>>>

How can I map over numpy dataset?

I am working with Keras and the provided MNIST data set. I believe the dataset is a numpy array. I have reshaped it as follows:
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
This gives a (60000, 1, 28, 28) numpy array. This can be read as there are 60000 28 x 28 images. I want to extract every single 28 x 28 image and apply some sort of function f to it. I have tried the following:
f = lambda a, _: print a.shape
np.apply_over_axes(f, data, [2,3])
But I am unsure exactly the second axis parameter comes into play though...
I have also tried:
f = lambda a: print a.shape
np.apply_along_axis(f, 0, data)
But the shape is always (60000,) instead of what I would expect (1, 28, 28). How do I get each subimage?
There is no performance gained by using np.apply_along_axis, np.vectorize, etc. Just use a loop:
import numpy as np
s = (4,1,28,28)
a = np.zeros(s)
for img in a[:,0]:
print(img.shape)
# (28, 28)
# (28, 28)
# (28, 28)
# (28, 28)
This lambda doesn't make sense:
lambda a, _: print a.shape
it's equivalent to
def foo(a, x):
return print a.shape
print a.shape prints something, and returns nothing, maybe even an error.
lambda a,x: a.shape is better, returning the shape of a, and ignoring the x argument.
If the size 1 dimension is in the way, why not just omit it?
X_train = X_train.reshape(X_train.shape[0], 28, 28)
or remove it
X_train[:,0,...]
np.squeeze(X_train)
But what's the point of the apply_over? Just to find the shape of a set of submatrices?
In [304]: X = np.ones((6,1,2,3))
In [305]: [x.shape for x in X]
Out[305]: [(1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3)]
or
[x.shape for x in X[:,0]]
to remove the 2nd dimension, getting just the shape of the last 2.
This apply_along_axis, iterates on the last 3 dim, passing a 1d array to the lambda. So in effect it is returning X[:,0,i,j].shape.
In [308]: np.apply_along_axis(lambda a: a.shape, 0, X)
Out[308]:
array([[[[6, 6, 6],
[6, 6, 6]]]])
Generally iterations like this aren't needed. And when used, are slow compared to 'full-array' ones.

How to add names to a numpy array without changing its dimension?

I have an existing two-column numpy array to which I need to add column names. Passing those in via dtype works in the toy example shown in Block 1 below. With my actual array, though, as shown in Block 2, the same approach is having an unexpected (to me!) side-effect of changing the array dimensions.
How can I convert my actual array, the one named Y in the second block below, to an array having named columns, like I did for array A in the first block?
Block 1: (Columns of A named without reshaping dimension)
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
A
# array([[ 1, 2],
# [ 3, 4],
# [ 50, 100]])
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
A
# array([[(1, 2)],
# [(3, 4)],
# [(50, 100)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
Block 2: (Naming columns of my actual array, Y, reshapes its dimension)
import numpy as np
## Code to reproduce Y, the array I'm actually dealing with
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
Y = X.T
Y
# array([[1, 3],
# [2, 2],
# [3, 2],
# [4, 1],
# [5, 1],
# [6, 1]])
## My unsuccessful attempt to add names to the array's columns
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
Y.dtype=dt
Y
# array([[(1, 2), (3, 2)],
# [(3, 4), (2, 1)],
# [(5, 6), (1, 1)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
## What I'd like instead of the results shown just above
# array([[(1, 3)],
# [(2, 2)],
# [(3, 2)],
# [(4, 1)],
# [(5, 1)],
# [(6, 1)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
First because your question asks about giving names to arrays, I feel obligated to point out that using "structured arrays" for the purpose of giving names is probably not the best approach. We often like to give names to rows/columns when we're working with tables, if this is the case I suggest you try something like pandas which is awesome. If you simply want to organize some data in your code, a dictionary of arrays is often much better than a structured array, so for example you can do:
Y = {'ID':X[0], 'Ring':X[1]}
With that out of the way, if you want to use a structured array, here is the clearest way to do it in my opinion:
import numpy as np
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
dt = {'names':['ID', 'Ring'], 'formats':[int, int]}
Y = np.zeros(len(RING), dtype=dt)
Y['ID'] = X[0]
Y['Ring'] = X[1]
store-different-datatypes-in-one-numpy-array
another page including a nice solution of adding name to an array which can be used as column
Example:
r = np.core.records.fromarrays([x1,x2,x3],names='a,b,c')
# x1, x2, x3 are flatten array
# a,b,c are field name
This is because Y is not C_CONTIGUOUS, you can check it by Y.flags:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
You can call Y.copy() or Y.ravel() first:
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
print Y.ravel().view(dt) # the result shape is (6, )
print Y.copy().view(dt) # the result shape is (6, 1)
Are you completely sure about the outputs for A and Y? I get something different using Python 2.7.6 and numpy 1.8.1.
My initial output for A is the same as yours, as it should be. After running the following code for the first example
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
the contents of array A are actually
array([[(1, 0), (3, 0)],
[(2, 0), (2, 0)],
[(3, 0), (2, 0)],
[(4, 0), (1, 0)],
[(5, 0), (1, 0)],
[(6, 0), (1, 0)]],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
This makes somewhat more sense to me than the output you added because dtype determines the data-type of every element in the array and the new definition states that every element should contain two fields, so it does, but the value of the second field is set to 0 because there was no preexisting value for the second field.
However, if you would like to make numpy group columns of your existing array so that every row contains only one element, but with each element having two fields, you could introduce a small code change.
Since a tuple is needed to make numpy group elements into a more complex data-type, you could make this happen by creating a new array and turning every row of the existing array into a tuple. Here is a simple working example
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)
Using this short piece of code, array B becomes
array([(1, 2), (3, 4), (50, 100)],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
To make B a 2D array, it is enough to write
B.reshape(len(B), 1) # in this case, even B.size would work instead of len(B)
For the second example, the similar thing needs to be done to make Y a structured array:
Y = np.array(list(map(tuple, X.T)), dtype=dt)
After doing this for your second example, array Y looks like this
array([(1, 3), (2, 2), (3, 2), (4, 1), (5, 1), (6, 1)],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
You can notice that the output is not the same as the one you expect it to be, but this one is simpler because instead of writing Y[0,0] to get the first element, you can just write Y[0]. To also make this array 2D, you can also use reshape, just as with B.
Try re-writing the definition of X:
X = np.array(zip(ID, RING))
and then you don't need to define Y = X.T

Shape of array python

Suppose I create a 2 dimensional array
m = np.random.normal(0, 1, size=(1000, 2))
q = np.zeros(shape=(1000,1))
print m[:,0] -q
When I take m[:,0].shape I get (1000,) as opposed to (1000,1) which is what I want. How do I coerce m[:,0] to a (1000,1) array?
By selecting the 0th column in particular, as you've noticed, you reduce the dimensionality:
>>> m = np.random.normal(0, 1, size=(5, 2))
>>> m[:,0].shape
(5,)
You have a lot of options to get a 5x1 object back out. You can index using a list, rather than an integer:
>>> m[:, [0]].shape
(5, 1)
You can ask for "all the columns up to but not including 1":
>>> m[:,:1].shape
(5, 1)
Or you can use None (or np.newaxis), which is a general trick to extend the dimensions:
>>> m[:,0,None].shape
(5, 1)
>>> m[:,0][:,None].shape
(5, 1)
>>> m[:,0, None, None].shape
(5, 1, 1)
Finally, you can reshape:
>>> m[:,0].reshape(5,1).shape
(5, 1)
but I'd use one of the other methods for a case like this.

Categories