I have 2 arrays. Call them 'A' and 'B'.
The shape of array 'A' is (10,10,3), and the shape of array 'B' is (10,10).
Now, I have acquired the coordinates of certain elements of array 'B' as a list of tuples. Let's call the list 'TupleList'.
I want to now make the values of all elements in array 'A' equal to 0, except for those elements present at the coordinates in 'TupleList'.
Imagine the arrays are image arrays. Where A being an RGB image has the 3rd dimension.
How can I do so?
I am having trouble doing this because of the extra 3rd dimension that array 'A' has. Otherwise, it is quite straightforward with the use of np.where(), given that I know the limited number of values that array 'B' can take.
Here is a solution by transposing the indices list into list of indices for the first dimension and list of indices for the second dimension:
import numpy as np
nrows, ncols = 5, 5
arr = np.arange(nrows * ncols * 3, dtype=float).reshape(nrows, ncols, 3)
idxs = [(0, 1), (1, 3), (3, 2), (4, 2)]
idxs_dim0, idxs_dim1 = zip(*idxs)
res = np.zeros(arr.shape)
res[idxs_dim0, idxs_dim1] = arr[idxs_dim0, idxs_dim1]
Checking the result:
from itertools import product
for idx_dim0, idx_dim1 in product(range(nrows), range(ncols)):
value = res[idx_dim0, idx_dim1]
if (idx_dim0, idx_dim1) in idxs:
assert np.all(value == arr[idx_dim0, idx_dim1])
else:
assert np.all(value == (0, 0, 0))
Related
Given a NumPy array:
array = np.random.randn(1,1,2)
print(array.shape)
# (1, 1, 2)
calling tuple() on it eats the first dimension:
tupled_array = tuple(array)
print(tupled_array[0].shape)
# (1, 2) <- why?
I am curious about why?
If we wrap the NumPy array with a list:
tupled_list_array = tuple([array])
print(tupled_list_array[0].shape)
# (1, 1, 2)
tuple() extracts elements based on the first dimension. np.random.randn(1,1,2) is a 1x1x2 matrix. tuple() turns it into the following tuple: (1x2 matrix, )
On the other hand, if you use np.random.randn(2,1,1), tuple() turns it into: (1x1 matrix, 1x1 matrix)
I want to create a 3D np.array named output of varying size. An array of size (5,a,b); with a and b varying (b decreasing):
(a,b) = (1000,20)
(a,b) = (1000,19)
(a,b) = (1000,18)
(a,b) = (1000,17)
(a,b) = (1000,16)
I could create an array of arrays in order to do so, but later on I want to get the first column of all the arrays (without a loop) then I cannot use:
output[:,:,0]
Concatenating them wont work also, it asks for the same size of the arrays...
Any alternatives to be able to have a varying single array instead of an array of arrays?
Thanks!
Like #Divakar said, create an empty array with type object and assign the different sized arrays to their respective indices.
import numpy as np
arrs = [np.ones((5, i, 10 - i)) for i in range(10)]
arrs[0].shape
(5, 0, 10)
arrs[1].shape
(5, 1, 9)
out = np.emtpy(len(arrs), dtype=object)
out[:] = arrs
out[0].shape
(5, 0, 10)
out[1].shape
(5, 1, 9)
Maybe you could make a list and add this 5 arrays.
I want to convert pandas dataframe to 3d array, but cannot get the real shape of the 3d array:
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))
df['a'][3:]=1
df['a'][:3]=2
a3d = np.array(list(df.groupby('a').apply(pd.DataFrame.as_matrix)))
a3d.shape
(2,)
But, when I set as this, I can get the shape
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))
df['a'][2:]=1
df['a'][:2]=2
a3d = np.array(list(df.groupby('a').apply(pd.DataFrame.as_matrix)))
a3d.shape
(2,2,5)
Is there some thing wrong with the code?
Thanks!
Nothing wrong with the code, it's because in the first case, you don't have a 3d array. By definition of an N-d array (here 3d), first two lines explain that each dimension must have the same size. In the first case:
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))
df['a'][3:]=1
df['a'][:3]=2
a3d = np.array(list(df.groupby('a').apply(pd.DataFrame.as_matrix)))
You have a 1-d array of size 2 (it's what a3d.shape shows you) which contains 2-d array of shape (1,5) and (3,5)
a3d[0].shape
Out[173]: (1, 5)
a3d[1].shape
Out[174]: (3, 5)
so both elements in the first dimension of what you call a3d does not have the same size, and can't be considered as other dimensions of this ndarray.
While in the second case,
df = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))
df['a'][2:]=1
df['a'][:2]=2
a3d = np.array(list(df.groupby('a').apply(pd.DataFrame.as_matrix)))
a3d[0].shape
Out[176]: (2, 5)
a3d[1].shape
Out[177]: (2, 5)
both elements of your first dimension have the same size, so a3d is a 3-d array.
The NumPy indexing docs say that
Ellipsis expand to the number of : objects needed to make a selection
tuple of the same length as x.ndim.
However, this seems to hold only when the other indexing arguments are ints and slice objects. For example, None doesn't seem to count towards the selection tuple length for the purposes of Ellipsis:
>>> import numpy
>>> numpy.zeros([2, 2]).shape
(2, 2)
>>> numpy.zeros([2, 2])[..., None].shape
(2, 2, 1)
>>> numpy.zeros([2, 2])[:, None].shape
(2, 1, 2)
>>> numpy.zeros([2, 2])[:, :, None].shape
(2, 2, 1)
Similar odd effects can be observed with boolean indexes, which may count as multiple tuple elements or none at all.
How does NumPy expand Ellipsis in the general case?
Ellipsis does expand to be equivalent to a number of :s, but that number is not always whatever makes the selection tuple length match the array's ndim. Rather, it expands to enough :s for the selection tuple to use every dimension of the array.
In most NumPy indexing, each element of the selection tuple matches up to some dimension of the original array. For example, in
>>> x = numpy.arange(9).reshape([3, 3])
>>> x[1, :]
array([3, 4, 5])
the 1 matches up to the first dimension of x, and the : matches up to the second dimension. The 1 and the : use those dimensions.
Indexing elements don't always use exactly one array dimension, though. If an indexing element corresponds to no input dimensions, or multiple input dimensions, that indexing element will use that many dimensions of the input. For example, None creates a new dimension in the output not corresponding to any dimension of the input. None doesn't use an input dimension, which is why
numpy.zeros([2, 2])[..., None]
expands to
numpy.zeros([2, 2])[:, :, None]
instead of numpy.zeros([2, 2])[:, None].
Similarly, a boolean index uses a number of dimensions corresponding to the number of dimensions of the boolean index itself. For example, a boolean scalar index uses none:
>>> x[..., False].shape
(3, 3, 0)
>>> x[:, False].shape
(3, 0, 3)
>>> x[:, :, False].shape
(3, 3, 0)
And in the common case of a boolean array index with the same shape as the array it's indexing, the boolean array will use every dimension of the other array, and inserting a ... will do nothing:
>>> x.shape
(3, 3)
>>> (x < 5).shape
(3, 3)
>>> x[x<5]
array([0, 1, 2, 3, 4])
>>> x[..., x<5]
array([0, 1, 2, 3, 4])
If you want to see the source code that handles ... expansion and used dimension calculation, it's in the NumPy github repository in the prepare_index function under numpy/core/src/multiarray/mapping.c. Look for the used_ndim variable.
I am writing a program that is suppose to be able to import numpy arrays of some higher dimension, e.g. something like an array a:
a = numpy.zeros([3,5,7,2])
Further, each dimension will correspond to some physical dimension, e.g. frequency, distance, ... and I will also import arrays with information about these dimensions, e.g. for a above:
freq = [1,2,3]
time = [0,1,2,3,4,5,6]
distance = [0,0,0,4,1]
angle = [0,180]
Clearly from this example and the signature it can be figured out that freq belong to dimension 0, time to dimension 2 and so on. But since this is not known in advance, I can take a frequency slice like
a_f1 = a[1,:,:,:]
since I do not know which dimension the frequency is indexed.
So, what I would like is to have some way to chose which dimension to index with an index; in some Python'ish code something like
a_f1 = a.get_slice([0,], [[1],])
This is suppose to return the slice with index 1 from dimension 0 and the full other dimensions.
Doing
a_p = a[0, 1:, ::2, :-1]
would then correspond to something like
a_p = a.get_slice([0, 1, 2, 3], [[0,], [1,2,3,4], [0,2,4,6], [0,]])
You can fairly easily construct a tuple of indices, using slice objects where needed, and then use this to index into your array. The basic is recipe is this:
indices = {
0: # put here whatever you want to get on dimension 0,
1: # put here whatever you want to get on dimension 1,
# leave out whatever dimensions you want to get all of
}
ix = [indices.get(dim, slice(None)) for dim in range(arr.ndim)]
arr[ix]
Here I have done it with a dictionary since I think that makes it easier to see which dimension goes with which indexer.
So with your example data:
x = np.zeros([3,5,7,2])
We do this:
indices = {0: 1}
ix = [indices.get(dim, slice(None)) for dim in range(x.ndim)]
>>> x[ix].shape
(5L, 7L, 2L)
Because your array is all zeros, I'm just showing the shape of the result to indicate that it is what we want. (Even if it weren't all zeros, it's hard to read a 3D array in text form.)
For your second example:
indices = {
0: 0,
1: slice(1, None),
2: slice(None, None, 2),
3: slice(None, -1)
}
ix = [indices.get(dim, slice(None)) for dim in range(x.ndim)]
>>> x[ix].shape
(4L, 4L, 1L)
You can see that the shape corresponds to the number of values in your a_p example. One thing to note is that the first dimension is gone, since you only specified one value for that index. The last dimension still exists, but with a length of one, because you specified a slice that happens to just get one element. (This is the same reason that some_list[0] gives you a single value, but some_list[:1] gives you a one-element list.)
You can use advanced indexing to achieve this.
The index for each dimension needs to be shaped appropriately so that the indices will broadcast correctly across the array. For example, the index for the first dimension of a 3-d array needs to be shaped (x, 1, 1) so that it will broadcast across the first dimension. The index for the second dimension of a 3-d array needs to be shaped (1, y, 1) so that it will broadcast across the second dimension.
import numpy as np
a = np.zeros([3,5,7,2])
b = a[0, 1:, ::2, :-1]
indices = [[0,], [1,2,3,4], [0,2,4,6], [0,]]
def get_aslice(a, indices):
n_dim_ = len(indices)
index_array = [np.array(thing) for thing in indices]
idx = []
# reshape the arrays by adding single-dimensional entries
# based on the position in the index array
for d, thing in enumerate(index_array):
shape = [1] * n_dim_
shape[d] = thing.shape[0]
#print(d, shape)
idx.append(thing.reshape(shape))
c = a[idx]
# to remove leading single-dimensional entries from the shape
#while c.shape[0] == 1:
# c = np.squeeze(c, 0)
# To remove all single-dimensional entries from the shape
#c = np.squeeze(c).shape
return c
For a as an input, it returns an array with shape (1,4,4,1) your a_p example has a shape of (4,4,1). If the extra dimensions need to be removed un-comment the np.squeeze lines in the function.
Now I feel silly. While reading the docs slower I noticed numpy has an indexing routine that does what you want - numpy.ix_
>>> a = numpy.zeros([3,5,7,2])
>>> indices = [[0,], [1,2,3,4], [0,2,4,6], [0,]]
>>> index_arrays = np.ix_(*indices)
>>> a_p = a[index_arrays]
>>> a_p.shape
(1, 4, 4, 1)
>>> a_p = np.squeeze(a_p)
>>> a_p.shape
(4, 4)
>>>