I have a 3d Numpy array and would like to take the mean over one axis considering certain elements from the other two dimensions.
This is an example code depicting my problem:
import numpy as np
myarray = np.random.random((5,10,30))
yy = [1,2,3,4]
xx = [20,21,22,23,24,25,26,27,28,29]
mymean = [ np.mean(myarray[t,yy,xx]) for t in np.arange(5) ]
However, this results in:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Why does an indexing like e.g. myarray[:,[1,2,3,4],[1,2,3,4]] work, but not my code above?
This is how you fancy-index over more than one dimension:
>>> np.mean(myarray[np.arange(5)[:, None, None], np.array(yy)[:, None], xx],
axis=(-1, -2))
array([ 0.49482768, 0.53013301, 0.4485054 , 0.49516017, 0.47034123])
When you use fancy indexing, i.e. a list or array as an index, over more than one dimension, numpy broadcasts those arrays to a common shape, and uses them to index the array. You need to add those extra dimensions of length 1 at the end of the first indexing arrays, for the broadcast to work properly. Here are the rules of the game.
Since you use consecutive elements you can use a slice:
import numpy as np
myarray = np.random.random((5,10,30))
yy = slice(1,5)
xx = slice(20, 30)
mymean = [np.mean(myarray[t, yy, xx]) for t in np.arange(5)]
To answer your question about why it doesn't work: when you use lists/arrays as indices, Numpy uses a different set of indexing semantics than it does if you use slices. You can see the full story in the documentation and, as that page says, it "can be somewhat mind-boggling".
If you want to do it for nonconsecutive elements, you must grok that complex indexing mechanism.
Related
I initialise an array as a=numpy.array([1,2,3]).
on running the statement print(a[0,:]), it shows an error. Does this slicing method only work for 2d arrays?
Just replace "a[0,:]" with "a[0:]".
import numpy as np
a = np.array([1, 2, 3])
print(a[0:])
You could solve this issue with
a = a[np.newaxis, :]
before printing, making it to a 1 x 3 array instead of having shape (3,). Obviously this only makes sense, if you need your printing statement for other multidimensional arrays also and want to make it work in a generalized way.
I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).
This is an example of my error. Say i created a numpy array
X = np.zeros((1000, 50))
Where 1000 is the features (rows) and 50 is the examples (columns)
Since i am adding examples one by one i will have to replace columns in the array 1 by 1 to get the final feature array. I tried this:
X[:,i] = example
where example is of size (1000, 1), and i is iterated for every example. This does not work because X[:,i] is of shape (1000,), a rank 1 array. How do i code it so that each example replaces a row of the X array without throwing the broadcast error. Thank you.
Reshape your vector before assigning it.
X[:,i] = example.reshape(-1,)
This will suppress the second dimension and turn example into shape (1000,)
Or, avoiding assigning one by one in the loop you can put all of your arrays in a list and then call np.array on your list and transpose it to have them as columns. This will probably work better if you can construct your list of arrays in a list comprehension.
Example:
arrs = [np.random.randint(10, size=5) for _ in range(5)]
X = np.array(arrs).T
Consider the following simple example:
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X[:,0] = x # WORKS
X[:,0:1] = x # returns ERROR:
# ValueError: could not broadcast input array from shape (10) into shape (10,1)
X[:,0:1] = (x.reshape(-1, 1)) # WORKS
Can someone explain why numpy has vectors of shape (N,) rather than (N,1) ?
What is the best way to do the casting from 1D array into 2D array?
Why do I need this?
Because I have a code which inserts result x into a 2D array X and the size of x changes from time to time so I have X[:, idx1:idx2] = x which works if x is 2D too but not if x is 1D.
Do you really need to be able to handle both 1D and 2D inputs with the same function? If you know the input is going to be 1D, use
X[:, i] = x
If you know the input is going to be 2D, use
X[:, start:end] = x
If you don't know the input dimensions, I recommend switching between one line or the other with an if, though there might be some indexing trick I'm not aware of that would handle both identically.
Your x has shape (N,) rather than shape (N, 1) (or (1, N)) because numpy isn't built for just matrix math. ndarrays are n-dimensional; they support efficient, consistent vectorized operations for any non-negative number of dimensions (including 0). While this may occasionally make matrix operations a bit less concise (especially in the case of dot for matrix multiplication), it produces more generally applicable code for when your data is naturally 1-dimensional or 3-, 4-, or n-dimensional.
I think you have the answer already included in your question. Numpy allows the arrays be of any dimensionality (while afaik Matlab prefers two dimensions where possible), so you need to be correct with this (and always distinguish between (n,) and (n,1)). By giving one number as one of the indices (like 0 in 3rd row), you reduce the dimensionality by one. By giving a range as one of the indices (like 0:1 in 4th row), you don't reduce the dimensionality.
Line 3 makes perfect sense for me and I would assign to the 2-D array this way.
Here are two tricks that make the code a little shorter.
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X.T[:1, :] = x
X[:, 2:3] = x[:, None]
I have three arrays: longitude(400,600),latitude(400,600),data(30,400,60); what I am trying to do is to extract value in the data array according to it's location(latitude and longitude).
Here is my code:
import numpy
import tables
hdf = "data.hdf5"
h5file = tables.openFile(hdf, mode = "r")
lon = numpy.array(h5file.root.Lonitude)
lat = numpy.array(h5file.root.Latitude)
arr = numpy.array(h5file.root.data)
lon = numpy.array(lon.flat)
lat = numpy.array(lat.flat)
arr = numpy.array(arr.flat)
lonlist=[]
latlist=[]
layer=[]
fre=[]
for i in range(0,len(lon)):
for j in range(0,30):
longi = lon[j]
lati = lat[j]
layers=[j]
frequency= arr[i]
lonlist.append(longi)
latlist.append(lati)
layer.append(layers)
fre.append(frequency)
output = numpy.column_stack((lonlist,latlist,layer,fre))
The problem is that the "frequency" is not what I want.I want the data array to be flattened along axis-zero,so that the "frequency" would be the 30 values at one location.Is there such a function in numpy to flatten ndarray along a particular axis?
You can try np.ravel(your_array), or your_array.shape=-1. The np.ravel function lets you use an optional argument order: choose C for a row-major order or F for a column-major order.
I guess what you actually wanted was just transpose to change the axis order. Depending on what you do with it, it might be useful to do a .copy() after the transposed to optimize the memory layout, since transpose will not create a copy itself.
Just to add, if you want to make something that is beyond F and C order, you can use transposed = ndarray.transpose([1,2,0]) to move the first axis to the end, the last into second position and then do transposed.ravel() (I assumed C order, so moved 0 axis to the end). You can also use reshape which is more powerful then the simple ravel (return shape can be any dimension).
Note that unless the strides add up exactly, numpy will have to make a copy of the array, you can avoid that by the very nice transposed.flat() iterator in many cases.
>>> a = np.random.rand(2,2,2)
>>> a
array([[[ 0.67379148, 0.95508303],
[ 0.80520281, 0.34666202]],
[[ 0.01862911, 0.33851973],
[ 0.18464121, 0.64637853]]])
>>> np.ravel(a)
array([ 0.67379148, 0.95508303, 0.80520281, 0.34666202, 0.01862911,
0.33851973, 0.18464121, 0.64637853])
You are essentially unfolding a high-dimensional tensor. Try tensorly.unfold(arr, mode=the_direction_you_want). For example,
import numpy as np
import tensorly as tl
a = np.zeros((3, 4, 5))
b = tl.unfold(a, mode=1)
b.shape # (4, 15)