I have three arrays: longitude(400,600),latitude(400,600),data(30,400,60); what I am trying to do is to extract value in the data array according to it's location(latitude and longitude).
Here is my code:
import numpy
import tables
hdf = "data.hdf5"
h5file = tables.openFile(hdf, mode = "r")
lon = numpy.array(h5file.root.Lonitude)
lat = numpy.array(h5file.root.Latitude)
arr = numpy.array(h5file.root.data)
lon = numpy.array(lon.flat)
lat = numpy.array(lat.flat)
arr = numpy.array(arr.flat)
lonlist=[]
latlist=[]
layer=[]
fre=[]
for i in range(0,len(lon)):
for j in range(0,30):
longi = lon[j]
lati = lat[j]
layers=[j]
frequency= arr[i]
lonlist.append(longi)
latlist.append(lati)
layer.append(layers)
fre.append(frequency)
output = numpy.column_stack((lonlist,latlist,layer,fre))
The problem is that the "frequency" is not what I want.I want the data array to be flattened along axis-zero,so that the "frequency" would be the 30 values at one location.Is there such a function in numpy to flatten ndarray along a particular axis?
You can try np.ravel(your_array), or your_array.shape=-1. The np.ravel function lets you use an optional argument order: choose C for a row-major order or F for a column-major order.
I guess what you actually wanted was just transpose to change the axis order. Depending on what you do with it, it might be useful to do a .copy() after the transposed to optimize the memory layout, since transpose will not create a copy itself.
Just to add, if you want to make something that is beyond F and C order, you can use transposed = ndarray.transpose([1,2,0]) to move the first axis to the end, the last into second position and then do transposed.ravel() (I assumed C order, so moved 0 axis to the end). You can also use reshape which is more powerful then the simple ravel (return shape can be any dimension).
Note that unless the strides add up exactly, numpy will have to make a copy of the array, you can avoid that by the very nice transposed.flat() iterator in many cases.
>>> a = np.random.rand(2,2,2)
>>> a
array([[[ 0.67379148, 0.95508303],
[ 0.80520281, 0.34666202]],
[[ 0.01862911, 0.33851973],
[ 0.18464121, 0.64637853]]])
>>> np.ravel(a)
array([ 0.67379148, 0.95508303, 0.80520281, 0.34666202, 0.01862911,
0.33851973, 0.18464121, 0.64637853])
You are essentially unfolding a high-dimensional tensor. Try tensorly.unfold(arr, mode=the_direction_you_want). For example,
import numpy as np
import tensorly as tl
a = np.zeros((3, 4, 5))
b = tl.unfold(a, mode=1)
b.shape # (4, 15)
Related
I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).
I have a different shape of 3D matrices. Such as:
Matrix shape = [5,10,2048]
Matrix shape = [5,6,2048]
Matrix shape = [5,1,2048]
and so on....
I would like to put them into big matrix, but I am normally getting a shape error (since they have different shape) when I am trying to use numpy.asarray(list_of_matrix) function.
What would be your recommendation to handle such a case?
My implementation was like the following:
matrices = []
matrices.append(mat1)
matrices.append(mat2)
matrices.append(mat3)
result_matrix = numpy.asarray(matrices)
and having shape error!!
UPDATE
I am willing to have a result matrix that is 4D.
Thank you.
I'm not entirely certain if this would work for you, but it looks as though your matrices only disagree along the 1st axis, so why not concatenate them:
e.g.
>>> import numpy as np
>>> c=np.zeros((5,10,2048))
>>> d=np.zeros((5,6,2048))
>>> e=np.zeros((5,1,2048))
>>> f=np.concatenate((c,d,e),axis=1)
>>> f.shape
(5, 17, 2048)
Now, you'd have to keep track of which indices of the 1st axis corresponds to which matrices, but maybe this could work for you?
I have an array of values and would like to create a matrix from that, where each row is my starting point vector multiplied by a sample from a (normal) distribution.
The number of rows of this matrix will then vary in dependence from the number of samples I want.
%pylab
my_vec = array([1,2,3])
my_rand_vec = my_vec*randn(100)
Last command does not work, because array shapes do not match.
I could think of using a for loop, but I am trying to leverage on array operations.
Try this
my_rand_vec = my_vec[None,:]*randn(100)[:,None]
For small numbers I get for example
import numpy as np
my_vec = np.array([1,2,3])
my_rand_vec = my_vec[None,:]*np.random.randn(5)[:,None]
my_rand_vec
# array([[ 0.45422416, 0.90844831, 1.36267247],
# [-0.80639766, -1.61279531, -2.41919297],
# [ 0.34203295, 0.6840659 , 1.02609885],
# [-0.55246431, -1.10492863, -1.65739294],
# [-0.83023829, -1.66047658, -2.49071486]])
Your solution my_vec*rand(100) does not work because * corresponds to the element-wise multiplication which only works if both arrays have identical shapes.
What you have to do is adding an additional dimension using [None,:] and [:,None] such that numpy's broadcasting works.
As a side note I would recommend not to use pylab. Instead, use import as in order to include modules as pointed out here.
It is the outer product of vectors:
my_rand_vec = numpy.outer(randn(100), my_vec)
You can pass the dimensions of the array you require to numpy.random.randn:
my_rand_vec = my_vec*np.random.randn(100,3)
To multiply each vector by the same random number, you need to add an extra axis:
my_rand_vec = my_vec*np.random.randn(100)[:,np.newaxis]
I have a 3d Numpy array and would like to take the mean over one axis considering certain elements from the other two dimensions.
This is an example code depicting my problem:
import numpy as np
myarray = np.random.random((5,10,30))
yy = [1,2,3,4]
xx = [20,21,22,23,24,25,26,27,28,29]
mymean = [ np.mean(myarray[t,yy,xx]) for t in np.arange(5) ]
However, this results in:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Why does an indexing like e.g. myarray[:,[1,2,3,4],[1,2,3,4]] work, but not my code above?
This is how you fancy-index over more than one dimension:
>>> np.mean(myarray[np.arange(5)[:, None, None], np.array(yy)[:, None], xx],
axis=(-1, -2))
array([ 0.49482768, 0.53013301, 0.4485054 , 0.49516017, 0.47034123])
When you use fancy indexing, i.e. a list or array as an index, over more than one dimension, numpy broadcasts those arrays to a common shape, and uses them to index the array. You need to add those extra dimensions of length 1 at the end of the first indexing arrays, for the broadcast to work properly. Here are the rules of the game.
Since you use consecutive elements you can use a slice:
import numpy as np
myarray = np.random.random((5,10,30))
yy = slice(1,5)
xx = slice(20, 30)
mymean = [np.mean(myarray[t, yy, xx]) for t in np.arange(5)]
To answer your question about why it doesn't work: when you use lists/arrays as indices, Numpy uses a different set of indexing semantics than it does if you use slices. You can see the full story in the documentation and, as that page says, it "can be somewhat mind-boggling".
If you want to do it for nonconsecutive elements, you must grok that complex indexing mechanism.
I have a multidimensional numpy array, and I need to iterate across a given dimension. Problem is, I won't know which dimension until runtime. In other words, given an array m, I could want
m[:,:,:,i] for i in xrange(n)
or I could want
m[:,:,i,:] for i in xrange(n)
etc.
I imagine that there must be a straightforward feature in numpy to write this, but I can't figure out what it is/what it might be called. Any thoughts?
There are many ways to do this. You could build the right index with a list of slices, or perhaps alter m's strides. However, the simplest way may be to use np.swapaxes:
import numpy as np
m=np.arange(24).reshape(2,3,4)
print(m.shape)
# (2, 3, 4)
Let axis be the axis you wish to loop over. m_swapped is the same as m except the axis=1 axis is swapped with the last (axis=-1) axis.
axis=1
m_swapped=m.swapaxes(axis,-1)
print(m_swapped.shape)
# (2, 4, 3)
Now you can just loop over the last axis:
for i in xrange(m_swapped.shape[-1]):
assert np.all(m[:,i,:] == m_swapped[...,i])
Note that m_swapped is a view, not a copy, of m. Altering m_swapped will alter m.
m_swapped[1,2,0]=100
print(m)
assert(m[1,0,2]==100)
You can use slice(None) in place of the :. For example,
from numpy import *
d = 2 # the dimension to iterate
x = arange(5*5*5).reshape((5,5,5))
s = slice(None) # :
for i in range(5):
slicer = [s]*3 # [:, :, :]
slicer[d] = i # [:, :, i]
print x[slicer] # x[:, :, i]