I have n equal length arrays whose transpose corresponds to the coordinates in an n dimensional parameter space:
x = np.array([800,800,800,800,900,900,900,900,900,1000,1000,1000,1000,1000])
y = np.array([4.5,5.0,4.5,5.0,4.5,5.0,5.5,5.0,5.5,4.5,5.0,5.5,5.0,5.5])
z = np.array([2,2,4,4,2,2,4,4,4,2,2,4,4,4])
Each coordinate in parameter space also has a value:
v = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14])
I want to interpolate between the grid points to get the v value at given arbitrary xyz coordinate, e.g. [934,5.1,3.3].
I've been trying to use scipy.RegularGridInterpolator, which takes (x,y,z) as the first argument, but I can't figure out how to construct the second argument of the values at each point.
Any suggestions would be greatly appreciated! Thanks!
Your input would fit better with LinearNDInterpolator or NearestNDInterpolator:
from scipy.interpolate import LinearNDInterpolator
ex = LinearNDInterpolator((x, y, z), v)
ex((800, 4.5, 2))
#array(1.0)
ex([[800, 4.5, 2], [800, 4.5, 3]])
#array([ 1., 2.])
To use RegularGridInterpolator you need to define v as a regular array. For example, assume that:
x = np.array([800., 900., 1000.])
y = np.array([4.5, 5.0, 5.5, 6.0])
z = np.array([2., 4.])
The array v could be something like:
v = np.array([[[ 1., 2.],
[ 1., 2.],
[ 1., 2.],
[ 1., 2.]],
[[10., 20.],
[10., 20.],
[10., 20.],
[10., 20.]],
[[100., 200.],
[100., 200.],
[100., 200.],
[100., 200.]]])
And then you would be able to interpolate:
form scipy.interpolate import RegularGridInterpolator
rgi = RegularGridInterpolator((x, y, z), v)
rgi((850., 4.5, 3.))
#array(8.25)
rgi([[850., 4.5, 3.], [800, 4.5, 3]])
#array([ 8.25, 1.5 ])
Related
I have big 3D matrices indicating the position of agents in a 3D space. The values of the matrix are 0 if there is not agent on it and 1 if there is an agent on it.
Then, my problem is that I want the agents to 'grow' in the sense that I want them to be determined by lets say a cube (3x3x3) of ones. If already gotten a way to do it but I'm having trouble when the agent is close to the borders.
For example, I have a matrix of positions 100x100x100, if I know my agent is at position (x, y, z) I will do:
positions_matrix = numpy.zeros((100, 100, 100))
positions_matrix[x - 1: x + 2, y - 1: y + 2, z - 1: z + 2] += numpy.ones((3, 3, 3))
Of course in my real code I'm looping over more positions but this is basically it. This works but the problem comes when the agent is to close to the border in which the sum can't be made because the resultant matrix from slicing would be smaller than the ones matrix.
Any idea how to solve it or if numpy or any other package have an implementation for this? I couldn't manage to find it although I'm pretty sure I'm not the first one to face against this.
A slightly more programmatic way of solving the problem:
import numpy as np
m = np.zeros((100, 100, 100))
slicing = tuple(
slice(max(0, x_i - 1), min(x_i + 2, d - 1))
for x_i, d in zip((x, y, z), m.shape))
ones_shape = tuple(s.stop - s.start for s in slicing)
m[slicing] += np.ones(ones_shape)
But it is otherwise the same as the accepted answer.
You should cut at the lower and upper bounds, using something like:
import numpy as np
m = np.zeros((100, 100, 100))
x_min, x_max = np.max([0, x-1]), np.min([x+2, m.shape[0]-1])
y_min, y_max = np.max([0, y-1]), np.min([y+2, m.shape[1]-1])
z_min, z_max = np.max([0, z-1]), np.min([z+2, m.shape[2]-1])
m[x_min:x_max, y_min:y_max, z_min:z_max] += np.ones((x_max-x_min, y_max-y_min, z_max-z_min))
There is a solution using np.put, and its 'clip' option.
It just requires a little gymnastics because the function requires indices in the flattened matrix; fortunately, the function np.ravel_multi_index does the job:
import itertools
import numpy as np
x, y, z = 2, 0, 4
positions_matrix = np.zeros((100,100,100))
indices = np.array( list( itertools.product( (x-1, x, x+1), (y-1, y, y+1), (z-1, z, z+1)) ))
flat_indices = np.ravel_multi_index(indices.T, positions_matrix.shape, mode='clip')
positions_matrix.put(flat_indices, 1+positions_matrix.take(flat_indices))
# positions_matrix[2,1,4] is now 1.0
The nice thing about this solution is that you can play with other modes, for instance 'wrap' (if your agents live on a donut ;-) or in a periodic space).
I'll explain how it works on a smaller 2D matrix:
import itertools
import numpy as np
positions_matrix = np.zeros((8,8))
ones = np.ones((3,3))
x, y = 0, 4
indices = np.array( list( itertools.product( (x-1, x, x+1), (y-1, y, y+1) )))
# array([[-1, 3],
# [-1, 4],
# [-1, 5],
# [ 0, 3],
# [ 0, 4],
# [ 0, 5],
# [ 1, 3],
# [ 1, 4],
# [ 1, 5]])
flat_indices = np.ravel_multi_index(indices.T, positions_matrix.shape, mode='clip')
# array([ 3, 4, 5, 3, 4, 5, 11, 12, 13])
positions_matrix.put(flat_indices, ones, mode='clip')
# positions_matrix is now:
# array([[0., 0., 0., 1., 1., 1., 0., 0.],
# [0., 0., 0., 1., 1., 1., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0.],
# [ ...
By the way, in this case mode='clip' was redundant for put.
Well, I just cheated put does an assignment. The +=1 requires both take and put:
positions_matrix.put(flat_indices, ones.flat + positions_matrix.take(flat_indices))
# notice that ones has to be flattened, or alternatively the result of take could be reshaped (3,3)
# positions_matrix is now:
# array([[0., 0., 0., 2., 2., 2., 0., 0.],
# [0., 0., 0., 2., 2., 2., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0.],
# [ ...
There is one important difference in this solution compared to the others: the ones matrix is always (3,3),
which may or may not be an advantage.
The trick is in this flat_indices list, that has repeating entries (result of clip).
It may thus require some precautions, if you add a non constant sub-matrix at max indices:
x, y = 1, 7
values = 1 + np.arange(9)
indices = np.array( list( itertools.product( (x-1, x, x+1), (y-1, y, y+1) )))
flat_indices = np.ravel_multi_index(indices.T, positions_matrix.shape, mode='clip')
positions_matrix.put(flat_indices, values, mode='clip')
# positions_matrix is now:
# array([[0., 0., 0., 2., 2., 2., 1., 3.],
# [0., 0., 0., 2., 2., 2., 4., 6.],
# [0., 0., 0., 0., 0., 0., 7., 9.],
... you were probably expecting the last column to be 2 5 8.
Currently, you could work on flat_indices, for example by putting -1 in the out-of-bounds locations.
But it'd all be easier if np.put accepted non-flat indices, or if there was a clip mode='ignore'.
Since collections.Counter is so slow, I am pursuing a faster method of summing mapped values in Python 2.7. It seems like a simple concept and I'm kind of disappointed in the built-in Counter method.
Basically, I need to be able to take arrays like this:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
And then "add" them so they look like this:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If there isn't a good way to do this quickly and efficiently, I'm open to any other ideas that will allow me to do something similar to this, and I'm open to modules other than Numpy.
Thanks!
Edit: Ready for some speedtests?
Intel win 64bit machine. All of the following values are in seconds; 20000 loops.
collections.Counter results:
2.131000, 2.125000, 2.125000
Divakar's union1d + masking results:
1.641000, 1.633000, 1.625000
Divakar's union1d + indexing results:
0.625000, 0.625000, 0.641000
Histogram results:
1.844000, 1.938000, 1.858000
Pandas results:
16.659000, 16.686000, 16.885000
Conclusions: union1d + indexing wins, the array size is too small for Pandas to be effective, and the histogram approach blew my mind with its simplicity but I'm guessing it takes too much overhead to create. All of the responses I received were very good, though. This is what I used to get the numbers. Thanks again!
Edit: And it should be mentioned that using Counter1.update(Counter2.elements()) is terrible despite doing the same exact thing (65.671000 sec).
Later Edit: I've been thinking about this a lot, and I've came to realize that, with Numpy, it might be more effective to fill each array with zeros so that the first column isn't even needed since we can just use the index, and that would also make it much easier to add multiple arrays together as well as do other functions. Additionally, Pandas makes more sense than Numpy since there would be no need to 0-fill, and it would definitely be more effective with large data sets (however, Numpy has the advantage of being compatible on more platforms, like GAE, if that matters at all). Lastly, the answer I checked was definitely the best answer for the exact question I asked--adding the two arrays in the way I showed--but I think what I needed was a change in perspective.
Here's one approach with np.union1d and masking -
def app1(a,b):
c0 = np.union1d(a[:,0],b[:,0])
out = np.zeros((len(c0),2))
out[:,0] = c0
mask1 = np.in1d(c0,a[:,0])
out[mask1,1] = a[:,1]
mask2 = np.in1d(c0,b[:,0])
out[mask2,1] += b[:,1]
return out
Sample run -
In [174]: a
Out[174]:
array([[ 0., 2.],
[ 12., 2.],
[ 23., 1.]])
In [175]: b
Out[175]:
array([[ 0., 3.],
[ 1., 1.],
[ 12., 5.]])
In [176]: app1(a,b)
Out[176]:
array([[ 0., 5.],
[ 1., 1.],
[ 12., 7.],
[ 23., 1.]])
Here's another with np.union1d and indexing -
def app2(a,b):
n = np.maximum(a[:,0].max(), b[:,0].max())+1
c0 = np.union1d(a[:,0],b[:,0])
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out = out0[c0.astype(int)]
out[:,0] = c0
return out
For the case where all indices are covered by the first column values in a and b -
def app2_specific(a,b):
c0 = np.union1d(a[:,0],b[:,0])
n = c0[-1]+1
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out0[:,0] = c0
return out0
Sample run -
In [234]: a
Out[234]:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
In [235]: b
Out[235]:
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
In [236]: app2_specific(a,b)
Out[236]:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If you know the number of fields, use np.bincount.
c = np.vstack([a, b])
counts = np.bincount(c[:, 0], weights = c[:, 1], minlength = numFields)
out = np.vstack([np.arange(numFields), counts]).T
This works if you're getting all your data at once. Make a list of your arrays and vstack them. If you're getting data chunks sequentially, you can use np.add.at to do the same thing.
out = np.zeros(2, numFields)
out[:, 0] = np.arange(numFields)
np.add.at(out[:, 1], a[:, 0], a[:, 1])
np.add.at(out[:, 1], b[:, 0], b[:, 1])
You can use a basic histogram, this will deal with gaps, too. You can filter out zero-count entries if need be.
import numpy as np
x = np.array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
y = np.array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.],
[ 5., 3.]])
c, w = np.vstack((x,y)).T
h, b = np.histogram(c, weights=w,
bins=np.arange(c.min(),c.max()+2))
r = np.vstack((b[:-1], h)).T
print(r)
# [[ 0. 5.]
# [ 1. 1.]
# [ 2. 7.]
# [ 3. 1.]
# [ 4. 0.]
# [ 5. 3.]]
r_nonzero = r[r[:,1]!=0]
Pandas have some functions doing exactly what you intend
import pandas as pd
pda = pd.DataFrame(a).set_index(0)
pdb = pd.DataFrame(b).set_index(0)
result = pd.concat([pda, pdb], axis=1).fillna(0).sum(axis=1)
Edit: If you actually need the data back in numpy format, just do
array_res = result.reset_index(name=1).values
This is a quintessential grouping problem, which numpy_indexed (disclaimer: I am its author) was created to solve elegantly and efficiently:
import numpy_indexed as npi
C = np.concatenate([A, B], axis=0)
labels, sums = npi.group_by(C[:, 0]).sum(C[:, 1])
Note: its cleaner to maintain your label arrays as a seperate int array; floats are finicky when it comes to labeling things, with positive and negative zeros, and printed values not relaying all binary state. Better to use ints for that.
What is a pythonic way to access the shifted, either right or left, of a numpy array? A clear example:
a = np.array([1.0, 2.0, 3.0, 4.0])
Is there away to access:
a_shifted_1_left = np.array([2.0, 3.0, 4.0, 1.0])
from the numpy library?
You are looking for np.roll -
np.roll(a,-1) # shifted left
np.roll(a,1) # shifted right
Sample run -
In [28]: a
Out[28]: array([ 1., 2., 3., 4.])
In [29]: np.roll(a,-1) # shifted left
Out[29]: array([ 2., 3., 4., 1.])
In [30]: np.roll(a,1) # shifted right
Out[30]: array([ 4., 1., 2., 3.])
If you want more shifts, just go np.roll(a,-2) and np.roll(a,2) and so on.
I have a number of time series, each containing measurements across weeks of the year, but not all of them start and end on the same weeks. I know the offsets, that is I know in what weeks each one starts and ends. Now I would like to combine them into a matrix respecting the inherent offsets, such that all values will align with the correct week numbers.
If the horizontal direction contains the series and vertical direction represents the weeks, given two series a and b, where values correspond to week numbers:
a = np.array([[1,2,3,4,5,6]])
b = np.array([[0,1,2,3,4,5]])
I want to know if is it possible to combine them, e.g. using some method that takes an offset argument in a fashion like combine((a, b), axis=0, offset=-1), such that the resulting array (lets call it c) looks like this:
print c
[[NaN 1 2 3 4 5 6 ]
[0 1 2 3 4 5 NaN]]
What more is, since the time series are enormous, I must stream them through my program, and therefore cannot know all offsets at the same time. I thought of using Pandas because it has nice indexing, but I felt there had to be a simpler way, since the essence of what I'm trying to do is super simple.
Update:
This seems to work
def offset_stack(a, b, offset=0):
if offset < 0:
a = np.insert(a, [0] * abs(offset), np.nan)
b = np.append(b, [np.nan] * abs(offset))
if offset > 0:
a = np.append(a, [np.nan] * abs(offset))
b = np.insert(b, [0] * abs(offset), np.nan)
return np.concatenate(([a],[b]), axis=0)
You can do in numpy:
def f(a, b, n):
v = np.empty(abs(n))*np.nan
if np.sign(n)==-1:
return np.vstack((np.append(a,v), np.append(v,b)))
elif np.sign(n)==1:
return np.vstack((np.append(v,a), np.append(b,v)))
else:
return np.vstack((a,b))
#In [148]: a = np.array([23, 13, 4, 12, 4, 4])
#In [149]: b = np.array([4, 12, 3, 41, 45, 6])
#In [150]: f(a,b,-2)
#Out[150]:
#array([[ 23., 13., 4., 12., 4., 4., nan, nan],
# [ nan, nan, 4., 12., 3., 41., 45., 6.]])
#In [151]: f(a,b,2)
#Out[151]:
#array([[ nan, nan, 23., 13., 4., 12., 4., 4.],
# [ 4., 12., 3., 41., 45., 6., nan, nan]])
#In [152]: f(a,b,0)
#Out[152]:
#array([[23, 13, 4, 12, 4, 4],
# [ 4, 12, 3, 41, 45, 6]])
There is a real simple way to accomplish this.
You basically want to pad and then stack your arrays and for both there are numpy functions:
numpy.lib.pad() aka offset
a = np.array([[1,2,3,4,5,6]], dtype=np.float_) # float because NaN is a float value!
b = np.array([[0,1,2,3,4,5]], dtype=np.float_)
from numpy.lib import pad
print(pad(a, ((0,0),(1,0)), mode='constant', constant_values=np.nan))
# [[ nan 1. 2. 3. 4. 5. 6.]]
print(pad(b, ((0,0),(0,1)), mode='constant', constant_values=np.nan))
# [[ 0., 1., 2., 3., 4., 5., nan]]
The ((0,0)(1,0)) means just no padding in the first axis (top/bottom) and only pad one element left and no element on the right. So you have to tweak these if you want more/less shift.
numpy.vstack() aka stack along axis=0
import numpy as np
a_padded = pad(a, ((0,0),(1,0)), mode='constant', constant_values=np.nan)
b_padded = pad(b, ((0,0),(0,1)), mode='constant', constant_values=np.nan)
np.vstack([a_padded, b_padded])
# array([[ nan, 1., 2., 3., 4., 5., 6.],
# [ 0., 1., 2., 3., 4., 5., nan]])
Your function:
Combining these two would be very easy and is easy to extend:
from numpy.lib import pad
import numpy as np
def offset_stack(a, b, axis=0, offsets=(0, 1)):
if (len(offsets) != a.ndim) or (a.ndim != b.ndim):
raise ValueError('Offsets and dimensions of the arrays do not match.')
offset1 = [(0, -offset) if offset < 0 else (offset, 0) for offset in offsets]
offset2 = [(-offset, 0) if offset < 0 else (0, offset) for offset in offsets]
a_padded = pad(a, offset1, mode='constant', constant_values=np.nan)
b_padded = pad(b, offset2, mode='constant', constant_values=np.nan)
return np.concatenate([a_padded, b_padded], axis=axis)
offset_stack(a, b)
This function works for generalized offsets in arbitary dimensions and can stack in arbitary dimensions. It doesn't work in the same way as the original since you pad the second dimension just passing in offset=1 would pad in the first dimension. But if you keep track of the dimensions of your arrays it should work fine.
For example:
offset_stack(a, b, offsets=(1,2))
array([[ nan, nan, nan, nan, nan, nan, nan, nan],
[ nan, nan, 1., 2., 3., 4., 5., 6.],
[ 0., 1., 2., 3., 4., 5., nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan]])
or for 3d arrays:
a = np.array([1,2,3], dtype=np.float_)[None, :, None] # makes it 3d
b = np.array([0,1,2], dtype=np.float_)[None, :, None] # makes it 3d
offset_stack(a, b, offsets=(0,1,0), axis=2)
array([[[ nan, 0.],
[ 1., 1.],
[ 2., 2.],
[ 3., nan]]])
pad and concatenate (and the various stack and inserts) create a target array of the right size, and fill values from the input arrays. So we can do the same, and potentially do it faster.
Just for example using your 2 arrays and the 1 step offset:
In [283]: a = np.array([[1,2,3,4,5,6]])
In [284]: b = np.array([[0,1,2,3,4,5]])
create the target array, and fill it with the pad value. np.nan is a float (even though a is int):
In [285]: m=a.shape[0]+b.shape[0]
In [286]: n=a.shape[1]+1
In [287]: c=np.zeros((m,n),float)
In [288]: c.fill(np.nan)
Now just copy values into the right places on the target. More arrays and offsets will require some generalization here.
In [289]: c[:a.shape[0],1:]=a
In [290]: c[-b.shape[0]:,:-1]=b
In [291]: c
Out[291]:
array([[ nan, 1., 2., 3., 4., 5., 6.],
[ 0., 1., 2., 3., 4., 5., nan]])
I want to center multi-dimensional data in a n x m matrix (<class 'numpy.matrixlib.defmatrix.matrix'>), let's say X . I defined a new array ones(645), lets say centVector to produce the mean for every row in matrix X. And now I want to iterate every row in X, compute the mean and assign this value to the corresponding index in centVector. Isn't this possible in a single row in scipy/numpy? I am not used to this language and think about something like:
centVector = ones(645)
for key, val in X:
centVector[key] = centVector[key] * (val.sum/val.size)
Afterwards I just need to subtract the mean in every Row:
X = X - centVector
How can I simplify this?
EDIT: And besides, the above code is not actually working - for a key-value loop I need something like enumerate(X). And I am not sure if X - centVector is returning the proper solution.
First, some example data:
>>> import numpy as np
>>> X = np.matrix(np.arange(25).reshape((5,5)))
>>> print X
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
numpy conveniently has a mean function. By default however, it'll give you the mean over all the values in the array. Since you want the mean of each row, you need to specify the axis of the operation:
>>> np.mean(X, axis=1)
matrix([[ 2.],
[ 7.],
[ 12.],
[ 17.],
[ 22.]])
Note that axis=1 says: find the mean along the columns (for each row), where 0 = rows and 1 = columns (and so on). Now, you can subtract this mean from your X, as you did originally.
Unsolicited advice
Usually, it's best to avoid the matrix class (see docs). If you remove the np.matrix call from the example data, then you get a normal numpy array.
Unfortunately, in this particular case, using an array slightly complicates things because np.mean will return a 1D array:
>>> X = np.arange(25).reshape((5,5))
>>> r_means = np.mean(X, axis=1)
>>> print r_means
[ 2. 7. 12. 17. 22.]
If you try to subtract this from X, r_means gets broadcast to a row vector, instead of a column vector:
>>> X - r_means
array([[ -2., -6., -10., -14., -18.],
[ 3., -1., -5., -9., -13.],
[ 8., 4., 0., -4., -8.],
[ 13., 9., 5., 1., -3.],
[ 18., 14., 10., 6., 2.]])
So, you'll have to reshape the 1D array into an N x 1 column vector:
>>> X - r_means.reshape((-1, 1))
array([[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.],
[-2., -1., 0., 1., 2.]])
The -1 passed to reshape tells numpy to figure out this dimension based on the original array shape and the rest of the dimensions of the new array. Alternatively, you could have reshaped the array using r_means[:, np.newaxis].