I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?
Here is my code:
import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d
You could do something like:
import numpy as np
def wc(weights):
cs = np.cumsum(weights)
idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
return np.unravel_index(idx, weights.shape)
Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.
To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404
def weighted_choice_indices(weights):
cs = np.cumsum(weights.flatten())/np.sum(weights)
idx = np.sum(cs < np.random.rand())
return np.unravel_index(idx, weights.shape)
Related
A is a k dimensional numpy array of floats (k could be pretty big, e.g. up to 10)
I need to implement an update to A by incrementing each of the values (as described below). I'm wondering if there is a numpy-style way that would be fast.
Let L_i be the length of axis i
An update to this array is generated in two steps follows:
For each axis of A a corresponding vector G is generated.
For example, corresponding to axis i a vector G_i of length L_i is generated (from data).
Update A at all positions by calculating an increment from the G vectors for each position in A
To do this at any particular position, let p be an array of k indices, corresponding to a position in A. Then A at p is incremented by a value calculated as the product:
Product(G_i[p[i]], for i from 0 to k-1)
A full update to A involves doing this operation for all locations in A (i.e. all possible values of p)
This operation would be very slow doing positions one by one via loops.
Is there a numpy style way to do this that would be fast?
edit
## this for three dimensions, final matrix at pos i,j,k has the
## product of c[i]*b[j]*a[k]
## but for arbitrary # of dimensions it will have a loop in a loop
## and will be slow
import numpy as np
a = np.array([1,2])
b = np.array([3,4,5])
c = np.array([6,7,8,9])
ab = []
for bval in b:
ab.append(bval*a)
ab = np.stack(ab)
abc = []
for cval in c:
abc.append(cval*ab)
abc = np.stack(abc)
as a function
def loopfunc(arraylist):
ndim = len(arraylist)
m = arraylist[0]
for i in range(1,ndim):
ml = []
for val in arraylist[i]:
ml.append(val*m)
m = np.stack(ml)
return m
This is a wacky problem, but I like it.
If I understand what you need from your example, you can accomplish this with some reshaping trickery and NumPy's usual broadcasting rules. The idea is to reshape each array so it has the right number of dimensions, then just directly multiply.
Here's a function that implements this.
from functools import reduce
import operator
import numpy as np
import scipy.linalg
def wacky_outer_product(*arrays):
assert len(arrays) >= 2
assert all(arr.ndim == 1 for arr in arrays)
ndim = len(arrays)
shapes = scipy.linalg.toeplitz((-1,) + (1,) * (ndim - 1))
reshaped = (arr.reshape(new_shape) for arr, new_shape in zip(arrays, shapes))
return reduce(operator.mul, reshaped).T
Testing this on your example arrays, we have:
>>> foo = wacky_outer_product(a, b, c)
>>> np.all(foo, abc)
True
Edit
Ok, the above function is fun, but the below is probably much better. No transposing, clearer, and much smaller:
from functools import reduce
import operator
import numpy as np
def wacky_outer_product(*arrays):
return reduce(operator.mul, np.ix_(*reversed(arrays)))
How can I transpose a 3D array in a similar fashion to a 2D array, except that the entries at the lowest level are arrays of three instead of scalar values?
This is what I mean:
M = [[[0,0,0][1,1,1][2,2,2]]
[[0,0,0][0,0,0][3,3,3]]
[[0,0,0][0,0,0][0,0,0]]]
N = some_operation(M)
N = [[[0,0,0][0,0,0][0,0,0]]
[[1,1,1][0,0,0][0,0,0]]
[[2,2,2][3,3,3][0,0,0]]]
I have an example in python code that shows what I mean as well:
import numpy as np
M = np.array([[[0,0,0],[1,1,1],[2,2,2]],[[0,0,0],[0,0,0],[3,3,3]],[[0,0,0],[0,0,0],[0,0,0]]])
N = np.array([[[0,0,0],[0,0,0],[0,0,0]],[[1,1,1],[0,0,0],[0,0,0]],[[2,2,2],[3,3,3],[0,0,0]]])
print(M)
print('\n\n')
print(M_flipped)
The np.transpose() function doesn't seem to be adaptable for my case.
Thanks in advance.
Simply permute axes with np.transpose -
N = M.transpose(1,0,2)
Or with np.moveaxis -
N = np.moveaxis(M,0,1)
With np.rollaxis -
N = np.rollaxis(M,1,0)
I have two sets of arrays, and what I am looking for is the index of the closest point in array2 to each value in array1, for example:
import numpy as np
from scipy.spatial import distance
array1 = np.array([[1,2,1], [4,2,6]])
array2 = np.array([[0,0,1], [4,5,0], [1,2,0], [6,5,0]])
def f(x):
return distance.cdist([x], array2 ).argmin()
def array_map(x):
return np.array(list(map(f, x)))
array_map(array1)
This code returns the correct results but is slow when both arrays are very big. I was wondering if it was possible to make this any quicker ?
Thanks to #Max7CD here is a working solution that works quite efficiantly (at least for my purpose):
from scipy import spatial
tree =spatial.KDTree(array2)
slitArray = np.split(array1, 2) #I split the data so that the KDtree doesn't take for ever and so that I can moniter progress, probably useless
listFinal = []
for elem in slitArray:
a = tree.query(elem)
listFinal.append(a[1])
print("Fnished")
b = np.array(listFinal).ravel()
I have to boost the time for an interpolation over a large (NxMxT) matrix MTR, where:
N is about 8000;
M is about 10000;
T represents the number of times at which each NxM matrix is calculated (in my case it's 23).
I have to compute the interpolation element-wise, on all the T different times, and return the interpolated values over a different array of times (T_interp, in my case with lenght 47) so, as output, I want an NxMxT_interp matrix.
The code snippet below defines the function I built for the interpolation, using scipy.interpolate.Rbf (y is the array MTR[i,j,:], x is the times array with length T, x_interp is the new array of times with length T_interp:
#==============================================================================
# Interpolate without nans
#==============================================================================
def interp(x,y,x_interp,**kwargs):
import numpy as np
from scipy.interpolate import Rbf
mask = np.isnan(y)
y_mask = np.ma.array(y,mask = mask)
x_new = [x[i] for i in np.where(~mask)[0]]
if len(y_mask.compressed()) == 0:
return [np.nan for i,n in enumerate(x_interp)]
elif len(y_mask.compressed()) == 1:
return [y_mask.compressed() for i,n in enumerate(x_interp)]
interp = Rbf(x_new,y_mask.compressed(),**kwargs)
y_interp = interp(x_interp)
return y_interp
I tried to achieve my goal either by looping over the NxM elements of the MTR matrix:
new_MTR = np.empty((N,M,T_interp))
for i in range(N):
for j in range(M):
new_MTR[i,j,:]=interp(times,MTR[i,j,:],New_times,function = 'linear')
or by using the np.apply_along_axis funtion:
new_MTR = np.apply_along_axis(lambda x: interp(times,x,New_times,function = 'linear'),2,MTR)
In both cases I extimated the time it takes to perform the whole operation and it appears to be slightly better for the np.apply_along_axis function, but still it will take about 15 hours!!
Is there a way to reduce this time? Maybe by vectorizing the entire operation? I don't know much about vectorizing and how it can be done in a situation like mine so any help would be much appreciated. Thank you!
I have many arrays of different length and what I want to do is to have for those arrays a fixed length, let's say 100 samples. These arrays contain time series and I do not want to lose the shape of those series while reducing the size of the array. What I think I need here is an undersampling algorithm. Is there an easy way to reduce the number of samples in an array doing like an average on some of those values?
Thanks
Heres a little script to do it without numpy. Maintains shape even if length required is larger than the length of the array.
from math import floor
def sample(input, count):
output = []
sample_size = float(len(input)) / count
for i in range(count):
output.append(input[int(floor(i * sample_size))])
return output
if you use a slice with generated random indices, and you keep your original array (or only the shape of it to reduce memory usage):
import numpy as np
input_data = somearray
shape = input_data.shape
n_samples= 100
inds = np.random.randint(0,shape[0], size=n_samples)
sub_samples = input_data[inds]
Here's a shorter version of Nick Fellingham's answer.
from math import floor
def sample(input,count):
ss=float(len(input))/count
return [ input[int(floor(i*ss))] for i in range(count) ]