I have three parameter arrays, each containing n parameter values. Now I need to draw m independent samples using the same parameter settings, and I was wondering if there is an efficient way of doing this?
Example:
p1 = [1, 2, 3, 4], p2 = [4,4,4,4], p3 = [6,7,7,5]
One sample would be generated as:
np.random.triangular(left=p1, mode=p2, right=p3)
resulting in
[3, 6, 3, 4.5]
But I would like to get m of those, in a single ndarray ideally.
A solution could of course be to initiate a sample ndarray of size [n, m] and fill each column using a loop. However, generating all random values simultaneously is generally quicker, hence I would like to figure out if that's possible.
NOTE:
adding the parameter 'size=(n,m)' does not work for array valued parameter values
It's true that strictly speaking, adding the parameter size=(n, m) doesn't work. But size=(m, n) does!
In general, in numpy sizes, the number of rows comes first.
>>> numpy.random.triangular(left=p1, mode=p2, right=p3, size=(10, 4))
array([[2.90526206, 3.90549642, 4.17820463, 4.49103927],
[4.128539 , 5.64750789, 4.2343925 , 4.14951323],
[4.55117141, 4.18380231, 4.94283228, 4.17310084],
[3.7047425 , 6.19969199, 3.9318881 , 4.73317286],
[5.0613046 , 4.88435654, 4.04345036, 4.41236136],
[3.6946254 , 2.28868213, 4.29268451, 4.61406735],
[4.26315216, 3.84219428, 4.79651309, 4.02510467],
[3.1213574 , 3.87407067, 4.20976142, 4.11963155],
[2.89005644, 4.43081604, 5.96604977, 4.0194683 ],
[5.28800737, 3.80200832, 4.45966515, 4.46419704]])
This can be generalized for arrays that broadcast in more complex ways. Here's an example that creates four distinct samples of a 2x2x2 array based on broadcasted parameters. Note that again, the first value is the number of samples, and the remaining ones describe the shape of each sample:
>>> numpy.random.triangular(a[:, None, None],
... a[None, :, None] + 2,
... a[None, None, :] + 4,
... size=(4, 2, 2, 2))
array([[[[1.96335621, 1.88351682],
[2.27347214, 3.23075503]],
[[2.53612351, 2.33322979],
[2.73651868, 2.7414705 ]]],
[[[3.80046148, 3.83468891],
[3.43258814, 3.33174839]],
[[3.05200913, 4.47039698],
[2.89013357, 1.99638614]]],
[[[1.91325759, 2.64773446],
[1.73132514, 3.47843725]],
[[1.88526414, 2.86937885],
[3.12001437, 1.58742945]]],
[[[0.58692663, 1.08249125],
[3.4744866 , 1.95300333]],
[[1.72887756, 2.68527515],
[1.95189437, 4.49416249]]]])
Related
I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example
Example data
dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])
dummy is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X is a matrix of index values, here X[:,0] are the indices of the first dimension of dummy, X[:,1] those of the second dimension. The number of columns in X is always the number of dimensions in dummy minus 1.
Example output
I want to extract an ndarray of the following form for this example
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Complications
If the number of dimensions in dummy were fixed, this could just be done by dummy[X[:,0],X[:,1],:] . Sadly the dimensionality can be different, e.g. dummy could be a 5x2x4x6x100 ndarray and X correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.
dummy[X,:] yields a 3x2x2x100 ndarray for this example same as dummy[X]
Iteratively reducing dummy by doing something like dummy = dummy[X[:,i],:] with i an iterator over the number of columns of X also does not reduce the ndarray in the example past 3x2x100
I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?
I will try to provide some explainability to #Michael Szczesny answer.
First, notice that if you have an np.array with dimension n and pass m indexes where m<n, then it will be the same as using : in the dimensions >=m. In your case, for example:
dummy[(0, 0)] == dummy[0, 0, :]
Given that, note that you can also pass an array as an index. Thus:
dummy[([0, 1], [0, 0])]
It would be the same as:
np.array([dummy[(0,0)], dummy[(1,0)]])
You can validate that using:
dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])
Finally, notice that:
(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))
You are here getting each dimension as an array, and then you will get:
[
dummy[0,1],
dummy[4,1],
dummy[2,0]
]
Which is the same as:
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense
as Michael Szczesny wrote, the best solution is dummy[(*X.T,)].
Since X[:,0] are the indices of the first dimension of dummy and X[:,1] are the indices of the second dimension of dummy, if you transpose X (X.T) you'll have the the indices of the first dimension of dummy as X.T[0] and the indices of the second dimension of dummy as X.T[1].
Now to slice dummy as you want, you can specify the indices of the first and of the second dimension in this way:
dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]
In order to simplify the code (and since you doesn't want to transpose the X matrix twice) you can unpack X.T in a tuple as (*X.T,) and so write X[(*X.T,)] is the same thing to write dummy[(X.T[0], X.T[1])].
This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T as many lines as there are dimensions to slice in dummy. For example suppose you want to retrieve an 1D-array from dummy given the following indices:
first_dim: (0, 4, 2)
second_dim: (1, 1, 0)
third_dim: (9, 8, 7)
You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]]) and dim[(*X.T,)] is still valid.
Say I have an array:
x = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
And a multi-labeled mask:
labels = np.array([[0, 0, 2],
[1, 1, 2],
[1, 1, 2]])
My goal is to sum the entries of x together, grouped by labels. For example:
n_labels = np.max(labels) + 1
out = np.empty(n_labels)
for label in range(n_labels):
mask = labels == label
out[label] = np.sum(x[mask])
>>> out
np.array([1, 20, 15])
However, as x and n_labels become large, I see this being inefficient. Each iteration, you are only summing together a small fraction of the number of entries of x, but still recompute the mask over all of labels (in the expression labels == label) and subsequently index over all of x (in the expression x[mask]). Is there a more efficient way to do this as x and n_labels grow large?
You can use bincount with weights:
np.bincount(labels.ravel(), weights=x.ravel())
out:
array([ 1., 20., 15.])
You don't really have a reason to operate on 2D arrays, so ravel them first:
labels = labels.ravel()
x = x.ravel()
If your labels are already indices, you can use np.argsort along with np.diff and np.add.reduceat:
index = labels.argsort()
splits = np.r_[0, np.flatnonzero(np.diff(labels[index])) + 1]
result = np.add.reduceat(x[index], splits)
labels[index] is the sorted index. Whenever that changes, you enter a new group, and the diff is nonzero. That's what np.flatnonzero(np.diff(labels[index])) finds for you. Since reduceat takes the stop index past the end of the run, you need to add one. np.r_ allows you to prepend zero easily to a 1D array, which is necessary for reduceat to regard t, and also prepend zero., and also prepend zero.he first run (the last is always automatic).
Before you run reduceat, you need to order x into the runs defined by labels, which is what x[index] does.
You can use 2D arrays with another slow and over-engineered approach using np.add.at
sums = np.zeros(labels.max()+1, x.dtype)
np.add.at(sums, labels, x)
sums
Output
array([ 1, 20, 15])
Given an x-dataset,
x = np.array([1, 2, 3, 4, 5])
what is the most efficient way to create the NumPy array where each x coordinate is paired with a y-coordinate of value 0? I am wondering if there is a way specifically that doesn't require any hard coding, so that x could vary in length without causing failure.
As per your problem statement, the following is one way to do it.
# initialize an array of zeros
In [36]: res = np.zeros((2, *x.shape), dtype=x.dtype)
# fill `x` as first row
In [37]: res[0] = x
In [38]: res
Out[38]:
array([[1, 2, 3, 4],
[0, 0, 0, 0]])
When we initialize the array of zeros, we use 2 for axis-0 dimension since your requirement is to create a 2D array. For the column size we simply take the length from the x array. For reasonably larger arrays, this approach would be the fastest.
I have an nd-array A
A.shape
(2, 500, 3)
What's the difference between A[:] and A[:,2]
Coming from Python, the ',' in the array access is confusing me a lot.
The commas separate the subscripts for each dimension. So, for example, if the matrix M is defined as
M = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
then M[2, 1] would be 8 (third row, second column).
The subscript for each dimension can also be a slice, where : represents a full slice, like a slice in normal Python sequences. For example, M[:, 2] would select from every row the third column, which would be [3, 6, 9].
Any additional dimensions for which a subscript is not provided are implicitly full slices. In your example, A[:,2] is equivalent to A[:, 2, :]. If you consider the (2, 500, 3) shaped array to be two stacked matrices with 500 rows and 3 columns, then A[:, 2, :] would select from both matrices the third row (and every column of the third row), which should have a shape of (2, 3).
When you have multidimensional NumPy arrays, the slicing operation [] can work if you provide tuple of slice() objects. If the number of tuples does not match your number of dimensions, this is equivalent to having a slice(None) (which abbreviates to :) in all the remaining dimensions. Note also that NumPy also accepts ... which means "fill the rest of the dimensions with :" - which is especially useful if you want to "fill" the initial dimensions.
So to recapitulate the following expression give identical results on your A array of A.ndim == 3:
A[:, 2]
A[:, 2, :]
A[:, 2, ...]
A[slice(None), 2]
A[slice(None), 2, slice(None)]
A[(slice(None), 2) + tuple(slice(None) for _ in range(A.ndim - 2))]
I would like a numpy-sh way of vectorizing the calculation of eigenvalues, such that I can feed it a matrix of matrices and it would return a matrix of the respective eigenvalues.
For example, in the code below, B is the block 6x6 matrix composed of 4 copies of the 3x3 matrix A.
C is what I would like to see as output, i.e. an array of dimension (2,2,3) (because A has 3 eigenvalues).
This is of course a very simplified example, in the general case the matrices A can have any size (although they are still square), and the matrix B is not necessarily formed of copies of A, but different A1, A2, etc (all of same size but containing different elements).
import numpy as np
A = np.array([[0, 1, 0],
[0, 2, 0],
[0, 0, 3]])
B = np.bmat([[A, A], [A,A]])
C = np.array([[np.linalg.eigvals(B[0:3,0:3]),np.linalg.eigvals(B[0:3,3:6])],
[np.linalg.eigvals(B[3:6,0:3]),np.linalg.eigvals(B[3:6,3:6])]])
Edit: if you're using a version of numpy >= 1.8.0, then np.linalg.eigvals operates over the last two dimensions of whatever array you hand it, so if you reshape your input to an (n_subarrays, nrows, ncols) array you'll only have to call eigvals once:
import numpy as np
A = np.array([[0, 1, 0],
[0, 2, 0],
[0, 0, 3]])
# the input needs to be an array, since matrices can only be 2D.
B = np.repeat(A[np.newaxis,...], 4, 0)
# for arbitrary input arrays you could do something like:
# B = np.vstack(a[np.newaxis,...] for a in input_arrays)
# but for this to work it will be necessary for each element in
# 'input_arrays' to have the same shape
# eigvals will operate over the last two dimensions of the array and return
# a (4, 3) array of eigenvalues
C = np.linalg.eigvals(B)
# reshape this output so that it matches your original example
C.shape = (2, 2, 3)
If your input arrays don't all have the same dimensions, e.g. input_arrays[0].shape == (2, 2), input_arrays[1].shape == (3, 3) etc. then you could only vectorize this calculation across subsets with matching dimensions.
If you're using an older version of numpy then unfortunately I don't think there's any way to vectorize the calculation of the eigenvalues over multiple input arrays - you'll just have to loop over your inputs in Python instead.
You could just do something like this
C = np.array([[np.linalg.eigvals(B[i:i+3, j:j+3])
for i in xrange(0, B.shape[0], 3)]
for j in xrange(0, B.shape[1], 3)])
Perhaps a nicer approach is to use the block_view function from https://stackoverflow.com/a/5078155/1352250:
B_blocks = block_view(B)
C = np.array([[np.linalg.eigvals(m) for m in v] for v in B_blocks])
Update
As ali_m points out, this method is a form of syntactic sugar that will not reduce the overhead incurred from calling eigvals a large number of times. While this overhead should be small if each matrix it is applied to is large-ish, for the 6x6 matrices that the OP is interested in, it is not trivial (see the comments below; according to ali_m, there might be a factor of three difference between the version I give above, and the version he posted that uses Numpy >= 1.8.0).