Building a numpy array through iteration - python

I start with an array of shape (n,d) containing n particle-vectors of length d dimensions, and want to construct an array containing the inter-particle vectors, of shape (n,n,d) which I'll go on to use for calculating forces etc in a Newtonian simulation.
I want to be able to generalise this for any number of dimensions, so the position vectors could be of any d, and have come up with the below, building the new dimension one array at a time into a list that I then convert back into an array. But this seems clunky, and, since it must be such a common operation, I can't help thinking there's some inbuilt numpy magic that will perform this operation more quickly.
def delta_matrix(pos_vec):
build=[]
for i in pos_vec:
build.append(i-pos_vec)
return np.array(build)
In particular then, is there a numpy method that performs this iterative type of operation?

What about list comprehension? That seems simple yet powerful enough.
def delta_matrix(pos_vec):
return np.array([i-pos_vec for i in pos_vec])

Related

Is there a faster method to use a 2d numpy array of booleans to select elements from a 2d array, but with a 2d output?

If I have an array like this
arr=np.array([['a','b','c'],
['d','e','f']])
and an array of booleans of the same shape, like this:
boolarr=np.array([[False,True,False],
[True,True,True]])
I want to be able to only select the elements from the first array, that correspond to a True in the boolean array. So the output would be:
out=[['b'],
['d','e','f']]
I managed to solve this with a simple for loop
out=[]
for n, i in enumerate(arr):
out.append(i[boolarr[n]])
out=np.array(out)
but the problem is this solution is slow for large arrays, and was wondering if there was an easier solution with numpys indexing. Just using the normal notation arr[boolarr] returns a single flat array ['b','d','e','f']. I also tried using a slice with arr[:,[True,False,True]], which keeps the shape but can only use one boolean array.
Thanks for the comments. I misunderstood how an array works. For those curious this is my solution (I'm actually working with numbers):
arr[boolarr]=np.nan
And then I just changed how the rest of the function handles nan values

Combinations of features using Python NumPy

For an assignment I have to use different combinations of features belonging to some data, to evaluate a classification system. By features I mean measurements, e.g. height, weight, age, income. So for instance I want to see how well a classifier performs when given just the height and weight to work with, and then the height and age say. I not only want to be able to test what two features work best together, but also what 3 features work best together and would like to be able to generalise this to n features.
I've been attempting this using numpy's mgrid, to create n dimensional arrays, flattening them, and then making arrays that use the same elements from each array to create new ones. Tricky to explain so here is some code and psuedo code:
import numpy as np
def test_feature_combos(data, combinations):
dimensions = combinations.shape[0]
grid = np.empty(dimensions)
for i in xrange(dimensions):
grid[i] = combinations[i].flatten()
#The above code throws an error "setting an array element with a sequence" error which I understand, but this shows my approach.
**Pseudo code begin**
For each element of each element of this new array,
create a new array like so:
[[1,1,2,2],[1,2,1,2]] ---> [[1,1],[1,2],[2,1],[2,2]]
Call this new array combo_indices
Then choose the columns (features) from the data in a loop using:
new_data = data[:, combo_indices[j]]
combinations = np.mgrid[1:5,1:5]
test_feature_combos(data, combinations)
I concede that this approach means a lot of unnecessary combinations due to repeats, however I cannot even implement this so beggars can not be choosers.
Please can someone advise me on how I can either a) implement my approach or b) achieve this goal in a much more elegant way.
Thanks in advance, and let me know if any clarification needs to be made, this was tough to explain.
To generate all combinations of k elements drawn without replacement from a set of size n you can use itertools.combinations, e.g.:
idx = np.vstack(itertools.combinations(range(n), k)) # an (n, k) array of indices
For the special case where k=2 it's often faster to use the indices of the upper triangle of an n x n matrix, e.g.:
idx = np.vstack(np.triu_indices(n, 1)).T

Numpy: average over one dimension in "jagged" 3D array

Suppose I have an N*M*X-dimensional array "data", where N and M are fixed, but X is variable for each entry data[n][m].
(Edit: To clarify, I just used np.array() on the 3D python list which I used for reading in the data, so the numpy array is of dimensions N*M and its entries are variable-length lists)
I'd now like to compute the average over the X-dimension, so that I'm left with an N*M-dimensional array. Using np.average/mean with the axis-argument doesn't work, so the way I'm doing it right now is just iterating over N and M and appending the manually computed average to a new list, but that just doesn't feel very "python":
avgData=[]
for n in data:
temp=[]
for m in n:
temp.append(np.average(m))
avgData.append(temp)
Am I missing something obvious here? I'm trying to freshen up my python skills while I'm at it, so interesting/varied responses are more than welcome! :)
Thanks!
What about using np.vectorize:
do_avg = np.vectorize(np.average)
data_2d = do_avg(data)
data = np.array([[1,2,3],[0,3,2,4],[0,2],[1]]).reshape(2,2)
avg=np.zeros(data.shape)
avg.flat=[np.average(x) for x in data.flat]
print avg
#array([[ 2. , 2.25],
# [ 1. , 1. ]])
This still iterates over the elements of data (nothing un-Pythonic about that). But since there's nothing special about the shape or axes of data, I'm just using data.flat. While appending to Python list, with numpy it is better to assign values to the elements of an existing array.
There are fast numeric methods to work with numpy arrays, but most (if not all) work with simple numeric dtypes. Here the array elements are object (either list or array), numpy has to resort to the usual Python iteration and list operations.
For this small example, this solution is a bit faster than Zwicker's vectorize. For larger data the two solutions take about the same time.

How to return an array of at least 4D: efficient method to simulate numpy.atleast_4d

numpy provides three handy routines to turn an array into at least a 1D, 2D, or 3D array, e.g. through numpy.atleast_3d
I need the equivalent for one more dimension: atleast_4d. I can think of various ways using nested if statements but I was wondering whether there is a more efficient and faster method of returning the array in question. In you answer, I would be interested to see an estimate (O(n)) of the speed of execution if you can.
The np.array method has an optional ndmin keyword argument that:
Specifies the minimum number of dimensions that the resulting array
should have. Ones will be pre-pended to the shape as needed to meet
this requirement.
If you also set copy=False you should get close to what you are after.
As a do-it-yourself alternative, if you want extra dimensions trailing rather than leading:
arr.shape += (1,) * (4 - arr.ndim)
Why couldn't it just be something as simple as this:
import numpy as np
def atleast_4d(x):
if x.ndim < 4:
y = np.expand_dims(np.atleast_3d(x), axis=3)
else:
y = x
return y
ie. if the number of dimensions is less than four, call atleast_3d and append an extra dimension on the end, otherwise just return the array unchanged.

Compute outer product of arrays with arbitrary dimensions

I have two arrays A,B and want to take the outer product on their last dimension,
e.g.
result[:,i,j]=A[:,i]*B[:,j]
when A,B are 2-dimensional.
How can I do this if I don't know whether they will be 2 or 3 dimensional?
In my specific problem A,B are slices out of a bigger 3-dimensional array Z,
Sometimes this may be called with integer indices A=Z[:,1,:], B=Z[:,2,:] and other times
with slices A=Z[:,1:3,:],B=Z[:,4:6,:].
Since scipy "squeezes" singleton dimensions, I won't know what dimensions my inputs
will be.
The array-outer-product I'm trying to define should satisfy
array_outer_product( Y[a,b,:], Z[i,j,:] ) == scipy.outer( Y[a,b,:], Z[i,j,:] )
array_outer_product( Y[a:a+N,b,:], Z[i:i+N,j,:])[n,:,:] == scipy.outer( Y[a+n,b,:], Z[i+n,j,:] )
array_outer_product( Y[a:a+N,b:b+M,:], Z[i:i+N, j:j+M,:] )[n,m,:,:]==scipy.outer( Y[a+n,b+m,:] , Z[i+n,j+m,:] )
for any rank-3 arrays Y,Z and integers a,b,...i,j,k...n,N,...
The kind of problem I'm dealing with involves a 2-D spatial grid, with a vector-valued function at each grid point. I want to be able to calculate the covariance matrix (outer product) of these vectors, over regions defined by slices in the first two axes.
You may have some luck with einsum :
http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html
After discovering the use of ellipsis in numpy/scipy arrays
I ended up implementing it as a recursive function:
def array_outer_product(A, B, result=None):
''' Compute the outer-product in the final two dimensions of the given arrays.
If the result array is provided, the results are written into it.
'''
assert(A.shape[:-1] == B.shape[:-1])
if result is None:
result=scipy.zeros(A.shape+B.shape[-1:], dtype=A.dtype)
if A.ndim==1:
result[:,:]=scipy.outer(A, B)
else:
for idx in xrange(A.shape[0]):
array_outer_product(A[idx,...], B[idx,...], result[idx,...])
return result
Assuming I've understood you correctly, I encountered a similar issue in my research a couple weeks ago. I realized that the Kronecker product is simply an outer product which preserves dimensionality. Thus, you could do something like this:
import numpy as np
# Generate some data
a = np.random.random((3,2,4))
b = np.random.random((2,5))
# Now compute the Kronecker delta function
c = np.kron(a,b)
# Check the shape
np.prod(c.shape) == np.prod(a.shape)*np.prod(b.shape)
I'm not sure what shape you want at the end, but you could use array slicing in combination with np.rollaxis, np.reshape, np.ravel (etc.) to shuffle things around as you wish. I guess the downside of this is that it does some extra calculations. This may or may not matter, depending on your limitations.

Categories