Numpy Vectorization While Indexing Two Arrays - python

I'm trying to vectorize the following function using numpy and am completely lost.
A = ndarray: Z x 3
B = ndarray: Z x 3
C = integer
D = ndarray: C x 3
Pseudocode:
entries = []
means = []
For i in range(C):
for p in range(len(B)):
if B[p] == D[i]:
entries.append(A[p])
means.append(columnwise_means(entries))
return means
an example would be :
A = [[1,2,3],[1,2,3],[4,5,6],[4,5,6]]
B = [[9,8,7],[7,6,5],[1,2,3],[3,4,5]]
C = 2
D = [[1,2,3],[4,5,6]]
Returns:
[average([9,8,7],[7,6,5]), average(([1,2,3],[3,4,5])] = [[8,7,6],[2,3,4]]
I've tried using np.where, np.argwhere, np.mean, etc but can't seem to get the desired effect. Any help would be greatly appreciated.
Thanks!

Going by the expected output of the question, I am assuming that in the actual code, you would have :
IF conditional statement as : if A[p] == D[i], and
Entries would be appended from B : entries.append(B[p]).
So, here's one vectorized approach with NumPy broadcasting and dot-product -
mask = (D[:,None,:] == A).all(-1)
out = mask.dot(B)/(mask.sum(1)[:,None])
If the input arrays are integer arrays, then you can save on memory and boost up performance, considering the arrays as indices of a n-dimensional array and thus create the 2D mask without going 3D like so -
dims = np.maximum(A.max(0),D.max(0))+1
mask = np.ravel_multi_index(D.T,dims)[:,None] == np.ravel_multi_index(A.T,dims)
Sample run -
In [107]: A
Out[107]:
array([[1, 2, 3],
[1, 2, 3],
[4, 5, 6],
[4, 5, 6]])
In [108]: B
Out[108]:
array([[9, 8, 7],
[7, 6, 5],
[1, 2, 3],
[3, 4, 5]])
In [109]: mask = (D[:,None,:] == A).all(-1)
...: out = mask.dot(B)/(mask.sum(1)[:,None])
...:
In [110]: out
Out[110]:
array([[8, 7, 6],
[2, 3, 4]])

I see two hints :
First, comparing array by rows. A way to do that is to simplify you index system in 1D :
def indexer(M,base=256):
return (M*base**arange(3)).sum(axis=1)
base is an integer > A.max() . Then the selection can be done like that :
indices=np.equal.outer(indexer(D),indexer(A))
for :
array([[ True, True, False, False],
[False, False, True, True]], dtype=bool)
Second, each group can have different length, so vectorisation is difficult for the last step. Here a way to do achieve the job.
B=array(B)
means=[B[i].mean(axis=0) for i in indices]

Related

Numpy: multiply first elements n elements along an axis where n is given by an array

I have a 3D numpy array, e.g. A = np.random.rand(a, b, c), and I want to compute the product of the first n elements across the last axis (i.e. axis=2), where each n is given by an array N with N.shape = (a, b). The end result is a 2d array with shape (a, b).
For instance, let's say one slice (e.g. for a=0, b=0) is [3, 2, 5, 7], and N[0, 0] = 3, then I want the product 3*2*5, that is, multiply the first n=3 elements.
Is there any efficient way of doing this without resorting to very slow quasi-for-loop solitions like np.fromiter or np.vectorize?
Edit: as per request, a minimal example
A = np.array([
[[1, 2, 3], [4, 5, 6]],
[[1, 2, 1], [3, 2, 4]]
])
N = np.array([
[2, 1],
[3, 2]
])
# desired result using a for loop:
desired_result = np.full(N.shape, np.nan)
for a in range(A.shape[0]):
for b in range(A.shape[1]):
# multiply the first n=N[a, b] elements
desired_result[a, b] = np.product(A[a, b][:N[a, b]])
print(desired_result)
# output = array([[2., 4.], [2., 6.]])
Approach #1
Here's one vectorized way leveraging broadcasting -
# Mask of same shape as 3D input array and thats has True from 0th till N[a, b]]
# for each element in N.
In [22]: m = N[...,None] > np.arange(A.shape[2])
# Use it to create an array where all alements with True in mask are A,
# 1s otherwise. The idea is when prod reduced along the last axis those False
# from mask will not affect, while valid ones will be prod-reduced with proper
# values.
In [23]: np.where(m,A,1).prod(-1)
Out[23]:
array([[2, 4],
[2, 6]])
Alternatively, using numexpr to leverage multi-cores, as we will translate the masking steps from earlier to mathematical ones -
In [14]: import numexpr as ne
In [15]: ne.evaluate('prod(m*A + ~m,2)')
Out[15]:
array([[2, 4],
[2, 6]], dtype=int64)
Approach #2
Based on this idea, here's one with np.multiply.reduceat -
s0 = np.arange(0, A.size, A.shape[2])
p = np.stack((s0, N.ravel()+s0),axis=1).ravel()
out = np.multiply.reduceat(A.ravel(), p)[::2].reshape(N.shape)

How to join/connect two list in python? [duplicate]

This question already has an answer here:
concatenate two one-dimensional to two columns array
(1 answer)
Closed 7 years ago.
I want to concatenate two arrays vertically in Python using the NumPy package:
a = array([1,2,3,4])
b = array([5,6,7,8])
I want something like this:
c = array([[1,2,3,4],[5,6,7,8]])
How we can do that using the concatenate function? I checked these two functions but the results are the same:
c = concatenate((a,b),axis=0)
# or
c = concatenate((a,b),axis=1)
We have this in both of these functions:
c = array([1,2,3,4,5,6,7,8])
The problem is that both a and b are 1D arrays and so there's only one axis to join them on.
Instead, you can use vstack (v for vertical):
>>> np.vstack((a,b))
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Also, row_stack is an alias of the vstack function:
>>> np.row_stack((a,b))
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
It's also worth noting that multiple arrays of the same length can be stacked at once. For instance, np.vstack((a,b,x,y)) would have four rows.
Under the hood, vstack works by making sure that each array has at least two dimensions (using atleast_2D) and then calling concatenate to join these arrays on the first axis (axis=0).
Use np.vstack:
In [4]:
import numpy as np
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
c = np.vstack((a,b))
c
Out[4]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In [5]:
d = np.array ([[1,2,3,4],[5,6,7,8]])
d
​
Out[5]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In [6]:
np.equal(c,d)
Out[6]:
array([[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
Maybe it's not a good solution, but it's simple way to makes your code works, just add reshape:
a = array([1,2,3,4])
b = array([5,6,7,8])
c = concatenate((a,b),axis=0).reshape((2,4))
print c
out:
[[1 2 3 4]
[5 6 7 8]]
In general if you have more than 2 arrays with the same length:
reshape((number_of_arrays, length_of_array))
To use concatenate, you need to make a and b 2D arrays instead of 1D, as in
c = concatenate((atleast_2d(a), atleast_2d(b)))
Alternatively, you can simply do
c = array((a,b))

NumPy/Python array

I created a NumPy array,
a = numpy.array([[1,2,3][4,5,6]])
I want to make the array look like this [[1,4],[2,5],[3,6]] and also after I make change I want to return to the original structure.
Is there a NumPy command to run a function on all values, like a[0] * 2?
The result should be
[[2,8][2,5][3,6]
You want to transpose the array (think matrices). Numpy arrays have a method for that:
a = np.array([[1,2,3],[4,5,6]])
b = a.T # or a.transpose()
But note that b is now a view of a; if you change b, a changes as well (this saves memory and time otherwise spent copying).
You can change the first column of b with
b[0] *= 2
Which gives you the result you want, but a has also changed! If you don't want that, use
b = a.T.copy()
If you do want to change a, note that you can also immediately change the values you want in a itself:
a[:, 0] *= 2
You can use zip on the ndarray and pass it to numpy.array:
In [36]: a = np.array([[1,2,3], [4,5,6]])
In [37]: b = np.array(zip(*a))
In [38]: b
Out[38]:
array([[1, 4],
[2, 5],
[3, 6]])
In [39]: b*2
Out[39]:
array([[ 2, 8],
[ 4, 10],
[ 6, 12]])
Use numpy.column_stack for a pure NumPy solution:
In [44]: b = np.column_stack(a)
In [45]: b
Out[45]:
array([[1, 4],
[2, 5],
[3, 6]])
In [46]: b*2
Out[46]:
array([[ 2, 8],
[ 4, 10],
[ 6, 12]])

Select a submatrix based on diagonal value

I want to select a submatrix of a numpy matrix based on whether the diagonal is less than some cutoff value. For example, given the matrix:
Test = array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8],
[5,6,7,8,9]])
I want to select the rows and columns where the diagonal value is less than, say, 6. In this example, the diagonal values are sorted, so that I could just take Test[:3,:3], but in the general problem I want to solve this isn't the case.
The following snippet works:
def MatrixCut(M,Ecut):
D = diag(M)
indices = D<Ecut
n = sum(indices)
NewM = zeros((n,n),'d')
ii = -1
for i,ibool in enumerate(indices):
if ibool:
ii += 1
jj = -1
for j,jbool in enumerate(indices):
if jbool:
jj += 1
NewM[ii,jj] = M[i,j]
return NewM
print MatrixCut(Test,6)
[[ 1. 2. 3.]
[ 2. 3. 4.]
[ 3. 4. 5.]]
However, this is fugly code, with all kinds of dangerous things like initializing the ii/jj indices to -1, which won't cause an error if somehow I get into the loop and take M[-1,-1].
Plus, there must be a numpythonic way of doing this. For a one-dimensional array, you could do:
D = diag(A)
A[D<Ecut]
But the analogous thing for a 2d array doesn't work:
D = diag(Test)
Test[D<6,D<6]
array([1, 3, 5])
Is there a good way to do this? Thanks in advance.
This also works when the diagonals are not sorted:
In [7]: Test = array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8],
[5,6,7,8,9]])
In [8]: d = np.argwhere(np.diag(Test) < 6).squeeze()
In [9]: Test[d][:,d]
Out[9]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
Alternately, to use a single subscript call, you could do:
In [10]: d = np.argwhere(np.diag(Test) < 6)
In [11]: Test[d, d.flat]
Out[11]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
[UPDATE]: Explanation of the second form.
At first, it may be tempting to just try Test[d, d] but that will only extract elements from the diagonal of the array:
In [75]: Test[d, d]
Out[75]:
array([[1],
[3],
[5]])
The problem is that d has shape (3, 1) so if we use d in both subscripts, the output array will have the same shape as d. The d.flat is equivalent to using d.flatten() or d.ravel() (except flat just returns an iterator instead of an array). The effect is that the result has shape (3,):
In [76]: d
Out[76]:
array([[0],
[1],
[2]])
In [77]: d.flatten()
Out[77]: array([0, 1, 2])
In [79]: print d.shape, d.flatten().shape
(3, 1) (3,)
The reason Test[d, d.flat] works is because numpy's general broadcasting rules cause the last dimension of d (which is 1) to be broadcast to the last (and only) dimension of d.flat (which is 3). Similarly, d.flat is broadcast to match the first dimension of d. The result is two (3,3) index arrays, which are equivalent to the following arrays i and j:
In [80]: dd = d.flatten()
In [81]: i = np.hstack((d, d, d)
In [82]: j = np.vstack((dd, dd, dd))
In [83]: print i
[[0 0 0]
[1 1 1]
[2 2 2]]
In [84]: print j
[[0 1 2]
[0 1 2]
[0 1 2]]
And just to make sure they work:
In [85]: Test[i, j]
Out[85]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
The only way I found to solve your task is somewhat tricky
>>> Test[[[i] for i,x in enumerate(D<6) if x], D<6]
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
possibly not the best one. Based on this answer.
Or (thanks to #bogatron or reminding me argwhere):
>>> Test[np.argwhere(D<6), D<6]
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])

removing columns from an array in Python

I have a 2D Python array, from which I would like to remove certain columns, but I don't know how many I would like to remove until the code runs.
I want to loop over the columns in the original array, and if the sum of the rows in any one column is about a certain value I want to remove the whole column.
I started to do this the following way:
for i in range(original_number_of_columns)
if sum(original_array[:,i]) < certain_value:
new_array[:,new_index] = original_array[:,i]
new_index+=1
But then I realised that I was going to have to define new_array first, and tell Python what size it is. But I don't know what size it is going to be beforehand.
I have got around it before by firstly looping over the columns to find out how many I will lose, then defining the new_array, and then lastly running the loop above - but obviously there will be much more efficient ways to do such things!
Thank you.
You can use the following:
import numpy as np
a = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
)
print a.compress(a.sum(0) > 15, 1)
[[3]
[6]
[9]]
without numpy
my_2d_table = [[...],[...],...]
only_cols_that_sum_lt_x = [col for col in zip(*my_2d_table) if sum(col) < some_threshold]
new_table = map(list,zip(*only_cols_that_sum_lt_x))
with numpy
a = np.array(my_2d_table)
a[:,np.sum(a,0) < some_target]
I suggest using numpy.compress.
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, -3, 2], [4, 5, 7]])
>>> a
array([[ 1, 2, 3],
[ 1, -3, 2],
[ 4, 5, 7]])
>>> a.sum(axis=0) # sums each column
array([ 6, 4, 12])
>>> a.sum(0) < 5
array([ False, True, False], dtype=bool)
>>> a.compress(a.sum(0) < 5, axis=1) # applies the condition to the elements of each row so that only those elements in the rows whose column indices correspond to True values in the condition array will be kept
array([[ 2],
[-3],
[ 5]])

Categories