Numpy mixing arrays with multiple index arrays - python

I have a 3d mesh with points and the locations of the points are
in an array that looks like this:
mesh_vectors = np.array([[-0.85758871, 0.8965745 , -0.1427767 ],
[-0.23945311, 1.00544977, 1.45797086],
[-0.57341832, -1.07448494, -0.11827722],
[ 0.05894491, -0.97208506, 1.47583127],
[-0.71402085, -0.08872638, -0.12916484],
[-0.09181146, 1.01235461, 0.47418442],
[-0.09025362, 0.01668115, 1.46690106],
[ 0.19773833, -0.95349348, 0.49089319],
[ 0.05055711, 0.02909645, 0.48503664]])
I have two indexing arrays:
idx1 = np.array([4 2 1 6 5 0 1 5])
idx2 = np.array([6 3 0 4 7 2 3 7])
these translations correspond to the index arrays:
translate_1 = np.array([[ 0.00323021 0.00047712 -0.00422925]
[ 0.00153422 0.00022654 -0.00203258]
[ 0.00273207 0.00039626 0.00038201]
[ 0.0052439 0.00075993 0.00068843]
[-0.00414245 -0.00053918 0.00543974]
[-0.00681844 -0.00084955 0.00894626]
[ 0. 0. 0. ]
[-0.00672519 -0.00099897 -0.00090189]])
translate_2 = np.array([[ 0.00523871 0.00079512 0.00068814]
[ 0.00251901 0.00038234 0.00033379]
[ 0.00169134 0.00021078 -0.00218737]
[ 0.00324106 0.00040338 -0.00422859]
[-0.00413547 -0.00058669 0.00544016]
[-0.00681223 -0.0008921 0.00894669]
[ 0. 0. 0. ]
[-0.00672553 -0.00099677 -0.00090191]])
they are currently added to the mesh like this:
mesh_vectors[idx1] += translate_1
mesh_vectors[idx2] += translate_2
The trouble is, what I really need to add isn't the translations
but the mean of the translations where multiple translations are
applied to the same mesh point. The indexing arrays can have indices occurring in a variety of different frequencies. Could be [2,2,2,3,4,5] and [1,2,1,1,5,4] though they will always be the same size. I'm trying to do this with numpy for speed but I have the options of using loops on start to generate indexing arrays if needed.
Thanks in advance!

This works:
scaled_tr1 = translate_1 / np.bincount(idx1)[idx1,None]
np.add.at(mesh_vectors, idx1, scaled_tr1)
Note that the use of np.add.at instead of fancy indexing is required:
ufunc.at(a, indices, b=None)
Performs unbuffered in place operation on operand a for elements specified by indices. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once. For example, a[[0,0]] += 1 will only increment the first element once because of buffering, whereas add.at(a, [0,0], 1) will increment the first element twice.

Related

Replacing entries in a numpy array with their quantile index with python

I have a one-dimensional numpy array with numbers, and I want each number replaced with the index of the quantile it belongs to.
This is my code for quintile indices:
import numpy as np
def get_quintile_indices( a ):
result = np.ones( a.shape[ 0 ] ) * 4
quintiles = [
np.percentile( a, 20 ),
np.percentile( a, 40 ),
np.percentile( a, 60 ),
np.percentile( a, 80 )
]
for q in quintiles:
result -= np.less_equal( a, q ) * 1
return result
a = np.array( [ 58, 54, 98, 76, 35, 13, 62, 18, 62, 97, 44, 43 ] )
print get_quintile_indices( a )
Output:
[ 2. 2. 4. 4. 0. 0. 3. 0. 3. 4. 1. 1.]
You see I start with an array initialized with the highest possible index and for every quintile cutpoint substract 1 from each entry that is less or equal than the quintile cutpoint. Is there a better way to do this? A build-in function that can be used to map numbers against a list of cutpoints?
First off, we can generate those quintiles in one go -
quintiles = np.percentile( a, [20,40,60,80] )
For the final step to get the offsets, we can simply use np.searchsorted and this might be the built-in you were looking for, like so -
out = np.searchsorted(quintiles, a)
Alternatively, a direct translation of your loopy code to a vectorized version would be with broadcasting, like so -
# Use broadcasting to perform those comparisons in one go.
# Then, simply sum along the first axis and subtract from 4.
out = 4 - (quintiles[:,None] >= a).sum(0)
If quintiles is a list, we need to assign it as an array and then use broadcasting, like so -
out = 4 - (np.asarray(quintiles)[:,None] >= a).sum(0)

Changing a numpy array in place based on values in a given column?

I have a numpy array:
array([[ 0.68597575, 0.05544651, -1. ],
[ 0.33494648, 0.46368367, 1. ],
[ 0.42486765, 0.89427025, 1. ],
[ 0.62408611, 0.64633939, 1. ],
[ 0.37087957, 0.53077302, -1. ],
[ 0.21664159, 0.10786084, -1. ],
[ 0.13003626, 0.18425347, -1. ]])
I want the rows having last values -1 to be multiplied by -1 and also replaced in the actual matrix.
I tried this:
def transform(data):
for row in data:
if row[-1] == -1:
row = row * -1
but I know there would be something simpler than this.
You can avoid the for loop by doing:
data[data[:, -1] == -1] *= -1
i prefer to do this in two steps for two reasons: (i) easier to understand the code; and (ii) often the boolean index created in the first step can be reused elsewhere in the data pipeline
create the Boolean index that selects the rows:
idx = M[:,-1] == -1
do the transform on the indexed data:
M[idx,] *= -1

Applying several functions to each row of an array

I have a numpy array which has only a few non-zero entries which can be either positive or negative. E.g. something like this:
myArray = np.array([[ 0. , 0. , 0. ],
[ 0.32, -6.79, 0. ],
[ 0. , 0. , 0. ],
[ 0. , 1.5 , 0. ],
[ 0. , 0. , -1.71]])
In the end, I would like to receive a list where each entry of this list corresponds to a row of myArray and is a cumulative product of function outputs which depend on the entries of the respective row of myArray and another list (in the example below it is called l).
The individual terms depend on the sign of the myArray entry: When it is positive, I apply "funPos", when it is negative, I apply "funNeg" and if the entry is 0, the term will be 1. So in the example array from above it would be:
output = [1*1*1 ,
funPos(0.32, l[0])*funNeg(-6.79,l[1])*1,
1*1*1,
1*funPos(1.5, l[1])*1,
1*1*funNeg(-1.71, l[2])]
I implemented this as shown below and it gives me the desired output (note: that is just a highly simplified toy example; the actual matrices are far bigger and the functions more complicated). I go through each row of the array, if the sum of the row is 0, I don't have to do any calculations and the output is just 1. If it is not equal 0, I go through this row, check the sign of each value and apply the appropriate function.
import numpy as np
def doCalcOnArray(Array1, myList):
output = np.ones(Array1.shape[0]) #initialize output
for indRow,row in enumerate(Array1):
if sum(row) != 0: #only then calculations are needed
tempProd = 1. #initialize the product that corresponds to the row
for indCol, valCol in enumerate(row):
if valCol > 0:
tempVal = funPos(valCol, myList[indCol])
elif valCol < 0:
tempVal = funNeg(valCol, myList[indCol])
elif valCol == 0:
tempVal = 1
tempProd = tempProd*tempVal
output[indRow] = tempProd
return output
def funPos(val1,val2):
return val1*val2
def funNeg(val1,val2):
return val1*(val2+1)
myArray = np.array([[ 0. , 0. , 0. ],
[ 0.32, -6.79, 0. ],
[ 0. , 0. , 0. ],
[ 0. , 1.5 , 0. ],
[ 0. , 0. , -1.71]])
l = [1.1, 2., 3.4]
op = doCalcOnArray(myArray,l)
print op
The output is
[ 1. -7.17024 1. 3. -7.524 ]
which is the desired one.
My question is whether there is a more efficient way for doing that since that is quite "expensive" for large arrays.
EDIT:
I accepted gabhijit's answer because the pure numpy solution he came up with seems to be the fastest one for the arrays I am dealing with. Please note, that there is also a nice working solution from RaJa that requires panda and also the solution from dave works fine which can serve as a nice example on how to use generators and numpy's "apply_along_axis".
Here's what I have tried - using reduce, map. I am not sure how fast this is - but is this what you are trying to do?
Edit 4: Simplest and most readable - Make l a numpy array and then greatly simplifies where.
import numpy as np
import time
l = np.array([1.0, 2.0, 3.0])
def posFunc(x,y):
return x*y
def negFunc(x,y):
return x*(y+1)
def myFunc(x, y):
if x > 0:
return posFunc(x, y)
if x < 0:
return negFunc(x, y)
else:
return 1.0
myArray = np.array([
[ 0.,0.,0.],
[ 0.32, -6.79, 0.],
[ 0.,0.,0.],
[ 0.,1.5,0.],
[ 0.,0., -1.71]])
t1 = time.time()
a = np.array([reduce(lambda x, (y,z): x*myFunc(z,l[y]), enumerate(x), 1) for x in myArray])
t2 = time.time()
print (t2-t1)*1000000
print a
Basically let's just look at last line it says cumulatively multiply things in enumerate(xx), starting with 1 (last parameter to reduce). myFunc simply takes the element in myArray(row) and element # index row in l and multiplies them as needed.
My output is not same as yours - so I am not sure whether this is exactly what you want, but may be you can follow the logic.
Also I am not so sure how fast this will be for huge arrays.
edit: Following is a 'pure numpy way' to do this.
my = myArray # just for brevity
t1 = time.time()
# First set the positive and negative values
# complicated - [my.itemset((x,y), posFunc(my.item(x,y), l[y])) for (x,y) in zip(*np.where(my > 0))]
# changed to
my = np.where(my > 0, my*l, my)
# complicated - [my.itemset((x,y), negFunc(my.item(x,y), l[y])) for (x,y) in zip(*np.where(my < 0))]
# changed to
my = np.where(my < 0, my*(l+1), my)
# print my - commented out to time it.
# Now set the zeroes to 1.0s
my = np.where(my == 0.0, 1.0, my)
# print my - commented out to time it
a = np.prod(my, axis=1)
t2 = time.time()
print (t2-t1)*1000000
print a
Let me try to explain the zip(*np.where(my != 0)) part as best as I can. np.where simply returns two numpy arrays first array is an index of row, second array is an index of column that matches the condition (my != 0) in this case. We take a tuple of those indices and then use array.itemset and array.item, thankfully, column index is available for free to us, so we can just take the element # that index in the list l. This should be faster than previous (and by orders of magnitude readable!!). Need to timeit to find out whether it indeed is.
Edit 2: Don't have to call separately for positive and negative can be done with one call np.where(my != 0).
So, let's see if I understand your question.
You want to map elements of your matrix to a new matrix such that:
0 maps to 1
x>0 maps to funPos(x)
x<0 maps to funNeg(x)
You want to calculate the product of all elements in the rows this new matrix.
So, here's how I would go about doing it:
1:
def myFun(a):
if a==0:
return 1
if a>0:
return funPos(a)
if a<0:
return funNeg(a)
newFun = np.vectorize(myFun)
newArray = newFun(myArray)
And for 2:
np.prod(newArray, axis = 1)
Edit: To pass the index to funPos, funNeg, you can probably do something like this:
# Python 2.7
r,c = myArray.shape
ctr = -1 # I don't understand why this should be -1 instead of 0
def myFun(a):
global ctr
global c
ind = ctr % c
ctr += 1
if a==0:
return 1
if a>0:
return funPos(a,l[ind])
if a<0:
return funNeg(a,l[ind])
I think this numpy function would be helpful to you
numpy.apply_along_axis
Here is one implementation. Also I would warn against checking if the sum of the array is 0. Comparing floats to 0 can give unexpected behavior due to machine accuracy constraints. Also if you have -5 and 5 the sum is zero and I'm not sure thats what you want. I used numpy's any() function to see if anything was nonzero. For simplicity I also pulled your list (my_list) into global scope.
import numpy as np
my_list = 1.1, 2., 3.4
def func_pos(val1, val2):
return val1 * val2
def func_neg(val1, val2):
return val1 *(val2 + 1)
def my_generator(row):
for i, a in enumerate(row):
if a > 0:
yield func_pos(a, my_list[i])
elif a < 0:
yield func_neg(a, my_list[i])
else:
yield 1
def reduce_row(row):
if not row.any():
return 1.0
else:
return np.prod(np.fromiter(my_generator(row), dtype=float))
def main():
myArray = np.array([
[ 0. , 0. , 0. ],
[ 0.32, -6.79, 0. ],
[ 0. , 0. , 0. ],
[ 0. , 1.5 , 0. ],
[ 0. , 0. , -1.71]])
return np.apply_along_axis(reduce_row, axis=1, arr=myArray)
There are probably faster implmentations, I think apply_along_axis is really just a loop under the covers.
I didn't test, but I bet this is faster than what you started with, and should be more memory efficient.
I've tried your example with the masking function of numpy arrays. However, I couldn't find a solution to replace the values in your array by funPos or funNeg.
So my suggestion would be to try this using pandas instead as it conserves indices while masking.
See my example:
import numpy as np
import pandas as pd
def funPos(a, b):
return a * b
def funNeg(a, b):
return a * (b + 1)
myPosFunc = np.vectorize(funPos) #vectorized form of funPos
myNegFunc = np.vectorize(funNeg) #vectorized form of funNeg
#Input
I = [1.0, 2.0, 3.0]
x = pd.DataFrame([
[ 0.,0.,0.],
[ 0.32, -6.79, 0.],
[ 0.,0.,0.],
[ 0.,1.5,0.],
[ 0.,0., -1.71]])
b = pd.DataFrame(myPosFunc(x[x>0], I)) #calculate all positive values
c = pd.DataFrame(myNegFunc(x[x<0], I)) #calculate all negative values
b = b.combineMult(c) #put values of c in b
b = b.fillna(1) #replace all missing values that were '0' in the raw array
y = b.product() #multiply all elements in one row
#Output
print ('final result')
print (y)
print (y.tolist())

Get indices of matrix from upper triangle

I have a symmetric matrix represented as a numpy array, like the following example:
[[ 1. 0.01735908 0.01628629 0.0183845 0.01678901 0.00990739 0.03326491 0.0167446 ]
[ 0.01735908 1. 0.0213712 0.02364181 0.02603567 0.01807505 0.0130358 0.0107082 ]
[ 0.01628629 0.0213712 1. 0.01293289 0.02041379 0.01791615 0.00991932 0.01632739]
[ 0.0183845 0.02364181 0.01293289 1. 0.02429031 0.01190878 0.02007371 0.01399866]
[ 0.01678901 0.02603567 0.02041379 0.02429031 1. 0.01496896 0.00924174 0.00698689]
[ 0.00990739 0.01807505 0.01791615 0.01190878 0.01496896 1. 0.0110924 0.01514519]
[ 0.03326491 0.0130358 0.00991932 0.02007371 0.00924174 0.0110924 1. 0.00808803]
[ 0.0167446 0.0107082 0.01632739 0.01399866 0.00698689 0.01514519 0.00808803 1. ]]
And I need to find the indices (row and column) of the greatest value without considering the diagonal. Since is a symmetric matrix I just took the the upper triangle of the matrix.
ind = np.triu_indices(M_size, 1)
And then the index of the max value
max_ind = np.argmax(H[ind])
However max_ind is the index of the vector resulting after taking the upper triangle with triu_indices, how do I know which are the row and column of the value I've just found?
The matrix could be any size but it's always symmetric. Do you know a better method to achieve the same?
Thank you
Couldn't you do this by using np.triu to return a copy of your matrix with all but the upper triangle zeroed, then just use np.argmax and np.unravel_index to get the row/column indices?
Example:
x = np.zeros((10,10))
x[3, 8] = 1
upper = np.triu(x, 1)
idx = np.argmax(upper)
row, col = np.unravel_index(idx, upper.shape)
The drawback of this method is that it creates a copy of the input matrix, but it should still be a lot quicker than looping over elements in Python. It also assumes that the maximum value in the upper triangle is > 0.
You can use the value of max_ind as an index into the ind data
max_ind = np.argmax(H[ind])
Out: 23
ind[0][max_ind], ind[1][max_ind],
Out: (4, 6)
Validate this by looking for the maximum in the entire matrix (won't always work -- data-dependent):
np.unravel_index(np.argmax(H), H.shape)
Out: (4, 6)
There's probably a neater "numpy way" to do this, but this is what comest to mind first:
answer = None
biggest = 0
for r,row in enumerate(matrix):
i,elem = max(enumerate(row[r+1:]), key=operator.itemgetter(1))
if elem > biggest:
biggest, answre = elem, i

Assigning identical array indices at once in Python/Numpy

I want to find a fast way (without for loop) in Python to assign reoccuring indices of an array.
This is the desired result using a for loop:
import numpy as np
a=np.arange(9, dtype=np.float64).reshape((3,3))
# The array indices: [2,3,4] are identical.
Px = np.uint64(np.array([0,1,1,1,2]))
Py = np.uint64(np.array([0,0,0,0,0]))
# The array to be added at the array indices (may also contain random numbers).
x = np.array([.1,.1,.1,.1,.1])
for m in np.arange(len(x)):
a[Px[m]][Py[m]] += x
print a
%[[ 0.1 1. 2.]
%[ 3.3 4. 5.]
%[ 6.1 7. 8.]]
When I try to add x to a at the indices Px,Py I obviously do not get the same result (3.3 vs. 3.1):
a[Px,Py] += x
print a
%[[ 0.1 1. 2.]
%[ 3.1 4. 5.]
%[ 6.1 7. 8.]]
Is there a way to do this with numpy? Thanks.
Yes, it can be done, but it is a little tricky:
# convert yourmulti-dim indices to flat indices
flat_idx = np.ravel_multi_index((Px, Py), dims=a.shape)
# extract the unique indices and their position
unique_idx, idx_idx = np.unique(flat_idx, return_inverse=True)
# Aggregate the repeated indices
deltas = np.bincount(idx_idx, weights=x)
# Sum them to your array
a.flat[unique_idx] += deltas

Categories