Value in an array between two numbers in python - python

So making a title that actually explains what i want is harder than i thought, so here goes me explaining it.
I have an array filled with zeros that adds values every time a condition is met, so after 1 time step iteration i get something like this (minus the headers):
current_array =
bubble_size y_coord
14040 42
3943 71
6345 11
0 0
0 0
....
After this time step is complete this current_array gets set as previous_array and is wiped with zeros because there is not a guaranteed number of entries each time.
NOW the real question is i want to be able to check all rows in the first column of the previous_array and see if the current bubble size is within say 5% either side and if so i want to take the current y position away for the value associated with the matching bubble size number in the previous_array's second column.
currently i have something like;
if bubble_size in current_array[:, 0]:
do_whatever
but i don't know how to pull out the associated y_coord without using a loop, which i am fine with doing (there is about 100 rows to the array and atleast 1000 time steps so i want to make it as efficient as possible) but would like to avoid
i have included my thoughts on the for loop (note the current and previous_array are actually current and previous_frame)
for y in range (0, array_size):
if bubble_size >> previous_frame[y,0] *.95 &&<< previous_frame[y, 0] *1.05:
distance_travelled = current_y_coord - previous_frame[y,0]
y = y + 1
Any help is greatly appreciated :)

I probably did not get your issue here but if you want to first check if the bubble size is in between the same row element 95 % you can use the following:
import numpy as np
def apply(p, c): # For each element check the bubblesize grow
if(p*0.95 < c < p*1.05):
return 1
else:
return 0
def dist(p, c): # Calculate the distance
return c-p
def update(prev, cur):
assert isinstance(
cur, np.ndarray), 'Current array is not a valid numpy array'
assert isinstance(
prev, np.ndarray), 'Previous array is not a valid numpy array'
assert prev.shape == cur.shape, 'Arrays size mismatch'
applyvec = np.vectorize(apply)
toapply = applyvec(prev[:, 0], cur[:, 0])
print(toapply)
distvec = np.vectorize(dist)
distance = distvec(prev[:, 1], cur[:, 1])
print(distance)
current = np.array([[14040, 42],
[3943,71],
[6345,11],
[0,0],
[0,0]])
previous = np.array([[14039, 32],
[3942,61],
[6344,1],
[0,0],
[0,0]])
update(previous,current)
PS: Please, could you tell us what is the final array you look for based on my examples?

As I understand it (correct me if Im wrong):
You have a current bubble size (integer) and a current y value (integer)
You have a 2D array (prev_array) that contains bubble sizes and y coords
You want to check whether your current bubble size is within 5% (either way) of each stored bubble size in prev_array
If they are within range, subtract your current y value from the stored y coord
This will result in a new array, containing only bubble sizes that are within range, and the newly subtracted y value
You want to do this without an explicit loop
You can do that using boolean indexing in numpy...
Setup the previous array:
prev_array = np.array([[14040, 42], [3943, 71], [6345, 11], [3945,0], [0,0]])
prev_array
array([[14040, 42],
[ 3943, 71],
[ 6345, 11],
[ 3945, 0],
[ 0, 0]])
You have your stored bubble size you want to use for comparison, and a current y coord value:
bubble_size = 3750
cur_y = 10
Next we can create a boolean mask where we only select rows of prev_array that meets the 5% criteria:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
# ind is a boolean array that looks like this: [False, True, False, True, False]
Then we use ind to index prev_array, and calculate the new (subtracted) y coords:
new_array = prev_array[ind]
new_array[:,1] = cur_y - new_array[:,1]
Giving your final output array:
array([[3943, -61],
[3945, 10]])
As its not clear what you want your output to actually look like, instead of creating a new array, you can also just update prev_array with the new y values:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
prev_array[ind,1] = cur_y - prev_array[ind,1]
Which gives:
array([[14040, 42],
[ 3943, -61],
[ 6345, 11],
[ 3945, 10],
[ 0, 0]])

Related

Recursive python function to make two arrays equal?

I'm attempting to write python code to solve a transportation problem using the Least Cost method. I have a 2D numpy array that I am iterating through to find the minimum, perform calculations with that minimum, and then replace it with a 0 so that the loops stops when values matches constantarray, an array of the same shape containing only 0s. The values array contains distances from points in supply to points in demand. I'm currently using a while loop to do so, but the loop isn't running because values.all() != constantarray.all() evaluates to False.
I also need the process to repeat once the arrays have been edited to move onto the next lowest number in values.
constarray = np.zeros((len(supply),len(demand)) #create array of 0s
sandmoved = np.zeros((len(supply),len(demand)) #used to store information needed for later
totalcost = 0
while values.all() != constantarray.all(): #iterate until `values` only contains 0s
m = np.argmin(values,axis = 0)[0] #find coordinates of minimum value
n = np.argmin(values,axis = 1)[0]
if supply[m] > abs(demand[m]): #all demand numbers are negative
supply[m]+=demand[n] #subtract demand from supply
totalcost +=abs(demand[n])*values[m,n]
sandmoved[m,n] = demand[n] #add amount of 'sand' moved to an empty array
values[m,0:-1] = 0 #replace entire m row with 0s since demand has been filled
demand[n]=0 #replace demand value with 0
elif supply[m]< abs(demand[n]):
demand[n]+=supply[m] #combine positive supply with negative demand
sandmoved[m,n]=supply[m]
totalcost +=supply[m]*values[m,n]
values[:-1,n]=0 #replace entire column with 0s since supply has been depleted
supply[m] = 0
There is an additional if statement for when supply[m]==demand[n] but I feel that isn't necessary. I've already tried using nested for loops, and so many different syntax combinations for a while loop but I just can't get it to work the way I want it to. Even when running the code block over over by itself, m and n stay the same and the function removes one value from values but doesn't add it to sandmoved. Any ideas are greatly appreciated!!
Well, here is an example from an old implementation of mine:
import numpy as np
values = np.array([[3, 1, 7, 4],
[2, 6, 5, 9],
[8, 3, 3, 2]])
demand = np.array([250, 350, 400, 200])
supply = np.array([300, 400, 500])
totCost = 0
MAX_VAL = 2 * np.max(values) # choose MAX_VAL higher than all values
while np.any(values.ravel() < MAX_VAL):
# find row and col indices of min
m, n = np.unravel_index(np.argmin(values), values.shape)
if supply[m] < demand[n]:
totCost += supply[m] * values[m,n]
demand[n] -= supply[m]
values[m,:] = MAX_VAL # set all row to MAX_VAL
else:
totCost += demand[n] * values[m,n]
supply[m] -= demand[n]
values[:,n] = MAX_VAL # set all col to MAX_VAL
Solution:
print(totCost)
# 2850
Basically, start by choosing a MAX_VAL higher than all given values and a totCost = 0. Then follow the standard steps of the algorithm. Find row and column indices of the smallest cell, say m, n. Select the m-th supply or the n-th demand whichever is smaller, then add what you selected multiplied by values[m,n] to the totCost, and set all entries of the selected row or column to MAX_VAL to avoid it in the next iterations. Update the greater value by subtracting the selected one and repeat until all values are equal to MAX_VAL.

Removing the 2d point that are close to each others

I would like to remove the coordinates that lie close to each other or if it is just a duplicate.
For example,
x = [[9, 169], [5, 164],[340,210],[1020,102],[210,312],[12,150]]
In the above list, the first and second element lies close to each other. How do I remove the second element while preserving the first one?
Following is what I have tried,
def process(input_list, thresh=(10, 10)):
buffer = input_list.copy()
n = 0
prev_cx, prev_cy = 0, 0
for i in range(len(input_list)):
elem = input_list[i]
cx, cy = elem
if n == 0:
prev_cx, prev_cy = cx, cy
else:
ab_cx, ab_cy = abs(prev_cx - cx), abs(prev_cy - cy)
if ab_cx <= thresh[0] and ab_cy <= thresh[1]:
del buffer[i]
n += 1
return buffer
x = [[9, 169], [5, 164], [340, 210], [1020, 102], [210, 312], [12, 150]]
processed = process(x)
print(processed)
The problem is that it doesn't recursively check if there are any other duplicates since it only checks the adjacent coordinates. What is an efficient way of filtering the coordinate?
Sample Input with thresh = (10,10):
x = [[12,24], [5, 12],[100,1020], [20,30], [121,214], [15,12]]
Sample output:
x = [[12,24],[100,1020], [121,214]]
Your question is a bit vague, but I'm taking it to mean:
You want to compare all combinations of points
If a combination contains points closer than a threshold
Then remove the point further from the start of the input list
Try this:
import itertools
def process(input_list, threshold=(10,10)):
combos = itertools.combinations(input_list, 2)
points_to_remove = [point2
for point1, point2 in combos
if abs(point1[0]-point2[0])<=threshold[0] and abs(point1[1]-point2[1])<=threshold[1]]
points_to_keep = [point for point in input_list if point not in points_to_remove]
return points_to_keep
coords = [[12,24], [5, 12],[100,1020], [20,30], [121,214], [15,12]]
print(process(coords))
>>> [[12, 24], [5, 12], [100, 1020], [121, 214]]
The way this works is to generate all combinations of points using itertools (which leaves the points in the original order), and then create a list of points to remove using the threshold. Then it returns a list of points not in that list of points to remove.
You'll notice that I have an extra point than you. I simply copied what seemed to be your intended functionality (i.e. both dy AND dx <= thresh for point removal). However, if I change the line with the AND statement to remove point if dy OR dx <= thresh, I get the same output as your sample.
So I'm going to ask you to recheck your sample output.
BTW, it might be useful for you to confirm if checking for x and y proximity separately is what you really want. So as a bonus, I've included a version using the Euclidean distance as well:
import itertools
import math
def process(input_list, threshold=100):
combos = itertools.combinations(input_list, 2)
points_to_remove = [point2 for point1, point2 in combos if math.dist(point1, point2)<=threshold]
points_to_keep = [point for point in input_list if point not in points_to_remove]
return points_to_keep
coords = [[12,24], [5, 12],[100,1020], [20,30], [121,214], [15,12]]
print(process(coords))
>>> [[12, 24], [100, 1020], [121, 214]]
This version fits your original sample when I used a threshold radius of 100.
I'd split this up a bit different. It's also tricky of course because of the way you have to modify the list.
def remove_close_neighbors(input_list, thresh, position):
target_item = input_list[position]
return [item for i, item in enumerate(input_list) if i == position or not is_close(target_item, item, thresh)]
This will remove all the "duplicate" (or close) points, other than the item under consideration.
(Then define is_close to check the threshold condition)
And then we can go over our items:
def process(input_list, thresh):
pos = 0
while pos < len(input_list):
input_list = remove_close_neighbors(input_list, thresh, pos)
pos += 1
This is by no means the most efficient way to achieve this. Depends on how scalable this needs to be for you. If we're talking "a bajillion points", you will need to look into clever data structures and algorithms. I think a tree structure could be good then, to group points "by sector", because then you don't have to compare each point to each other point all the time.

Finding singulars/sets of local maxima/minima in a 1D-NumPy array (once again)

I would like to have a function that can detect where the local maxima/minima are in an array (even if there is a set of local maxima/minima). Example:
Given the array
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
I would like to have an output like:
set of 2 local minima => array[0]:array[1]
set of 3 local minima => array[3]:array[5]
local minima, i = 9
set of 2 local minima => array[11]:array[12]
set of 2 local minima => array[15]:array[16]
As you can see from the example, not only are the singular values detected but, also, sets of local maxima/minima.
I know in this question there are a lot of good answers and ideas, but none of them do the job described: some of them simply ignore the extreme points of the array and all ignore the sets of local minima/maxima.
Before asking the question, I wrote a function by myself that does exactly what I described above (the function is at the end of this question: local_min(a). With the test I did, it works properly).
Question: However, I am also sure that is NOT the best way to work with Python. Are there builtin functions, APIs, libraries, etc. that I can use? Any other function suggestion? A one-line instruction? A full vectored solution?
def local_min(a):
candidate_min=0
for i in range(len(a)):
# Controlling the first left element
if i==0 and len(a)>=1:
# If the first element is a singular local minima
if a[0]<a[1]:
print("local minima, i = 0")
# If the element is a candidate to be part of a set of local minima
elif a[0]==a[1]:
candidate_min=1
# Controlling the last right element
if i == (len(a)-1) and len(a)>=1:
if candidate_min > 0:
if a[len(a)-1]==a[len(a)-2]:
print("set of " + str(candidate_min+1)+ " local minima => array["+str(i-candidate_min)+"]:array["+str(i)+"]")
if a[len(a)-1]<a[len(a)-2]:
print("local minima, i = " + str(len(a)-1))
# Controlling the other values in the middle of the array
if i>0 and i<len(a)-1 and len(a)>2:
# If a singular local minima
if (a[i]<a[i-1] and a[i]<a[i+1]):
print("local minima, i = " + str(i))
# print(str(a[i-1])+" > " + str(a[i]) + " < "+str(a[i+1])) #debug
# If it was found a set of candidate local minima
if candidate_min >0:
# The candidate set IS a set of local minima
if a[i] < a[i+1]:
print("set of " + str(candidate_min+1)+ " local minima => array["+str(i-candidate_min)+"]:array["+str(i)+"]")
candidate_min = 0
# The candidate set IS NOT a set of local minima
elif a[i] > a[i+1]:
candidate_min = 0
# The set of local minima is growing
elif a[i] == a[i+1]:
candidate_min = candidate_min + 1
# It never should arrive in the last else
else:
print("Something strange happen")
return -1
# If there is a set of candidate local minima (first value found)
if (a[i]<a[i-1] and a[i]==a[i+1]):
candidate_min = candidate_min + 1
Note: I tried to enrich the code with some comments to let understand what I do. I know that the function that I propose is
not clean and just prints the results that can be stored and returned
at the end. It was written to give an example. The algorithm I propose should be O(n).
UPDATE:
Somebody was suggesting to import from scipy.signal import argrelextrema and use the function like:
def local_min_scipy(a):
minima = argrelextrema(a, np.less_equal)[0]
return minima
def local_max_scipy(a):
minima = argrelextrema(a, np.greater_equal)[0]
return minima
To have something like that is what I am really looking for. However, it doesn't work properly when the sets of local minima/maxima have more than two values. For example:
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
print(local_max_scipy(test03))
The output is:
[ 0 2 4 8 10 13 14 16]
Of course in test03[4] I have a minimum and not a maximum. How do I fix this behavior? (I don't know if this is another question or if this is the right place where to ask it.)
A full vectored solution:
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1]) # Size 17
extended = np.empty(len(test03)+2) # Rooms to manage edges, size 19
extended[1:-1] = test03
extended[0] = extended[-1] = np.inf
flag_left = extended[:-1] <= extended[1:] # Less than successor, size 18
flag_right = extended[1:] <= extended[:-1] # Less than predecessor, size 18
flagmini = flag_left[1:] & flag_right[:-1] # Local minimum, size 17
mini = np.where(flagmini)[0] # Indices of minimums
spl = np.where(np.diff(mini)>1)[0]+1 # Places to split
result = np.split(mini, spl)
result:
[0, 1] [3, 4, 5] [9] [11, 12] [15, 16]
EDIT
Unfortunately, This detects also maxima as soon as they are at least 3 items large, since they are seen as flat local minima. A numpy patch will be ugly this way.
To solve this problem I propose 2 other solutions, with numpy, then with numba.
Whith numpy using np.diff :
import numpy as np
test03=np.array([12,13,12,4,4,4,5,6,7,2,6,5,5,7,7,17,17])
extended=np.full(len(test03)+2,np.inf)
extended[1:-1]=test03
slope = np.sign(np.diff(extended)) # 1 if ascending,0 if flat, -1 if descending
not_flat,= slope.nonzero() # Indices where data is not flat.
local_min_inds, = np.where(np.diff(slope[not_flat])==2)
#local_min_inds contains indices in not_flat of beginning of local mins.
#Indices of End of local mins are shift by +1:
start = not_flat[local_min_inds]
stop = not_flat[local_min_inds+1]-1
print(*zip(start,stop))
#(0, 1) (3, 5) (9, 9) (11, 12) (15, 16)
A direct solution compatible with numba acceleration :
##numba.njit
def localmins(a):
begin= np.empty(a.size//2+1,np.int32)
end = np.empty(a.size//2+1,np.int32)
i=k=0
begin[k]=0
search_end=True
while i<a.size-1:
if a[i]>a[i+1]:
begin[k]=i+1
search_end=True
if search_end and a[i]<a[i+1]:
end[k]=i
k+=1
search_end=False
i+=1
if search_end and i>0 : # Final plate if exists
end[k]=i
k+=1
return begin[:k],end[:k]
print(*zip(*localmins(test03)))
#(0, 1) (3, 5) (9, 9) (11, 12) (15, 16)
I think another function from scipy.signal would be interesting.
from scipy.signal import find_peaks
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
find_peaks(test03)
Out[]: (array([ 2, 8, 10, 13], dtype=int64), {})
find_peaks has lots of options and might be quite useful, especially for noisy signals.
Update
The function is really powerful and versatile. You can set several parameters for peak minimal width, height, distance from each other and so on. As example:
test04 = np.array([1,1,5,5,5,5,5,5,5,5,1,1,1,1,1,5,5,5,1,5,1,5,1])
find_peaks(test04, width=1)
Out[]:
(array([ 5, 16, 19, 21], dtype=int64),
{'prominences': array([4., 4., 4., 4.]),
'left_bases': array([ 1, 14, 18, 20], dtype=int64),
'right_bases': array([10, 18, 20, 22], dtype=int64),
'widths': array([8., 3., 1., 1.]),
'width_heights': array([3., 3., 3., 3.]),
'left_ips': array([ 1.5, 14.5, 18.5, 20.5]),
'right_ips': array([ 9.5, 17.5, 19.5, 21.5])})
See documentation for more examples.
There can be multiple ways to solve this. One approach listed here.
You can create a custom function, and use the maximums to handle edge cases while finding mimima.
import numpy as np
a = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
def local_min(a):
temp_list = list(a)
maxval = max(a) #use max while finding minima
temp_list = temp_list + [maxval] #handles last value edge case.
prev = maxval #prev stores last value seen
loc = 0 #used to store starting index of minima
count = 0 #use to count repeated values
#match_start = False
matches = []
for i in range(0, len(temp_list)): #need to check all values including the padded value
if prev == temp_list[i]:
if count > 0: #only increment for minima candidates
count += 1
elif prev > temp_list[i]:
count = 1
loc = i
# match_start = True
else: #prev < temp_list[i]
if count > 0:
matches.append((loc, count))
count = 0
loc = i
prev = temp_list[i]
return matches
result = local_min(a)
for match in result:
print ("{} minima found starting at location {} and ending at location {}".format(
match[1],
match[0],
match[0] + match[1] -1))
Let me know if this does the trick for you. The idea is simple, you want to iterate through the list once and keep storing minima as you see them. Handle the edges by padding with maximum values on either end. (or by padding the last end, and using the max value for initial comparison)
Here's an answer based on restriding the array into an iterable of windows:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def windowstride(a, window):
return as_strided(a, shape=(a.size - window + 1, window), strides=2*a.strides)
def local_min(a, maxwindow=None, doends=True):
if doends: a = np.pad(a.astype(float), 1, 'constant', constant_values=np.inf)
if maxwindow is None: maxwindow = a.size - 1
mins = []
for i in range(3, maxwindow + 1):
for j,w in enumerate(windowstride(a, i)):
if (w[0] > w[1]) and (w[-2] < w[-1]):
if (w[1:-1]==w[1]).all():
mins.append((j, j + i - 2))
mins.sort()
return mins
Testing it out:
test03=np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
local_min(test03)
Output:
[(0, 2), (3, 6), (9, 10), (11, 13), (15, 17)]
Not the most efficient algorithm, but at least it's short. I'm pretty sure it's O(n^2), since there's roughly 1/2*(n^2 + n) windows to iterate over. This is only partially vectorized, so there may be a way to improve it.
Edit
To clarify, the output is the indices of the slices that contain the runs of local minimum values. The fact that they go one past the end of the run is intentional (someone just tried to "fix" that in an edit). You can use the output to iterate over the slices of minimum values in your input array like this:
for s in local_mins(test03):
print(test03[slice(*s)])
Output:
[2 2]
[4 4 4]
[2]
[5 5]
[1 1]
A pure numpy solution (revised answer):
import numpy as np
y = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
x = np.r_[y[0]+1, y, y[-1]+1] # pad edges, gives possibility for minima
ups, = np.where(x[:-1] < x[1:])
downs, = np.where(x[:-1] > x[1:])
minend = ups[np.unique(np.searchsorted(ups, downs))]
minbeg = downs[::-1][np.unique(np.searchsorted(-downs[::-1], -ups[::-1]))][::-1]
minlen = minend - minbeg
for line in zip(minlen, minbeg, minend-1): print("set of %d minima %d - %d" % line)
This gives
set of 2 minima 0 - 1
set of 3 minima 3 - 5
set of 1 minima 9 - 9
set of 2 minima 11 - 12
set of 2 minima 15 - 16
np.searchsorted(ups, downs) finds the first ups after every down. This is the "true" end of a minimum.
For the start of the minima, we do it similar, but now in reverse order.
It is working for the example, yet not fully tested. But I would say a good starting point.
You can use argrelmax, as long as there no multiple consecutive equal elements, so first you need to run length encode the array, then use argrelmax (or argrelmin):
import numpy as np
from scipy.signal import argrelmax
from itertools import groupby
def local_max_scipy(a):
start = 0
result = [[a[0] - 1, 0, 0]] # this is to guarantee the left edge is included
for k, g in groupby(a):
length = sum(1 for _ in g)
result.append([k, start, length])
start += length
result.append([a[-1] - 1, 0, 0]) # this is to guarantee the right edge is included
arr = np.array(result)
maxima, = argrelmax(arr[:, 0])
return arr[maxima]
test03 = np.array([2, 2, 10, 4, 4, 4, 5, 6, 7, 2, 6, 5, 5, 7, 7, 1, 1])
output = local_max_scipy(test03)
for val, start, length in output:
print(f'set of {length} maxima start:{start} end:{start + length}')
Output
set of 1 maxima start:2 end:3
set of 1 maxima start:8 end:9
set of 1 maxima start:10 end:11
set of 2 maxima start:13 end:15

Vectorize a numpy.argmin search with a variable range per matrix row

Is there a way to get rid of the loop in the code below and replace it with vectorized operation?
Given a data matrix, for each row I want to find the index of the minimal value that fits within ranges defined (per row) in a separate array.
Here's an example:
import numpy as np
np.random.seed(10)
# Values of interest, for this example a random 6 x 100 matrix
data = np.random.random((6,100))
# For each row, define an inclusive min/max range
ranges = np.array([[0.3, 0.4],
[0.35, 0.5],
[0.45, 0.6],
[0.52, 0.65],
[0.6, 0.8],
[0.75, 0.92]])
# For each row, find the index of the minimum value that fits inside the given range
result = np.zeros(6).astype(np.int)
for i in xrange(6):
ind = np.where((ranges[i][0] <= data[i]) & (data[i] <= ranges[i][1]))[0]
result[i] = ind[np.argmin(data[i,ind])]
print result
# Result: [35 8 22 8 34 78]
print data[np.arange(6),result]
# Result: [ 0.30070006 0.35065639 0.45784951 0.52885388 0.61393513 0.75449247]
Approach #1 : Using broadcasting and np.minimum.reduceat -
mask = (ranges[:,None,0] <= data) & (data <= ranges[:,None,1])
r,c = np.nonzero(mask)
cut_idx = np.unique(r, return_index=1)[1]
out = np.minimum.reduceat(data[mask], cut_idx)
Improvement to avoid np.nonzero and compute cut_idx directly from mask :
cut_idx = np.concatenate(( [0], np.count_nonzero(mask[:-1],1).cumsum() ))
Approach #2 : Using broadcasting and filling invalid places with NaNs and then using np.nanargmin -
mask = (ranges[:,None,0] <= data) & (data <= ranges[:,None,1])
result = np.nanargmin(np.where(mask, data, np.nan), axis=1)
out = data[np.arange(6),result]
Approach #3 : If you are not iterating enough (just like you have a loop of 6 iterations in the sample), you might want to stick to a loop for memory efficiency, but make use of more efficient masking with a boolean array instead -
out = np.zeros(6)
for i in xrange(6):
mask_i = (ranges[i,0] <= data[i]) & (data[i] <= ranges[i,1])
out[i] = np.min(data[i,mask_i])
Approach #4 : There is one more loopy solution possible here. The idea would be to sort each row of data. Then, use the two range limits for each row to decide on the start and stop indices with help from np.searchsorted. Further, we would use those indices to slice and then get the minimum values. Benefit with slicing that way is, we would be working with views and as such would be very efficient, both on memory and performance.
The implementation would look something like this -
out = np.zeros(6)
sdata = np.sort(data, axis=1)
for i in xrange(6):
start = np.searchsorted(sdata[i], ranges[i,0])
stop = np.searchsorted(sdata[i], ranges[i,1], 'right')
out[i] = np.min(sdata[i,start:stop])
Furthermore, we could get those start, stop indices in a vectorized manner following an implementation of vectorized searchsorted.
Based on suggestion by #Daniel F for the case when we are dealing with ranges that are within the limits of given data, we could simply use the start indices -
out[i] = sdata[i, start]
Assuming at least one value in range, you don't even have to bother with the upper limit:
result = np.empty(6)
for i in xrange(6):
lt = (ranges[i,0] >= data[i]).sum()
result[i] = np.argpartition(data[i], lt)[lt]
Actually, you could even vectorize the whole thing using argpartition
lt = (ranges[:,None,0] >= data).sum(1)
result = np.argpartition(data, lt)[np.arange(data.shape[0]), lt]
Of course, this is only efficient if data.shape[0] << data.shape[1], as otherwise you're basically sorting

Averaging out sections of a multiple row array in Python

I've got a 2-row array called C like this:
from numpy import *
A = [1,2,3,4,5]
B = [50,40,30,20,10]
C = vstack((A,B))
I want to take all the columns in C where the value in the first row falls between i and i+2, and average them. I can do this with just A no problem:
i = 0
A_avg = []
while(i<6):
selection = A[logical_and(A >= i, A < i+2)]
A_avg.append(mean(selection))
i += 2
then A_avg is:
[1.0,2.5,4.5]
I want to carry out the same process with my two-row array C, but I want to take the average of each row separately, while doing it in a way that's dictated by the first row. For example, for C, I want to end up with a 2 x 3 array that looks like:
[[1.0,2.5,4.5],
[50,35,15]]
Where the first row is A averaged in blocks between i and i+2 as before, and the second row is B averaged in the same blocks as A, regardless of the values it has. So the first entry is unchanged, the next two get averaged together, and the next two get averaged together, for each row separately. Anyone know of a clever way to do this? Many thanks!
I hope this is not too clever. TIL boolean indexing does not broadcast, so I had to manually do the broadcasting. Let me know if anything is unclear.
import numpy as np
A = [1,2,3,4,5]
B = [50,40,30,20,10]
C = np.vstack((A,B)) # float so that I can use np.nan
i = np.arange(0, 6, 2)[:, None]
selections = np.logical_and(A >= i, A < i+2)[None]
D, selections = np.broadcast_arrays(C[:, None], selections)
D = D.astype(float) # allows use of nan, and makes a copy to prevent repeated behavior
D[~selections] = np.nan # exclude these elements from mean
D = np.nanmean(D, axis=-1)
Then,
>>> D
array([[ 1. , 2.5, 4.5],
[ 50. , 35. , 15. ]])
Another way, using np.histogram to bin your data. This may be faster for large arrays, but is only useful for few rows, since a hist must be done with different weights for each row:
bins = np.arange(0, 7, 2) # include the end
n = np.histogram(A, bins)[0] # number of columns in each bin
a_mean = np.histogram(A, bins, weights=A)[0]/n
b_mean = np.histogram(A, bins, weights=B)[0]/n
D = np.vstack([a_mean, b_mean])

Categories