Find the closest value above and under in an array? - python

I have an array which is consisting of [An elementnumber, x-coordinate, y-coordinate, z-coordinate, radius(polare coordinates), θ(polare coordinates)]
In this array i need to find the two closest values to a specified number, the one above and the one below.
This needs to be found in the last columns of the array which holds the θ value.
The range of the values goes from 0 - 1.5707 radian(0 - 90 degrees) and in our case we want to be able to choose the amount of specified numbers we want
number=9
Anglestep = math.pi/2 / number
Anglerange = np.arange(0,math.pi/2+anglestep,anglestep) #math.pi/2+anglestep so that we get math.pi/2 in the array
For an example i need to find the two values above and under the specified value: "0.17"
[...['4549', '4.2158604', '49.4799309', '0.0833661', 49.65920902290997, 0.0849981532744405],
['4535', '4.2867651', '49.4913025', '0.0813997', 49.67660795755971, 0.08640089283783374],
['4537', '5.6042995', '49.4534569', '0.0811241', 49.7699967073121, 0.11284330708918186],
['4538', '6.2840257', '49.4676971', '0.0809942', 49.86523874780516, 0.12635612935285648],
['4539', '6.9654546', '49.4909363', '0.0814121', 49.97869879894153, 0.13982362821749783],
['4540', '7.6476088', '49.5210190', '0.0813955', 50.10805567128103, 0.1532211602749019],
['4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243],
['4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624],
['4529', '9.3985014', '49.6320610', '0.0812080', 50.51409018950577, 0.18714756393388338],
['4531', '10.3884563', '49.7157669', '0.0812043', 50.78954127329902, 0.2059930152826599]..]
So what i want as output would in this case be the two values: (0.16651831290560243, 0.17975113416156624)

In [30]: np.max(arr[arr < .17])
Out[30]: 0.16651831290560243
In [31]: np.min(arr[arr > .17])
Out[31]: 0.17975113416156624

#NPE's answer is correct for a 1d array, but you must first access the Angle column of your array. This depends on the dtype (data type) of your array (your array seems to include both strings and floats, which is not allowed for a numpy array). There are two ways that it might be solved, one by making it all floats, the other by using a structured dtype:
All floats
arr = np.array([
['4549', '4.2158604', '49.4799309', '0.0833661', 49.65920902290997, 0.0849981532744405 ],
['4535', '4.2867651', '49.4913025', '0.0813997', 49.67660795755971, 0.08640089283783374],
['4537', '5.6042995', '49.4534569', '0.0811241', 49.7699967073121 , 0.11284330708918186],
['4538', '6.2840257', '49.4676971', '0.0809942', 49.86523874780516, 0.12635612935285648],
['4539', '6.9654546', '49.4909363', '0.0814121', 49.97869879894153, 0.13982362821749783],
['4540', '7.6476088', '49.5210190', '0.0813955', 50.10805567128103, 0.1532211602749019 ],
['4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243],
['4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624],
['4529', '9.3985014', '49.6320610', '0.0812080', 50.51409018950577, 0.18714756393388338],
['4531', '10.3884563', '49.7157669', '0.0812043', 50.78954127329902, 0.2059930152826599 ]], dtype=float)
Then, to apply #Jaime's method, use
i = np.searchsorted(arr[:, -1], 0.17)
below = arr[i-1]
above = arr[i]
below
# array([ 4.54100000e+03, 8.32986550e+00, 4.95605049e+01, 8.12513000e-02, 5.02556495e+01, 1.66518313e-01])
above
# array([ 4.54200000e+03, 9.01412110e+00, 4.96065178e+01, 8.11457000e-02, 5.04188555e+01, 1.79751134e-01])
If you want just the angles, then just slice by column as well:
below_ang = arr[i-1, -1]
above_ang = arr[i, -1]
below_ang, above_ang
#(0.166518313, 0.179751134)
Note that this assumes that arr is sorted by angle.
Structured array:
arr = array([ ('4549', '4.2158604', '49.4799309', '0.0833661', 49.65920902290997, 0.0849981532744405 ),
('4535', '4.2867651', '49.4913025', '0.0813997', 49.67660795755971, 0.08640089283783374),
('4537', '5.6042995', '49.4534569', '0.0811241', 49.7699967073121 , 0.11284330708918186),
('4538', '6.2840257', '49.4676971', '0.0809942', 49.86523874780516, 0.12635612935285648),
('4539', '6.9654546', '49.4909363', '0.0814121', 49.97869879894153, 0.13982362821749783),
('4540', '7.6476088', '49.5210190', '0.0813955', 50.10805567128103, 0.1532211602749019 ),
('4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243),
('4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624),
('4529', '9.3985014', '49.6320610', '0.0812080', 50.51409018950577, 0.18714756393388338),
('4531', '10.3884563', '49.7157669', '0.0812043', 50.78954127329902, 0.2059930152826599)],
dtype=[('id', 'S4'), ('x', 'S10'), ('y', 'S10'), ('z', 'S9'), ('rad', '<f8'), ('ang', '<f8')])
i = np.searchsorted(arr['ang'], 0.17)
below = arr[i-1]
above = arr[i]
below
# ('4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243)
above
# ('4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624)
Doing it for several values
First, an easier way to set up your range is with linspace, which automatically includes the start and end, and is specified by length of array, not step. Instead of:
number=9
anglestep = math.pi/2 / number
anglerange = np.arange(0,math.pi/2+anglestep,anglestep) #math.pi/2+anglestep so that we get math.pi/2 in the array
Use
number = 9
anglerange = np.linspace(0, math.pi/2, number) # start, end, number
Now, searchsorted will actually find several points for you just as easily:
locs = np.searchsorted(arr['ang'], anglerange)
belows = arr['ang'][locs-1]
aboves = arr['ang'][locs]
For example, I'll set anglerange = [0.1, 0.17, 0.2] since the full range isn't in your sample data:
belows
# array([ 0.08640089, 0.16651831, 0.18714756])
aboves
# array([ 0.11284331, 0.17975113, 0.20599302])

Related

Create normally distributed columns, with different mean values

I have the following numpy matrix:
R = np.matrix(np.ones([3,3]))
# Update R matrix based on sales statistics
for i in range(0, len(R)):
for j in range(0, len(R)):
R[j,i] = scipy.stats.norm(2, 1).pdf(i) * 100
print(R)
[[ 5.39909665 24.19707245 39.89422804]
[ 5.39909665 24.19707245 39.89422804]
[ 5.39909665 24.19707245 39.89422804]]
I would like to convert each column, multiplying the index (0,1,2) to the corresponding density value of the normal distribution, with mean equals to, specifically, 5.39909665 for the first column, 24.19707245 the second and 39.8942280 the third; and standard deviation equals to 1.
Ultimately, creating a matrix as:
[norm(5.39, 1).pdf(0), norm(24.197, 1).pdf(0), ...]
[ norm(5.39, 1).pdf(1), norm(24.197, 1).pdf(1), ...]
[ norm(5.39, 1).pdf(2), norm(24.197, 1).pdf(2), ...]]
How can I create the final matrix?
The pdf method works much like any numpy function, in the sense you can input arrays with same shapes in combinations with scalars. You can create R with something like:
ix = np.repeat(np.arange(3),3).reshape((3,3)) #row index, or ix.T for column index
R = scipy.stats.norm(2,1).pdf(ix.T)*100
>>array([[ 5.39909665, 24.19707245, 39.89422804],
[ 5.39909665, 24.19707245, 39.89422804],
[ 5.39909665, 24.19707245, 39.89422804]])
Following the same logic, if you want your [i,j] index to be scipy.stats.norm(scipy.stats.norm(2,1).pdf(j) * 100, 1).pdf(i) (as from the matrix you put as result), use:
scipy.stats.norm(scipy.stats.norm(2,1).pdf(ix.T) * 100, 1).pdf(ix)

Value in an array between two numbers in python

So making a title that actually explains what i want is harder than i thought, so here goes me explaining it.
I have an array filled with zeros that adds values every time a condition is met, so after 1 time step iteration i get something like this (minus the headers):
current_array =
bubble_size y_coord
14040 42
3943 71
6345 11
0 0
0 0
....
After this time step is complete this current_array gets set as previous_array and is wiped with zeros because there is not a guaranteed number of entries each time.
NOW the real question is i want to be able to check all rows in the first column of the previous_array and see if the current bubble size is within say 5% either side and if so i want to take the current y position away for the value associated with the matching bubble size number in the previous_array's second column.
currently i have something like;
if bubble_size in current_array[:, 0]:
do_whatever
but i don't know how to pull out the associated y_coord without using a loop, which i am fine with doing (there is about 100 rows to the array and atleast 1000 time steps so i want to make it as efficient as possible) but would like to avoid
i have included my thoughts on the for loop (note the current and previous_array are actually current and previous_frame)
for y in range (0, array_size):
if bubble_size >> previous_frame[y,0] *.95 &&<< previous_frame[y, 0] *1.05:
distance_travelled = current_y_coord - previous_frame[y,0]
y = y + 1
Any help is greatly appreciated :)
I probably did not get your issue here but if you want to first check if the bubble size is in between the same row element 95 % you can use the following:
import numpy as np
def apply(p, c): # For each element check the bubblesize grow
if(p*0.95 < c < p*1.05):
return 1
else:
return 0
def dist(p, c): # Calculate the distance
return c-p
def update(prev, cur):
assert isinstance(
cur, np.ndarray), 'Current array is not a valid numpy array'
assert isinstance(
prev, np.ndarray), 'Previous array is not a valid numpy array'
assert prev.shape == cur.shape, 'Arrays size mismatch'
applyvec = np.vectorize(apply)
toapply = applyvec(prev[:, 0], cur[:, 0])
print(toapply)
distvec = np.vectorize(dist)
distance = distvec(prev[:, 1], cur[:, 1])
print(distance)
current = np.array([[14040, 42],
[3943,71],
[6345,11],
[0,0],
[0,0]])
previous = np.array([[14039, 32],
[3942,61],
[6344,1],
[0,0],
[0,0]])
update(previous,current)
PS: Please, could you tell us what is the final array you look for based on my examples?
As I understand it (correct me if Im wrong):
You have a current bubble size (integer) and a current y value (integer)
You have a 2D array (prev_array) that contains bubble sizes and y coords
You want to check whether your current bubble size is within 5% (either way) of each stored bubble size in prev_array
If they are within range, subtract your current y value from the stored y coord
This will result in a new array, containing only bubble sizes that are within range, and the newly subtracted y value
You want to do this without an explicit loop
You can do that using boolean indexing in numpy...
Setup the previous array:
prev_array = np.array([[14040, 42], [3943, 71], [6345, 11], [3945,0], [0,0]])
prev_array
array([[14040, 42],
[ 3943, 71],
[ 6345, 11],
[ 3945, 0],
[ 0, 0]])
You have your stored bubble size you want to use for comparison, and a current y coord value:
bubble_size = 3750
cur_y = 10
Next we can create a boolean mask where we only select rows of prev_array that meets the 5% criteria:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
# ind is a boolean array that looks like this: [False, True, False, True, False]
Then we use ind to index prev_array, and calculate the new (subtracted) y coords:
new_array = prev_array[ind]
new_array[:,1] = cur_y - new_array[:,1]
Giving your final output array:
array([[3943, -61],
[3945, 10]])
As its not clear what you want your output to actually look like, instead of creating a new array, you can also just update prev_array with the new y values:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
prev_array[ind,1] = cur_y - prev_array[ind,1]
Which gives:
array([[14040, 42],
[ 3943, -61],
[ 6345, 11],
[ 3945, 10],
[ 0, 0]])

Vectorize a numpy.argmin search with a variable range per matrix row

Is there a way to get rid of the loop in the code below and replace it with vectorized operation?
Given a data matrix, for each row I want to find the index of the minimal value that fits within ranges defined (per row) in a separate array.
Here's an example:
import numpy as np
np.random.seed(10)
# Values of interest, for this example a random 6 x 100 matrix
data = np.random.random((6,100))
# For each row, define an inclusive min/max range
ranges = np.array([[0.3, 0.4],
[0.35, 0.5],
[0.45, 0.6],
[0.52, 0.65],
[0.6, 0.8],
[0.75, 0.92]])
# For each row, find the index of the minimum value that fits inside the given range
result = np.zeros(6).astype(np.int)
for i in xrange(6):
ind = np.where((ranges[i][0] <= data[i]) & (data[i] <= ranges[i][1]))[0]
result[i] = ind[np.argmin(data[i,ind])]
print result
# Result: [35 8 22 8 34 78]
print data[np.arange(6),result]
# Result: [ 0.30070006 0.35065639 0.45784951 0.52885388 0.61393513 0.75449247]
Approach #1 : Using broadcasting and np.minimum.reduceat -
mask = (ranges[:,None,0] <= data) & (data <= ranges[:,None,1])
r,c = np.nonzero(mask)
cut_idx = np.unique(r, return_index=1)[1]
out = np.minimum.reduceat(data[mask], cut_idx)
Improvement to avoid np.nonzero and compute cut_idx directly from mask :
cut_idx = np.concatenate(( [0], np.count_nonzero(mask[:-1],1).cumsum() ))
Approach #2 : Using broadcasting and filling invalid places with NaNs and then using np.nanargmin -
mask = (ranges[:,None,0] <= data) & (data <= ranges[:,None,1])
result = np.nanargmin(np.where(mask, data, np.nan), axis=1)
out = data[np.arange(6),result]
Approach #3 : If you are not iterating enough (just like you have a loop of 6 iterations in the sample), you might want to stick to a loop for memory efficiency, but make use of more efficient masking with a boolean array instead -
out = np.zeros(6)
for i in xrange(6):
mask_i = (ranges[i,0] <= data[i]) & (data[i] <= ranges[i,1])
out[i] = np.min(data[i,mask_i])
Approach #4 : There is one more loopy solution possible here. The idea would be to sort each row of data. Then, use the two range limits for each row to decide on the start and stop indices with help from np.searchsorted. Further, we would use those indices to slice and then get the minimum values. Benefit with slicing that way is, we would be working with views and as such would be very efficient, both on memory and performance.
The implementation would look something like this -
out = np.zeros(6)
sdata = np.sort(data, axis=1)
for i in xrange(6):
start = np.searchsorted(sdata[i], ranges[i,0])
stop = np.searchsorted(sdata[i], ranges[i,1], 'right')
out[i] = np.min(sdata[i,start:stop])
Furthermore, we could get those start, stop indices in a vectorized manner following an implementation of vectorized searchsorted.
Based on suggestion by #Daniel F for the case when we are dealing with ranges that are within the limits of given data, we could simply use the start indices -
out[i] = sdata[i, start]
Assuming at least one value in range, you don't even have to bother with the upper limit:
result = np.empty(6)
for i in xrange(6):
lt = (ranges[i,0] >= data[i]).sum()
result[i] = np.argpartition(data[i], lt)[lt]
Actually, you could even vectorize the whole thing using argpartition
lt = (ranges[:,None,0] >= data).sum(1)
result = np.argpartition(data, lt)[np.arange(data.shape[0]), lt]
Of course, this is only efficient if data.shape[0] << data.shape[1], as otherwise you're basically sorting

Averaging out sections of a multiple row array in Python

I've got a 2-row array called C like this:
from numpy import *
A = [1,2,3,4,5]
B = [50,40,30,20,10]
C = vstack((A,B))
I want to take all the columns in C where the value in the first row falls between i and i+2, and average them. I can do this with just A no problem:
i = 0
A_avg = []
while(i<6):
selection = A[logical_and(A >= i, A < i+2)]
A_avg.append(mean(selection))
i += 2
then A_avg is:
[1.0,2.5,4.5]
I want to carry out the same process with my two-row array C, but I want to take the average of each row separately, while doing it in a way that's dictated by the first row. For example, for C, I want to end up with a 2 x 3 array that looks like:
[[1.0,2.5,4.5],
[50,35,15]]
Where the first row is A averaged in blocks between i and i+2 as before, and the second row is B averaged in the same blocks as A, regardless of the values it has. So the first entry is unchanged, the next two get averaged together, and the next two get averaged together, for each row separately. Anyone know of a clever way to do this? Many thanks!
I hope this is not too clever. TIL boolean indexing does not broadcast, so I had to manually do the broadcasting. Let me know if anything is unclear.
import numpy as np
A = [1,2,3,4,5]
B = [50,40,30,20,10]
C = np.vstack((A,B)) # float so that I can use np.nan
i = np.arange(0, 6, 2)[:, None]
selections = np.logical_and(A >= i, A < i+2)[None]
D, selections = np.broadcast_arrays(C[:, None], selections)
D = D.astype(float) # allows use of nan, and makes a copy to prevent repeated behavior
D[~selections] = np.nan # exclude these elements from mean
D = np.nanmean(D, axis=-1)
Then,
>>> D
array([[ 1. , 2.5, 4.5],
[ 50. , 35. , 15. ]])
Another way, using np.histogram to bin your data. This may be faster for large arrays, but is only useful for few rows, since a hist must be done with different weights for each row:
bins = np.arange(0, 7, 2) # include the end
n = np.histogram(A, bins)[0] # number of columns in each bin
a_mean = np.histogram(A, bins, weights=A)[0]/n
b_mean = np.histogram(A, bins, weights=B)[0]/n
D = np.vstack([a_mean, b_mean])

Get indices of matrix from upper triangle

I have a symmetric matrix represented as a numpy array, like the following example:
[[ 1. 0.01735908 0.01628629 0.0183845 0.01678901 0.00990739 0.03326491 0.0167446 ]
[ 0.01735908 1. 0.0213712 0.02364181 0.02603567 0.01807505 0.0130358 0.0107082 ]
[ 0.01628629 0.0213712 1. 0.01293289 0.02041379 0.01791615 0.00991932 0.01632739]
[ 0.0183845 0.02364181 0.01293289 1. 0.02429031 0.01190878 0.02007371 0.01399866]
[ 0.01678901 0.02603567 0.02041379 0.02429031 1. 0.01496896 0.00924174 0.00698689]
[ 0.00990739 0.01807505 0.01791615 0.01190878 0.01496896 1. 0.0110924 0.01514519]
[ 0.03326491 0.0130358 0.00991932 0.02007371 0.00924174 0.0110924 1. 0.00808803]
[ 0.0167446 0.0107082 0.01632739 0.01399866 0.00698689 0.01514519 0.00808803 1. ]]
And I need to find the indices (row and column) of the greatest value without considering the diagonal. Since is a symmetric matrix I just took the the upper triangle of the matrix.
ind = np.triu_indices(M_size, 1)
And then the index of the max value
max_ind = np.argmax(H[ind])
However max_ind is the index of the vector resulting after taking the upper triangle with triu_indices, how do I know which are the row and column of the value I've just found?
The matrix could be any size but it's always symmetric. Do you know a better method to achieve the same?
Thank you
Couldn't you do this by using np.triu to return a copy of your matrix with all but the upper triangle zeroed, then just use np.argmax and np.unravel_index to get the row/column indices?
Example:
x = np.zeros((10,10))
x[3, 8] = 1
upper = np.triu(x, 1)
idx = np.argmax(upper)
row, col = np.unravel_index(idx, upper.shape)
The drawback of this method is that it creates a copy of the input matrix, but it should still be a lot quicker than looping over elements in Python. It also assumes that the maximum value in the upper triangle is > 0.
You can use the value of max_ind as an index into the ind data
max_ind = np.argmax(H[ind])
Out: 23
ind[0][max_ind], ind[1][max_ind],
Out: (4, 6)
Validate this by looking for the maximum in the entire matrix (won't always work -- data-dependent):
np.unravel_index(np.argmax(H), H.shape)
Out: (4, 6)
There's probably a neater "numpy way" to do this, but this is what comest to mind first:
answer = None
biggest = 0
for r,row in enumerate(matrix):
i,elem = max(enumerate(row[r+1:]), key=operator.itemgetter(1))
if elem > biggest:
biggest, answre = elem, i

Categories