Sort on multiple NumPy arrays - python

I am creating a 2 dimensional numpy array that contains stock returns. I want to sum the return every 2 days, and if the sum is in the top two, I will set every element in a similar shaped array to True.
For example, returns below is the daily returns for four different stocks.
returns=np.array([
[0, 0, 4, 8],
[7, 5, 4, 1],
[10, 5, 7, 6],
[7, 5, 4, 2]])
For the first two days, columns 2 and 3 (using 0 based indexing) have the highest sums. For the second set of two days, columns 0 and 2 have the highest sums. The output array I want is
bools=np.array([
[False, False, True, True],
[False, False, True, True],
[True, False, True, False],
[True, False, True, False]])
What are good ways to accomplish this?
If there are ties with the sums of two days, I want to use another similarly shaped numpy array as tiebreakers.
For example, if
returns=np.array([
[0, 9, 4, 8],
[7, 5, 4, 0],
[10, 5, 7, 6],
[7, 5, 4, 2]])
For the first set of two days, columns 2 and 3 are tied for the second highest sum. I want to decide the tiebreaker by the greatest value in the last row for the tied columns so that the tie break between columns 2 and 3 look at tiebreaks[1][2] vs tiebreaks[1][3] (4 v 5), and that the ultimate output is bools2.
tiebreaks=np.array([
[0, 0, 1, 1],
[2, 3, 4, 5],
[0, 5, 7, 6],
[-7, 5, -4, 2]])
bools2=np.array([
[False, True, False, True],
[False, True, False, True],
[True, False, True, False],
[True, False, True, False]])
Thanks for your help.

You can use numpy.lexsort() to get the indices that sort your arrays using prices as primary key and names as secondary key. Applying advanced indexing using these indices yields the sorted arrays:
col_indices = numpy.lexsort((names, prices))
row_indices = numpy.arange(len(names))[:, None]
print(prices[row_indices, col_indices])
print(names[row_indices, col_indices])
(Note that in your example, names and prices don't have compatible shapes.)

Related

Identifying all consecutive positive triplets in a 1D numpy array

Consider the 1D array arr shown below, and assume n = 3.
I want to identify all 'islands' holding >= n consecutive positive values.
The following code succesfully finds the FIRST set of 3 consecutive positive numbers by determining the initial index, but it does not find all such sets.
import numpy as np
arr = np.array([1, -1, 5, 6, 3, -4, 2, 5, 9, 2, 1, -6, 8])
def find_consec_pos(arr, n):
mask = np.convolve(np.greater(arr,0), np.ones(n, dtype=int)) >= n
if mask.any():
return mask.argmax() - n + 1
else:
return None
find_consec_pos(arr, 3)
This gives output 2, the index of the 1st triplet of consecutive positive values.
I want to know how to modify the code to get the output [2, 6, 7, 8], identifying all consecutive positive triples.
This code does the job and is simple while being relatively efficient:
positive = arr > 0
np.where(positive[:-2] & positive[1:-1] & positive[2:])
You could use sliding_window_view:
In [1]: from numpy.lib.stride_tricks import sliding_window_view
In [2]: sliding_window_view(arr, 3) > 0
Out[2]:
array([[ True, False, True],
[False, True, True],
[ True, True, True],
[ True, True, False],
[ True, False, True],
[False, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, False],
[ True, False, True]])
Turning this into your desired function (and assuming you want a list as output):
def find_consec_pos(arr, n):
all_n_positive = np.all(sliding_window_view(arr > 0, n), axis=1)
return np.argwhere(all_n_positive).flatten().tolist()
Demo of some different "window" sizes:
In [4]: arr
Out[4]: array([ 1, -1, 5, 6, 3, -4, 2, 5, 9, 2, 1, -6, 8])
In [5]: find_consec_pos(arr, 3)
Out[5]: [2, 6, 7, 8]
In [6]: find_consec_pos(arr, 4)
Out[6]: [6, 7]
In [7]: find_consec_pos(arr, 2)
Out[7]: [2, 3, 6, 7, 8, 9]

How would I calculate how many selectable values would be left in a numpy array given a finite budget constraint and cost of each value

Say I had a simple numpy array representing the cost of each item
x = np.array([1,2,3,4,5])
and a finite budget of 5.
I want to return another numpy array that tells me how many selectable items would be left over for each item selected.
So for example in this case the array returned would be:
y = np.array([3,2,2,1,0])
Because if I selected the first item of cost 1, I can still take the 2nd, 3rd, of 4th items while staying less than or equal to the budget constraint. If I select the second item of cost 2, I can still take the 1st or 3rd items. If I select the third item of cost 3, I can still take the 1st or 2nd items. If I select the 4th item, I can only still take the 1st item. And of course if I select the item that costs 5, I can't select any other items given a finite budget constraint of 5.
How would you go about this in an efficient way without doing an explicit for loop over each item?
Thanks so much for your help.
Numpy broadcasting
m = (x[:, None] + x) <= 5
y = m.sum(1) - m.diagonal()
Explanations
Calculate the outer sum on array x using numpy broadcasting
>>> x[:, None] + x
array([[ 2, 3, 4, 5, 6],
[ 3, 4, 5, 6, 7],
[ 4, 5, 6, 7, 8],
[ 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10]])
Compare the outer sum with budget constraint to create a boolean mask
>>> m = (x[:, None] + x) <= 5
>>> m
array([[ True, True, True, True, False],
[ True, True, True, False, False],
[ True, True, False, False, False],
[ True, False, False, False, False],
[False, False, False, False, False]])
Now take sum along axis=1 on this boolean mask to get the total number of selectable items including the current selected item
>>> m.sum(1)
array([4, 3, 2, 1, 0])
Subtract the diagonal values from the above calculated sum to exclude the current selected item.
>>> m.diagonal().view('i1')
array([1, 1, 0, 0, 0])
>>> y = m.sum(1) - m.diagonal()
>>> y
array([3, 2, 2, 1, 0])

Different starting indices for slices in NumPy

I'm wondering if it's possible without iterating with a for loop to do something like this:
a = np.array([[1, 2, 5, 3, 4],
[4, 5, 6, 7, 8]])
cleaver = np.argmax(a == 5, axis=1) # np.array([2, 1])
foo(a, cleaver)
>>> np.array([False, False, True, True, True],
[False, True, True, True, True])
Is there a way to accomplish this through slicing or some other non-iterative function? The arrays I'm using are quite large and iterating over them row by row is prohibitively expensive.
You can use some broadcasting magic -
cleaver[:,None] <= np.arange(a.shape[1])
Sample run -
In [60]: a
Out[60]:
array([[1, 2, 5, 3, 4],
[4, 5, 6, 7, 8]])
In [61]: cleaver
Out[61]: array([2, 1])
In [62]: cleaver[:,None] <= np.arange(a.shape[1])
Out[62]:
array([[False, False, True, True, True],
[False, True, True, True, True]], dtype=bool)

Using numpy.where() to iterate through a matrix

There's something about numpy.where() I do not understand:
Let's say I have a 2D numpy ndarray:
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
Now, would like to create a function which "checks" this numpy array for a variety of conditions.
array([[ 1, 2, 3, 4],
[ 1, 6, 7, 8],
[ 1, 1, 1, 12],
[17, 3, 15, 16],
[17, 3, 18, 18]])
For example, which entries in this array have (A) even numbers (B) greater than 7 (C) divisible by 3?
I would like to use numpy.where() for this, and iterate through each entry of this array, finally finding the elements which match all conditions (if such an entry exists):
even_entries = np.where(twodim % 2 == 0)
greater_seven = np.where(twodim > 7 )
divisible_three = np.where(twodim % 3 == 0)
How does one do this? I am not sure how to iterate through Booleans...
I could access the indices of the matrix (i,j) via
np.argwhere(even_entries)
We could do something like
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
even_entries = np.where(twodim % 2 == 0)
greater_seven = np.where(twodim > 7 )
divisible_three = np.where(twodim % 3 == 0)
for row in even_entries:
for item in row:
if item: #equivalent to `if item == True`
for row in greater_seven:
for item in row:
if item: #equivalent to `if item == True`
for row in divisible_three:
for item in row:
if item: #equivalent to `if item == True`
# something like print(np.argwhere())
Any advice?
EDIT1: Great ideas below. As #hpaulj mentions "Your tests produce a boolean matrix of the same shape as twodim"
This is a problem I'm running into as I toy around---not all conditionals produce matrices the same shape as my starting matrix. For instance, let's say I'm comparing whether the array element has a matching array to the left or right (i.e. horizontally)
twodim[:, :-1] == twodim[:, 1:]
That results in a (5,3) Boolean array, whereas our original matrix is a (5,4) array
array([[False, False, False],
[False, False, False],
[ True, True, False],
[False, False, False],
[False, False, True]], dtype=bool)
If we do the same vertically, that results in a (4,4) Boolean array, whereas the original matrix is (5,4)
twodim[:-1] == twodim[1:]
array([[ True, False, False, False],
[ True, False, False, False],
[False, False, False, False],
[ True, True, False, False]], dtype=bool)
If we wished to know which entries have both vertical and horizontal pairs, it is non-trivial to figure out which dimension we are in.
Your tests produce a boolean matrix of the same shape as twodim:
In [487]: mask3 = twodim%3==0
In [488]: mask3
Out[488]:
array([[False, False, True, False],
[False, True, False, False],
[False, False, False, True],
[False, True, True, False],
[False, True, True, True]], dtype=bool)
As other answers noted you can combine tests logically - with and and or.
np.where is the same as np.nonzero (in this use), and just returns the coordinates of the True values - as a tuple of 2 arrays.
In [489]: np.nonzero(mask3)
Out[489]:
(array([0, 1, 2, 3, 3, 4, 4, 4], dtype=int32),
array([2, 1, 3, 1, 2, 1, 2, 3], dtype=int32))
argwhere returns the same values, but as a transposed 2d array.
In [490]: np.argwhere(mask3)
Out[490]:
array([[0, 2],
[1, 1],
[2, 3],
[3, 1],
[3, 2],
[4, 1],
[4, 2],
[4, 3]], dtype=int32)
Both the mask and tuple can be used to index your array directly:
In [494]: twodim[mask3]
Out[494]: array([ 3, 6, 12, 3, 15, 3, 18, 18])
In [495]: twodim[np.nonzero(mask3)]
Out[495]: array([ 3, 6, 12, 3, 15, 3, 18, 18])
The argwhere can't be used directly for indexing, but may be more suitable for iteration, especially if you want the indexes as well as the values:
In [496]: for i,j in np.argwhere(mask3):
.....: print(i,j,twodim[i,j])
.....:
0 2 3
1 1 6
2 3 12
3 1 3
3 2 15
4 1 3
4 2 18
4 3 18
The same thing with where requires a zip:
for i,j in zip(*np.nonzero(mask3)): print(i,j,twodim[i,j])
BUT in general in numpy we try to avoid iteration. If you can use twodim[mask] directly your code will be much faster.
Logical combinations of the boolean masks are easier to produce than combinations of the where indices. To use the indices I'd probably resort to set operations (union, intersect, difference).
As for a reduced size test, you have to decide how that maps on to the original array (and other tests). e.g.
A (5,3) mask (difference between columns):
In [505]: dmask=np.diff(twodim, 1).astype(bool)
In [506]: dmask
Out[506]:
array([[ True, True, True],
[ True, True, True],
[False, False, True],
[ True, True, True],
[ True, True, False]], dtype=bool)
It can index 3 columns of the original array
In [507]: twodim[:,:-1][dmask]
Out[507]: array([ 1, 2, 3, 1, 6, 7, 1, 17, 3, 15, 17, 3])
In [508]: twodim[:,1:][dmask]
Out[508]: array([ 2, 3, 4, 6, 7, 8, 12, 3, 15, 16, 3, 18])
It can also be combined with 3 columns of another mask:
In [509]: dmask & mask3[:,:-1]
Out[509]:
array([[False, False, True],
[False, True, False],
[False, False, False],
[False, True, True],
[False, True, False]], dtype=bool)
It is still easier to combine tests in the boolean array form than with where indices.
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
condition = (twodim % 2. == 0.) & (twodim > 7.) & (twodim % 3. ==0.)
location = np.argwhere(condition == True)
for i in location:
print i, twodim[i[0],i[1]],
>>> [2 3] 12 [4 2] 18 [4 3] 18
If you want to find where all three conditions are satisfied:
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
mask = (twodim % 2 == 0) & (twodim > 7) & (twodim % 3 =0)
print(twodim[mask])
[12 18 18]
Not sure what you want in the end whether all elements in the row must satisfy the condition and to find those rows or if you want individual elements.

Numpy arrays - Multidimensional logic comparison

I am trying find the entries in a two-dimensional array that are above a certain threshold. The thresholds for the individual columns is given by a one-dimensional array. To exemplify,
[[1, 2, 3],
[4, 5, 6],
[2, 0, 4]]
is the two-dimensional array and I want to see if where in the columns values are bigger than
[2, 1, 3]
so the output of running the operation should be
[[False, True, False]
[True, True, True],
[False, False, True]]
Thanks!
Well, assuming there's an error in the example, I would simply do:
import numpy as np
A = np.array([[1, 2, 3],[4, 5, 6],[2, 0, 4]])
T = np.array([2, 1, 3])
X = A > T
Which gives
array([[False, True, False],
[ True, True, True],
[False, False, True]], dtype=bool)
I think there may be inconsistencies in your example (e.g. 2 > 1 is True, yet 2 > 4 is True) - can you clarify this?
Assuming you you want to know, for each row, which columns of the values in the first list are greater than the [2,1,3] list you gave, I suggest the following:
import numpy as np
tmp = [[1, 2, 3],
[4, 5, 6],
[2, 0, 4]]
output = [ np.less([2, 1, 3], tmp[i]) for i in range(len(tmp))]
Similarly, try greater or greater_equal or less_equal for the result you're after:
http://docs.scipy.org/doc/numpy/reference/routines.logic.html

Categories