Select certain rows (condition met), but only some columns in Python/Numpy - python

I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:
I = A[A[:,1] == i]
which works. Then I further tried (similarly to matlab which I know very well):
I = A[A[:,1] == i, [0,2,3]]
which doesn't work. How to do it?
EXAMPLE DATA:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
# I want to get the columns 1, 3 and 4
# for every row which has the value i in the second column.
# In this case, this would be row 1 and 3 with columns 1, 3 and 4:
[[1 3 4]
[3 5 6]]
I am now currently using this:
I = A[A[:,1] == i]
I = I[:, [0,2,3]]
But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)

>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3] # select rows where first column is greater than 3
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns
array([[ 5, 6, 8],
[ 9, 10, 12]])
# fancier equivalent of the previous
>>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
For an explanation of the obscure np.ix_(), see https://stackoverflow.com/a/13599843/4323
Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:
>>> a[np.ix_(a[:,0] > 3, (0,1,3))]
array([[ 5, 6, 8],
[ 9, 10, 12]])

If you do not want to use boolean positions but the indexes, you can write it this way:
A[:, [0, 2, 3]][A[:, 1] == i]
Going back to your example:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
>>> print A[:, [0, 2, 3]][A[:, 1] == i]
[[1 3 4]
[3 5 6]]
Seriously,

>>> a=np.array([[1,2,3], [1,3,4], [2,2,5]])
>>> a[a[:,0]==1][:,[0,1]]
array([[1, 2],
[1, 3]])
>>>

This also works.
I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i])
print I
Edit: Since indexing starts from 0, so
i-1
should be used.

I am hoping this answers your question but a piece of script I have implemented using pandas is:
df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]]
For example,
targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']]
this will return a dataframe with only columns ['symbol','date','rtns'] from stockdf where the row value of rtns satisfies, stockdf['rtns'] > .04
hope this helps

Related

How to get the indexes of the greatest N values greater than a threshold in Numpy?

For a project I need to be able to get, from a vector with shape (k, m), the indexes of the N greatest values of each row greater than a fixed threshold.
For example, if k=3, m=5, N=3 and the threshold is 5 and the vector is :
[[3 2 6 7 0],
[4 1 6 4 0],
[7 10 6 9 8]]
I should get the result (or the flattened version, I don't care) :
[[2, 3],
[2],
[1, 3, 4]]
The indexes don't have to be sorted.
My code is currently :
indexes = []
for row, inds in enumerate(np.argsort(results, axis=1)[:, -N:]):
for index in inds:
if results[row, index] > threshold:
indexes.append(index)
but I feel like I am not using Numpy to its full capacity.
Does anybody know a better and more elegant solution ?
How about this method:
import numpy as np
arr = np.array(
[[3, 2, 6, 7, 0],
[4, 1, 6, 4, 0],
[7, 10, 6, 9, 8]]
)
t = 5
n = 3
sorted_idxs = arr.argsort(1)[:, -n:]
sorted_arr = np.sort(arr, 1)[:, -n:]
item_nums = np.cumsum((sorted_arr > t).sum(1))
masked_idxs = sorted_idxs[sorted_arr > t]
idx_lists = np.split(masked_idxs, item_nums)
output:
[array([2, 3]), array([2]), array([4, 3, 1])]

How to remove a column from a numpy array if the column has a specified element? (Python)

For example, I have a 2D Array with dimensions 3 x 3.
[1 2 7
4 5 6
7 8 9]
And I want to remove all columns which contain 7 - so first and third, outputting a 3 x 1 matrix of:
[2
5
8]
How do I go about doing this in python? I want to apply it to a large matrix of n x n dimensions.
Thank You!
#Creating array
x = np.array([[1, 2, 7],[4,5, 6],[7,8,9]])
x
Out[]:
array([[1, 2, 7],
[4, 5, 6],
[7, 8, 9]])
#Deletion
a = np.delete(x,np.where(x ==7),axis=1)
a
Out[]:
array([[2],
[5],
[8]])
numpy can help you do this!
import numpy as np
a = np.array([1, 2, 7, 4, 5, 6, 7, 8, 9]).reshape((3, 3))
b = np.array([col for col in a.T if 7 not in col]).T
print(b)
If you don't actually want to delete parts of the original matrix, you can just use boolean indexing:
a
Out[]:
array([[1, 2, 7],
[4, 5, 6],
[7, 8, 9]])
a[:, ~np.any(a == 7, axis = 1)]
Out[]:
array([[2],
[5],
[8]])
you can use argwhere for column index and then delete.
import numpy
a = numpy.array([[5, 2, 4],[1, 7, 3],[1, 2, 7]])
index = numpy.argwhere(a==7)
y = numpy.delete(a, index, axis=1)
print(y)
A = np.array([[1,2,7],[4,5,6],[7,8,9]])
for i in range(0,3):
... B=A[:,i]
... if(7 in B):
... A=np.delete(A,i,1)

How to append a "label" to a numpy array

I have a numpy array created such as:
x = np.array([[1,2,3,4],[5,6,7,8]])
y = np.asarray([x])
which prints out
x=[[1 2 3 4]
[5 6 7 8]]
y=[[[1 2 3 4]
[5 6 7 8]]]
What I would like is an array such as
[0 [[1 2 3 4]
[5 6 7 8]]]
What's the easiest way to go about this?
Thanks!
To do what you're asking, just use the phrase
labeledArray = [0, x]
This way, you will get a standard list with 0 as the first element and a Numpy array as the second element.
However, in practice, you are probably trying to label for the purpose of later recall. In that case, I'd recommend you use a dictionary, as it is less confusing to keep track of:
myArrays = {}
myArrays[0] = x
Which can be used as follows:
>>> myArrays
{0: array([[1, 2, 3, 4],
[5, 6, 7, 8]])}
>>> myArrays[0]
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

print rows that have x value in the last column

I´m quite new using numpy, and I have this problem:
Having this array:
x = np.array([[ 1, 2, 0],[ 4, 5, 0],[ 7, 8, 1],[10, 11, 1]])
>[[ 1 2 0]
[ 4 5 0]
[ 7 8 1]
[10 11 1]]
How could I print the rows with 1 in the last column?
I would like to get something like this:
>[[ 7 8 1]
[10 11 1]]
Get a slice of the array on last column and find which of those equal 1. Based on the resulting boolean array filter your main array:
>>> x[:,-1]
array([0, 0, 1, 1])
>>> x[:,-1]==1
array([False, False, True, True], dtype=bool)
>>> x[x[:,-1]==1]
array([[ 7, 8, 1],
[10, 11, 1]])
Please try this:
y = [ a for a in x if a[-1] == 1 ]
print y
Cheers,
Alex

Numpy: calculate edges of a matrix

I have the following to calculate the difference of a matrix, i.e. the i-th element - the (i-1) element.
How can I (easily) calculate the difference for each element horizontally and vertically? With a transpose?
inputarr = np.arange(12)
inputarr.shape = (3,4)
inputarr+=1
#shift one position
newarr = list()
for x in inputarr:
newarr.append(np.hstack((np.array([0]),x[:-1])))
z = np.array(newarr)
print inputarr
print 'first differences'
print inputarr-z
Output
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
first differences
[[1 1 1 1]
[5 1 1 1]
[9 1 1 1]]
Check out numpy.diff.
From the documentation:
Calculate the n-th order discrete difference along given axis.
The first order difference is given by out[n] = a[n+1] - a[n] along
the given axis, higher order differences are calculated by using diff
recursively.
An example:
>>> import numpy as np
>>> a = np.arange(12).reshape((3,4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.diff(a,axis = 1) # row-wise
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>>> np.diff(a, axis = 0) # column-wise
array([[4, 4, 4, 4],
[4, 4, 4, 4]])

Categories