Python equivalent of the R operator "%in%" - python

What is the python equivalent of this in operator? I am trying to filter down a pandas database by having rows only remain if a column in the row has a value found in my list.
I tried using any() and am having immense difficulty with this.

Pandas comparison with R docs are here.
s <- 0:4
s %in% c(2,4)
The isin method is similar to R %in% operator:
In [13]: s = pd.Series(np.arange(5),dtype=np.float32)
In [14]: s.isin([2, 4])
Out[14]:
0 False
1 False
2 True
3 False
4 True
dtype: bool

FWIW: without having to call pandas, here's the answer using a for loop and list compression in pure python
x = [2, 3, 5]
y = [1, 2, 3]
# for loop
for i in x: [].append(i in y)
Out: [True, True, False]
# list comprehension
[i in y for i in x]
Out: [True, True, False]

If you want to use only numpy without panads (like a use case I had) then you can:
import numpy as np
x = np.array([1, 2, 3, 10])
y = np.array([10, 11, 2])
np.isin(y, x)
This is equivalent to:
c(10, 11, 2) %in% c(1, 2, 3, 10)
Note that the last line will work only for numpy >= 1.13.0, for older versions you'll need to use np.in1d.

As others indicate, in operator of base Python works well.
myList = ["a00", "b000", "c0"]
"a00" in myList
# True
"a" in myList
# False

Related

How do I remove separate elements in a vector without using the range function?

I've created vector x and I need to create a vector z by removing the 3rd and 6th elements of x. I cannot just create a vector by simply typing in the elements that should be in z. I have to index them or use a separate function.
x = [5,2,0,6,-10,12]
np.array(x)
print x
z = np.delete(x,)
I am not sure if using np.delete is best or if there is a better approach. Help?
You can index and conact pieces of the list excluding the one you want to "delete"
x = [5,2,0,6,-10,12]
print ( x[0:2]+x[3:5] )
[5, 2, 6, -10]
if x is numpy array, first convert to list:
x = list(x)
if not array then:
z = [x.pop(2), x.pop(-1)]
This will remove 3rd and 6th element form x and place it in z. Then convert it to numpy array if needed.
In [69]: x = np.array([5,2,0,6,-10,12])
Using delete is straight forward:
In [70]: np.delete(x,[2,5])
Out[70]: array([ 5, 2, 6, -10])
delete is a general function that takes various approaches based on the delete object, but in a case like this it uses a boolean mask:
In [71]: mask = np.ones(x.shape, bool); mask[[2,5]] = False; mask
Out[71]: array([ True, True, False, True, True, False])
In [72]: x[mask]
Out[72]: array([ 5, 2, 6, -10])

What's the best way of comparing slices of a list in Python?

I attempted to compare slices of a list in Python but to no avail? Is there a better way to do this?
My Code (Attempt to make slice return True)
a = [1,2,3]
# Slice Assignment
a[0:1] = [0,0]
print(a)
# Slice Comparisons???
print(a[0:2])
print(a[0:2] == True)
print(a[0:2] == [True, True])
My Results
[0, 0, 2, 3]
[0, 0]
False
False
Since slicing returns lists and lists automatically compare element-wise, all you need to do is use ==:
>>> a = [1, 2, 3, 1, 2, 3]
>>> a[:3] == a[3:]
True
To compare to a fixed value, you need a little more effort:
>>> b = [1, 1, 1, 3]
>>> all(e == 1 for e in b[:3])
True
>>> all(e == 1 for e in b[2:])
False
Bonus: if you are doing lots of array calculations, you might benefit from using numpy arrays:
>>> import numpy as np
>>> c = np.array(b)
>>> c[:3] == 1 # this automatically gets applied to all elements
array([ True, True, True])
>>> (c[:3] == 1).all()
True
It is not quite clear what you're trying to do exactly,
As you printed, a[0:2] is [0,0], you're trying to compare the list to a boolean which are different types so they are different
In the second one, you are comparing [0,0] to [True, True], python compares the lists element by element, and 0 evaluvates to false, so [False, False] is clearly not == to [True, True]
Could you edit your question and add what you want the code to do? I would add this in a comment but I dont have enough rep yet :)

numpy.where for 2+ specific values

Can the numpy.where function be used for more than one specific value?
I can specify a specific value:
>>> x = numpy.arange(5)
>>> numpy.where(x == 2)[0][0]
2
But I would like to do something like the following. It gives an error of course.
>>> numpy.where(x in [3,4])[0][0]
[3,4]
Is there a way to do this without iterating through the list and combining the resulting arrays?
EDIT: I also have a lists of lists of unknown lengths and unknown values so I cannot easily form the parameters of np.where() to search for multiple items. It would be much easier to pass a list.
You can use the numpy.in1d function with numpy.where:
import numpy
numpy.where(numpy.in1d(x, [2,3]))
# (array([2, 3]),)
I guess np.in1d might help you, instead:
>>> x = np.arange(5)
>>> np.in1d(x, [3,4])
array([False, False, False, True, True], dtype=bool)
>>> np.argwhere(_)
array([[3],
[4]])
If you only need to check for a few values you can:
import numpy as np
x = np.arange(4)
ret_arr = np.where([x == 1, x == 2, x == 4, x == 0])[1]
print "Ret arr = ",ret_arr
Output:
Ret arr = [1 2 0]

Python: Elegant and efficient ways to mask a list

Example:
from __future__ import division
import numpy as np
n = 8
"""masking lists"""
lst = range(n)
print lst
# the mask (filter)
msk = [(el>3) and (el<=6) for el in lst]
print msk
# use of the mask
print [lst[i] for i in xrange(len(lst)) if msk[i]]
"""masking arrays"""
ary = np.arange(n)
print ary
# the mask (filter)
msk = (ary>3)&(ary<=6)
print msk
# use of the mask
print ary[msk] # very elegant
and the results are:
>>>
[0, 1, 2, 3, 4, 5, 6, 7]
[False, False, False, False, True, True, True, False]
[4, 5, 6]
[0 1 2 3 4 5 6 7]
[False False False False True True True False]
[4 5 6]
As you see the operation of masking on array is more elegant compared to list. If you try to use the array masking scheme on list you'll get an error:
>>> lst[msk]
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: only integer arrays with one element can be converted to an index
The question is to find an elegant masking for lists.
Updates:
The answer by jamylak was accepted for introducing compress however the points mentioned by Joel Cornett made the solution complete to a desired form of my interest.
>>> mlist = MaskableList
>>> mlist(lst)[msk]
>>> [4, 5, 6]
If you are using numpy:
>>> import numpy as np
>>> a = np.arange(8)
>>> mask = np.array([False, False, False, False, True, True, True, False], dtype=np.bool)
>>> a[mask]
array([4, 5, 6])
If you are not using numpy you are looking for itertools.compress
>>> from itertools import compress
>>> a = range(8)
>>> mask = [False, False, False, False, True, True, True, False]
>>> list(compress(a, mask))
[4, 5, 6]
If you are using Numpy, you can do it easily using Numpy array without installing any other library:
>> a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>> msk = [ True, False, False, True, True, True, True, False, False, False]
>> a = np.array(a) # convert list to numpy array
>> result = a[msk] # mask a
>> result.tolist()
[0, 3, 4, 5, 6]
Since jamylak already answered the question with a practical answer, here is my example of a list with builtin masking support (totally unnecessary, btw):
from itertools import compress
class MaskableList(list):
def __getitem__(self, index):
try: return super(MaskableList, self).__getitem__(index)
except TypeError: return MaskableList(compress(self, index))
Usage:
>>> myList = MaskableList(range(10))
>>> myList
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> mask = [0, 1, 1, 0]
>>> myList[mask]
[1, 2]
Note that compress stops when either the data or the mask runs out. If you wish to keep the portion of the list that extends past the length of the mask, you could try something like:
from itertools import izip_longest
[i[0] for i in izip_longest(myList, mask[:len(myList)], fillvalue=True) if i[1]]
i don't consider it elegant. It's compact, but tends to be confusing, as the construct is very different than most languages.
As Rossum has said about language design, we spend more time reading it than writing it. The more obscure the construction of a line of code, the more confusing it becomes to others, who may lack familiarity with Python, even though they have full competency in any number of other languages.
Readability trumps short form notations everyday in the real world of servicing code. Just like fixing your car. Big drawings with lots of information make troubleshooting a lot easier.
For me, I would much rather troubleshoot someone's code that uses the long form
print [lst[i] for i in xrange(len(lst)) if msk[i]]
than the numpy short notation mask. I don't need to have any special knowledge of a specific Python package to interpret it.
The following works perfectly well in Python 3:
np.array(lst)[msk]
If you need a list back as the result:
np.array(lst)[msk].tolist()
You could also just use list and zip
define a funcion
def masklist(mylist,mymask):
return [a for a,b in zip(mylist,mymask) if b]
use it!
n = 8
lst = range(n)
msk = [(el>3) and (el<=6) for el in lst]
lst_msk = masklist(lst,msk)
print(lst_msk)

How to invert numpy.where (np.where) function

I frequently use the numpy.where function to gather a tuple of indices of a matrix having some property. For example
import numpy as np
X = np.random.rand(3,3)
>>> X
array([[ 0.51035326, 0.41536004, 0.37821622],
[ 0.32285063, 0.29847402, 0.82969935],
[ 0.74340225, 0.51553363, 0.22528989]])
>>> ix = np.where(X > 0.5)
>>> ix
(array([0, 1, 2, 2]), array([0, 2, 0, 1]))
ix is now a tuple of ndarray objects that contain the row and column indices, whereas the sub-expression X>0.5 contains a single boolean matrix indicating which cells had the >0.5 property. Each representation has its own advantages.
What is the best way to take ix object and convert it back to the boolean form later when it is desired? For example
G = np.zeros(X.shape,dtype=np.bool)
>>> G[ix] = True
Is there a one-liner that accomplishes the same thing?
Something like this maybe?
mask = np.zeros(X.shape, dtype='bool')
mask[ix] = True
but if it's something simple like X > 0, you're probably better off doing mask = X > 0 unless mask is very sparse or you no longer have a reference to X.
mask = X > 0
imask = np.logical_not(mask)
For example
Edit: Sorry for being so concise before. Shouldn't be answering things on the phone :P
As I noted in the example, it's better to just invert the boolean mask. Much more efficient/easier than going back from the result of where.
The bottom of the np.where docstring suggests to use np.in1d for this.
>>> x = np.array([1, 3, 4, 1, 2, 7, 6])
>>> indices = np.where(x % 3 == 1)[0]
>>> indices
array([0, 2, 3, 5])
>>> np.in1d(np.arange(len(x)), indices)
array([ True, False, True, True, False, True, False], dtype=bool)
(While this is a nice one-liner, it is a lot slower than #Bi Rico's solution.)

Categories