Here is a toy series for illustrative purposes.
test = pd.Series([True, False, 2.2, 6.6, 0, True])
I have a Pandas series that contains True, False, and a bunch of different numeric values. I want to replace all numerics with False so that the entire column is Boolean. How do I accomplish this?
I want it to end up like:
0 True
1 False
2 False
3 False
4 False
5 True
Thanks!
The simpliest solution is compare by True:
test = test == True
print (test)
0 True
1 False
2 False
3 False
4 False
5 True
dtype: bool
For compare floats and integers:
test = test.apply(lambda x: False if type(x) in (float, int) else x)
print (test)
0 True
1 False
2 False
3 False
4 False
5 True
dtype: bool
Solution with isinstance:
def testing(x):
if isinstance(x, bool):
return x
elif isinstance(x, (float, int)):
return False
else:
return x
test = test.apply(testing)
print (test)
0 True
1 False
2 False
3 False
4 False
5 True
dtype: bool
Try this:
>>> test[test!= True] = False
>>> test
0 True
1 False
2 False
3 False
4 False
5 True
dtype: object
This worked for floats. I can repeat for ints. I'm sure there's a better way though.
df.col_1.apply(lambda x: False if type(x)==float else x)
Related
Say I have a dataframe. (Original dataframe has 91 columns 1000 rows)
0 1 2 3
0 False False False True
1 True False False False
2 True False False False
3 False False True False
4 False True True False
5 False False False False
6 True True True True
I need to get the AND/OR values for all the columns in my dataframe. So the resultant OR, AND values would be.
OR AND
0 True False
1 True False
2 True False
3 True False
4 True False
5 False False
6 True True
I can do this by looping over all my columns and calculate the boolean for each column but I was looking for a more dataframe level approach without actually going through the columns.
You can use any and all.
df = df.assign(OR=df.any(axis=1), AND=df.all(axis=1))
You can sum along the columns and then the OR is indicated by sum > 0, and AND is indicated by sum == len(df.columns):
total = df.sum(axis=1)
res = pd.DataFrame({"OR": total > 0, "AND": total == len(df.columns)})
If you have many columns this is more efficient as it only iterates over the entire matrix once (in the worst case, depending on the input distribution and implementation of any/all iterating twice can be faster).
I have a dataset, which has two columns:
index Value
0 True
1 True
2 False
3 True
Is it possible to obtain a matrix that looks like
index 0 1 2 3
0 True True False True
1 True True False True
2 False False False False
3 True True False True
I tried pd.crosstab, still not able to get the matrix, can anyone please help?
A possible way:
m = np.tile(df['Value'], len(df)).reshape(-1, len(df)) * df[['Value']].values
out = pd.DataFrame(m)
print(out)
# Output
0 1 2 3
0 True True False True
1 True True False True
2 False False False False
3 True True False True
First, convert the values of Value columns to a numpy array using to_numpy. Then take advantage of numpy broadcasting by creating an extra axis with [:,None] and computing the bitwise and operation:
vals = df['Value'].to_numpy()
res = pd.DataFrame(vals[:,None] & vals, index=df.index)
Output:
>>> res
0 1 2 3
index
0 True True False True
1 True True False True
2 False False False False
3 True True False True
i have df like this
a b c
0 True False True
1 False False False
2 True True True
i want this
a b c Result
0 True False True True
1 False False False False
2 True True True True
if any one Value True then Result True ele false
You can use any():
df['result'] = df.any(1)
# or with pd.assign
df = df.assign(result = df.any(1))
both will print:
a b c result
0 True False True True
1 False False False False
2 True True True True
Note that 1 is short for axis=1, i.e. perform operation row-wise
It's quite easy...
if a or b or c:
#do stuff
or you could also use
if a | b | c:
#do stuff
Use any with (axis=1) to check the existance of any True in each row.
df['result'] = df.any(axis=1)
If values are string rather than boolean then:
df['result'] = df.eq('True').any(axis=1)
I'm trying to change the first instance of True to False in my DataFrame dependent on row:
A B C
Number
1 True True True
2 False True True
3 False False True
A B C
Number
1 False True True
2 False False True
3 False False False
Every time I try using the for index, row in target_df.iterrows(): line it ends up never finding any 'True' when I look through the row.
Thanks in advance!
You can use the cumulative sum of the Boolean values (False corresponds to 0; True to 1) for each row, along with DataFrame.mask():
>>> condition = df.cumsum(axis=1) == 1
>>> df.mask(condition, False)
a b c
0 False True True
1 False False True
2 False False False
df.mask(self, cond, other=nan)
Return an object of same shape as self and whose corresponding entries
are from self where cond is False and otherwise are from other.
In this case, condition is False everywhere except the points at which you want to switch True -> False:
>>> condition
a b c
0 True False False
1 False True False
2 False False True
One other option would be to use NumPy:
>>> row, col = np.where(df.cumsum(axis=1) == 1)
>>> df.values[row, col] = False
>>> df
a b c
0 False True True
1 False False True
2 False False False
I have been applying some binary boolean operators about my code base and came across a bug that really surprised me. I've reconstructed a minimal working example to demonstrate the behavior below...
import pandas
s = pandas.Series( [True]*4 )
d = pandas.DataFrame( { 'a':[True, False, True, False] , 'b':[True]*4 } )
print(d)
a b
0 True True
1 False True
2 True True
3 False True
print( s[0:2] )
0 True
1 True
dtype: bool
print( d.loc[ d['a'] , 'b' ] )
0 True
2 True
dtype: bool
print( s[0:2] & d.loc[ d['a'] , 'b' ] )
0 True
1 False
2 False
This last statement's value catches me entirely by surprise in its yielding of 3 elements. Realizing the influence of indices here I manually reset the index to yield the result I expected.
s[0:2].reset_index(drop=True) & d.loc[ d['a'] , 'b' ].reset_index( drop=True )
0 True
1 True
Needless to say I'll need to revisit the documentation and get a grip to understand how the indexing rules apply here. Can any one explain step by step how this operator behaves with mixed indexes?
=============================================
Just to add comparison for those coming from a similar R background, R's data.frame equivalent operation yields what I'd expect...
> a = c(TRUE,FALSE,TRUE,FALSE)
> b = c(TRUE,TRUE,TRUE,TRUE)
>
> d = data.frame( a, b )
> d
a b
1 TRUE TRUE
2 FALSE TRUE
3 TRUE TRUE
4 FALSE TRUE
> s = c( TRUE,TRUE,TRUE,TRUE)
> s
[1] TRUE TRUE TRUE TRUE
>
> d[ d$a , 'b']
[1] TRUE TRUE
>
> s[0:2]
[1] TRUE TRUE
> s[0:2] & d[ d$a , 'b']
[1] TRUE TRUE
You are comparing two series with different indices
s[0:2]
0 True
1 True
dtype: bool
and
d.loc[ d['a'] , 'b']
0 True
2 True
dtype: bool
pandas needs to align the indices then compares.
s[0:2] & d.loc[ d['a'] , 'b']
0 True # True from both indices therefore True
1 False # Only True from s[0:2] and missing from other therefore False
2 False # Only True from d and missing from other therefore False
dtype: bool