I have two DataFrames where each column contain True/False statements. I am looking for a way to test all possible combinations and find out where "True" for each row in df1 also is "True" in the corresponding row in df2.
In reference to the data below, the logic would be something like this:
For each row, starting in column "Main1", test if row is equal to True and if row in column "Sub1" also is True. Next, test if row in "Main1" is equal to true and if rows in column "Sub1" is True and column "sub2" also is True. In this case, if all values are True, the output would be True. Then repeat for all columns and all possible combinations.
df1:
Main1 Main2 Main3
0 True False True
1 False False False
2 False True True
3 False False True
4 False True True
5 True True True
6 True False False
df2:
Sub1 Sub2 Sub3
0 False False True
1 False True False
2 True False True
3 False False False
4 True True False
5 False False False
6 True True True
The output would be similar to something like this.
Of course, I could do this manually but it would be timely as well as there would be rooms for errors.
Main1Sub1 Main1Sub1Sub2 ... Main3Sub2Sub3 Main3Sub3
0 False False ... False True
1 False False ... False False
2 False False ... False True
3 False False ... False False
4 False False ... False False
5 False False ... False False
6 True True ... False False
[7 rows x 18 columns]
Any help on how to tackle this problem is appreciated!
You can use the combinations() function in itertools to extract all the possible combinations of the columns of the 2 data frames, and then use the product() function in pandas to identify the rows where all the columns in the considered combination are equal to True. I included an example below, which considers all combinations of either 2 or 3 columns.
import pandas as pd
from itertools import combinations
df1 = pd.DataFrame({"Main1": [True, False, False, False, False, True, True],
"Main2": [False, False, True, False, True, True, False],
"Main3": [True, False, True, True, True, True, False]})
df2 = pd.DataFrame({"Sub1": [False, False, True, False, True, False, True],
"Sub2": [False, True, False, False, True, False, True],
"Sub3": [True, False, True, False, False, False, True]})
df3 = df1.join(df2)
all_combinations = list(combinations(df3.columns, 2)) + \
list(combinations(df3.columns, 3))
for combination in all_combinations:
df3["".join(list(combination))] = df3[list(combination)].product(axis=1).astype(bool)
df3.drop(labels=["Main1", "Main2", "Main3", "Sub1", "Sub2", "Sub3"], axis=1, inplace=True)
df3
Main1Main2 Main1Main3 ... Main3Sub2Sub3 Sub1Sub2Sub3
0 False True ... False False
1 False False ... False False
2 False False ... False False
3 False False ... False False
4 False False ... False False
5 True True ... False False
6 False False ... False True
Related
I have two columns like below.
Column A
Column B
True
False
True
True
False
True
False
False
I want to get
Column A
Column B
Column C
Column D
True
False
True
False
True
True
False
False
False
True
False
True
False
False
False
False
I was trying to use the XOR operator, but couldn't figure out how to make it only return true if the specific column was true.
XOR is the wrong function. You want A AND NOT B and NOT A AND B.
df = pd.DataFrame({
'A': [True, True, False, False],
'B': [False, True, True, False]})
df['C'] = df['A'] & ~df['B']
df['D'] = ~df['A'] & df['B']
df
A B C D
0 True False True False
1 True True False False
2 False True False True
3 False False False False
If it helps, these operations are called non-implication and converse non-implication, and Wikipedia has a table here: Template:Logical connectives
I have a dataframe where two columns represent the start and end points of intervals on a real number line. I want to generate a third column as a list of the indices of rows which said row has any overlap with. I'm having difficulty creating a inequality boolean matrix for this natively in pandas. I assume logic like this s1<=e2 and e1>=s2 will do the trick, but I don't know how to effectively broadcast it.
As a toy example I'm hoping for a simple way to at least generate a 5x5 boolean matrix (with all True down the diagonal) given this dataframe:
import pandas as pd
intervals_df = pd.DataFrame({"Starts":[0,1,5,10,15,20],"Ends":[4,2,9,14,19,24]})
Starts Ends
0 0 4
1 1 2
2 5 9
3 10 14
4 15 19
5 20 24
The condition for the two intervals (s1,e1) and (s2,e2) to intersect is max(s1,s2) <= min(e1,e2). So you can do a cross merge (this is the broadcast), calculate the condition, the pivot:
d = (intervals_df.reset_index()
.merge(intervals_df.reset_index(), how='cross')
.assign(cond=lambda x: x.filter(like='Starts').max(axis=1) <= x.filter(like='Ends').min(axis=1))
.pivot('index_x', 'index_y', 'cond')
)
You would get:
index_y 0 1 2 3 4 5
index_x
0 True True False False False False
1 True True False False False False
2 False False True False False False
3 False False False True False False
4 False False False False True False
5 False False False False False True
Or you can make do with numpy's broadcasting:
starts = intervals_df[['Starts']].to_numpy()
ends = intervals_df[['Ends']].to_numpy()
np.maximum(starts, starts.T) <= np.minimum(ends, ends.T)
Output:
array([[ True, True, False, False, False, False],
[ True, True, False, False, False, False],
[False, False, True, False, False, False],
[False, False, False, True, False, False],
[False, False, False, False, True, False],
[False, False, False, False, False, True]])
In [4]: df = pd.DataFrame({'a': [True, False, True], 'b': [False, False, False],
...: 'c': [False, False, False], 'd': [False, True, False],
...: 'e': [False, False, False]})
In [5]: df
Out[5]:
a b c d e
0 True False False False False
1 False False False True False
2 True False False False False
In [6]: df[df.any()[df.any()].index]
Out[6]:
a d
0 True False
1 False True
2 True False
The code under [6] does what I want. My question, however, is: is there a better solution? That is, more concise and/or more elegant.
One direct method is using df.loc with the mask generated by df.any() as input:
df.loc[:, df.any()]
a d
0 True False
1 False True
2 True False
Another option is to index df.columns,
df[df.columns[df.any()]]
Or, df.keys():
df[df.keys()[df.any()]]
a d
0 True False
1 False True
2 True False
how can i imitate following excel formula in python DataFrame?
=IF(AND(A1=TRUE,B1=TRUE),TRUE,FALSE)
A B C
TRUE FALSE FALSE
TRUE TRUE TRUE
FALSE FALSE FALSE
FALSE TRUE FALSE
i tried this,
def check(sig1,sig2):
if sig1 == True and sig2 == True:
return True
else:
return False
df['chk'] = df.apply(check,df['up_signal1',df['up_signal2']],axis=1)
You can do this:
# DataFrame that checks all possible combinations
df = pd.DataFrame({
'up_signal1': [False, False, True, True],
'up_signal2': [False, True, False, True]
})
df['chk'] = df.up_signal1 & df.up_signal2
df
up_signal1 up_signal2 chk
0 False False False
1 False True False
2 True False False
3 True True True
I have 2 columns(FN1 and FN2) and based on these i have to create one more column(Final)
FN1 FN2 Final
False False 1
True True 1
False False 1
True False 2
True True 2
False False 1
True True 1
True True 1
If FN1 is False, Final will be 1.
If FN2 is True i will be the previous value of Final.
But if FN2 is False i need to update it with the previous value of
Final +1 (i.e. increment by 1)
. I tried doing it using shift() but again that does not help in this scenario.
FN1 FN2 Final
False False 1
True True 1
False False 1
True False 2
True True 2
False False 1
True True 1
True True 1
.
Use np.select:
df1 = df.shift()
cond1 = df['FN1'] == False
cond2 = (df['FN1']==True) & (df['FN2'] ==True)
cond3 = (df['FN1']==True) & (df['FN2'] == False)
df['Final'] = np.select([cond1,cond2,cond3], [1, df1['Final'], df1['Final']+1])
print(df)
using a lambda:
df = pd.DataFrame({'FN1': [False, True, False, True, True, False, True, True],
'FN2': [False, True, False, False, True, False, True, True]
})
def f(fn1, fn2):
global previousfinal
previousfinal = 1 if not fn1 else previousfinal + 1 if not fn2 else previousfinal
return previousfinal
previousfinal = 1
df['Final'] = df[['FN1', 'FN2']].apply(lambda x: f(*x), axis=1)
print(df)