Transform false in between trues - python

Hello I have a dataframe like the following one:
df = pd.DataFrame({"a": [True, True, False, True, True], "b": [True, True, False, False, True]})
df
I would like to be able to transform the False values in between Trues to obtain a result like this (depending on a threshold).
# Threshold = 1
df = pd.DataFrame({"a": [True, True, True, True, True], "b": [True, True, False, False, True]})
df
# Threshold = 2
df = pd.DataFrame({"a": [True, True, True, True, True], "b": [True, True, True, True, True]})
df
Any suggestions to do this apart from a for loop?
Edit: The threshold value defines how many consecutive Falses you will take into account to do the transformation.
Edit2: In the beggining and end of the array you should not consider any special case.

If possible simplify solution for replace Falses groups less like Threshold value first filter separate groups by DataFrame.cumsum with DataFrame.mask, counts by Series.map with Series.value_counts and last compare by DataFrame.le with pass to DataFrame.mask:
Threshold = 1
m = df.cumsum().mask(df).apply(lambda x: x.map(x.value_counts())).le(Threshold)
df = df.mask(m, True)
If need not replace start or ends groups by Falses:
df = pd.DataFrame({"a": [False, False, True, False, True, False],
"b": [True, True, False, False, True, True]})
print (df)
a b
0 False True
1 False True
2 True False
3 False False
4 True True
5 False True
Threshold = 1
df1 = df.cumsum().mask(df)
m1 = df1.apply(lambda x: x.map(x.value_counts())).le(Threshold)
m2 = df1.ne(df1.iloc[0]) & df1.ne(df1.iloc[-1])
df = df.mask(m1 & m2, True)
print (df)
a b
0 False True
1 False True
2 True False
3 True False
4 True True
5 False True

one way would be to use itertools groupby to generate counts of each adjacent items group, but sadly it does include a couple of loops:
from itertools import groupby
def how_many_identical_elements(itter):
return sum([[x]*x for x in [len(list(v)) for g,v in groupby(itter)]], [])
def fill_up_df(df, th):
df = df.copy()
for c in df.columns:
df[f'{c}_count'] = how_many_identical_elements(df[c].values)
df[c] = [False if x[0]==False and x[1]>th else True for x in zip(df[c], df[f'{c}_count'])]
return df[[c for c in df.columns if 'count' not in c]]
then
fill_up_df(df, 1)
a
b
0
True
True
1
True
True
2
True
False
3
True
False
4
True
True
fill_up_df(df, 2)
a
b
0
True
True
1
True
True
2
True
True
3
True
True
4
True
True

This code looks from -threshold -> threshold, on a column-by-column basis and or's the results together to create a masking dataframe that meets your criteria. The last line is just the logical or of your original data and the new mask as we only need to fill False values. It should be one of the faster solutions if speed is an issue.
threshold = 2
filling_mask = reduce(
lambda x, y: x | y,
(
df.shift(-i, fill_value=True) & df.shift(i, fill_value=True)
for i in range(1, threshold + 1)
)
)
df |= filling_mask
Threshold 1:
>>> df # Threshold 1
a b
0 True True
1 True True
2 True False
3 True False
4 True True
Threshold 2:
>>> df # Threshold 2
a b
0 True True
1 True True
2 True True
3 True True
4 True True

Related

Keeping only True if one of two columns is true

I have two columns like below.
Column A
Column B
True
False
True
True
False
True
False
False
I want to get
Column A
Column B
Column C
Column D
True
False
True
False
True
True
False
False
False
True
False
True
False
False
False
False
I was trying to use the XOR operator, but couldn't figure out how to make it only return true if the specific column was true.
XOR is the wrong function. You want A AND NOT B and NOT A AND B.
df = pd.DataFrame({
'A': [True, True, False, False],
'B': [False, True, True, False]})
df['C'] = df['A'] & ~df['B']
df['D'] = ~df['A'] & df['B']
df
A B C D
0 True False True False
1 True True False False
2 False True False True
3 False False False False
If it helps, these operations are called non-implication and converse non-implication, and Wikipedia has a table here: Template:Logical connectives

Best solution for selecting the columns that contain at least one True value in a pandas DataFrame

In [4]: df = pd.DataFrame({'a': [True, False, True], 'b': [False, False, False],
...: 'c': [False, False, False], 'd': [False, True, False],
...: 'e': [False, False, False]})
In [5]: df
Out[5]:
a b c d e
0 True False False False False
1 False False False True False
2 True False False False False
In [6]: df[df.any()[df.any()].index]
Out[6]:
a d
0 True False
1 False True
2 True False
The code under [6] does what I want. My question, however, is: is there a better solution? That is, more concise and/or more elegant.
One direct method is using df.loc with the mask generated by df.any() as input:
df.loc[:, df.any()]
a d
0 True False
1 False True
2 True False
Another option is to index df.columns,
df[df.columns[df.any()]]
Or, df.keys():
df[df.keys()[df.any()]]
a d
0 True False
1 False True
2 True False

How to check if column A row 1 and column B in row 1 has value true in pandas Datarame?

how can i imitate following excel formula in python DataFrame?
=IF(AND(A1=TRUE,B1=TRUE),TRUE,FALSE)
A B C
TRUE FALSE FALSE
TRUE TRUE TRUE
FALSE FALSE FALSE
FALSE TRUE FALSE
i tried this,
def check(sig1,sig2):
if sig1 == True and sig2 == True:
return True
else:
return False
df['chk'] = df.apply(check,df['up_signal1',df['up_signal2']],axis=1)
You can do this:
# DataFrame that checks all possible combinations
df = pd.DataFrame({
'up_signal1': [False, False, True, True],
'up_signal2': [False, True, False, True]
})
df['chk'] = df.up_signal1 & df.up_signal2
df
up_signal1 up_signal2 chk
0 False False False
1 False True False
2 True False False
3 True True True

How to test all possible combinations with True/False Statement in python?

I have two DataFrames where each column contain True/False statements. I am looking for a way to test all possible combinations and find out where "True" for each row in df1 also is "True" in the corresponding row in df2.
In reference to the data below, the logic would be something like this:
For each row, starting in column "Main1", test if row is equal to True and if row in column "Sub1" also is True. Next, test if row in "Main1" is equal to true and if rows in column "Sub1" is True and column "sub2" also is True. In this case, if all values are True, the output would be True. Then repeat for all columns and all possible combinations.
df1:
Main1 Main2 Main3
0 True False True
1 False False False
2 False True True
3 False False True
4 False True True
5 True True True
6 True False False
df2:
Sub1 Sub2 Sub3
0 False False True
1 False True False
2 True False True
3 False False False
4 True True False
5 False False False
6 True True True
The output would be similar to something like this.
Of course, I could do this manually but it would be timely as well as there would be rooms for errors.
Main1Sub1 Main1Sub1Sub2 ... Main3Sub2Sub3 Main3Sub3
0 False False ... False True
1 False False ... False False
2 False False ... False True
3 False False ... False False
4 False False ... False False
5 False False ... False False
6 True True ... False False
[7 rows x 18 columns]
Any help on how to tackle this problem is appreciated!
You can use the combinations() function in itertools to extract all the possible combinations of the columns of the 2 data frames, and then use the product() function in pandas to identify the rows where all the columns in the considered combination are equal to True. I included an example below, which considers all combinations of either 2 or 3 columns.
import pandas as pd
from itertools import combinations
df1 = pd.DataFrame({"Main1": [True, False, False, False, False, True, True],
"Main2": [False, False, True, False, True, True, False],
"Main3": [True, False, True, True, True, True, False]})
df2 = pd.DataFrame({"Sub1": [False, False, True, False, True, False, True],
"Sub2": [False, True, False, False, True, False, True],
"Sub3": [True, False, True, False, False, False, True]})
df3 = df1.join(df2)
all_combinations = list(combinations(df3.columns, 2)) + \
list(combinations(df3.columns, 3))
for combination in all_combinations:
df3["".join(list(combination))] = df3[list(combination)].product(axis=1).astype(bool)
df3.drop(labels=["Main1", "Main2", "Main3", "Sub1", "Sub2", "Sub3"], axis=1, inplace=True)
df3
Main1Main2 Main1Main3 ... Main3Sub2Sub3 Sub1Sub2Sub3
0 False True ... False False
1 False False ... False False
2 False False ... False False
3 False False ... False False
4 False False ... False False
5 True True ... False False
6 False False ... False True

How to use python apply/lambda/shift function to get the previous row value of that particular column based on the value of 2 columns?

I have 2 columns(FN1 and FN2) and based on these i have to create one more column(Final)
FN1 FN2 Final
False False 1
True True 1
False False 1
True False 2
True True 2
False False 1
True True 1
True True 1
If FN1 is False, Final will be 1.
If FN2 is True i will be the previous value of Final.
But if FN2 is False i need to update it with the previous value of
Final +1 (i.e. increment by 1)
. I tried doing it using shift() but again that does not help in this scenario.
FN1 FN2 Final
False False 1
True True 1
False False 1
True False 2
True True 2
False False 1
True True 1
True True 1
.
Use np.select:
df1 = df.shift()
cond1 = df['FN1'] == False
cond2 = (df['FN1']==True) & (df['FN2'] ==True)
cond3 = (df['FN1']==True) & (df['FN2'] == False)
df['Final'] = np.select([cond1,cond2,cond3], [1, df1['Final'], df1['Final']+1])
print(df)
using a lambda:
df = pd.DataFrame({'FN1': [False, True, False, True, True, False, True, True],
'FN2': [False, True, False, False, True, False, True, True]
})
def f(fn1, fn2):
global previousfinal
previousfinal = 1 if not fn1 else previousfinal + 1 if not fn2 else previousfinal
return previousfinal
previousfinal = 1
df['Final'] = df[['FN1', 'FN2']].apply(lambda x: f(*x), axis=1)
print(df)

Categories