Pandas incremental values when boolean changes - python

There is a dataframe that contains column in which boolean values alternate. I want to create incremental value series based on those boolean changes. I want to increment only when boolean value differs from previous value. I want to do this without loop.
Example, Here's dataframe:
column
0 True
1 True
2 False
3 False
4 False
5 True
I want to get this:
column inc
0 True 1
1 True 1
2 False 2
3 False 2
4 False 2
5 True 3

Compare shifted column by not equal and add cumulative sum:
df['inc'] = df['column'].ne(df['column'].shift()).cumsum()
print (df)
column inc
0 True 1
1 True 1
2 False 2
3 False 2
4 False 2
5 True 3
Detail:
print (df['column'].ne(df['column'].shift()))
0 True
1 False
2 True
3 False
4 False
5 True
Name: column, dtype: bool

Related

Fil rows in df.column betwen rows if number of rows between is less than something

I have a df where i want to fill the rows in column values with True if the number of rows between values True in column values is less then two.
counter
values
1
True
2
False
3
False
4
True
5
False
6
True
7
True
8
False
9
True
10
False
11
False
The result i want is like the df below:
counter
values
1
True
2
False
3
False
4
True
5
True
6
True
7
True
8
True
9
True
10
False
11
False
You can make groups starting with True, if the group is 2 items (or less), replace with True. Then compute the boolean OR with the original column:
N = 2
fill = df['values'].groupby(df['values'].cumsum()).transform(lambda g: len(g)<=N)
df['values'] = df['values']|fill ## or df['values'] |= fill
output (here as new column value2 for clarity):
counter values values2
0 1 True True
1 2 False False
2 3 False False
3 4 True True
4 5 False True
5 6 True True
6 7 True True
7 8 False True
8 9 True True
9 10 False False
10 11 False False
Other option that works only in the particular case of N=2, check if both the row before and after is True:
df['values'] = df['values']|(df['values'].shift()&df['values'].shift(-1))

Change all values in column if a condition is met within a group in Pandas dataframe

I have a dataframe that contains many rows, and a condition that is checked for each row and saved as a boolean in a column named condition. If this condition is False for any row within a group, I want to create a new column that is set to False for the whole group, and to True if the condition for every row within the group is set to True.
The final dataframe should look like this:
group condition final_condition
0 1 False False
1 1 False False
2 1 True False
3 2 True True
4 2 True True
5 3 True False
6 3 False False
I have tried many different things but can't find a solution, so any help is appreciated.
use groupby()+transform():
df['final_condition']=df.groupby('group')['condition'].transform('all')
output of df:
group condition final_condition
0 1 False False
1 1 False False
2 1 True False
3 2 True True
4 2 True True
5 3 True False
6 3 False False

How to get the no of same boolean occur in two list in python

i have dataframe having
A B C D
0 True 5 True True
1 True 6 False False
2 False 5 True True
3 False 8 True False
4 True 2 True True
It should print the count when Column D is True, how many times Column A and Column C are True.
Expected Output
A : 2
C : 3
You can filter by column D because boolean in boolean indexing with DataFrame.loc for also filter by columns names and last for count Trues values is used sum:
s = df.loc[df.D, ['A','C']].sum()
print (s)
A 2
C 3
dtype: int64
Details:
print (df.loc[df.D, ['A','C']])
A C
0 True True
2 False True
4 True True

List columns in data frame that have missing values as '?'

List names of the column(s) of data frame along with the count of missing number of values if missing values are coded with '?' using pandas and numpy.
import numpy as np
import pandas as pd
bridgeall = pd.read_excel('bridge.xlsx',sheet_name='Sheet1')
#print(bridgeall)
bridge_sep = bridgeall.iloc[:,0].str.split(',',-1,expand=True)
bridge_sep.columns = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH', 'LANES','CLEAR-G', 'T-OR-D',
'MATERIAL', 'SPAN', 'REL-L', 'TYPE']
print(bridge_sep)
Data: I am posting a snippet. Its actually [107 rows x 13 columns].
IDENTIF RIVER LOCATION ERECTED ... MATERIAL SPAN REL-L TYPE
0 E2 A ? CRAFTS ... WOOD SHORT ? WOOD
1 E3 A 39 CRAFTS ... WOOD ? S WOOD
2 E5 A ? CRAFTS ... WOOD SHORT S WOOD
Output required:
LOCATION 2
SPAN 1
REL-L 1
Compare all values by eq (==) and for count accurencies use sum - Trues are processes like 1, then remove only False values (0) by boolean indexing:
s = df.eq('?').sum()
s = s[s != 0]
print (s)
LOCATION 2
SPAN 1
REL-L 1
dtype: int64
Last for DataFrame add reset_index:
df1 = s.reset_index()
df1.columns = ['names','count']
print (df1)
names count
0 LOCATION 2
1 SPAN 1
2 REL-L 1
EDIT:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)))
print (df)
0 1 2 3 4
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
#compare with same length Series
#same index values like index/columns of DataFrame
s = pd.Series(np.arange(5))
print (s)
0 0
1 1
2 2
3 3
4 4
dtype: int32
#compare columns
print (df.eq(s, axis=0))
0 1 2 3 4
0 False False False False False
1 False False False False False
2 True True False False False
3 False False False False False
4 True False False False True
#compare rows
print (df.eq(s, axis=1))
0 1 2 3 4
0 False False False False False
1 True False True False False
2 False False False False False
3 False False False False False
4 False True False True True
If your DataFrame is named df, try (df == '?').sum()

Python - Pandas - DataFrame - Explode single column into multiple boolean columns based on conditions

Good morning chaps,
Any pythonic way to explode a dataframe column into multiple columns with boolean flags, based on some condition (str.contains in this case)?
Let's say I have this:
Position Letter
1 a
2 b
3 c
4 b
5 b
And I'd like to achieve this:
Position Letter is_a is_b is_C
1 a TRUE FALSE FALSE
2 b FALSE TRUE FALSE
3 c FALSE FALSE TRUE
4 b FALSE TRUE FALSE
5 b FALSE TRUE FALSE
Can do with a loop through 'abc' and explicitly creating new df columns, but wondering if some built-in method already exists in pandas. Number of possible values, and hence number of new columns is variable.
Thanks and regards.
use Series.str.get_dummies():
In [31]: df.join(df.Letter.str.get_dummies())
Out[31]:
Position Letter a b c
0 1 a 1 0 0
1 2 b 0 1 0
2 3 c 0 0 1
3 4 b 0 1 0
4 5 b 0 1 0
or
In [32]: df.join(df.Letter.str.get_dummies().astype(bool))
Out[32]:
Position Letter a b c
0 1 a True False False
1 2 b False True False
2 3 c False False True
3 4 b False True False
4 5 b False True False

Categories