I want to break dataframe into blocks from one True value to next True value:
data
flag
MODS start 12/12/2020
True
Some data
False
Some data
False
MODS start 30/12/2020
True
Some data
False
Some data
False
To
data
flag
MODS start 12/12/2020
True
Some data
False
Some data
False
data
flag
MODS start 30/12/2020
True
Some data
False
Some data
False
Please help
You can use cumsum to create groups then query the datafame for each group:
df = pd.DataFrame({'data':['MODS start 12/12/202','Some data', 'Some data', 'MODS starts 30/12/2020', 'Some data', 'Some data'],
'flag':[True, False, False, True, False, False]})
df['grp'] = df['flag'].cumsum()
print(df)
Output:
data flag grp
0 MODS start 12/12/202 True 1
1 Some data False 1
2 Some data False 1
3 MODS starts 30/12/2020 True 2
4 Some data False 2
5 Some data False 2
The use:
df.query('grp == 1')
data flag grp
0 MODS start 12/12/202 True 1
1 Some data False 1
2 Some data False 1
and
df.query('grp == 2')
data flag grp
3 MODS starts 30/12/2020 True 2
4 Some data False 2
5 Some data False 2
You can use numpy.split:
np.split(df, df.index[df.flag])[1:]
Here, I used [1:] because numpy.split also consider the groups before the first index, even if it's empty.
That said, you can also use a simple list comprehension:
idx = df.index[df.flag].tolist() + [df.shape[0]]
[df.iloc[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Output (both approaches):
data flag
0 MODS start 12/12/2020 True
1 Some data False
2 Some data False
data flag
3 MODS start 30/12/2020 True
4 Some data False
5 Some data False
Get a list of indices of rows with flag = True
true_idx = df[df['flag']==True].index
n = len(true_idx)
Loop over true_idx and create a list of dataframes from each true index to next
new_dfs_list = [df.iloc[ true_idx[i]:true_idx[i+1], :] for i in range(n-1)]
append last df from last true index to the tail of df
new_dfs_list.append(df.iloc[ true_idx[n-1]:, :])
access any of your new_dfs by index
print(new_dfs_list[-1])
Related
How can i search for duplicate columns in a dataframe and then create a new column with same name. the new column is result of 'OR' operator of these columns. Then drop old duplicated columns.
Example:
For that, I tried to create a unique column 'job' that is the result of 'OR' operator of the two 'job' columns in the table bellow.
There is my table look like:
name
job
maried
children
job
John
True
True
True
True
Peter
True
False
True
True
Karl
False
True
True
True
jack
False
False
False
False
the result that I want is:
name
job
maried
children
John
True
True
True
Peter
True
False
True
Karl
True
True
True
jack
False
False
False
I tried to do this (df1 is my table):
df_join = pd.DataFrame()
df1_dulp = pd.DataFrame()
df_tmp = pd.DataFrame()
for column in df1.columns:
df1_dulp = df1.filter(like=str(column))
if df1_dulp.shape[1] >= 2:
for i in range(0, df1_dulp.shape[1]):
df_tmp += df1_dulp.iloc[:,i]
if column in df1_dulp.columns:
df1_dulp.drop(column, axis=1, inplace=True)
df_join = df_join.join(df1_dulp, how = 'left', lsuffix='left', rsuffix='right')
The result is an empty table (df_join).
You can select the boolean columns with select_dtypes, then aggregate as OR with groupby.any on columns:
out = (df
.select_dtypes(exclude='bool')
.join(df.select_dtypes('bool')
.groupby(level=0, axis=1, sort=False).any()
)
)
output:
name job maried children
0 John True True True
1 Peter True False True
2 Karl True True True
3 jack False False False
Annotating maximum by iterating each rows. and make new column with resultant output.
Can anyone help using pandas in Python, how to get the result?
text A B C
index
0 Cool False False True
1 Drunk True False False
2 Study False True False
Output:
Text Result
index
0 Cool False
1 Drunk False
2 Study False
If the sum of each row is more than half the length of the columns, True is the more common value.
Try:
df["Result"] = df.drop("text", axis=1).sum(axis=1)>=len(df.columns)//2+1
output = df[["text", "Result"]]
>>> df
text Result
0 Cool False
1 Drunk False
2 Study False
I am checking a panadas dataframe for duplicate rows using the duplicated function, which works well. But how do I print out the row contents of only the items that are true?
for example, If I run:
duplicateCheck = dataSet.duplicated(subset=['Name', 'Date',], keep=False)
print(duplicateCheck)
it outputs:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
8 False
9 False
I'm looking for something like:
for row in duplicateCheck.keys():
if row == True:
print (row, duplicateCheck[row])
Which prints the items from the dataframe that are duplicates.
Why not
duplicateCheck = dataSet.duplicated(subset=['Name', 'Date',], keep=False)
print(dataSet[duplicateCheck])
I want to print out the row where the value is "True" for more than one column.
For example if data frame is the following:
Remove Ignore Repair
0 True False False
1 False True True
2 False True False
I want it to print:
1
Is there an elegant way to do this instead of bunch of if statements?
you can use sum and pass axis=1 to sum over columns.
import pandas as pd
df = pd.DataFrame({'a':[False, True, False],'b':[False, True, False], 'c':[True, False, False,]})
print(df)
print("Ans: ",df[(df.sum(axis=1)>1)].index.tolist())
output:
a b c
0 False False True
1 True True False
2 False False False
Ans: [1]
To get the first row that meets the criteria:
df.index[df.sum(axis=1).gt(1)][0]
Output:
Out[14]: 1
Since you can get multiple matches, you can exclude the [0] to get all the rows that meet your criteria
You can just sum booleans as they will be interpreted as True=1, False=0:
df.sum(axis=1) > 1
So to filter to rows where this evaluates as True:
df.loc[df.sum(axis=1) > 1]
Or the same thing but being more explicit about converting the booleans to integers:
df.loc[df.astype(int).sum(axis=1) > 1]
I am relatively new to Python/Pandas and am struggling with extracting the correct data from a pd.Dataframe. What I actually have is a Dataframe with 3 columns:
data =
Position Letter Value
1 a TRUE
2 f FALSE
3 c TRUE
4 d TRUE
5 k FALSE
What I want to do is put all of the TRUE rows into a new Dataframe so that the answer would be:
answer =
Position Letter Value
1 a TRUE
3 c TRUE
4 d TRUE
I know that you can access a particular column using
data['Value']
but how do I extract all of the TRUE rows?
Thanks for any help and advice,
Alex
You can test which Values are True:
In [11]: data['Value'] == True
Out[11]:
0 True
1 False
2 True
3 True
4 False
Name: Value, dtype: bool
and then use fancy indexing to pull out those rows:
In [12]: data[data['Value'] == True]
Out[12]:
Position Letter Value
0 1 a True
2 3 c True
3 4 d True
*Note: if the values are actually the strings 'TRUE' and 'FALSE' (they probably shouldn't be!) then use:
data['Value'] == 'TRUE'
You can wrap your value/values in a list and do the following:
new_df = df.loc[df['yourColumnName'].isin(['your', 'list', 'items'])]
This will return a new dataframe consisting of rows where your list items match your column name in df.