I would like to find with Python the way to find blocks of x equal values or more in a row in a column.
E.g. given a dataset, I would like to find blocks of three or more in a row of True in the column Value and create a new column with the result:
ID Value New column
1 True False
2 True False
3 False False
4 False False
5 False False
6 True True
7 True True
8 True True
9 False False
10 False False
11 True False
12 True False
13 False False
14 True True
. True True
. True True
.. True True
One way to solve your problem would be to use 2 loops. Note that this is quick and dirty code and not optimized for efficiency
First Loop - just populate data for all the 3 columns with a default value of False for the 3rd column.
Second Loop - Now go through each row. If the value of the 3rd column is not already set to True, check the value of 2nd column for that row and the next 2 rows i.e. row(i), row(i+1), row(i+2). If the values are all True then you update the value of the 3rd column to True for that row and the next 2 rows.
Example you're on row 6 where the 3rd column is False, you check the values of the 2nd column for row 6, 7, 8 and they are all True, so you update the value of the 3rd column for rows 6,7,8 to True
Since you are only checking rows where the 3rd column is False, it means you will then skip rows 7, 8 (so they don't get overwritten back to False) and you should land on row 9 where again the 3rd column is False
for i in range(len(rows) ):
try:
if not rows[i][2]: # Ignore any row where the third column is already set to True
if all(rows[i][1], rows[i+1][1], rows[i+2][1]):
rows[i][2] = True
rows[i+1][2] = True
rows[i+2][2] = True
except Exception, inst:
continue # if row(i+1) or row(i+2) doesn't exist, continue or exit the loop
The try except block takes care of when you get towards the end of the rows where you don't have row (i+1) or row(i+2)
Related
Annotating maximum by iterating each rows. and make new column with resultant output.
Can anyone help using pandas in Python, how to get the result?
text A B C
index
0 Cool False False True
1 Drunk True False False
2 Study False True False
Output:
Text Result
index
0 Cool False
1 Drunk False
2 Study False
If the sum of each row is more than half the length of the columns, True is the more common value.
Try:
df["Result"] = df.drop("text", axis=1).sum(axis=1)>=len(df.columns)//2+1
output = df[["text", "Result"]]
>>> df
text Result
0 Cool False
1 Drunk False
2 Study False
I have a dataframe 'df' from which I want to select the subset where 3 specific columns are not null.
So far I have tried to apply bool filtering
mask_df = df[['Empty', 'Peak', 'Full']].notnull()
which gives me the following result
Empty Peak Full
0 True False False
1 False False False
2 True True True
3 False False False
4 False False False
... ... ... ...
2775244 True True True
2775245 True True True
2775246 False False False
2775247 False False False
2775248 False False False
Now I want to select ONLY the rows where the mask for those 3 columns is True (i.e., rows where those 3 columns have null values). If I filter the original dataframe 'df' with this mask I get the original dataframe full of null values, except those where the mask_df is "True".
I probably can do this by applying a lambda function row-wise, but I would prefer to avoid that computation if there was a simpler way to do this.
Thanks in advance!
use pandas.DataFrame.all:
df[mask_df.all(axis = 1)]
I am checking a panadas dataframe for duplicate rows using the duplicated function, which works well. But how do I print out the row contents of only the items that are true?
for example, If I run:
duplicateCheck = dataSet.duplicated(subset=['Name', 'Date',], keep=False)
print(duplicateCheck)
it outputs:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
8 False
9 False
I'm looking for something like:
for row in duplicateCheck.keys():
if row == True:
print (row, duplicateCheck[row])
Which prints the items from the dataframe that are duplicates.
Why not
duplicateCheck = dataSet.duplicated(subset=['Name', 'Date',], keep=False)
print(dataSet[duplicateCheck])
I want to print out the row where the value is "True" for more than one column.
For example if data frame is the following:
Remove Ignore Repair
0 True False False
1 False True True
2 False True False
I want it to print:
1
Is there an elegant way to do this instead of bunch of if statements?
you can use sum and pass axis=1 to sum over columns.
import pandas as pd
df = pd.DataFrame({'a':[False, True, False],'b':[False, True, False], 'c':[True, False, False,]})
print(df)
print("Ans: ",df[(df.sum(axis=1)>1)].index.tolist())
output:
a b c
0 False False True
1 True True False
2 False False False
Ans: [1]
To get the first row that meets the criteria:
df.index[df.sum(axis=1).gt(1)][0]
Output:
Out[14]: 1
Since you can get multiple matches, you can exclude the [0] to get all the rows that meet your criteria
You can just sum booleans as they will be interpreted as True=1, False=0:
df.sum(axis=1) > 1
So to filter to rows where this evaluates as True:
df.loc[df.sum(axis=1) > 1]
Or the same thing but being more explicit about converting the booleans to integers:
df.loc[df.astype(int).sum(axis=1) > 1]
I have a pandas dataframe with the column "Values" that has comma separated values:
Row|Values
1|1,2,3,8
2|1,4
I want to create columns based on the CSV, and assign a boolean indicating if the row has that value, as follows:
Row|1,2,3,4,8
1|true,true,true,false,true
2|true,false,false,true,false
How can I accomplish that?
Thanks in advance
Just using get_dummies, check the link here and the astype(bool) change 1 to True 0 to False
df.set_index('Row')['Values'].str.get_dummies(',').astype(bool)
Out[318]:
1 2 3 4 8
Row
1 True True True False True
2 True False False True False