I am checking a panadas dataframe for duplicate rows using the duplicated function, which works well. But how do I print out the row contents of only the items that are true?
for example, If I run:
duplicateCheck = dataSet.duplicated(subset=['Name', 'Date',], keep=False)
print(duplicateCheck)
it outputs:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
7 False
8 False
9 False
I'm looking for something like:
for row in duplicateCheck.keys():
if row == True:
print (row, duplicateCheck[row])
Which prints the items from the dataframe that are duplicates.
Why not
duplicateCheck = dataSet.duplicated(subset=['Name', 'Date',], keep=False)
print(dataSet[duplicateCheck])
Related
Annotating maximum by iterating each rows. and make new column with resultant output.
Can anyone help using pandas in Python, how to get the result?
text A B C
index
0 Cool False False True
1 Drunk True False False
2 Study False True False
Output:
Text Result
index
0 Cool False
1 Drunk False
2 Study False
If the sum of each row is more than half the length of the columns, True is the more common value.
Try:
df["Result"] = df.drop("text", axis=1).sum(axis=1)>=len(df.columns)//2+1
output = df[["text", "Result"]]
>>> df
text Result
0 Cool False
1 Drunk False
2 Study False
How to subset output of pandas contain statement to give all True values?
Code
df_2clean["p2_conf"].astype(str).str.contains(r'[^0-9+-:.\s]')
Output
0 False
1 False
2 False
3 False
4 True
Try this:
df_2subset=df_2clean[df_2clean["p2_conf"].astype(str).str.contains(r'[^0-9+-:.\s]')==True]
I have a pandas dataframe with the column "Values" that has comma separated values:
Row|Values
1|1,2,3,8
2|1,4
I want to create columns based on the CSV, and assign a boolean indicating if the row has that value, as follows:
Row|1,2,3,4,8
1|true,true,true,false,true
2|true,false,false,true,false
How can I accomplish that?
Thanks in advance
Just using get_dummies, check the link here and the astype(bool) change 1 to True 0 to False
df.set_index('Row')['Values'].str.get_dummies(',').astype(bool)
Out[318]:
1 2 3 4 8
Row
1 True True True False True
2 True False False True False
So I have a pytest testing the results of a query that returns pandas dataframe.
I want to assert that a particular column col has all the values that are a substring of a given input.
So this below gives me the rows (dataframe) that have that column's col value containing some input part. How can I assert it to be true?
assert result_df[result_df['col'].astype(str).str.contains(category)].bool == True
doesn't work
Try this:
assert result_df[result_df['col'].astype(str).str.contains(category)].bool.all(axis=None) == True
Please refer to the pandas docs for more info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.all.html
The reason your code doesn't work is because you are trying to test whether the dataframe object is True, not all of the values in it.
I believe you need Series.all for check if all values of filtered Series are Trues:
assert result_df['col'].astype(str).str.contains(category).all()
Sample:
result_df = pd.DataFrame({
'col':list('aaabbb')
})
print (result_df)
col
0 a
1 a
2 a
3 b
4 b
5 b
category = 'b'
assert result_df['col'].astype(str).str.contains(category).all()
AssertionError
Detail:
print (result_df['col'].astype(str).str.contains(category))
0 False
1 False
2 False
3 True
4 True
5 True
Name: col, dtype: bool
print (result_df['col'].astype(str).str.contains(category).all())
False
category = 'a|b'
assert result_df['col'].astype(str).str.contains(category).all()
print (result_df['col'].astype(str).str.contains(category))
0 True
1 True
2 True
3 True
4 True
5 True
Name: col, dtype: bool
print (result_df['col'].astype(str).str.contains(category).all())
True
Found it. assert result_df[result_df['col'].astype(str).str.contains(category)].bool works
or assert result_df['col'].astype(str).str.contains(category).all (Thanks to #jezrael for suggesting all)
I am relatively new to Python/Pandas and am struggling with extracting the correct data from a pd.Dataframe. What I actually have is a Dataframe with 3 columns:
data =
Position Letter Value
1 a TRUE
2 f FALSE
3 c TRUE
4 d TRUE
5 k FALSE
What I want to do is put all of the TRUE rows into a new Dataframe so that the answer would be:
answer =
Position Letter Value
1 a TRUE
3 c TRUE
4 d TRUE
I know that you can access a particular column using
data['Value']
but how do I extract all of the TRUE rows?
Thanks for any help and advice,
Alex
You can test which Values are True:
In [11]: data['Value'] == True
Out[11]:
0 True
1 False
2 True
3 True
4 False
Name: Value, dtype: bool
and then use fancy indexing to pull out those rows:
In [12]: data[data['Value'] == True]
Out[12]:
Position Letter Value
0 1 a True
2 3 c True
3 4 d True
*Note: if the values are actually the strings 'TRUE' and 'FALSE' (they probably shouldn't be!) then use:
data['Value'] == 'TRUE'
You can wrap your value/values in a list and do the following:
new_df = df.loc[df['yourColumnName'].isin(['your', 'list', 'items'])]
This will return a new dataframe consisting of rows where your list items match your column name in df.