How to select rows based on column value string checking? [duplicate] - python

This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 1 year ago.
df.loc[df['name'] == 'Mary']
The above get rows if the 'name' is Mary. What about if I want rows that contains 'Mary', not exactly equal 'Mary'?

You can use pd.Series.str.contains() method to achieve this.
df[df['name'].str.contains('Mary')]

Related

Python: Create seperate dataframe from a dataframe with one category of the category variable column [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have a dataframe like the one below having 3 types of status -'complete','start' and 'fail'. I want to create another dataframe from this keeping only the "fail" status entries with their corresponding level number.
Let's do this:
fail_df = df[df['status']=='fail']
or this with str.contains:
fail_df = df[df['status'].str.contains(r'fail',case=False)]
Both ways will give a new dataframe with only status being 'fail'. However, the str.contains method is more robust to typo's.

Using Python, Get Partial Sum of DF column based on a condition from another sorted column [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 2 years ago.
DataFrame in question
I need to find out total invoice value for each supplier and create a new dataframe with unique supplier names as follows.
Final Output desired
Try this:
sum_by_supplier = (df.groupby('Supplier Name')['Invoice Value'].sum()).reset_index()

Find elements from a list in a dataframe column [duplicate]

This question already has answers here:
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 years ago.
I come today with a rather simple problem. I have a dataframe containing values for individuals.
df = pd.DataFrame({'Names': ['John', 'Edward', 'Sean'],
'Age': ['21', '44', '35']})
What I am looking forward to do is to remove some lines that contain individuals from a list.
list = ['Edward', 'Martin', 'Paul']
So the desired output should be a dataframe without Edward's line.
I have tried some things but I only get a list with False and True Values :
print(df.index.isin(list))
Thank you in advance

How to add a column to a python dataframe, that is a string manipulation of another column? [duplicate]

This question already has answers here:
Get first letter of a string from column
(2 answers)
Closed 4 years ago.
There is a column named 'country' and I want a column that is 'abc' <= the first two chars of 'country'.
In pseudocode:
df['abc'] = df['country'][0:2]
Of course this does not work.
You want:
df['abc'] = df['country'].str[:2]

Get specific element from Groups after applying groupby - PANDAS [duplicate]

This question already has answers here:
How to access pandas groupby dataframe by key
(6 answers)
Closed 9 years ago.
I have grouped sum data in a dataframe, by the following:
groups = df.groupby(['name'])
Now I can get the head of the groups by groups.head(2) which gives the first two rows.
But how do I get a group by a specific name? i.e. if I want the single group where the group name is 'ruby', I can't just do groups['ruby']
How about:
groups.get_group('name')
For more elaboration, see this related question

Categories