How to put a condition while using a GroupBy in Pandas? [duplicate] - python

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have used the following code to make a distplot.
data_agg = data.groupby('HourOfDay')['travel_time'].aggregate(np.median).reset_index()
plt.figure(figsize=(12,3))
sns.pointplot(data.HourOfDay.values, data.travel_time.values)
plt.show()
However I want to choose hours above 8 only and not 0-7. How do I proceed with that?

What about filtering first?
data_filtered = data[data['HourOfDay'] > 7]
# depending of the type of the column of date
data_agg = data_filtered.groupby('HourOfDay')['travel_time'].aggregate(np.median).reset_index()
plt.figure(figsize=(12,3))
Sns.pointplot(data_filtered.HourOfDay.values, data_filtered.travel_time.values)
plt.show()

Related

Histogram only in a specific range [duplicate]

This question already has answers here:
How to select rows in a DataFrame between two values, in Python Pandas?
(7 answers)
Closed 1 year ago.
Example df
How do I create a histogram that in this case only uses the range of 2–5 points, instead of the entire points data range of 1–6?
I'm trying to only display the average data spread, not the extreme areas. Is there maybe a function to zoom in to the significant ranges? And is there a smart way to describe those?
For your specific data, you can first filter your DataFrame, then call .hist(). Note that Series.between(left, right) includes both the left and right values:
df[df['points'].between(2, 5)].hist()

How to filter pandas dataframe based on length of a list in a column? [duplicate]

This question already has answers here:
How to filter a pandas dataframe based on the length of a entry
(2 answers)
Closed 1 year ago.
I have a pandas DataFrame like this:
id subjects
1 [math, history]
2 [English, Dutch, Physics]
3 [Music]
How to filter this dataframe based on the length of the column subjects?
So for example, if I only want to have rows where len(subjects) >= 2?
I tried using
df[len(df["subjects"]) >= 2]
But this gives
KeyError: True
Also, using loc does not help, that gives me the same error.
Thanks in advance!
Use the string accessor to work with lists:
df[df['subjects'].str.len() >= 2]
Output:
id subjects
0 1 [math, history]
1 2 [English, Dutch, Physics]

Delete multiple rows by multiple conditions in python [duplicate]

This question already has answers here:
delete rows based on a condition in pandas
(2 answers)
Closed 1 year ago.
I have a simple dataset:
I want to delete the rows where count>1 when animal is cat or dog. So the output should look like:
Can I get the result in an efficient way? Thank you
count_mask = dataset['count'] > 1
animal_mask = dataset['animal'].isin(['cat', 'dog'])
dataset = dataset[~(count_mask & animal_mask)]

How to drop all rows where output is 0 and speed is greater than 5 in pandas dataframe [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have two columns in the table, speed and power. I would like to exclude any 0 for power where speed is greater than 5.
So far I have
df[(df.sum(axis=1) != 0)]
which will exclude all 0 values but how do I amend that to also exclude all speeds greater than 5, while also including below 5?
You can try this -
drop_idx = df[df.power == 0) & (df.speed > 5)].index
df = df.drop(index=drop_idx)
Your df will contain all the required rows

Value is being returned as None [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 3 years ago.
I am trying to return all correct data when a condition is met. I would like to return all the relevant records when there has been X amount of goals scored by the home team.
data = pd.read_csv("epl_data_v2.csv")
def highest_home_score():
data.loc[data['HG']==1]
The console is returning the value None. I'm not sure why this happens. I know the column name 'HG' is correct.
def highest_home_score():
print(data.loc[data['HG']==1])
highest_home_score()
The code above produces what I was expecting - a small set of results that feature 1 as the HG value.

Categories