Delete multiple rows by multiple conditions in python [duplicate] - python

This question already has answers here:
delete rows based on a condition in pandas
(2 answers)
Closed 1 year ago.
I have a simple dataset:
I want to delete the rows where count>1 when animal is cat or dog. So the output should look like:
Can I get the result in an efficient way? Thank you

count_mask = dataset['count'] > 1
animal_mask = dataset['animal'].isin(['cat', 'dog'])
dataset = dataset[~(count_mask & animal_mask)]

Related

How to filter pandas dataframe based on length of a list in a column? [duplicate]

This question already has answers here:
How to filter a pandas dataframe based on the length of a entry
(2 answers)
Closed 1 year ago.
I have a pandas DataFrame like this:
id subjects
1 [math, history]
2 [English, Dutch, Physics]
3 [Music]
How to filter this dataframe based on the length of the column subjects?
So for example, if I only want to have rows where len(subjects) >= 2?
I tried using
df[len(df["subjects"]) >= 2]
But this gives
KeyError: True
Also, using loc does not help, that gives me the same error.
Thanks in advance!
Use the string accessor to work with lists:
df[df['subjects'].str.len() >= 2]
Output:
id subjects
0 1 [math, history]
1 2 [English, Dutch, Physics]

How to put a condition while using a GroupBy in Pandas? [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have used the following code to make a distplot.
data_agg = data.groupby('HourOfDay')['travel_time'].aggregate(np.median).reset_index()
plt.figure(figsize=(12,3))
sns.pointplot(data.HourOfDay.values, data.travel_time.values)
plt.show()
However I want to choose hours above 8 only and not 0-7. How do I proceed with that?
What about filtering first?
data_filtered = data[data['HourOfDay'] > 7]
# depending of the type of the column of date
data_agg = data_filtered.groupby('HourOfDay')['travel_time'].aggregate(np.median).reset_index()
plt.figure(figsize=(12,3))
Sns.pointplot(data_filtered.HourOfDay.values, data_filtered.travel_time.values)
plt.show()

Value is being returned as None [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 3 years ago.
I am trying to return all correct data when a condition is met. I would like to return all the relevant records when there has been X amount of goals scored by the home team.
data = pd.read_csv("epl_data_v2.csv")
def highest_home_score():
data.loc[data['HG']==1]
The console is returning the value None. I'm not sure why this happens. I know the column name 'HG' is correct.
def highest_home_score():
print(data.loc[data['HG']==1])
highest_home_score()
The code above produces what I was expecting - a small set of results that feature 1 as the HG value.

Columns in Pandas Dataframe [duplicate]

This question already has answers here:
Binning a column with pandas
(4 answers)
Closed 3 years ago.
I have a dataframe of cars. I have its car price column and I want to create a new column carsrange that would have values like 'high','low' etc according to car price. Like for example :
if price is between 0 and 9000 then cars range should have 'low' for those cars. similarly, if price is between 9000 and 30,000 carsrange should have 'medium' for those cars etc. I tried doing it, but my code is replacing one value to the other. Any help please?
I ran a for loop in the price column, and use the if-else iterations to define my column values.
for i in cars_data['price']:
if (i>0 and i<9000): cars_data['carsrange']='Low'
elif (i<9000 and i<18000): cars_data['carsrange']='Medium-Low'
elif (i<18000 and i>27000): cars_data['carsrange']='Medium'
elif(i>27000 and i<36000): cars_data['carsrange']='High-Medium'
else : cars_data['carsrange']='High'
Now, When I run the unique function for carsrange, it shows only 'High'.
cars_data['carsrange'].unique()
This is the Output:
In[74]:cars_data['carsrange'].unique()
Out[74]: array(['High'], dtype=object)
I believe I have applied the wrong concept here. Any ideas as to what I should do now?
you can use list:
resultList = []
for i in cars_data['price']:
if (i>0 and i<9000):
resultList.append("Low")
else:
resultList.append("HIGH")
# write other conditions here
cars_data["carsrange"] = resultList
then find uinque values from cars_data["carsrange"]

Adding rows and duplicating values in a Pandas based on a list of duplicates [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
So here's my daily challenge :
I have an Excel file containing a list of streets, and some of those streets will be doubled (or tripled) based on their road type. For instance :
In another Excel file, I have the street names (without duplicates) and their mean distances between features such as this :
Both Excel files have been converted to pandas dataframes as so :
duplicates_df = pd.DataFrame()
duplicates_df['Street_names'] = street_names
dist_df=pd.DataFrame()
dist_df['Street_names'] = names_dist_values
dist_df['Mean_Dist'] = dist_values
dist_df['STD'] = std_values
I would like to find a way to append the values of mean distance and STD many times in the duplicates_df whenever a street has more than one occurence, but I am struggling with the proper syntax. This is probably an easy fix, but I've never done this before.
The desired output would be :
Any help would be greatly appreciated!
Thanks again!
pd.merge(duplicates_df, dist_df, on="Street_names")

Categories