This question already has answers here:
GroupBy pandas DataFrame and select most common value
(13 answers)
Closed 4 months ago.
i can use mean and median with groupby with this line:
newdf.groupby('dropoff_site')['load_weight'].mean()
newdf.groupby('dropoff_site')['load_weight'].median()
But when i use it for mode like this:
newdf.groupby('dropoff_site')['load_weight'].mode()
An error popped up, saying:
'SeriesGroupBy' object has no attribute 'mode'
What should i do?
update:
from GroupBy pandas DataFrame and select most common value i used
source2.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
as
newdf.groupby(['dropoff_site'])['load_weight'].agg(pd.Series.mode)
because this has multimodal, but now the error goes:
Must produce aggregated value
Try this...
newdf.groupby('dropoff_site')['load_weight'].agg(pd.Series.mode)
Related
This question already has answers here:
Python renaming Pandas DataFrame Columns
(4 answers)
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 11 months ago.
im new to python, i used to code...
StoreGrouper= DonHenSaless.groupby('Store')
StoreGrouperT= StoreGrouper["SalesDollars"].agg(np.sum)
StoreGrouperT.rename(columns={SalesDollars:TotalSalesDollars})
to group stores and sum by the SalesDollars then rename SalesDollars to TotalSalesDollars. it outputted the following error...
NameError: name 'SalesDollars' is not defined
I also tried using quotes
StoreGrouper= DonHenSaless.groupby('Store')
StoreGrouperT= StoreGrouper["SalesDollars"].agg(np.sum)
StoreGrouperT= StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
This output the error: rename() got an unexpected keyword argument 'columns'
Here is my df
df
In order to rename a column you need quotes so it would be:
StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
Also I usually assign it a variable
StoreGrouperT = StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
Use the pandas rename option to change the column name. You can also use inplace as true if you want your change to get reflected to the dataframe rather than saving it again on the df variable
df.rename(columns={'old_name':'new_name'}, inplace=True)
This question already has answers here:
How to get value counts for multiple columns at once in Pandas DataFrame?
(14 answers)
Closed 2 years ago.
hello I have the following dataset:
I want to count the frequency of each value occuring across the entire dataset. I am aware of the value_count() which is works only on columns but not for the entire dataset.
I used the following code:
df.value_counts()
But it results in an error:
AttributeError: 'DataFrame' object has no attribute 'value_counts'
Please could you help me count the frequency of values across the whole dataset?
You can use the stack function to stack all values in one column, and then use value_counts:
df.stack().value_counts()
This question already has answers here:
Python TypeError: cannot convert the series to <class 'int'> when trying to do math on dataframe
(3 answers)
Closed 4 years ago.
I am working in a project now and the is a column known as 'Date of last recharge' but it's an object-type column but I need to convert it into date formats:
Date of Last Recharge
20-10-2018
23-10-2018
04-08-2018
12-09-2018
20-08-2018
How I went about it was to split each date into it's individual year(y), month(m) and day(d) by using custom functions. But in the process of trying to recombine each of the series into a new column using:
date(y,m,d)
But I end up with this error:
cannot convert the series to (class 'int')
I'm not even sure if this approach is correct, if you know a better way to do this, please let me know.
Try this, it will work ..
df['Date_of_Last_Recharge'] = pd.to_datetime(df['Date_of_Last_Recharge'])
This question already has an answer here:
Pandas select rows and columns based on boolean condition
(1 answer)
Closed 4 years ago.
what is the pandas equivalent of
SELECT Column2
FROM DF
WHERE column3 ="value"
when we are using dataframe please
THANK YOU
You can use .loc and a conditional statement on a Dataframe to select out relevant information, similar to a where clause. You can pull out your desired column(s) using a second argument to loc, e.g.
df.loc[df[column3]==“value”, [column2]]
This question already has answers here:
Pandas column creation
(3 answers)
Accessing Pandas column using squared brackets vs using a dot (like an attribute)
(5 answers)
pandas dataframe where clause with dot versus brackets column selection
(1 answer)
Closed 5 years ago.
I just thought I added a column to a pandas dataframe with
df.NewColumn = np.log1p(df.ExistingColumn)
but when I looked it wasn't there! No error was raised either.. I executed it many times not believing what I was seeing but yeah, it wasn't there. So then did this:
df['NewColumn'] = np.log1p(df.ExistingColumn)
and now the new column was there.
Does anyone know the reason for this confusing behaviour? I thought those two ways of operating on a column were equivalent..