Pandas - Add colums for mean und std after groupby statement [duplicate] - python

This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 1 year ago.
I have a following dataframe:
d = {'City' : ['Paris', 'London', 'NYC', 'Paris', 'NYC'], 'ppl' : [3000,4646,33543,85687568,34545]}
df = pd.DataFrame(data=d)
df_mean = df.groupby('City').mean()
now I want to instead just calc the mean (and .std()) of the ppl column, I want to have the city, mean, std in my dataframe (of course the cities should be grouped). If this is not possible it would be ok to just add at least the column for the .std() column to my resulting dataframe

You can use .GroupBy.agg(), as follows:
df.groupby('City').agg({'ppl': ['min', 'std']})
If you don't want the column City be the index, you can do:
df.groupby('City').agg({'ppl': ['min', 'std']}).reset_index()
or
df.groupby('City')['ppl'].agg(['mean','std']).reset_index()
Result:
City mean std
0 London 4646 NaN
1 NYC 34044 7.085210e+02
2 Paris 42845284 6.058814e+07

Related

How do I filter a dataframe based on complicated conditions? [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 months ago.
Right now my dataframes look like this (I simplified it cause the original has hundreds of rows)
import pandas as pd
Winner=[[1938,"Italy"],[1950,"Uruguay"],[2014,"Germany"]]
df=pd.DataFrame(Winner, columns=['Year', 'Winner'])
print(df)
MatchB=[[1938,"Germany",1.0],[1938,"Germany",2.0],[1938,"Brazil",1.0],[1950,"Italy",2.0],[1950,"Spain",2.0],[1950,"Spain",1.0],[1950,"Spain",1.0],[1950,"Brazil",1.0],
[2014,"Italy",2.0],[2014,"Spain",3.0],[2014,"Germany",1.0]]
df2B=pd.DataFrame(MatchB, columns=['Year', 'Away Team Name','Away Team Goals'])
df2B
I would like to filter df2B so that I will have the rows where the "Year" and "Away Team Name" match df:
Filtered List (Simplified)
I check google but can't find anything useful
You can merge.
df = pd.merge(left=df, right=df2B, left_on=["Year", "Winner"], right_on=["Year", "Away Team Name"])
print(df)
Output:
Year Winner Away Team Name Away Team Goals
0 2014 Germany Germany 1.0

How to transform a series attributeNames to headers with corresponding attributeValues [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
Situation
I have a dataframe attributes that holds some attribute information about cars in 3 series:
attributes = {'brand': ['Honda Civic','Honda Civic','Honda Civic','Toyota Corolla','Toyota Corolla','Audi A4'],
'attributeName': ['wheels','doors','fuelType','wheels','color','wheels'],
'attributeValue': ['4','2','hybrid','4','red','4']
}
Expected result
result = {'brand': ['Honda Civic','Toyota Corolla','Audi A4'],
'wheels': ['4','4','4'],
'doors': ['2','',''],
'fuelType':['hybrid','',''],
'color': ['','red','']
}
How can I realize this?
Transform the values from attributeName into series to represent its attributeValue for each brand/car in one row.
With get_dummies I get this transformation, but only with true/false values not with the original values.
This is a simple pivot:
attributes.pivot(index='brand',
columns='attributeName',
values='attributeValue').fillna('')
or, shorter as your columns are in right order:
attributes.pivot(*attributes).fillna('')
To format it exactly as your provided output (except column order, please give details on that), you can use:
(attributes.pivot(index='brand', columns='attributeName', values='attributeValue')
.fillna('').rename_axis(None, axis=1)
.reset_index()
)
output:
brand color doors fuelType wheels
0 Audi A4 4
1 Honda Civic 2 hybrid 4
2 Toyota Corolla red 4

How can i pivot a dataframe in pandas where values are date? I get DataError: No numeric types to aggregate error [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I am trying to pivot a Dataframe in Pandas but I get DataError: No numeric types to aggregate.
I have data that looks like :
Year,Country,medal,date
1896,Afghanistan,Gold,1/1/2012
1896,Afghanistan,Silver,1/1/2012
1896,Afghanistan,Bronze,2/3/2012
1896,Algeria,Gold,3/4/2012
1896,Algeria,Silver,4/3/2012
1896,Algeria,Bronze,5/4/2012
What I want is:
Year,Country,Gold,Silver,Bronze
1896,Afghanistan,1/1/2012,1/1/2012,2/3/2012
1896,Algeria,3/4/2012,4/3/2012,5/4/2012
I tried
medals = df.pivot_table('date', ['Year', 'Country',],
'medal').reset_index()
I get DataError: No numeric types to aggregate. Any help would be appreciated.
You have to specify aggfunc in this case, because it tries to aggregates a numeric column:
df.pivot_table(index=['Year', 'Country'],
columns='medal',
values='date',
aggfunc=lambda x: x).reset_index()
medal Year Country Bronze Gold Silver
0 1896 Afghanistan 2/3/2012 1/1/2012 1/1/2012
1 1896 Algeria 5/4/2012 3/4/2012 4/3/2012
.pivot_table is good for aggregating data, but they need to be numerical. In your case, you should rather use .pivot as follows:
df.pivot(index='Country', columns='medal', values='date')
To your code, just add "aggfunc=np.sum"
df.pivot_table('date', ['Year', 'Country'], 'medal',aggfunc=np.sum).reset_index()

How to check if a cell has a specific character in Pandas [duplicate]

This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
So I have a dataframe that looks like this:
import pandas as pd
df = pd.df[index=['2014', '2015', '2016'], columns = ['Latitude', 'Longitude'], data = ['14.0N', '33.0W'],['22.0S', '12.0E']]
I want to go through and check the cells in the longitude column if they have an N or an S.
I will using endswith
df.Longitude.str.endswith(('N','S'))
Out[77]:
2014 False
2015 False
Name: Longitude, dtype: bool
Use the following:
# Example data
df = pd.DataFrame(index=['2014', '2015'],
columns = ['Latitude', 'Longitude'],
data = [['14.0N', '33.0W'],['22.0S', '12.0E']])
print(df)
Latitude Longitude
2014 14.0N 33.0W
2015 22.0S 12.0E
Check if 'N' or 'S' is in each row of Longitude:
df.Longitude.str.contains('|'.join(['N', 'S']))
2014 False
2015 False
Name: Longitude, dtype: bool

Count of unique value in column pandas [duplicate]

This question already has answers here:
How can I compute a histogram (frequency table) for a single Series?
(4 answers)
Closed 6 years ago.
I have a dataframe and I am looking at one column within the dataframe called names
array(['Katherine', 'Robert', 'Anne', nan, 'Susan', 'other'], dtype=object)
I am trying to make a call to tell me how many times each of these unique names shows up in the column, for example if there are 223 instances of Katherine etc.
How do i do this? i know value_counts just shows 1 for each of these because they are the separate unique values
If I understand you correctly, you can use pandas.Series.value_counts.
Example:
import pandas as pd
import numpy as np
s = pd.Series(['Katherine', 'Robert', 'Anne', np.nan, 'Susan', 'other'])
s.value_counts()
Katherine 1
Robert 1
other 1
Anne 1
Susan 1
dtype: int64
The data you provided only has one of each name - so here is an example with multiple 'Katherine' entries:
s = pd.Series(['Katherine','Katherine','Katherine','Katherine', 'Robert', 'Anne', np.nan, 'Susan', 'other'])
s.value_counts()
Katherine 4
Robert 1
other 1
Anne 1
Susan 1
dtype: int64
When applied to your Dataframe you will call this as follows:
df['names'].value_counts()
You could use group by to achieve that:
df[['col1']].groupby(['col1']).agg(['count'])

Categories