I have this function written in python. I want this thing show difference between row from production column.
Here's the code
def print_df():
mycursor.execute("SELECT * FROM productions")
myresult = mycurson.fetchall()
myresult.sort(key=lambda x: x[0])
df = pd.DataFrame(myresult, columns=['Year', 'Production (Ton)'])
df['Dif'] = abs(df['Production (Ton)']. diff())
print(abs(df))
And of course the output is this
Year Production (Ton) Dif
0 2010 339491 NaN
1 2011 366999 27508.0
2 2012 361986 5013.0
3 2013 329461 32525.0
4 2014 355464 26003.0
5 2015 344998 10466.0
6 2016 274317 70681.0
7 2017 200916 73401.0
8 2018 217246 16330.0
9 2019 119830 97416.0
10 2020 66640 53190.0
But I want the output like this
Year Production (Ton) Dif
0 2010 339491 27508.0
1 2011 366999 5013.0
2 2012 361986 32525.0
3 2013 329461 26003.0
4 2014 355464 10466.0
5 2015 344998 70681.0
6 2016 274317 73401.0
7 2017 200916 16330.0
8 2018 217246 97416.0
9 2019 119830 53190.0
10 2020 66640 66640.0
What should I change or add to my code?
You can use a negative period input to diff to get the differences the way you want, and then fillna to fill the last value with the value from the Production column:
df['Dif'] = df['Production (Ton)'].diff(-1).fillna(df['Production (Ton)']).abs()
Output:
Year Production (Ton) Dif
0 2010 339491 27508.0
1 2011 366999 5013.0
2 2012 361986 32525.0
3 2013 329461 26003.0
4 2014 355464 10466.0
5 2015 344998 70681.0
6 2016 274317 73401.0
7 2017 200916 16330.0
8 2018 217246 97416.0
9 2019 119830 53190.0
10 2020 66640 66640.0
Use shift(-1) to shift all rows one position up.
df['Dif'] = (df['Production (Ton)'] - df['Production (Ton)'].shift(-1).fillna(0)).abs()
Notice that by setting fillna(0), you avoid the NaNs.
You can also use diff:
df['Dif'] = df['Production (Ton)'].diff().shift(-1).fillna(0).abs()
I have a dataframe called LCI. In this dataframe, the index corresponds to a year+1. So index 0 is year 1 and so on. Some years contain values. I made a list of which years contain values. The list looks for like this [1,5,10,15,...,60]
years
year_reality
CO2
CH4
NO2
CO
1
2021
7.016
6.180
1.222
2
2022
2
0
0
3
2023
0
0
0
What I now want to do, is multiply the corresponding value of a year to another column of values called DynCFs. DynCFs looks like this
years
year_reality
CO2
CH4
NO2
CO
1
2021
3
6
2
2
2022
4
2
7
3
2023
3
7
6
so for example: LCI.loc[(0),'CO2']DynCFs['CO2'] = [37.016
47.016
37.016]
and call this new dataframe/column tempDLCA. (different name for each new column)
I want to make a new dataframe which is equal to the sum of the columns of tempDLCA, but only the values of the same years should be added up.
so for example:
year_reality
CO2
2021
7.016*3
2022
7.016*4
2023
7.016*3
and
year_reality
CO2
2022
2*3
2023
2*4
2024
2*3
should give this (what I will call dynLCA in the code)
year_reality
CO2
2021
7.016*3
2022
7.016x4 + 2x3
2023
7.016x3+2x4
2024
2*3
ps.: i used x because * was not recognised by stackoverflow for some reason
I tried the following, but the output is only for the last i of listedValues(), so 60.
for i in listedValues:
tempDLCA= pd.DataFrame()
tempDLCA['Year_reality']= np.arange(2021+(i-1),4021+(i-1),1)
tempDLCA['CO2'] = LCI.loc[(i-1),'CO2']*DynCFs['CO2']
tempDLCA['CO'] = LCI.loc[(i-1),'CO']*DynCFs['CO']
tempDLCA['NO2'] = LCI.loc[(i-1),'NO2']*DynCFs['NO2']
tempDLCA['CH4'] = LCI.loc[(i-1),'CH4']*DynCFs['CH4']
dynLCA= pd.concat([DLCA,tempDLCA], ignore_index=True).groupby(['Year_reality'], as_index = False).sum()
dynLCA
What I am doing wrong?
I want to filter data from the total count after groupby.
data is like that :
Rating Num Year
0 6 1001508 2009
1 6 1001508 2009
2 6 1001508 2009
3 7 0100802 1990
4 7 0100802 1990
i groupby data and count it.
data.groupby(['Year'])["Rating"].count()
and output is :
2017 225
2018 215
2019 397
2020 82
2021 39
However, couldn't filter after that. I want to more than 50 for example.
tried
data[data.groupby(['Year'])["Rating"].count()<10]
and some variations but couldn't work it out. Lastly, i'm using mean of of these years.
In your case change to transform
out = data[data.groupby(['Year'])["Rating"].transform('count')<10]
Hi everyone, I want to calculate the sum of Violent_type count according to year. For example, calculating total count of violent_type for year 2013, which is 18728+121662+1035. But I don't know how to select the data when there are multiIndexes. Any advice will be appreciated. Thanks.
The level argument in pandas.DataFrame.groupby() is what you are looking for.
level int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
To answer your question, you only need:
df.groupby(level=[0, 1]).sum()
# or
df.groupby(level=['district', 'year']).sum()
To see the effect
import pandas as pd
iterables = [['001', 'SST'], [2013, 2014], ['Dangerous', 'Non-Violent', 'Violent']]
index = pd.MultiIndex.from_product(iterables, names=['district', 'year', 'Violent_type'])
df = pd.DataFrame(list(range(0, len(index))), index=index, columns=['count'])
'''
print(df)
count
district year Violent_type
001 2013 Dangerous 0
Non-Violent 1
Violent 2
2014 Dangerous 3
Non-Violent 4
Violent 5
SST 2013 Dangerous 6
Non-Violent 7
Violent 8
2014 Dangerous 9
Non-Violent 10
Violent 11
'''
print(df.groupby(level=[0, 1]).sum())
'''
count
district year
001 2013 3
2014 12
SST 2013 21
2014 30
'''
print(df.groupby(level=['district', 'year']).sum())
'''
count
district year
001 2013 3
2014 12
SST 2013 21
2014 30
'''
I want to create a graph that will display the cumulative average revenue for each 'Year Onboarded' (first customer transaction) over a period of time. But I am making mistakes when grouping the information I need.
Toy Data:
dataset = {'ClientId': [1,2,3,1,2,3,1,2,3,1,2,3,4,4,4,4,4,4,4],
'Year Onboarded': [2018,2019,2020,2018,2019,2020,2018,2019,2020,2018,2019,2020,2016,2016,2016,2016,2016,2016,2016],
'Year': [2019,2019,2020,2019,2019,2020,2018,2020,2020,2020,2019,2020,2016,2017,2018,2019,2020,2017,2018],
'Revenue': [100,50,25,30,40,50,60,100,20,40,100,20,5,5,8,4,10,20,8]}
df = pd.DataFrame(data=dataset)
Explanation: Customers have a designated 'Year Onboarded' and they make a transaction every 'Year' mentioned.
Then I calculate the years that have elapsed since the clients onboarded in order to make my graph visually more appealing.
df['Yearsdiff'] = df['Year']-df['Year Onboarded']
To calculate the Cumulative Average Revenue I tried the following methods:
First try:
df = df.join(df.groupby(['Year']).expanding().agg({ 'Revenue': 'mean'})
.reset_index(level=0, drop=True)
.add_suffix('_roll'))
df.groupby(['Year Onboarded', 'Year']).last().drop(columns=['Revenue'])
The output starts to be cumulative but the last row isn't cumulative anymore (not sure why).
Second Try:
df.groupby(['Year Onboarded','Year']).agg('mean') \
.groupby(level=[1]) \
.agg({'Revenue':np.cumsum})
But it doesn't work properly, I tried other ways as well but didn't achieve good results.
To visualize the cumulative average revenue I simply use sns.lineplot
My goal is to get a graph similar as the one below but for that I first need to group my data correctly.
Expected output plot
The Years that we can see on the graph represent the 'Year Onboarded' not the 'Year'.
Can someone help me calculate a Cumulative Average Revenue that works in order to plot a graph similar to the one above? Thank you
Also the data provided in the toy dataset will surely not give something similar to the example plot but the idea should be there.
This is how I would do it and considering the toy data is not the same, probably some changes should be done, but all in all:
import seaborn as sns
df1 = df.copy()
df1['Yearsdiff'] = df1['Year']-df1['Year Onboarded']
df1['Revenue'] = df.groupby(['Year Onboarded'])['Revenue'].transform('mean')
#Find the average revenue per Year Onboarded
df1['Revenue'] = df1.groupby(['Yearsdiff'])['Revenue'].transform('cumsum')
#Calculate the cumulative sum of Revenue (Which is now the average per Year Onboarded) per Yearsdiff (because this will be our X-axis in the plot)
sns.lineplot(x=df1['Yearsdiff'],y=df1['Revenue'],hue=df1['Year'])
#Finally plot the data, using the column 'Year' as hue to account for the different years.
You can create rolling mean like this:
df['rolling_mean'] = df.groupby(['Year Onboarded'])['Revenue'].apply(lambda x: x.rolling(10, 1).mean())
df
# ClientId Year Onboarded Year Revenue rolling_mean
# 0 1 2018 2019 100 100.000000
# 1 2 2019 2019 50 50.000000
# 2 3 2020 2020 25 25.000000
# 3 1 2018 2019 30 65.000000
# 4 2 2019 2019 40 45.000000
# 5 3 2020 2020 50 37.500000
# 6 1 2018 2018 60 63.333333
# 7 2 2019 2020 100 63.333333
# 8 3 2020 2020 20 31.666667
# 9 1 2018 2020 40 57.500000
# 10 2 2019 2019 100 72.500000
# 11 3 2020 2020 20 28.750000
# 12 4 2016 2016 5 5.000000
# 13 4 2016 2017 5 5.000000
# 14 4 2016 2018 8 6.000000
# 15 4 2016 2019 4 5.500000
# 16 4 2016 2020 10 6.400000
# 17 4 2016 2017 20 8.666667
# 18 4 2016 2018 8 8.571429