How to narrow data in a list - python

I am having an issue with a python script I am running that is attempting to get one of the 22 top trending topics on the PyTrends (https://github.com/GeneralMills/pytrends/) from the output printed. I am trying to create a random number from 1 to 22, and then use that to choose one of the 22 results printed on lines 176-198 in the python shell.
import pytrends
import random
pytrend = TrendReq()
random = random.randint(1,22)
random = random + 99
itemList = list(pytrend.trending_searches())
Data = itemList.index(random) # This is one of the issue lines, as I cannot figure out how to index the output as needed.
Data = str(Data)
Data = Data[1:21] # An attempt at indexing output
print (Data)
This is my output on the Shell:
<bound method NDFrame.head of date exploreUrl \
0 20180504 /trends/explore?q=Free+Comic+Book+Day&date=now...
1 20180504 /trends/explore?q=Brad+Marchand&date=now+7-d&g...
2 20180504 /trends/explore?q=jrue+holiday&date=now+7-d&ge...
3 20180504 /trends/explore?q=Kentucky+Derby&date=now+7-d&...
4 20180504 /trends/explore?q=Cinco+de+Mayo&date=now+7-d&g...
5 20180504 /trends/explore?q=Warriors&date=now+7-d&geo=US
6 20180504 /trends/explore?q=Bruins&date=now+7-d&geo=US
7 20180504 /trends/explore?q=Rockets&date=now+7-d&geo=US
8 20180504 /trends/explore?q=Matt+Harvey&date=now+7-d&geo=US
9 20180504 /trends/explore?q=DJ+Khaled&date=now+7-d&geo=US
10 20180504 /trends/explore?q=Matthew+Lawrence&date=now+7-...
11 20180504 /trends/explore?q=junot+diaz&date=now+7-d&geo=US
12 20180504 /trends/explore?q=nashville+predators&date=now...
13 20180504 /trends/explore?q=albert+pujols&date=now+7-d&g...
14 20180504 /trends/explore?q=indians+vs+yankees&date=now+...
15 20180504 /trends/explore?q=zoe+saldana&date=now+7-d&geo=US
16 20180504 /trends/explore?q=Rihanna&date=now+7-d&geo=US
17 20180504 /trends/explore?q=Becky+Hammon&date=now+7-d&ge...
18 20180504 /trends/explore?q=dte+outage+map&date=now+7-d&...
19 20180504 /trends/explore?q=hawaii+news+now&date=now+7-d...
20 20180504 /trends/explore?q=Colton+Haynes&date=now+7-d&g...
21 20180504 /trends/explore?q=Audrey+Hepburn&date=now+7-d&...
22 20180504 /trends/explore?q=Carol+Burnett&date=now+7-d&g...
formattedTraffic hotnessColor hotnessLevel \
0 20,000+ #f0a049 2.0
1 20,000+ #f0a049 2.0
2 20,000+ #f0a049 2.0
3 2,000,000+ #d04108 5.0
4 1,000,000+ #db601e 4.0
5 500,000+ #db601e 4.0
6 200,000+ #e68033 3.0
7 200,000+ #e68033 3.0
8 200,000+ #e68033 3.0
9 200,000+ #e68033 3.0
10 100,000+ #e68033 3.0
11 100,000+ #e68033 3.0
12 100,000+ #e68033 3.0
13 100,000+ #e68033 3.0
14 100,000+ #e68033 3.0
15 50,000+ #f0a049 2.0
16 50,000+ #f0a049 2.0
17 50,000+ #f0a049 2.0
18 50,000+ #f0a049 2.0
19 50,000+ #f0a049 2.0
20 50,000+ #f0a049 2.0
21 50,000+ #f0a049 2.0
22 50,000+ #f0a049 2.0
imgLinkUrl imgSource \
0 https://wtop.com/entertainment/2018/05/grab-fr... WTOP
1 http://www.espn.com/nhl/story/_/id/23414142/nh... ESPN
2 https://www.slamonline.com/nba/jrue-holiday-an... SLAM Online
3 https://www.nbcnews.com/business/business-news... NBCNews.com
4 https://www.nytimes.com/2018/05/05/business/ci... New York Times
5 https://www.goldenstateofmind.com/2018/5/5/173... Golden State of Mind
6 https://www.bostonglobe.com/sports/bruins/2018... The Boston Globe
7 http://www.espn.com/nba/story/_/id/23409022/ho... ESPN
8 https://www.forbes.com/sites/tomvanriper/2018/... Forbes
9 http://people.com/music/dj-khaled-2015-video-w... PEOPLE.com
10 https://www.goodhousekeeping.com/life/a2015507... GoodHousekeeping.com
11 https://www.washingtonpost.com/news/arts-and-e... Washington Post
12 https://www.tennessean.com/story/sports/nhl/pr... The Tennessean
13 https://www.cbssports.com/mlb/news/leaderboard... CBSSports.com
14 https://www.mlb.com/news/miguel-andujar-yankee... MLB.com
15 http://people.com/movies/mila-kunis-gets-emoti... PEOPLE.com
16 http://www.bbc.com/news/newsbeat-44000486 BBC News
17 http://www.espn.com/nba/story/_/id/23407719/be... ESPN
18 https://www.lansingstatejournal.com/story/news... Lansing State Journal
19 http://www.hawaiinewsnow.com/story/38110613/li... Hawaii News Now
20 http://people.com/tv/colton-haynes-denies-rumo... PEOPLE.com
21 http://people.com/movies/see-audrey-hepburn-in... PEOPLE.com
22 https://www.vanityfair.com/hollywood/2018/05/c... Vanity Fair
imgUrl \
0 //t1.gstatic.com/images?q=tbn:ANd9GcRgX9VkY3X0...
1 //t2.gstatic.com/images?q=tbn:ANd9GcQNtvvQkzuu...
2 //t1.gstatic.com/images?q=tbn:ANd9GcSWuoUKvQM1...
3 //t0.gstatic.com/images?q=tbn:ANd9GcSvx53B96Jy...
4 //t3.gstatic.com/images?q=tbn:ANd9GcS7m8935VXh...
5 //t1.gstatic.com/images?q=tbn:ANd9GcQw4FYzAfaN...
6 //t1.gstatic.com/images?q=tbn:ANd9GcQKEOxhee7r...
7 //t1.gstatic.com/images?q=tbn:ANd9GcTMGOQfUc7u...
8 //t3.gstatic.com/images?q=tbn:ANd9GcQrbRgWqQM-...
9 //t3.gstatic.com/images?q=tbn:ANd9GcTH2gEcxXtQ...
10 //t2.gstatic.com/images?q=tbn:ANd9GcQuOq7biu30...
11 //t1.gstatic.com/images?q=tbn:ANd9GcQroHePQnEr...
12 //t0.gstatic.com/images?q=tbn:ANd9GcSgdsziSLo-...
13 //t1.gstatic.com/images?q=tbn:ANd9GcT8Z0CYLzOL...
14 //t0.gstatic.com/images?q=tbn:ANd9GcQJUrmvZbvz...
15 //t3.gstatic.com/images?q=tbn:ANd9GcSBQuX6A0c3...
16 //t0.gstatic.com/images?q=tbn:ANd9GcQU6AztveLs...
17 //t2.gstatic.com/images?q=tbn:ANd9GcQX6uw7bDSG...
18 //t2.gstatic.com/images?q=tbn:ANd9GcTKzcn18NOd...
19 //t1.gstatic.com/images?q=tbn:ANd9GcRSizKTqReb...
20 //t1.gstatic.com/images?q=tbn:ANd9GcTjJAoEQ0A2...
21 //t0.gstatic.com/images?q=tbn:ANd9GcRWzAeeA3c3...
22 //t0.gstatic.com/images?q=tbn:ANd9GcTCjUox_o9U...
newsArticlesList \
0 [{'title': 'Grab a freebie on <b>Free Comic Bo...
1 [{'title': 'NHL to give <b>Brad Marchand</b> e...
2 [{'title': 'Pelicans' <b>Jrue Holiday</b>:...
3 [{'title': '<b>Kentucky Derby</b> Field: No. 5...
4 [{'title': 'What Is <b>Cinco de Mayo</b>?', 'l...
5 [{'title': '<b>Warriors</b> deservedly get the...
6 [{'title': 'Dan Girardi lifts Lightning over <...
7 [{'title': '<b>Rockets</b> take 2-1 lead by bl...
8 [{'title': '<b>Matt Harvey</b> And Mets Just C...
9 [{'title': '<b>DJ Khaled</b> Faces Critics Aft...
10 [{'title': '<b>Matthew Lawrence</b> Proposed t...
11 [{'title': 'Pulitzer Prize-winning author <b>J...
12 [{'title': '<b>Predators</b> coach Peter Lavio...
13 [{'title': 'Leaderboarding: The astounding Hal...
14 [{'title': 'Andujar walks off Yanks to 13th wi...
15 [{'title': 'Mila Kunis Gets Emotional at BFF <...
16 [{'title': '<b>Rihanna</b> opens up about her ...
17 [{'title': 'Sources: Spurs assistant <b>Becky ...
18 [{'title': 'Crews restoring power quickly afte...
19 [{'title': 'LIST: Lava threat forces evacuatio...
20 [{'title': '<b>Colton Haynes</b> Shuts Down Ru...
21 [{'title': 'See <b>Audrey Hepburn</b> in Gorge...
22 [{'title': '<b>Carol Burnett</b> Wants to Be L...
relatedSearchesList safe \
0 [] 1.0
1 [] 1.0
2 [] 1.0
3 [{'query': 'Kentucky Derby 2018 horses', 'safe... 1.0
4 [{'query': 'Cinco De Mayo 2018 Events', 'safe'... 1.0
5 [{'query': 'Warriors Vs Pelicans', 'safe': Tru... 1.0
6 [{'query': 'Boston Bruins', 'safe': True}, {'q... 1.0
7 [{'query': 'Rockets Vs Jazz', 'safe': True}] 1.0
8 [] 1.0
9 [{'query': 'Dj Khaled Wife', 'safe': True}] 1.0
10 [{'query': 'Cheryl Burke', 'safe': True}] 1.0
11 [] 1.0
12 [{'query': 'Predators', 'safe': True}] 1.0
13 [] 1.0
14 [] 1.0
15 [] 1.0
16 [] 1.0
17 [] 1.0
18 [] 1.0
19 [] 1.0
20 [] 1.0
21 [] 1.0
22 [] 1.0
shareUrl startTime \
0 https://www.google.com/trends/hottrends?stt=Fr... 1.525540e+09
1 https://www.google.com/trends/hottrends?stt=Br... 1.525543e+09
2 https://www.google.com/trends/hottrends?stt=Jr... 1.525532e+09
3 https://www.google.com/trends/hottrends?stt=Ke... 1.525460e+09
4 https://www.google.com/trends/hottrends?stt=Ci... 1.525453e+09
5 https://www.google.com/trends/hottrends?stt=Wa... 1.525482e+09
6 https://www.google.com/trends/hottrends?stt=Br... 1.525486e+09
7 https://www.google.com/trends/hottrends?stt=Ro... 1.525493e+09
8 https://www.google.com/trends/hottrends?stt=Ma... 1.525468e+09
9 https://www.google.com/trends/hottrends?stt=DJ... 1.525475e+09
10 https://www.google.com/trends/hottrends?stt=Ma... 1.525439e+09
11 https://www.google.com/trends/hottrends?stt=Ju... 1.525457e+09
12 https://www.google.com/trends/hottrends?stt=Na... 1.525435e+09
13 https://www.google.com/trends/hottrends?stt=Al... 1.525439e+09
14 https://www.google.com/trends/hottrends?stt=In... 1.525486e+09
15 https://www.google.com/trends/hottrends?stt=Zo... 1.525446e+09
16 https://www.google.com/trends/hottrends?stt=Ri... 1.525432e+09
17 https://www.google.com/trends/hottrends?stt=Be... 1.525493e+09
18 https://www.google.com/trends/hottrends?stt=Dt... 1.525468e+09
19 https://www.google.com/trends/hottrends?stt=Ha... 1.525453e+09
20 https://www.google.com/trends/hottrends?stt=Co... 1.525489e+09
21 https://www.google.com/trends/hottrends?stt=Au... 1.525475e+09
22 https://www.google.com/trends/hottrends?stt=Ca... 1.525478e+09
title titleLinkUrl \
0 Free Comic Book Day //www.google.com/search?q=Free+Comic+Book+Day
1 Brad Marchand //www.google.com/search?q=Brad+Marchand
2 Jrue Holiday //www.google.com/search?q=Jrue+Holiday
3 Kentucky Derby //www.google.com/search?q=Kentucky+Derby
4 Cinco de Mayo //www.google.com/search?q=Cinco+de+Mayo
5 Warriors //www.google.com/search?q=Warriors
6 Bruins //www.google.com/search?q=Bruins
7 Rockets //www.google.com/search?q=Rockets
8 Matt Harvey //www.google.com/search?q=Matt+Harvey
9 DJ Khaled //www.google.com/search?q=DJ+Khaled
10 Matthew Lawrence //www.google.com/search?q=Matthew+Lawrence
11 Junot Diaz //www.google.com/search?q=Junot+Diaz
12 Nashville Predators //www.google.com/search?q=Nashville+Predators
13 Albert Pujols //www.google.com/search?q=Albert+Pujols
14 Indians Vs Yankees //www.google.com/search?q=Indians+Vs+Yankees
15 Zoe Saldana //www.google.com/search?q=Zoe+Saldana
16 Rihanna //www.google.com/search?q=Rihanna
17 Becky Hammon //www.google.com/search?q=Becky+Hammon
18 Dte Outage Map //www.google.com/search?q=Dte+Outage+Map
19 Hawaii News Now //www.google.com/search?q=Hawaii+News+Now
20 Colton Haynes //www.google.com/search?q=Colton+Haynes
21 Audrey Hepburn //www.google.com/search?q=Audrey+Hepburn
22 Carol Burnett //www.google.com/search?q=Carol+Burnett
trafficBucketLowerBound
0 20000.0
1 20000.0
2 20000.0
3 2000000.0
4 1000000.0
5 500000.0
6 200000.0
7 200000.0
8 200000.0
9 200000.0
10 100000.0
11 100000.0
12 100000.0
13 100000.0
14 100000.0
15 50000.0
16 50000.0
17 50000.0
18 50000.0
19 50000.0
20 50000.0
21 50000.0
22 50000.0 >

pytrends returns a pandas dataframe as an output. Pandas dataframes have all sorts of useful methods for subsetting and indexing, so when you call list and str on it rather than the native methods you are getting some weird results.
To take a random sample of a dataframe, you can use the sample method:
data.sample(n)
So, to get you a randomly chosen row from the request:
from pytrends.request import TrendReq
pytrend = TrendReq()
mydata = pytrend.trending_searches()
print(mydata.sample(1)) #or assign it, or get the required rows etc

Related

Pull specific values from one dataframe based on values in another

I have two dataframes
df1:
Country
value
Average
Week Rank
UK
42
42
1
US
9
9.5
2
DE
10
9.5
3
NL
15
15.5
4
ESP
16
15.5
5
POL
17
18
6
CY
18
18
7
IT
20
18
8
AU
17
18
9
FI
18
18
10
SW
20
18
11
df2:
Country
value
Average
Year Rank
US
42
42
1
UK
9
9.5
2
ESP
10
9.5
3
SW
15
15.5
4
IT
16
15.5
5
POL
17
18
6
NO
18
18
7
SL
20
18
8
PO
17
18
9
FI
18
18
10
NL
20
18
11
DE
17
18
12
AU
18
18
13
CY
20
18
14
Im looking to create a column in df1 that shows the 'Year Rank' of the countries in df1 so that I have the following:
Country
value
Average
Week Rank
Year Rank
UK
42
42
1
2
US
9
9.5
2
1
DE
10
9.5
3
9
NL
15
15.5
4
8
ESP
16
15.5
5
3
POL
17
18
6
6
CY
18
18
7
7
IT
20
18
8
5
AU
17
18
9
13
FI
18
18
10
10
SW
20
18
11
4
How would i loop through the countries in df1 and find the corresponding rank in df2?
Edit: I am only looking for the yearly rank of the countries in df1
Thanks!
Use:
df1['Year Rank'] = df1.merge(df2, on='Country')['Year Rank']

Sum values of a column for each value based on another column and divide it by total

Today I'm struggling once again with Python and data-analytics.
I got a dataframe which looks like this:
name totdmgdealt
0 Warwick 96980.0
1 Nami 25995.0
2 Draven 171568.0
3 Fiora 113721.0
4 Viktor 185302.0
5 Skarner 148791.0
6 Galio 130692.0
7 Ahri 145731.0
8 Jinx 182680.0
9 VelKoz 85785.0
10 Ziggs 46790.0
11 Cassiopeia 62444.0
12 Yasuo 117896.0
13 Warwick 129156.0
14 Evelynn 179252.0
15 Caitlyn 163342.0
16 Wukong 122919.0
17 Syndra 146754.0
18 Karma 35766.0
19 Warwick 117790.0
20 Draven 74879.0
21 Janna 11242.0
22 Lux 66424.0
23 Amumu 87826.0
24 Vayne 76085.0
25 Ahri 93334.0
..
..
..
this is a dataframe, which includes the total damage of a champion for one game.
Now I want to group these information, so I can see which champion overall has the most damage dealt.
I tried groupby('name') but it didn't work at all.
I already went through some threads about groupby and summing values, but I didn't solve my specific problem.
The dealt damage of each champion should also be shown as percentage of the total.
I'm looking for something like this as an output:
name totdmgdealt percentage
0 Warwick 2378798098 2.1 %
1 Nami 2837491074 2.3 %
2 Draven 1231451224 ..
3 Fiora 1287301724 ..
4 Viktor 1239808504 ..
5 Skarner 1487911234 ..
6 Galio 1306921234 ..
We can groupby on name and get the sum then we divide each value by the total with .div and multiply it by 100 with .mul and finally round it to one decimal with .round:
total = df['totdmgdealt'].sum()
summed = df.groupby('name', sort=False)['totdmgdealt'].sum().reset_index()
summed['percentage'] = summed.groupby('name', sort=False)['totdmgdealt']\
.sum()\
.div(total)\
.mul(100)\
.round(1).values
name totdmgdealt percentage
0 Warwick 343926.0 12.2
1 Nami 25995.0 0.9
2 Draven 246447.0 8.7
3 Fiora 113721.0 4.0
4 Viktor 185302.0 6.6
5 Skarner 148791.0 5.3
6 Galio 130692.0 4.6
7 Ahri 239065.0 8.5
8 Jinx 182680.0 6.5
9 VelKoz 85785.0 3.0
10 Ziggs 46790.0 1.7
11 Cassiopeia 62444.0 2.2
12 Yasuo 117896.0 4.2
13 Evelynn 179252.0 6.4
14 Caitlyn 163342.0 5.8
15 Wukong 122919.0 4.4
16 Syndra 146754.0 5.2
17 Karma 35766.0 1.3
18 Janna 11242.0 0.4
19 Lux 66424.0 2.4
20 Amumu 87826.0 3.1
21 Vayne 76085.0 2.7
you can use sum() to get the total dmg, and apply to calculate the precent relevant for each row, like this:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
name totdmgdealt
0 Warwick 96980.0
1 Nami 25995.0
2 Draven 171568.0
3 Fiora 113721.0
4 Viktor 185302.0
5 Skarner 148791.0
6 Galio 130692.0
7 Ahri 145731.0
8 Jinx 182680.0
9 VelKoz 85785.0
10 Ziggs 46790.0
11 Cassiopeia 62444.0
12 Yasuo 117896.0
13 Warwick 129156.0
14 Evelynn 179252.0
15 Caitlyn 163342.0
16 Wukong 122919.0
17 Syndra 146754.0
18 Karma 35766.0
19 Warwick 117790.0
20 Draven 74879.0
21 Janna 11242.0
22 Lux 66424.0
23 Amumu 87826.0
24 Vayne 76085.0
25 Ahri 93334.0"""), sep=r"\s+")
summed_df = df.groupby('name')['totdmgdealt'].agg(['sum']).rename(columns={"sum": "totdmgdealt"}).reset_index()
summed_df['percentage'] = summed_df.apply(
lambda x: "{:.2f}%".format(x['totdmgdealt'] / summed_df['totdmgdealt'].sum() * 100), axis=1)
print(summed_df)
Output:
name totdmgdealt percentage
0 Ahri 239065.0 8.48%
1 Amumu 87826.0 3.12%
2 Caitlyn 163342.0 5.79%
3 Cassiopeia 62444.0 2.21%
4 Draven 246447.0 8.74%
5 Evelynn 179252.0 6.36%
6 Fiora 113721.0 4.03%
7 Galio 130692.0 4.64%
8 Janna 11242.0 0.40%
9 Jinx 182680.0 6.48%
10 Karma 35766.0 1.27%
11 Lux 66424.0 2.36%
12 Nami 25995.0 0.92%
13 Skarner 148791.0 5.28%
14 Syndra 146754.0 5.21%
15 Vayne 76085.0 2.70%
16 VelKoz 85785.0 3.04%
17 Viktor 185302.0 6.57%
18 Warwick 343926.0 12.20%
19 Wukong 122919.0 4.36%
20 Yasuo 117896.0 4.18%
21 Ziggs 46790.0 1.66%
Maybe You can Try this:
I tried to achieve the same using my sample data and try to run the below code into your Jupyter Notebook:
import pandas as pd
name=['abhit','mawa','vaibhav','dharam','sid','abhit','vaibhav','sid','mawa','lakshya']
totdmgdealt=[24,45,80,22,89,55,89,51,93,85]
name=pd.Series(name,name='name') #converting into series
totdmgdealt=pd.Series(totdmgdealt,name='totdmgdealt') #converting into series
data=pd.concat([name,totdmgdealt],axis=1)
data=pd.DataFrame(data) #converting into Dataframe
final=data.pivot_table(values="totdmgdealt",columns="name",aggfunc="sum").transpose() #actual aggregating method
total=data['totdmgdealt'].sum() #calculating total for calculating percentage
def calPer(row,total): #actual Function for Percentage
return ((row/total)*100).round(2)
total=final['totdmgdealt'].sum()
final['Percentage']=calPer(final['totdmgdealt'],total) #assigning the function to the column
final
Sample Data :
name totdmgdealt
0 abhit 24
1 mawa 45
2 vaibhav 80
3 dharam 22
4 sid 89
5 abhit 55
6 vaibhav 89
7 sid 51
8 mawa 93
9 lakshya 85
Output:
totdmgdealt Percentage
name
abhit 79 12.48
dharam 22 3.48
lakshya 85 13.43
mawa 138 21.80
sid 140 22.12
vaibhav 169 26.70
Understand and run the code and just replace the dataset with Yours. Maybe This Helps.

Multi column filtering in python data frame

I have created a pandas dataframe. I want to filter all with the values 9, 12, 24, 18.
df:
index no1 no2 no3 no4 no5 no6 no7
1 9 11 12 14 18 24 30
2 9 12 13 18 19 24 31
3 9 12 13 42 20 19 24
4 10 9 13 42 18 24 12
5 13 12 13 44 18 24 30
6 2 9 12 18 24 31 44
7 10 12 14 42 18 24 30
8 10 12 14 42 18 24 31
Code:
a = df['no1'].isin([9,12,18 ,24])
b = df['no2'].isin([9,12,18,24])
c = df['no3'].isin([9,12 , 18, 24])
d = df['no4'].isin([9,12 , 18, 24])
e = df['no5'].isin([9,12,18,24])
f = df['no6'].isin([9,12 , 18, 24])
g = df['no7'].isin([9,12 , 18, 24])
df [a & b & c & d & e & f & g]
Desired output:
index no1 no2 no3 no4 no5 no6 no7
1 9 11 12 14 18 24 30
2 9 12 13 18 19 24 31
4 10 9 13 42 18 24 12
6 2 9 12 18 24 31 44
original data frame and expected output
Try:
df[df.isin([9,12,18,24])]
This should give you the exact answer
df=pd.DataFrame({'no1':[9,9,9,10,13,2,10,10],
'no2':[11,12,12,9,12,9,12,12],
'no3':[12,13,13,13,13,12,14,14],
'no4':[14,18,42,42,44,18,42,42],
'no5':[18,19,20,18,18,24,18,18],
'no6':[24,24,19,24,24,31,24,24],
'no7':[30,31,24,12,30,44,30,31]}) # Creating the data frame
df_new=df[df.isin([9,12,18,24])]
df_new=df_new.dropna(thresh=4)
df_new=df_new.fillna(df)
The result would be:
no1 no2 no3 no4 no5 no6 no7
0 9.0 11.0 12.0 14.0 18.0 24.0 30.0
1 9.0 12.0 13.0 18.0 19.0 24.0 31.0
3 10.0 9.0 13.0 42.0 18.0 24.0 12.0
5 2.0 9.0 12.0 18.0 24.0 31.0 44.0

Average of last 13 months for each record in pandas

I am trying to calculate the average of the last 13 months for each month for P1 and P2. Here is a sample of the data:
P1 P2
Month
May-16 4 24
Jun-16 2 9
Jul-16 4 20
Aug-16 2 12
Sep-16 7 8
Oct-16 7 11
Nov-16 0 4
Dec-16 3 18
Jan-17 4 9
Feb-17 9 16
Mar-17 2 13
Apr-17 9 9
May-17 5 13
Jun-17 9 16
Jul-17 5 11
Aug-17 6 11
Sep-17 8 13
Oct-17 6 12
Nov-17 9 21
Dec-17 4 12
Jan-18 2 12
Feb-18 7 17
Mar-18 5 15
Apr-18 3 13
May-18 7 25
Jun-18 5 23
I am trying to create this table:
P1 P2 AVGP1 AVGP2
Month
Jun-17 9 16 4.85 11.23
Jul-17 5 11 5.08 11.38
Aug-17 6 11 5.23 11.54
Sep-17 8 13 5.69 11.54
Oct-17 6 12 5.62 11.85
Nov-17 9 21 5.77 12.46
Dec-17 4 12 6.08 13.08
Jan-18 2 12 6.00 12.62
Feb-18 7 17 6.23 13.23
Mar-18 5 15 5.92 13.23
Apr-18 3 13 6.00 13.23
May-18 7 25 5.85 14.46
Jun-18 5 23 5.85 15.23
The goal is to create a dataframe with the above table. I can't figure out how to make a function that will calculate only the last 13 months of data. Any help would be great!
You can use pd.DataFrame.rolling followed by dropna:
res = df.join(df.rolling(13).mean().add_prefix('AVG')).dropna(how='any')
print(res)
P1 P2 AVGP1 AVGP2
Month
May-17 5 13 4.461538 12.769231
Jun-17 9 16 4.846154 12.153846
Jul-17 5 11 5.076923 12.307692
Aug-17 6 11 5.230769 11.615385
Sep-17 8 13 5.692308 11.692308
Oct-17 6 12 5.615385 12.000000
Nov-17 9 21 5.769231 12.769231
Dec-17 4 12 6.076923 13.384615
Jan-18 2 12 6.000000 12.923077
Feb-18 7 17 6.230769 13.538462
Mar-18 5 15 5.923077 13.461538
Apr-18 3 13 6.000000 13.461538
May-18 7 25 5.846154 14.692308
Jun-18 5 23 5.846154 15.461538

Counting repeated blocks in pandas

I have the following dataframe and I am trying to label an entire block with a number which is based on how many similar blocks has been seen upto now based on class column. Consecutive class value is given the same number. If the same class block comes later, the number will be incremented. If some new class block comes, then it is initialized to 1.
df = DataFrame(zip(range(10,30), range(20)), columns = ['a','b'])
df['Class'] = [np.nan, np.nan, np.nan, np.nan, 'a', 'a', 'a', 'a', np.nan, np.nan,'a', 'a', 'a', 'a', 'a', np.nan, np.nan, 'b', 'b','b']
a b Class
0 10 0 NaN
1 11 1 NaN
2 12 2 NaN
3 13 3 NaN
4 14 4 a
5 15 5 a
6 16 6 a
7 17 7 a
8 18 8 NaN
9 19 9 NaN
10 20 10 a
11 21 11 a
12 22 12 a
13 23 13 a
14 24 14 a
15 25 15 NaN
16 26 16 NaN
17 27 17 b
18 28 18 b
19 29 19 b
Sample output looks like this:
a b Class block_encounter_no
0 10 0 NaN NaN
1 11 1 NaN NaN
2 12 2 NaN NaN
3 13 3 NaN NaN
4 14 4 a 1
5 15 5 a 1
6 16 6 a 1
7 17 7 a 1
8 18 8 NaN NaN
9 19 9 NaN NaN
10 20 10 a 2
11 21 11 a 2
12 22 12 a 2
13 23 13 a 2
14 24 14 a 2
15 25 15 NaN NaN
16 26 16 NaN NaN
17 27 17 b 1
18 28 18 b 1
19 29 19 b 1
Solution with mask:
df['block_encounter_no'] = (df.Class != df.Class.shift()).mask(df.Class.isnull())
.groupby(df.Class).cumsum()
print (df)
a b Class block_encounter_no
0 10 0 NaN NaN
1 11 1 NaN NaN
2 12 2 NaN NaN
3 13 3 NaN NaN
4 14 4 a 1.0
5 15 5 a 1.0
6 16 6 a 1.0
7 17 7 a 1.0
8 18 8 NaN NaN
9 19 9 NaN NaN
10 20 10 a 2.0
11 21 11 a 2.0
12 22 12 a 2.0
13 23 13 a 2.0
14 24 14 a 2.0
15 25 15 NaN NaN
16 26 16 NaN NaN
17 27 17 b 1.0
18 28 18 b 1.0
19 29 19 b 1.0
Do this:
df['block_encounter_no'] = \
np.where(df.Class.notnull(),
(df.Class.notnull() & (df.Class != df.Class.shift())).cumsum(),
np.nan)

Categories