How to draw plots on Specific pandas columns - python

So I have the df.head() being displayed below.I wanted to display the progression of salaries across time spans.As you can see the teams will get repeated across the years and the idea is to
display how their salaries changed over time.So for teamID='ATL' I will have a graph that starts by 1985 and goes all the way to the present time.
I think I will need to select teams by their team ID and have the x axis display time (year) and Y axis display year. I don't know how to do that on Pandas and for each team in my data frame.
teamID yearID lgID payroll_total franchID Rank W G win_percentage
0 ATL 1985 NL 14807000.0 ATL 5 66 162 40.740741
1 BAL 1985 AL 11560712.0 BAL 4 83 161 51.552795
2 BOS 1985 AL 10897560.0 BOS 5 81 163 49.693252
3 CAL 1985 AL 14427894.0 ANA 2 90 162 55.555556
4 CHA 1985 AL 9846178.0 CHW 3 85 163 52.147239
5 ATL 1986 NL 17800000.0 ATL 4 55 181 41.000000

You can use seaborn for this:
import seaborn as sns
sns.lineplot(data=df, x='yearID', y='payroll_total', hue='teamID')
To get different plot for each team:
for team, d in df.groupby('teamID'):
d.plot(x='yearID', y='payroll_total', label='team')

import pandas as pd
import matplotlib.pyplot as plt
# Display the box plots on 3 separate rows and 1 column
fig, axes = plt.subplots(nrows=3, ncols=1)
# Generate a plot for each team
df[df['teamID'] == 'ATL'].plot(ax=axes[0], x='yearID', y='payroll_total')
df[df['teamID'] == 'BAL'].plot(ax=axes[1], x='yearID', y='payroll_total')
df[df['teamID'] == 'BOS'].plot(ax=axes[2], x='yearID', y='payroll_total')
# Display the plot
plt.show()
depending on how many teams you want to show you should adjust the
fig, axes = plt.subplots(nrows=3, ncols=1)
Finally, you could create a loop and create the visualization for every team

Related

Display all values on a maplotlib barplot

I have a data frame with 20 values, and I am trying to bar.plot it using matplotlib. when I do it, I am not seeing the 20 bars but 10. I have 5 nana values in it and 4 of them.
Here is a sample of dataframe:
Name Bonus
Jack Carpenter 890
John Clegg 653
Mike Holiday 367
Rene Moukad 900
........... ...
my code is standard:
fig,ax = plt.subplots(figsize=(16,6))
plt.bar(df.Name, df.Bonus)
fig.autofmt_xdate(rotation=45)

Is it possible to plot a barchart with upper and lower limits of the bins with Pandas,seaborn or Matplotlib

I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()

Plot number of observations for categorical groups

I have a data frame that looks like -
id age_bucket state gender duration category1 is_active
1 (40, 70] Jammu and Kashmir m 123 ABB 1
2 (17, 24] West Bengal m 72 ABB 0
3 (40, 70] Bihar f 109 CA 0
4 (17, 24] Bihar f 52 CA 1
5 (24, 30] MP m 23 ACC 1
6 (24, 30] AP m 103 ACC 1
7 (30, 40] West Bengal f 182 GF 0
I want to create a bar plot with how many people are active for each age_bucket and state (top 10). For for gender and category1 I want to create a pie chart with the proportion of active people. The top of the bar should display the total count for active and inactive members and similarly % should be display on pie chart based on is_active.
How to do it in python using seaborn or matplotlib?
I have done so far -
import seaborn as sns
%matplotlib inline
sns.barplot(x='age_bucket',y='is_active',data=df)
sns.barplot(x='category1',y='is_active',data=df)
It sounds like you want to count the observations rather than plotting a value from a column along the yaxis. In seaborn, the function for this is countplot():
sns.countplot('age_bucket', hue='is_active', data=df)
Since the returned object is a matplotlib axis, you could assign it to a variable (e.g. ax) and then use ax.annotate to place text in the the figure manually:
ax = sns.countplot('age_bucket', hue='is_active', data=df)
ax.annotate('1 1', (0, 1), ha='center', va='bottom', fontsize=12)
Seaborn has no way of creating pie charts, so you would need to use matplotlib directly. However, it is often easier to tell counts and proportions from bar charts so I would generally recommend that you stick to those unless you have a specific constraint that forces you to use a pie chart.

subplot by group in python pandas

I wanna make subplots for the following data. I averaged and grouped together.
I wanna make subpolts by country for x-axis resource and y-axis average.
country resource average
india water 76
india soil 45
india tree 60
US water 45
US soil 70
US tree 85
Germany water 76
Germany soil 65
Germany water 56
Grouped = df.groupby(['country','resource'])['TTR in minutes'].agg({'average': 'mean'}).reset_index()
I tried but couldn't plot in subplots
g = df.groupby('country')
fig, axes = plt.subplots(g.ngroups, sharex=True, figsize=(8, 6))
for i, (country, d) in enumerate(g):
ax = d.plot.bar(x='resource', y='average', ax=axes[i], title=country)
ax.legend().remove()
fig.tight_layout()

can not remove a trend components and a seasonal components

I am trying to make a model for predicting energy production, by using ARMA model.
 
The data I can use for training is as following;
(https://github.com/soma11soma11/EnergyDataSimulationChallenge/blob/master/challenge1/data/training_dataset_500.csv)
ID Label House Year Month Temperature Daylight EnergyProduction
0 0 1 2011 7 26.2 178.9 740
1 1 1 2011 8 25.8 169.7 731
2 2 1 2011 9 22.8 170.2 694
3 3 1 2011 10 16.4 169.1 688
4 4 1 2011 11 11.4 169.1 650
5 5 1 2011 12 4.2 199.5 763
...............
11995 19 500 2013 2 4.2 201.8 638
11996 20 500 2013 3 11.2 234 778
11997 21 500 2013 4 13.6 237.1 758
11998 22 500 2013 5 19.2 258.4 838
11999 23 500 2013 6 22.7 122.9 586
As shown above, I can use data from July 2011 to May 2013 for training.
Using the training, I want to predict energy production on June 2013 for each 500 house.
The problem is that the time series data is not stationary and has trend components and seasonal components (I checked it as following.).
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_train = pd.read_csv('../../data/training_dataset_500.csv')
rng=pd.date_range('7/1/2011', '6/1/2013', freq='M')
house1 = data_train[data_train.House==1][['EnergyProduction','Daylight','Temperature']].set_index(rng)
fig, axes = plt.subplots(nrows=1, ncols=3)
for i, column in enumerate(house1.columns):
house1[column].plot(ax=axes[i], figsize=(14,3), title=column)
plt.show()
With this data, I cannot implement ARMA model to get good prediction. So I want to get rid of the trend components and a seasonal components and make the time series data stationary. I tried this problem, but I could not remove these components and make it stationary..
I would recommend the Hodrick-Prescott (HP) filter, which is widely used in macroeconometrics to separate long-term trending component from short-term fluctuations. It is implemented statsmodels.api.tsa.filters.hpfilter.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df = pd.read_csv('/home/Jian/Downloads/data.csv', index_col=[0])
# get part of the data
x = df.loc[df.House==1, 'Daylight']
# hp-filter, set parameter lamb=129600 following the suggestions for monthly data
x_smoothed, x_trend = sm.tsa.filters.hpfilter(x, lamb=129600)
fig, axes = plt.subplots(figsize=(12,4), ncols=3)
axes[0].plot(x)
axes[0].set_title('raw x')
axes[1].plot(x_trend)
axes[1].set_title('trend')
axes[2].plot(x_smoothed)
axes[2].set_title('smoothed x')

Categories