Set Xticks frequency to dataframe index - python

I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis).
To create the dataframe:
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot = df_.fillna(0)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
Total Population Urban Population
1990 150 50
1991 151 53
1992 152 56
1993 153 59
1994 154 62
1995 155 65
1996 156 68
1997 157 71
1998 158 74
1999 159 77
2000 160 80
2001 161 83
2002 162 86
2003 163 89
2004 164 92
2005 165 95
2006 166 98
2007 167 101
2008 168 104
2009 169 107
2010 170 110
2011 171 113
2012 172 116
2013 173 119
2014 174 122
The code that I currently have:
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1, xticklabels=pop_plot.index)
plt.subplot(2, 2, 1)
plt.plot(pop_plot)
legend = plt.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(range(len(pop_plot.index)))
This is the plot that I get:
When I comment the set_xticks I get the following plot:
#ax1.set_xticks(range(len(pop_plot.index)))
I've tried a couple of answers that I found here, but I didn't have much success.

It's not clear what ax1.set_xticks(range(len(pop_plot.index))) should be used for. It will set the ticks to the numbers 0,1,2,3 etc. while your plot should range from 1990 to 2014.
Instead, you want to set the ticks to the numbers of your data:
ax1.set_xticks(pop_plot.index)
Complete corrected example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1)
ax1.plot(pop_plot)
legend = ax1.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(pop_plot.index)
plt.show()

The easiest option is to use the xticks parameter for pandas.DataFrame.plot
Pass the dataframe index to xticks: xticks=pop_plot.index
# given the dataframe in the OP
ax = pop_plot.plot(xticks=pop_plot.index, figsize=(15, 5))
# move the legend
ax.legend(bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand', frameon=False)

Related

Barchart using pandas Dataframe in plotly

I am plotting bar chart using below code as an example:
fig = make_subplots(rows=1, cols=2, shared_yaxes=True, horizontal_spacing= 0)
y = ['10', '20', '30', '40', '50','60']
width=2.9
fig.add_trace(go.Bar(x=[34, 64, 20,24,12,89], y=y,orientation='h',name = '1',marker_color='gold', width=width),row=1, col=1)
fig.add_trace(go.Bar(x=[14, 24, 50,34,9,104], y=y,orientation='h',name = '2',marker_color='darkorange',width=width),row=1, col=1)
fig['layout']['xaxis']['autorange'] = "reversed"
fig.add_trace(go.Bar(x=[17,46,68,22,12,93], y=y,orientation='h',name = '3',marker_color='deepskyblue',width=width),row=1, col=2)
fig.add_trace(go.Bar(x=[57,45,14,44,8,100], y=y,orientation='h',name = '4',marker_color='royalblue',width=width),row=1, col=2)
fig.update_layout(title_text="Data Chart",title_x=0.45, bargap=0.4)
fig.show()
but when i am trying plot the same chart using pandas dataframe ,I have with me, I am getting error and chart is not same as i have like above.
Datagrame I have is like
1 2 3 4
100 23 34 56 67
110 46 78 94 56
120 71 88 17 85
130 92 99 72 35
140 39 35 64 72
150 81 50 120 12
Is there a easy fix so that I can achieve exact bar chart as I have in image above using Pandas Dataframe.
Identical to the sample script by specifying a column of data frames, respectively
import pandas as pd
import numpy as np
import io
data = '''
1 2 3 4
100 23 34 56 67
110 46 78 94 56
120 71 88 17 85
130 92 99 72 35
140 39 35 64 72
150 81 50 120 12
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig = make_subplots(rows=1, cols=2, shared_yaxes=True, horizontal_spacing= 0)
Y = df.index
width=2.9
fig.add_trace(go.Bar(x=df['1'], y=y,orientation='h',name='1',marker_color='gold', width=width),row=1, col=1)
fig.add_trace(go.Bar(x=df['2'], y=y,orientation='h',name='2',marker_color='darkorange',width=width),row=1, col=1)
fig['layout']['xaxis']['autorange'] = "reversed"
fig.add_trace(go.Bar(x=df['3'], y=y,orientation='h',name='3',marker_color='deepskyblue',width=width),row=1, col=2)
fig.add_trace(go.Bar(x=df['4'], y=y,orientation='h',name='4',marker_color='royalblue',width=width),row=1, col=2)
fig.update_layout(title_text="Data Chart",title_x=0.45, bargap=0.4)
fig.show()

Matplotlib: scatter.legend_elements() not as 1,2,3

This is DataFrame:
age weight score height name
0 12 100 12 501 aa
1 23 120 12 502 bb
2 34 121 13 499 bb
3 32 134 10 499 cc
4 23 133 11 498 cc
5 12 112 19 503 aa
I need to do a four scatter graphs for columns: 'age', 'weight','score','height' , so my code:
fig,axes = plt.subplots(2,2,figsize=(12,8))
property = ['age','weight','score','height']
indexes = df.index.tolist()
for counter in range(0,4):
i = counter % 2
j = math.floor(counter / 2)
scatter = axes[i,j].scatter(indexes,df[property[counter]],c=y)
axes[i,j].set_title(property[counter])
legend = axes[i,j].legend(*scatter.legend_elements())
axes[i,j].add_artist(legend)
As result i got labels as '1','2','3'
How to get labels as 'aa','bb','cc' and with different colors?
Seaborn could create the legends automatically:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
from io import StringIO
data_str = ''' age weight score height name
0 12 100 12 501 aa
1 23 120 12 502 bb
2 34 121 13 499 bb
3 32 134 10 499 cc
4 23 133 11 498 cc
5 12 112 19 503 aa'''
df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
property = ['age', 'weight', 'score', 'height']
indexes = df.index.tolist()
for ax, prop in zip(axes.ravel(), property):
scatter = sns.scatterplot(x=indexes, y=prop, hue='name', data=df, ax=ax)
ax.set_title(prop)
ax.set_ylabel('') # remove default y label
plt.tight_layout()
plt.show()

Is it possible to plot a barchart with upper and lower limits of the bins with Pandas,seaborn or Matplotlib

I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()

How to plot Numerical Values in matplotlib

So I have this kind of database:
Time Type Profit
2 82 s/l -51.3
5 9 t/p 164.32
8 38 s/l -53.19
11 82 s/l -54.4
14 107 s/l -54.53
.. ... ... ...
730 111 s/l -70.72
731 111 s/l -70.72
732 111 s/l -70.72
733 113 s/l -65.13
734 113 s/l -65.13
[239 rows x 3 columns]
I want to plot a chart which shows X as the time (that's already on week hours), and Y as profit(Which can be positive or negative). For Y, I would like for each hour (X) to have 2 bars to show the profit. The negative profit would be positive too in this case but in another bar.
For example we have -65 and 70. They would show as 65 and 70 on the chart but the loss would have a different bar color.
This is my code so far:
#reading the csv file
data = pd.read_csv(filename)
df = pd.DataFrame(data, columns = ['Time','Type','Profit']).astype(str)
#turns time column into hours of week
df['Time'] = df['Time'].apply(lambda x: findHourOfWeek(x))
#Takes in winning trades (t/p) and losing trades(s/l)
df = df[(df['Type'] == 't/p') | (df['Type'] == 's/l')]
#Plots the chart
ax = df.plot(title='Profits and Losses (Hour Of Week)',kind='bar')
#ax.legend(['Losses', 'Winners'])
plt.xlabel('Hour of Week')
plt.ylabel('Amount Of Profit/Loss')
plt.show()
You can groupby, unstack and plot:
(df.groupby(['Time','Type']).Profit.sum().abs()
.unstack('Type')
.plot.bar()
)
For your sample data above, the output is:

Matplotlib is printing the line plot twice/multiple times

What could be the problem if Matplotlib is printing a line plot twice or multiple like this one:
Here is my code:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy import integrate
def compute_integrated_spectral_response_ikonos(file, sheet):
df = pd.read_excel(file, sheet_name=sheet, header=2)
blue = integrate.cumtrapz(df['Blue'], df['Wavelength'])
green = integrate.cumtrapz(df['Green'], df['Wavelength'])
red = integrate.cumtrapz(df['Red'], df['Wavelength'])
nir = integrate.cumtrapz(df['NIR'], df['Wavelength'])
pan = integrate.cumtrapz(df['Pan'], df['Wavelength'])
plt.figure(num=None, figsize=(6, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(df[1:], blue, label='Blue', color='darkblue');
plt.plot(df[1:], green, label='Green', color='b');
plt.plot(df[1:], red, label='Red', color='g');
plt.plot(df[1:], nir, label='NIR', color='r');
plt.plot(df[1:], pan, label='Pan', color='darkred')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Spectral Response (%)')
plt.title(f'Integrated Spectral Response of {sheet} Bands')
plt.show()
compute_integrated_spectral_response_ikonos('Sorted Wavelengths.xlsx', 'IKONOS')
Here is my dataset.
This is because plotting df[1:] is plotting the entire dataframe as the x-axis.
>>> df[1:]
Wavelength Blue Green Red NIR Pan
1 355 0.001463 0.000800 0.000504 0.000532 0.000619
2 360 0.000866 0.000729 0.000391 0.000674 0.000361
3 365 0.000731 0.000806 0.000597 0.000847 0.000244
4 370 0.000717 0.000577 0.000328 0.000729 0.000435
5 375 0.001251 0.000842 0.000847 0.000906 0.000914
.. ... ... ... ... ... ...
133 1015 0.002601 0.002100 0.001752 0.002007 0.149330
134 1020 0.001602 0.002040 0.002341 0.001793 0.136372
135 1025 0.001946 0.002218 0.001260 0.002754 0.118682
136 1030 0.002417 0.001376 0.000898 0.000000 0.103634
137 1035 0.001300 0.001602 0.000000 0.000000 0.089097
[137 rows x 6 columns]
The slice [1:] just gives the dataframe without the first row. Altering each instance of df[1:] to df['Wavelength'][1:] gives us what I presume is the expected output:
>>> df['Wavelength'][1:]
1 355
2 360
3 365
4 370
5 375
133 1015
134 1020
135 1025
136 1030
137 1035
Name: Wavelength, Length: 137, dtype: int64
Output:

Categories