I been working on a Pie Chart to display data based on year wise.I have tried for quite a while I am successful at achieving at slicing rows:
df = pd.DataFrame(dict( Year = dates[:3],
robbery = robbery[:3],
fraud = fraud[:3],
sexual = sexual[:3]
))
fig, axes = plt.subplots(1,3, figsize=(12,8))
for ax, idx in zip(axes, df.index):
ax.pie(df.loc[idx],explode=explode,shadow=True, labels=df.columns, autopct='%.2f%%')
ax.set(ylabel='', title=idx, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))
plt.show()
but I have checked this link to display pie chart but they have worked on numpy array to achieve charts.
In my scenario, I stuck on displaying all the data on pie chart with a year wise at once here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(dict(
robbery = robbery,
fraud = fraud,
assualt = sexual
), index=dates)
print(df)
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(nrows=2, ncols=2)
for ax, col in zip(axes.flat, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0, 0].legend(bbox_to_anchor=(0, 0.5))
fig.savefig('your_file.png') # Or whichever format you'd like
plt.show()
DataFrame:
assualt fraud robbery
1997-1998 2988 11897 1212
1998-1999 6033 27660 2482
1999-2000 5924 28421 2418
2000-2001 5631 29539 2298
2001-2002 5875 30295 2481
2002-2003 7434 27141 1940
2003-2004 5673 27986 2053
2004-2005 5695 30070 1879
2005-2006 6099 26031 1903
2006-2007 7038 25845 1889
2007-2008 6671 21009 1736
2008-2009 6046 17768 1791
2009-2010 5496 18974 1934
2010-2011 5666 18458 1726
2011-2012 4933 14157 1748
2012-2013 4972 16849 1962
2013-2014 5328 18819 1762
2014-2015 5909 21915 1341
2015-2016 6067 21891 1354
2016-2017 6448 27390 1608
2017-2018 6355 25438 1822
1997-1998 2988 11897 1212
1998-1999 6033 27660 2482
1999-2000 5924 28421 2418
2000-2001 5631 29539 2298
2001-2002 5875 30295 2481
2002-2003 7434 27141 1940
2003-2004 5673 27986 2053
2004-2005 5695 30070 1879
2005-2006 6099 26031 1903
2006-2007 7038 25845 1889
2007-2008 6671 21009 1736
2008-2009 6046 17768 1791
2009-2010 5496 18974 1934
2010-2011 5666 18458 1726
2011-2012 4933 14157 1748
2012-2013 4972 16849 1962
2013-2014 5328 18819 1762
2014-2015 5909 21915 1341
2015-2016 6067 21891 1354
2016-2017 6448 27390 1608
2017-2018 6355 25438 1822
pie chart looks like this:
I generated an example 9 x 3 dataframe and a 3 x 3 subplots then populated the pie chart one row at a time.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(9, 3)),
columns=['a', 'b', 'c'])
fig, axes = plt.subplots(3,3, figsize=(12,8))
for i in range(int(len(df.index)/3)):
for j in range(int(len(df.index)/3)):
idx = i * 3 + j
ax = axes[i][j]
ax.pie(df.loc[idx],shadow=True, labels=df.columns, autopct='%.2f%%')
ax.set(ylabel='', title=idx, aspect='equal')
axes[0][0].legend(bbox_to_anchor=(0, 0.5))
plt.show()
Related
I want to make a gdp vs life expectancy for Ireland over the course of a few years. I want to plot the first point on a scatter plot, then I want wait a few seconds and have the next point plot.
ax = plt.figure()
for i in year:
plt.scatter(ie_gdp[i], ie_life[i])
plt.draw()
plt.show()
plt.pause(1)
So far this is all I can come up with. However, using JupterLab this plots an individual plot for each point. I've tried looking at animations online, but they all use live data. I already have the datasets cleaned and reay in ie_gdp and ie_life.
%matplotlib inline
fig = plt.figure(figsize = (15,15))
ax = fig.add_subplot(1,1,1)
def animate(i):
xs = []
ys = []
for y in year:
xs.append(ie_gdp[y])
ys.append(ie_life[y])
ax.cla()
ax.scatter(xs,ys)
ani = animation.FuncAnimation(fig, animate, interval = 10000)
plt.show()
Above is my attempt at using animations, but it also doesn't work. I get this error: AttributeError: 'list' object has no attribute 'shape'
Any help would be appreciated.
I'm not sure I understand your intended animation, but I animated the x-axis as year, y-axis as average age, and the size of the scatter plot as GDP value. The sample data is from the data provided by Plotly, so please replace it with your own data.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
#from IPython.display import HTML
#from matplotlib.animation import PillowWriter
# for sample data
import plotly.express as px
df = px.data.gapminder()
ie_df = df[df['country'] == 'Ireland']
ie_df
country continent year lifeExp pop gdpPercap iso_alpha iso_num
744 Ireland Europe 1952 66.910 2952156 5210.280328 IRL 372
745 Ireland Europe 1957 68.900 2878220 5599.077872 IRL 372
746 Ireland Europe 1962 70.290 2830000 6631.597314 IRL 372
747 Ireland Europe 1967 71.080 2900100 7655.568963 IRL 372
748 Ireland Europe 1972 71.280 3024400 9530.772896 IRL 372
749 Ireland Europe 1977 72.030 3271900 11150.981130 IRL 372
750 Ireland Europe 1982 73.100 3480000 12618.321410 IRL 372
751 Ireland Europe 1987 74.360 3539900 13872.866520 IRL 372
752 Ireland Europe 1992 75.467 3557761 17558.815550 IRL 372
753 Ireland Europe 1997 76.122 3667233 24521.947130 IRL 372
754 Ireland Europe 2002 77.783 3879155 34077.049390 IRL 372
755 Ireland Europe 2007 78.885 4109086 40675.996350 IRL 372
fig = plt.figure(figsize=(10,10))
ax = plt.axes(xlim=(1952,2012), ylim=(0, 45000))
scat = ax.scatter([], [], [], cmap='jet')
def animate(i):
tmp = ie_df.iloc[:i,:]
ax.clear()
scat = ax.scatter(tmp['year'], tmp['lifeExp'], s=tmp['gdpPercap'], c=tmp['gdpPercap'], ec='k')
ax.set_xlabel('Year', fontsize=15)
ax.set_ylabel('lifeExp', fontsize=15)
ax.set_title('Ireland(1952-2007)')
return scat,
anim = FuncAnimation(fig, animate, frames=12, interval=1000, repeat=False)
#anim.save('gdp_life.gif', writer='Pillow')
plt.show()
I'm still having troubles to do this
Here is how my data looks like:
date positive negative neutral
0 2015-09 23 6 18
1 2016-04 709 288 704
2 2016-08 1478 692 1750
3 2016-09 1881 926 2234
4 2016-10 3196 1594 3956
in my csv file I don't have those 0-4 indexes, but only 4 columns from 'date' to 'neutral'.
I don't know how to fix my codes to get it look like this
Seaborn code
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x=df['positive'], y=df['negative'], ax=ax)
ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
ax.set_ylabel("Percentage")
plt.show()
To do this in seaborn you'll need to transform your data into long format. You can easily do this via melt:
plotting_df = df.melt(id_vars="date", var_name="sign", value_name="percentage")
print(plotting_df.head())
date sign percentage
0 2015-09 positive 23
1 2016-04 positive 709
2 2016-08 positive 1478
3 2016-09 positive 1881
4 2016-10 positive 3196
Then you can plot this long-format dataframe with seaborn in a straightforward mannter:
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x="date", y="percentage", ax=ax, hue="sign", data=plotting_df)
Based on the data you posted
sns.set(style='darkgrid', context='talk', palette='Dark2')
# fig, ax = plt.subplots(figsize=(8, 8))
df.plot(x="date",y=["positive","neutral","negative"],kind="bar")
plt.xticks(rotation=-360)
# ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
# ax.set_ylabel("Percentage")
plt.show()
I am trying to plot two pandas series
Series A
Private 11210
Self-emp-not-inc 1321
Local-gov 1043
? 963
State-gov 683
Self-emp-inc 579
Federal-gov 472
Without-pay 7
Never-worked 3
Name: workclass, dtype: int64
Series B
Self-emp-not-inc 1321
Local-gov 1043
State-gov 683
Self-emp-inc 579
Federal-gov 472
Without-pay 7
Never-worked 3
Name: workclass, dtype: int64
g = sns.barplot(x=A.index, y=A.values, color='green', ax=faxes[ax_id]) # some subplot
g.set_xticklabels(g.get_xticklabels(), rotation=30)
sns.barplot(x=B.index, y=B.values, color='red', ax=faxes[ax_id])
The first plot draws as expected:
however, once I draw the second something goes wrong (a couple of bar disappear, labels are incorrect, etc).
Partially related ... how can I use log for y-axis (11K vs 3 hides the low number completely)
You can concatenate A and B joining the index. Rows that appear in one but not in the other will be filled in with NaN or NA and will not be shown in the bar plot.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
A = pd.Series({'Private': 11210,
'Self-emp-not-inc': 1321,
'Local-gov': 1043,
'?': 963,
'State-gov': 683,
'Self-emp-inc': 579,
'Federal-gov': 472,
'Without-pay': 7,
'Never-worked': 3}, name='workclass')
B = pd.Series({'Self-emp-not-inc': 1321,
'Local-gov': 1043,
'State-gov': 683,
'Self-emp-inc': 579,
'Federal-gov': 472,
'Without-pay': 7,
'Never-worked': 3}, name='workclass')
df = pd.concat([A.rename('workclass A'), B.rename('workclass B')], axis=1)
ax = df.plot.bar(rot=30, color=['darkgreen', 'crimson'])
plt.tight_layout()
plt.show()
The concatenated dataframe looks like:
workclass A workclass B
Private 11210 NaN
Self-emp-not-inc 1321 1321.0
Local-gov 1043 1043.0
? 963 NaN
State-gov 683 683.0
Self-emp-inc 579 579.0
Federal-gov 472 472.0
Without-pay 7 7.0
Never-worked 3 3.0
Note that an integer can't be NaN, so B is automatically converted to a float type.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
A = {'Private':11210,
'Self-emp-not-inc':1321,
'Local-gov':1043,
'?':963,
'State-gov':683,
'Self-emp-inc':579,
'Federal-gov':472,
'Without-pay':7,
'Never-worked':3}
B = {'Self-emp-not-inc':1321,
'Local-gov':1043,
'State-gov':683,
'Self-emp-inc':579,
'Federal-gov':472,
'Without-pay':7,
'Never-worked':3}
df = pd.concat([pd.Series(A, name='A'), pd.Series(B, name='B')], axis=1)
sns.barplot(y=df.A.values, x=df.index, color='b', alpha=0.4, label='A')
sns.barplot(y=df.B.values, x=df.index, color='r', alpha=0.4, label='B', bottom=df.A.values)
plt.yscale('log')
What could be the problem if Matplotlib is printing a line plot twice or multiple like this one:
Here is my code:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy import integrate
def compute_integrated_spectral_response_ikonos(file, sheet):
df = pd.read_excel(file, sheet_name=sheet, header=2)
blue = integrate.cumtrapz(df['Blue'], df['Wavelength'])
green = integrate.cumtrapz(df['Green'], df['Wavelength'])
red = integrate.cumtrapz(df['Red'], df['Wavelength'])
nir = integrate.cumtrapz(df['NIR'], df['Wavelength'])
pan = integrate.cumtrapz(df['Pan'], df['Wavelength'])
plt.figure(num=None, figsize=(6, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(df[1:], blue, label='Blue', color='darkblue');
plt.plot(df[1:], green, label='Green', color='b');
plt.plot(df[1:], red, label='Red', color='g');
plt.plot(df[1:], nir, label='NIR', color='r');
plt.plot(df[1:], pan, label='Pan', color='darkred')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Spectral Response (%)')
plt.title(f'Integrated Spectral Response of {sheet} Bands')
plt.show()
compute_integrated_spectral_response_ikonos('Sorted Wavelengths.xlsx', 'IKONOS')
Here is my dataset.
This is because plotting df[1:] is plotting the entire dataframe as the x-axis.
>>> df[1:]
Wavelength Blue Green Red NIR Pan
1 355 0.001463 0.000800 0.000504 0.000532 0.000619
2 360 0.000866 0.000729 0.000391 0.000674 0.000361
3 365 0.000731 0.000806 0.000597 0.000847 0.000244
4 370 0.000717 0.000577 0.000328 0.000729 0.000435
5 375 0.001251 0.000842 0.000847 0.000906 0.000914
.. ... ... ... ... ... ...
133 1015 0.002601 0.002100 0.001752 0.002007 0.149330
134 1020 0.001602 0.002040 0.002341 0.001793 0.136372
135 1025 0.001946 0.002218 0.001260 0.002754 0.118682
136 1030 0.002417 0.001376 0.000898 0.000000 0.103634
137 1035 0.001300 0.001602 0.000000 0.000000 0.089097
[137 rows x 6 columns]
The slice [1:] just gives the dataframe without the first row. Altering each instance of df[1:] to df['Wavelength'][1:] gives us what I presume is the expected output:
>>> df['Wavelength'][1:]
1 355
2 360
3 365
4 370
5 375
133 1015
134 1020
135 1025
136 1030
137 1035
Name: Wavelength, Length: 137, dtype: int64
Output:
After a count operation in Pandas, I have the following dataframe:
Cancer No Yes
AgeGroups Factor
0-5 w-statin 108 0
wo-statin 6575 223
11-15 w-statin 5 1
wo-statin 3669 143
16-20 w-statin 28 1
wo-statin 6174 395
21-25 w-statin 80 2
wo-statin 8173 624
26-30 w-statin 110 2
wo-statin 9143 968
30-35 w-statin 171 5
wo-statin 9046 1225
35-40 w-statin 338 21
wo-statin 8883 1475
41-45 w-statin 782 65
wo-statin 11155 2533
I am having a problem with my barchart. With the code:
ax = counts.plot(kind='bar',stacked=True,colormap='Paired',rot = 45)
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=0).astype(np.int64), (p.get_x()+p.get_width()/2., p.get_y()), ha='center', va='center', xytext=(2, 10), textcoords='offset points', fontsize=10)
yielded me:
My target is to achieve two different subplots with two different factors (w-statin/wo-statin) with agegroups as my x-axis. It should approximately look like this:
I would appreciate any help provided. Thank you so much.
by_factor = counts.groupby(level='Factor')
k = by_factor.ngroups
fig, axes = plt.subplots(1, k, sharex=True, sharey=False, figsize=(15, 8))
for i, (gname, grp) in enumerate(by_factor):
grp.xs(gname, level='Factor').plot.bar(
stacked=True, colormap='Paired', rot=45, ax=axes[i], title=gname)
fig.tight_layout()