I am trying to plot two pandas series
Series A
Private 11210
Self-emp-not-inc 1321
Local-gov 1043
? 963
State-gov 683
Self-emp-inc 579
Federal-gov 472
Without-pay 7
Never-worked 3
Name: workclass, dtype: int64
Series B
Self-emp-not-inc 1321
Local-gov 1043
State-gov 683
Self-emp-inc 579
Federal-gov 472
Without-pay 7
Never-worked 3
Name: workclass, dtype: int64
g = sns.barplot(x=A.index, y=A.values, color='green', ax=faxes[ax_id]) # some subplot
g.set_xticklabels(g.get_xticklabels(), rotation=30)
sns.barplot(x=B.index, y=B.values, color='red', ax=faxes[ax_id])
The first plot draws as expected:
however, once I draw the second something goes wrong (a couple of bar disappear, labels are incorrect, etc).
Partially related ... how can I use log for y-axis (11K vs 3 hides the low number completely)
You can concatenate A and B joining the index. Rows that appear in one but not in the other will be filled in with NaN or NA and will not be shown in the bar plot.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
A = pd.Series({'Private': 11210,
'Self-emp-not-inc': 1321,
'Local-gov': 1043,
'?': 963,
'State-gov': 683,
'Self-emp-inc': 579,
'Federal-gov': 472,
'Without-pay': 7,
'Never-worked': 3}, name='workclass')
B = pd.Series({'Self-emp-not-inc': 1321,
'Local-gov': 1043,
'State-gov': 683,
'Self-emp-inc': 579,
'Federal-gov': 472,
'Without-pay': 7,
'Never-worked': 3}, name='workclass')
df = pd.concat([A.rename('workclass A'), B.rename('workclass B')], axis=1)
ax = df.plot.bar(rot=30, color=['darkgreen', 'crimson'])
plt.tight_layout()
plt.show()
The concatenated dataframe looks like:
workclass A workclass B
Private 11210 NaN
Self-emp-not-inc 1321 1321.0
Local-gov 1043 1043.0
? 963 NaN
State-gov 683 683.0
Self-emp-inc 579 579.0
Federal-gov 472 472.0
Without-pay 7 7.0
Never-worked 3 3.0
Note that an integer can't be NaN, so B is automatically converted to a float type.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
A = {'Private':11210,
'Self-emp-not-inc':1321,
'Local-gov':1043,
'?':963,
'State-gov':683,
'Self-emp-inc':579,
'Federal-gov':472,
'Without-pay':7,
'Never-worked':3}
B = {'Self-emp-not-inc':1321,
'Local-gov':1043,
'State-gov':683,
'Self-emp-inc':579,
'Federal-gov':472,
'Without-pay':7,
'Never-worked':3}
df = pd.concat([pd.Series(A, name='A'), pd.Series(B, name='B')], axis=1)
sns.barplot(y=df.A.values, x=df.index, color='b', alpha=0.4, label='A')
sns.barplot(y=df.B.values, x=df.index, color='r', alpha=0.4, label='B', bottom=df.A.values)
plt.yscale('log')
Related
I'm still having troubles to do this
Here is how my data looks like:
date positive negative neutral
0 2015-09 23 6 18
1 2016-04 709 288 704
2 2016-08 1478 692 1750
3 2016-09 1881 926 2234
4 2016-10 3196 1594 3956
in my csv file I don't have those 0-4 indexes, but only 4 columns from 'date' to 'neutral'.
I don't know how to fix my codes to get it look like this
Seaborn code
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x=df['positive'], y=df['negative'], ax=ax)
ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
ax.set_ylabel("Percentage")
plt.show()
To do this in seaborn you'll need to transform your data into long format. You can easily do this via melt:
plotting_df = df.melt(id_vars="date", var_name="sign", value_name="percentage")
print(plotting_df.head())
date sign percentage
0 2015-09 positive 23
1 2016-04 positive 709
2 2016-08 positive 1478
3 2016-09 positive 1881
4 2016-10 positive 3196
Then you can plot this long-format dataframe with seaborn in a straightforward mannter:
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x="date", y="percentage", ax=ax, hue="sign", data=plotting_df)
Based on the data you posted
sns.set(style='darkgrid', context='talk', palette='Dark2')
# fig, ax = plt.subplots(figsize=(8, 8))
df.plot(x="date",y=["positive","neutral","negative"],kind="bar")
plt.xticks(rotation=-360)
# ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
# ax.set_ylabel("Percentage")
plt.show()
I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()
What could be the problem if Matplotlib is printing a line plot twice or multiple like this one:
Here is my code:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy import integrate
def compute_integrated_spectral_response_ikonos(file, sheet):
df = pd.read_excel(file, sheet_name=sheet, header=2)
blue = integrate.cumtrapz(df['Blue'], df['Wavelength'])
green = integrate.cumtrapz(df['Green'], df['Wavelength'])
red = integrate.cumtrapz(df['Red'], df['Wavelength'])
nir = integrate.cumtrapz(df['NIR'], df['Wavelength'])
pan = integrate.cumtrapz(df['Pan'], df['Wavelength'])
plt.figure(num=None, figsize=(6, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(df[1:], blue, label='Blue', color='darkblue');
plt.plot(df[1:], green, label='Green', color='b');
plt.plot(df[1:], red, label='Red', color='g');
plt.plot(df[1:], nir, label='NIR', color='r');
plt.plot(df[1:], pan, label='Pan', color='darkred')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Spectral Response (%)')
plt.title(f'Integrated Spectral Response of {sheet} Bands')
plt.show()
compute_integrated_spectral_response_ikonos('Sorted Wavelengths.xlsx', 'IKONOS')
Here is my dataset.
This is because plotting df[1:] is plotting the entire dataframe as the x-axis.
>>> df[1:]
Wavelength Blue Green Red NIR Pan
1 355 0.001463 0.000800 0.000504 0.000532 0.000619
2 360 0.000866 0.000729 0.000391 0.000674 0.000361
3 365 0.000731 0.000806 0.000597 0.000847 0.000244
4 370 0.000717 0.000577 0.000328 0.000729 0.000435
5 375 0.001251 0.000842 0.000847 0.000906 0.000914
.. ... ... ... ... ... ...
133 1015 0.002601 0.002100 0.001752 0.002007 0.149330
134 1020 0.001602 0.002040 0.002341 0.001793 0.136372
135 1025 0.001946 0.002218 0.001260 0.002754 0.118682
136 1030 0.002417 0.001376 0.000898 0.000000 0.103634
137 1035 0.001300 0.001602 0.000000 0.000000 0.089097
[137 rows x 6 columns]
The slice [1:] just gives the dataframe without the first row. Altering each instance of df[1:] to df['Wavelength'][1:] gives us what I presume is the expected output:
>>> df['Wavelength'][1:]
1 355
2 360
3 365
4 370
5 375
133 1015
134 1020
135 1025
136 1030
137 1035
Name: Wavelength, Length: 137, dtype: int64
Output:
I been working on a Pie Chart to display data based on year wise.I have tried for quite a while I am successful at achieving at slicing rows:
df = pd.DataFrame(dict( Year = dates[:3],
robbery = robbery[:3],
fraud = fraud[:3],
sexual = sexual[:3]
))
fig, axes = plt.subplots(1,3, figsize=(12,8))
for ax, idx in zip(axes, df.index):
ax.pie(df.loc[idx],explode=explode,shadow=True, labels=df.columns, autopct='%.2f%%')
ax.set(ylabel='', title=idx, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))
plt.show()
but I have checked this link to display pie chart but they have worked on numpy array to achieve charts.
In my scenario, I stuck on displaying all the data on pie chart with a year wise at once here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(dict(
robbery = robbery,
fraud = fraud,
assualt = sexual
), index=dates)
print(df)
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(nrows=2, ncols=2)
for ax, col in zip(axes.flat, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0, 0].legend(bbox_to_anchor=(0, 0.5))
fig.savefig('your_file.png') # Or whichever format you'd like
plt.show()
DataFrame:
assualt fraud robbery
1997-1998 2988 11897 1212
1998-1999 6033 27660 2482
1999-2000 5924 28421 2418
2000-2001 5631 29539 2298
2001-2002 5875 30295 2481
2002-2003 7434 27141 1940
2003-2004 5673 27986 2053
2004-2005 5695 30070 1879
2005-2006 6099 26031 1903
2006-2007 7038 25845 1889
2007-2008 6671 21009 1736
2008-2009 6046 17768 1791
2009-2010 5496 18974 1934
2010-2011 5666 18458 1726
2011-2012 4933 14157 1748
2012-2013 4972 16849 1962
2013-2014 5328 18819 1762
2014-2015 5909 21915 1341
2015-2016 6067 21891 1354
2016-2017 6448 27390 1608
2017-2018 6355 25438 1822
1997-1998 2988 11897 1212
1998-1999 6033 27660 2482
1999-2000 5924 28421 2418
2000-2001 5631 29539 2298
2001-2002 5875 30295 2481
2002-2003 7434 27141 1940
2003-2004 5673 27986 2053
2004-2005 5695 30070 1879
2005-2006 6099 26031 1903
2006-2007 7038 25845 1889
2007-2008 6671 21009 1736
2008-2009 6046 17768 1791
2009-2010 5496 18974 1934
2010-2011 5666 18458 1726
2011-2012 4933 14157 1748
2012-2013 4972 16849 1962
2013-2014 5328 18819 1762
2014-2015 5909 21915 1341
2015-2016 6067 21891 1354
2016-2017 6448 27390 1608
2017-2018 6355 25438 1822
pie chart looks like this:
I generated an example 9 x 3 dataframe and a 3 x 3 subplots then populated the pie chart one row at a time.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(9, 3)),
columns=['a', 'b', 'c'])
fig, axes = plt.subplots(3,3, figsize=(12,8))
for i in range(int(len(df.index)/3)):
for j in range(int(len(df.index)/3)):
idx = i * 3 + j
ax = axes[i][j]
ax.pie(df.loc[idx],shadow=True, labels=df.columns, autopct='%.2f%%')
ax.set(ylabel='', title=idx, aspect='equal')
axes[0][0].legend(bbox_to_anchor=(0, 0.5))
plt.show()
I tried to do sequential colormap on pandas. This is my outcome and I want to do colormap.
A G C T -
A - 5823 1997 1248 962
G 9577 - 2683 2492 788
C 2404 2574 - 9569 722
T 1272 1822 5931 - 767
- 795 583 599 559 -
df = pd.DataFrame(index= ["A", "G", "C", "T", "-"], columns=["A", "G", "C", "T", "-"])
import matplotlib.pyplot as plt
import numpy as np
column_labels = list("AGCT-")
row_labels = list("AGCT-")
data = df
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)
ax.set_xticks(np.arange(data.shape[0])+0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1])+0.5, minor=False)
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.set_xticklabels(row_labels, minor=False)
ax.set_yticklabels(column_labels, minor=False)
plt.show()
But it keeps giving an error.
File "//anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "//anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/Users/macbookpro/Desktop/mutations/first.py", line 115, in <module>
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)
File "//anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 4967, in pcolor
collection.autoscale_None()
File "//anaconda/lib/python2.7/site-packages/matplotlib/cm.py", line 335, in autoscale_None
self.norm.autoscale_None(self._A)
File "//anaconda/lib/python2.7/site-packages/matplotlib/colors.py", line 956, in autoscale_None
self.vmax = ma.max(A)
File "//anaconda/lib/python2.7/site-packages/numpy/ma/core.py", line 6036, in max
return asanyarray(obj).max(axis=axis, fill_value=fill_value, out=out)
File "//anaconda/lib/python2.7/site-packages/numpy/ma/core.py", line 5280, in max
result = self.filled(fill_value).max(axis=axis, out=out).view(type(self))
AttributeError: 'str' object has no attribute 'view'
The problem is that the '-' characters in your dataframe are causing the values to be stored as strings, rather than integers.
You can convert your data frame to integers like this (the first part replaces '-' with 0, and the second part changes the data type to int):
df = df.where(df != '-', 0).astype(int)
df
A G C T -
A 0 5823 1997 1248 962
G 9577 0 2683 2492 788
C 2404 2574 0 9569 722
T 1272 1822 5931 0 767
- 795 583 599 559 0