[Python3]How to use Seaborn/Matplotlib to graph pandas dataframe - python

I'm still having troubles to do this
Here is how my data looks like:
date positive negative neutral
0 2015-09 23 6 18
1 2016-04 709 288 704
2 2016-08 1478 692 1750
3 2016-09 1881 926 2234
4 2016-10 3196 1594 3956
in my csv file I don't have those 0-4 indexes, but only 4 columns from 'date' to 'neutral'.
I don't know how to fix my codes to get it look like this
Seaborn code
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x=df['positive'], y=df['negative'], ax=ax)
ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
ax.set_ylabel("Percentage")
plt.show()

To do this in seaborn you'll need to transform your data into long format. You can easily do this via melt:
plotting_df = df.melt(id_vars="date", var_name="sign", value_name="percentage")
print(plotting_df.head())
date sign percentage
0 2015-09 positive 23
1 2016-04 positive 709
2 2016-08 positive 1478
3 2016-09 positive 1881
4 2016-10 positive 3196
Then you can plot this long-format dataframe with seaborn in a straightforward mannter:
sns.set(style='darkgrid', context='talk', palette='Dark2')
fig, ax = plt.subplots(figsize=(8, 8))
sns.barplot(x="date", y="percentage", ax=ax, hue="sign", data=plotting_df)

Based on the data you posted
sns.set(style='darkgrid', context='talk', palette='Dark2')
# fig, ax = plt.subplots(figsize=(8, 8))
df.plot(x="date",y=["positive","neutral","negative"],kind="bar")
plt.xticks(rotation=-360)
# ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
# ax.set_ylabel("Percentage")
plt.show()

Related

plot and draw curves in python matplotlib without ignoring first and last Nan values from the graph figure

I have csv format file like the below table
depth
x1
x2
x3
1000
15
Nan
Nan
1001
10
Nan
Nan
1002
5
Nan
Nan
1003
8
10
Nan
1004
12
11.11111111
Nan
1010
13
17.77777778
14.16666667
1011
14
18.88888889
15
1012
15
20
15.71428571
1013
16
20.55555556
16.42857143
1014
17
21.11111111
17.14285714
1017
20
22.77777778
19.28571429
1018
21
23.33333333
20
1019
22
23.88888889
20.83333333
1024
27
17.5
25
1025
28
15
25
1026
25
Nan
Nan
1027
26
Nan
Nan
1028
7
Nan
Nan
I want to plot x1, x2, x3 columns versus depth columns but sometimes these columns contain Nan values at start and end of columns, I want to plot whole curves points without ignoring the first and last Nan values
the below code is my attempt to plot curves but the plot always start and end at first and last valid values and ignores the first and last Nan values
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
df = pd.read_csv("result.csv")
fig = plt.figure(figsize=(15, 12), dpi=100, tight_layout=True)
gs = gridspec.GridSpec(nrows=1, ncols=5, wspace=0)
fig.add_subplot(gs[0, 1])
plt.plot(df['x1'],df["depth"], linewidth=2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,2 ])
plt.plot(df["x2"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,3])
plt.plot(df["x3"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
plt.show()
the current reult
the desired result in the below image where all curves y axis start from same depth point
You need to share the y axis with the other y axis:
fig, axs = plt.subplots(1, 3, figsize=(15, 12), dpi=100, tight_layout=True, gridspec_kw={'wspace': 0})
axs[0].plot(df.x1, df.depth, '-ok', lw=2, ms=3)
axs[1].plot(df.x2, df.depth, '-ok', lw=2, ms=3)
axs[1].sharey(axs[0])
axs[2].plot(df.x3, df.depth, '-ok', lw=2, ms=3)
axs[2].sharey(axs[0])

Is it possible to plot a barchart with upper and lower limits of the bins with Pandas,seaborn or Matplotlib

I will like to know how I can go about plotting a barchart with upper and lower limits of the bins represented by the values in the age_classes column of the dataframe shown below with pandas, seaborn or matplotlib. A sample of the dataframe looks like this:
age_classes total_cases male_cases female_cases
0 0-9 693 381 307
1 10-19 931 475 454
2 20-29 4530 1919 2531
3 30-39 7466 3505 3885
4 40-49 13701 6480 7130
5 50-59 20975 11149 9706
6 60-69 18089 11761 6254
7 70-79 19238 12281 6868
8 80-89 16252 8553 7644
9 >90 4356 1374 2973
10 Unknown 168 84 81
If you want a chart like this:
then you can make it with sns.barplot setting age_classes as x and one columns (in my case total_cases) as y, like in this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data.csv')
fig, ax = plt.subplots()
sns.barplot(ax = ax,
data = df,
x = 'age_classes',
y = 'total_cases')
plt.show()

Expand x axis when x is a string (make xlim wider)

I have the following pandas data frame:
print(so)
Time Minions Crime_rate
0 2018-01 1907 0.147352
1 2018-02 2094 0.165234
2 2018-03 2227 0.148181
3 2018-04 2101 0.135174
4 2018-05 2321 0.132271
5 2018-06 2208 0.128623
6 2018-07 2593 0.140378
7 2018-08 2660 0.145865
8 2018-09 2488 0.149920
9 2018-10 2640 0.152273
10 2018-11 2501 0.138345
11 2018-12 2379 0.134931
I want to plot Time on the x axis, Minions on the y axis and Crime_rate on a secondary y axis. The problem is that the x axis is cropped and I want to expand it. I tried the following code:
so.plot(x="Time", y="Minions", kind="bar", color="orange", legend=False)
plt.ylabel("Number of Minions")
so["Crime_rate"].plot(secondary_y=True, rot=90)
plt.ylabel("Minion crime rate")
plt.ylim(0, 1)
# plt.xlim(min, max)
plt.show()
The code returns the following plot:
I had done this before using plt.xlim(), but so["Time"] is a string, so I cannot subtract or add to the limits. How can I expand the x axis limits to show the first and last bars?
I couldn't find a solution that involves keeping the x axis as a string. To solve this, I had to avoid setting the x axis and then overwriting its values using set_xticklabels().
fig, ax1 = plt.subplots()
ax1 = so["Minions"].plot(ax=ax1, kind="bar", color="orange", legend=False)
ax2 = ax1.twinx()
so["Crime_rate"].plot(ax=ax2, legend=False)
ax1.set_ylabel("Minions")
ax1.set_xlabel("Time")
ax2.set_ylabel("Minion crime rate")
ax2.set_xlim(-0.5, len(so) - 0.5) # extend the x axis by 0.5 to the left and 0.5 to the right
ax2.set_ylim(0, 1)
ax2.set_xticklabels(so["Time"])
plt.show()
This works because I never set the x axis in ax1, so it was generically set to a [0, 1, 2, ..., 10, 11]. This way, I could set the x axis range from -0.5 to 11.5.

How to make bar graph of 2 variables based on same DataFrame and I want to choose 2 or until 5 data

I have a DataFrame:
wilayah branch Income Januari 2018 Income Januari 2019 Income Febuari 2018 Income Febuari 2019 Income Jan-Feb 2018 Income Jan-Feb 2019
1 sunarto 1000 1500 2000 3000 3333 4431
1 pemabuk 500 700 3000 3000 4333 5431
1 pemalas 2000 2200 4000 3000 5333 6431
1 hasuntato 9000 1200 6000 3000 2222 2121
1 sibodoh 1000 1500 3434 3000 2233 2121
...
My expectation to to create a bar graph where x axis is every name in branch (e.g sunarto, pemabuk, pemalas, etc), and y axis is income.
Let's say I will compare sunarto's income januari 2018 and income januari 2019, pemabuk's income januari 2018 and income januari 2019, and so on (1 name in x axis, 2 values as comparison of two values). Then I will sort values high to low value from Income Jan-Feb 2019 in my bar graph.
I tried:
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots()
ax = df1[["Sunarto","Income Januari 2018", "Income Januari 2019"]].plot(x='branch', kind='bar', color=["g","b"],rot=45)
plt.show()
Consider a groupby aggregation then run DataFrame.plot. Below will line all branches on x-axis with different income columns as color_coded keys in legend.
agg_df = df.groupby('branch').sum()
fig, ax = plt.subplots(figsize=(15,5))
agg_df.plot(kind='bar', edgecolor='w', ax=ax, rot=22, width=0.5, fontsize = 15)
# ADD TITLES AND LABELS
plt.title('Income by Branches, Jan/Feb 2018-2019', weight='bold', size=24)
plt.xlabel('Branch', weight='bold', size=24)
plt.ylabel('Income', weight='bold', size=20)
plt.tight_layout()
plt.show()
plt.clf()
Should you want each separate branch plots on specific columns, iterate off a groupby list:
dfs = df.groupby('branch')
for i,g in dfs:
ord_cols = (pd.melt(g.drop(columns="wilayah"), id_vars = "branch")
.sort_values("value")["variable"].values
)
fig, ax = plt.subplots(figsize=(8,4))
(g.reindex(columns=ord_cols)
.plot(kind='bar', edgecolor='w', ax=ax, rot=0, width=0.5, fontsize = 15)
)
# ADD TITLES AND LABELS
plt.title('Income by {} Branch, Jan/Feb 2018-2019'.format(i),
weight='bold', size=16)
plt.xlabel('Branch', weight='bold', size=16)
plt.ylabel('Income', weight='bold', size=14)
plt.tight_layout()
plt.show()

X-Axis scales not matching with 2 data sets on same plot

I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()

Categories