How can I plot graph in different position in pandas? - python

I have these four datasets like df1. and I want to print them into scatter diagram like 2*2.
df1
Height time_of_day resolution clusters
11 3.146094 0.458333 0.594089 0
90 0.191690 0.541667 0.594089 0
99 1.300386 1.666667 0.594089 1
121 3.054903 2.083333 0.594089 0
df2
Height time_of_day resolution clusters
10 3.146094 0.458333 0.594089 0
60 3.191690 0.541667 0.594089 0
87 1.300386 1.666667 0.594089 1
121 3.054903 1.083333 0.594089 0
df3
Height time_of_day resolution clusters
13 3.146094 0.458333 0.594089 0
61 3.191690 0.541667 0.594089 0
86 1.300386 1.666667 0.594089 1
113 4.054903 1.083333 0.594089 0
df4
Height time_of_day resolution clusters
10 3.146094 0.458333 0.594089 0
20 3.191690 0.541667 0.594089 0
37 1.300386 1.666667 0.594089 1
121 3.054903 1.083333 0.594089 0
I have tried several methods and all of them was not work.
dics = [df1,df2,df3,df4]
rows = range(4)
fig, ax = plt.subplots(2,2,squeeze=False,figsize = (20,10))
for x in rows:
for i,dic in enumerate(dics):
sns.lmplot(x="time_of_day", y="Height",fit_reg=False,hue="clusters", data=dic[x], height=6, aspect=1.5)
plt.show()
And this is the single code for scatter plot
sns.lmplot(x="time_of_day", y="Height",fit_reg=False,hue="clusters", data=summer_spike_df, height=6, aspect=1.5)
What code should I change in order to print into 2*2 with different results of scatter plot?
Thank you

If you're not plotting the regression line, then why not just use seaborn.scatterplot.
You can use the zip function and array.ravel to plot using:
fig, axes = plt.subplots(2,2,squeeze=False,figsize = (20,10))
for df, ax in zip(dics, axes.ravel()):
sns.scatterplot(x="time_of_day", y="Height",hue="clusters", data=df, ax=ax)
plt.show()

Related

How to add a box plot and a vertical line in a histogram diagram in python Plotly Express graph objects subplots

Below is the data that is used to create the histogram subplot charts in ploty express graph objects.
Below code is used to create histogram subplot charts in ploty express graph objects.
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
specs = [[{'type':'histogram'}, {'type':'histogram'},{'type':'histogram'}]]
fig = make_subplots(rows=1, cols=3, specs=specs, subplot_titles=['<b> Millenials </b>',
'<b> Generation X </b>',
'<b> Boomers </b>'])
fig.add_trace(go.Histogram(
x=df[df['Generation']=='Millenials']['NumCompaniesWorked'],
opacity = 0.5,
marker_color = ['#455f66'] * 15
),1,1)
fig.add_trace(go.Histogram(
x=df[df['Generation']=='Generation X']['NumCompaniesWorked'],
opacity = 0.5,
marker_color = ['#455f66'] * 15
),1,2)
fig.add_trace(go.Histogram(
x=df[df['Generation']=='Boomers']['NumCompaniesWorked'],
opacity = 0.5,
marker_color = ['#455f66'] * 15
),1,3)
fig.update_layout(
showlegend=False,
title=dict(text="<b> Histogram - <br> <span style='color: #f55142'> How to add the box plot and mean vertical line on each diagram </span></b> ",
font=dict(
family="Arial",
size=20,
color='#283747')
))
fig.show()
And below is the output I get from the above code
How can I include the mean (Average) vertical line in a histogram diagrams as the mean values are,
Millenials = 2.2
Generation X = 3.4
Boomers = 4.1
and a box plot above all 03 histogram diagrams.
Which should look like the shown diagram below for all 03 histogram diagrams.
import pandas as pd
import numpy as np
#original df
df = pd.DataFrame({'NumCompaniesWorked':list(range(10)),
'Millenials':[139,407,54,57,55,32,35,28,17,24],
'Generation X':[53,108,83,90,70,27,32,40,26,24],
'Boomers':[5,6,9,12,14,4,3,6,6,4]})
#reorganizing df
dfs = []
for col in ['Millenials', 'Generation X', 'Boomers']:
dfs.append(df[['NumCompaniesWorked', col]].rename(columns={col:'count'}).assign(Generation=col))
df = pd.concat(dfs)
#output
NumCompaniesWorked count Generation
0 0 139 Millenials
1 1 407 Millenials
2 2 54 Millenials
3 3 57 Millenials
4 4 55 Millenials
5 5 32 Millenials
6 6 35 Millenials
7 7 28 Millenials
8 8 17 Millenials
9 9 24 Millenials
0 0 53 Generation X
1 1 108 Generation X
2 2 83 Generation X
3 3 90 Generation X
4 4 70 Generation X
5 5 27 Generation X
6 6 32 Generation X
7 7 40 Generation X
8 8 26 Generation X
9 9 24 Generation X
0 0 5 Boomers
1 1 6 Boomers
2 2 9 Boomers
3 3 12 Boomers
4 4 14 Boomers
5 5 4 Boomers
6 6 3 Boomers
7 7 6 Boomers
8 8 6 Boomers
9 9 4 Boomers
fig = px.histogram(df,
x='NumCompaniesWorked',
y='count',
marginal='box',
facet_col='Generation')
fig.add_vline(x=2.2, line_width=1, line_dash='dash', line_color='gray', col=1)
fig.add_vline(x=3.4, line_width=1, line_dash='dash', line_color='gray', col=2)
fig.add_vline(x=4.1, line_width=1, line_dash='dash', line_color='gray', col=3)
fig.show()

Different binning for histplot as JoinGrid (x,y) marginal plot

I have a pandas dataframe like this:
Date
Weight
Year
Month
Day
Week
DayOfWeek
0
2017-11-13
76.1
2017
11
13
46
0
1
2017-11-14
76.2
2017
11
14
46
1
2
2017-11-15
76.6
2017
11
15
46
2
3
2017-11-16
77.1
2017
11
16
46
3
4
2017-11-17
76.7
2017
11
17
46
4
...
...
...
...
...
...
...
...
I created a JoinGrid with:
g = sns.JointGrid(data=df,
x="Date",
y="Weight",
marginal_ticks=True,
height=6,
ratio=2,
space=.05)
Then a defined joint and marginal plots:
g.plot_joint(sns.scatterplot,
hue=df["Year"],
alpha=.4,
legend=True)
g.plot_marginals(sns.histplot,
multiple="stack",
bins=20,
hue=df["Year"])
Result is this.
Now the question is: "is it possible to specify different binning for the two histplot resulting in the x and y marginal plot?"
I don't think there is a built-in way to do that, by you can plot directly on the marginal axes using the plotting function of your choice, like so:
penguins = sns.load_dataset('penguins')
data = penguins
x_col = "bill_length_mm"
y_col = "bill_depth_mm"
hue_col = "species"
g = sns.JointGrid(data=data, x=x_col, y=y_col, hue=hue_col)
g.plot_joint(sns.scatterplot)
# top marginal
sns.histplot(data=data, x=x_col, hue=hue_col, bins=5, ax=g.ax_marg_x, legend=False, multiple='stack')
# right marginal
sns.histplot(data=data, y=y_col, hue=hue_col, bins=40, ax=g.ax_marg_y, legend=False, multiple='stack')

How to plot Numerical Values in matplotlib

So I have this kind of database:
Time Type Profit
2 82 s/l -51.3
5 9 t/p 164.32
8 38 s/l -53.19
11 82 s/l -54.4
14 107 s/l -54.53
.. ... ... ...
730 111 s/l -70.72
731 111 s/l -70.72
732 111 s/l -70.72
733 113 s/l -65.13
734 113 s/l -65.13
[239 rows x 3 columns]
I want to plot a chart which shows X as the time (that's already on week hours), and Y as profit(Which can be positive or negative). For Y, I would like for each hour (X) to have 2 bars to show the profit. The negative profit would be positive too in this case but in another bar.
For example we have -65 and 70. They would show as 65 and 70 on the chart but the loss would have a different bar color.
This is my code so far:
#reading the csv file
data = pd.read_csv(filename)
df = pd.DataFrame(data, columns = ['Time','Type','Profit']).astype(str)
#turns time column into hours of week
df['Time'] = df['Time'].apply(lambda x: findHourOfWeek(x))
#Takes in winning trades (t/p) and losing trades(s/l)
df = df[(df['Type'] == 't/p') | (df['Type'] == 's/l')]
#Plots the chart
ax = df.plot(title='Profits and Losses (Hour Of Week)',kind='bar')
#ax.legend(['Losses', 'Winners'])
plt.xlabel('Hour of Week')
plt.ylabel('Amount Of Profit/Loss')
plt.show()
You can groupby, unstack and plot:
(df.groupby(['Time','Type']).Profit.sum().abs()
.unstack('Type')
.plot.bar()
)
For your sample data above, the output is:

X-Axis scales not matching with 2 data sets on same plot

I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()

Python scatter plot different colors depending on value

I have a dataframe which i want to make a scatter plot of.
the dataframe looks like:
year length Animation
0 1971 121 1
1 1939 71 1
2 1941 7 0
3 1996 70 1
4 1975 71 0
I want the points in my scatter plot to be a different color depending the value in the Animation row.
So animation = 1 = yellow
animation = 0 = black
or something similiar
I tried doing the following:
dfScat = df[['year','length', 'Animation']]
dfScat = dfScat.loc[dfScat.length < 200]
axScat = dfScat.plot(kind='scatter', x=0, y=1, alpha=1/15, c=2)
This results in a slider which makes it hard to tell the difference.
You can also assign discrete colors to the points by passing an array to c=
Like this:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = {"year" : (1971, 1939, 1941, 1996, 1975),
"length" : ( 121, 71, 7, 70, 71),
"Animation" : ( 1, 1, 0, 1, 0)}
df = pd.DataFrame(d)
print(df)
colors = np.where(df["Animation"]==1,'y','k')
df.plot.scatter(x="year",y="length",c=colors)
plt.show()
This gives:
Animation length year
0 1 121 1971
1 1 71 1939
2 0 7 1941
3 1 70 1996
4 0 71 1975
Use the c parameter in scatter
df.plot.scatter('year', 'length', c='Animation', colormap='jet')

Categories