I have a pandas dataframe like this:
Date
Weight
Year
Month
Day
Week
DayOfWeek
0
2017-11-13
76.1
2017
11
13
46
0
1
2017-11-14
76.2
2017
11
14
46
1
2
2017-11-15
76.6
2017
11
15
46
2
3
2017-11-16
77.1
2017
11
16
46
3
4
2017-11-17
76.7
2017
11
17
46
4
...
...
...
...
...
...
...
...
I created a JoinGrid with:
g = sns.JointGrid(data=df,
x="Date",
y="Weight",
marginal_ticks=True,
height=6,
ratio=2,
space=.05)
Then a defined joint and marginal plots:
g.plot_joint(sns.scatterplot,
hue=df["Year"],
alpha=.4,
legend=True)
g.plot_marginals(sns.histplot,
multiple="stack",
bins=20,
hue=df["Year"])
Result is this.
Now the question is: "is it possible to specify different binning for the two histplot resulting in the x and y marginal plot?"
I don't think there is a built-in way to do that, by you can plot directly on the marginal axes using the plotting function of your choice, like so:
penguins = sns.load_dataset('penguins')
data = penguins
x_col = "bill_length_mm"
y_col = "bill_depth_mm"
hue_col = "species"
g = sns.JointGrid(data=data, x=x_col, y=y_col, hue=hue_col)
g.plot_joint(sns.scatterplot)
# top marginal
sns.histplot(data=data, x=x_col, hue=hue_col, bins=5, ax=g.ax_marg_x, legend=False, multiple='stack')
# right marginal
sns.histplot(data=data, y=y_col, hue=hue_col, bins=40, ax=g.ax_marg_y, legend=False, multiple='stack')
Related
I have a dataframe which looks like this:
MM Initial Energy MM Initial Angle QM Energy QM Angle
0 13.029277 120.0 18.048 120.0
1 11.173115 125.0 15.250 125.0
2 9.411475 130.0 12.668 130.0
3 7.762888 135.0 10.309 135.0
4 6.239025 140.0 8.180 140.0
5 4.853004 145.0 6.286 145.0
6 3.617394 150.0 4.633 150.0
7 2.544760 155.0 3.226 155.0
8 1.646335 160.0 2.070 160.0
9 0.934298 165.0 1.166 165.0
10 0.419003 170.0 0.519 170.0
11 0.105913 175.0 0.130 175.0
12 0.000000 -180.0 0.000 -180.0
13 0.105988 -175.0 0.130 -175.0
14 0.420029 -170.0 0.519 -170.0
15 0.937312 -165.0 1.166 -165.0
16 1.650080 -160.0 2.070 -160.0
17 2.548463 -155.0 3.227 -155.0
18 3.621227 -150.0 4.633 -150.0
19 4.856266 -145.0 6.286 -145.0
20 6.236939 -140.0 8.180 -140.0
21 7.760035 -135.0 10.309 -135.0
22 9.409117 -130.0 12.669 -130.0
23 11.170671 -125.0 15.251 -125.0
24 13.033293 -120.0 18.048 -120.0
I want to plot the data with Angles on the x-axis and energy on the y. This sounds fairly simple, however what happens is that pandas or matplotlib sorts the X-axis values in a such a manner that my plot looks split. This is what it looks like:
However, this is how I want it:
My code is as follows:
df=pd.read_fwf('scan_c1c2c3h31_orig.txt', header=None, prefix='X')
df.rename(columns={'X0':'MM Initial Energy',
'X1':'MM Initial Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df=df.sort_values(by=['MM Initial Angle'], axis=0, ascending=True)
df=df.reset_index(drop=False)
df2=pd.read_fwf('scan_c1c2c3h31.txt', header=None, prefix='X')
df2.rename(columns={'X0':'MM Energy',
'X1':'MM Angle',
'X2':'QM Energy', 'X3':'QM Angle'},
inplace=True)
df2=df2.sort_values(by=['MM Angle'], axis=0, ascending=True)
df2=df2.reset_index(drop=False)
df
df2
ax = plt.axes()
df.plot(y="MM Initial Energy", x="MM Initial Angle", color='red', linestyle='dashed',linewidth=2.0, ax=ax, fontsize=20, legend=True)
df2.plot(y="MM Energy", x="MM Angle", color='red', ax=ax, linewidth=2.0, fontsize=20, legend=True)
df2.plot(y="QM Energy", x="QM Angle", color='blue', ax=ax, linewidth=2.0, fontsize=20, legend=True)
plt.ylim(-0.05, 6)
ax.xaxis.set_major_locator(MultipleLocator(20))
ax.xaxis.set_minor_locator(MultipleLocator(10))
ax.yaxis.set_minor_locator(MultipleLocator(0.5))
plt.xlabel('Angles (Degrees)', fontsize=25)
plt.ylabel('Energy (kcal/mol)', fontsize=25)
What I am doing is, sorting the dataframe by 'MM Angles'/'MM Initial Angles' to avoid plot "scarambling" due to repeating values in the y-axis.The angles vary from -180 to 180, where I want the -180 and +180 next to each other.
I have tried sorting the negative values in ascending order and positive values in descending order as suggested in this post, but I still get the same plot where x axis ranges from -180 to +180.
I have also tried matplotlib axis spines to recenter the plot, and I have also tried inverting the x-axis as suggested in this post, but still get the same plot. Additionally, I have also tried suggestion in this another post.
Any help will be appreciated.
If you don't need to rescale the plot, I would plot against the positive angles 0-360 and manually re-label the ticks:
fig, ax = plt.subplots()
(df.assign(Angle=df['MM Initial Angle']%360)
.plot(x='Angle', y=['QM Energy','MM Initial Energy'], ax=ax)
)
ax.xaxis.set_major_locator(MultipleLocator(20))
x_ticks = ax.get_xticks()
x_ticks = [t-360 if t>180 else t for t in x_ticks]
ax.set_xticklabels(x_ticks)
plt.plot()
Output:
I'd like to make this type of plot with multiple columns separated by small whitespace, each having different category having 3-5 (5 in this example) different observations with varying values on y axis:
actually, i can plot this plot use ggplot2. for example:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(dplyr)
library(ggplot2)
mtcars %>% reshape2::melt() %>%
ggplot(aes(x = variable, y = value)) +
geom_point() + facet_grid(~ variable) +
theme(axis.text.x = element_blank())
you set a categorical variable in your dataset,then use the facet_grid(~).this function can change your plot into multiple plot by your categrical variable
Here is an approach to draw a similar plot using Python's matplotlib. The plot has a grey background and white major and minor gridlines to delimit the zones. Getting the dots in the center of each little cell is somewhat tricky: divide into n+1 spaces and shift half a cell (1/2n). A secondary x-axis can be used to set the labels. A zorder has to be set to have the dots on top of the gridlines.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import ticker
n = 5
cols = 7
values = [np.random.uniform(1, 10, n) for c in range(cols)]
fig, ax = plt.subplots()
ax.set_facecolor('lightgrey')
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.xaxis.set_minor_locator(ticker.MultipleLocator(1 / (n)))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
ax.grid(True, which='both', axis='both', color='white')
ax.set_xticklabels([])
ax.tick_params(axis='x', which='both', length=0)
ax.grid(which='major', axis='both', lw=3)
ax.set_xlim(1, cols + 1)
for i in range(1, cols + 1):
ax.scatter(np.linspace(i, i + 1, n, endpoint=False) + 1 / (2 * n), values[i-1], c='crimson', zorder=2)
ax2 = ax.twiny()
ax2.set_xlim(0.5, cols + 0.5)
ticks = range(1, cols + 1)
ax2.set_xticks(ticks)
ax2.set_xticklabels([f'Cat_{t:02d}' for t in ticks])
bbox = dict(boxstyle="round", ec="limegreen", fc="limegreen", alpha=0.5)
plt.setp(ax2.get_xticklabels(), bbox=bbox)
ax2.tick_params(axis='x', length=0)
plt.show()
I have these four datasets like df1. and I want to print them into scatter diagram like 2*2.
df1
Height time_of_day resolution clusters
11 3.146094 0.458333 0.594089 0
90 0.191690 0.541667 0.594089 0
99 1.300386 1.666667 0.594089 1
121 3.054903 2.083333 0.594089 0
df2
Height time_of_day resolution clusters
10 3.146094 0.458333 0.594089 0
60 3.191690 0.541667 0.594089 0
87 1.300386 1.666667 0.594089 1
121 3.054903 1.083333 0.594089 0
df3
Height time_of_day resolution clusters
13 3.146094 0.458333 0.594089 0
61 3.191690 0.541667 0.594089 0
86 1.300386 1.666667 0.594089 1
113 4.054903 1.083333 0.594089 0
df4
Height time_of_day resolution clusters
10 3.146094 0.458333 0.594089 0
20 3.191690 0.541667 0.594089 0
37 1.300386 1.666667 0.594089 1
121 3.054903 1.083333 0.594089 0
I have tried several methods and all of them was not work.
dics = [df1,df2,df3,df4]
rows = range(4)
fig, ax = plt.subplots(2,2,squeeze=False,figsize = (20,10))
for x in rows:
for i,dic in enumerate(dics):
sns.lmplot(x="time_of_day", y="Height",fit_reg=False,hue="clusters", data=dic[x], height=6, aspect=1.5)
plt.show()
And this is the single code for scatter plot
sns.lmplot(x="time_of_day", y="Height",fit_reg=False,hue="clusters", data=summer_spike_df, height=6, aspect=1.5)
What code should I change in order to print into 2*2 with different results of scatter plot?
Thank you
If you're not plotting the regression line, then why not just use seaborn.scatterplot.
You can use the zip function and array.ravel to plot using:
fig, axes = plt.subplots(2,2,squeeze=False,figsize = (20,10))
for df, ax in zip(dics, axes.ravel()):
sns.scatterplot(x="time_of_day", y="Height",hue="clusters", data=df, ax=ax)
plt.show()
I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()
I need to compare different sets of daily data between 4 shifts(categorical / groupby), using bar graphs and line graphs. I have looked everywhere and have not found a working solution for this that doesn't include generating new pivots and such.
I've used both, matplotlib and seaborn, and while I can do one or the other(different colored bars/lines for each shift), once I incorporate the other, either one disappears, or other anomalies happen like only one plot point shows. I have looked all over and there are solutions for representing a single series of data on both chart types, but none that goes into multi category or grouped for both.
Data Example:
report_date wh_id shift Head_Count UTL_R
3/17/19 55 A 72 25%
3/18/19 55 A 71 10%
3/19/19 55 A 76 20%
3/20/19 55 A 59 33%
3/21/19 55 A 65 10%
3/22/19 55 A 54 20%
3/23/19 55 A 66 14%
3/17/19 55 1 11 10%
3/17/19 55 2 27 13%
3/17/19 55 3 18 25%
3/18/19 55 1 23 100%
3/18/19 55 2 16 25%
3/18/19 55 3 12 50%
3/19/19 55 1 28 10%
3/19/19 55 2 23 50%
3/19/19 55 3 14 33%
3/20/19 55 1 29 25%
3/20/19 55 2 29 25%
3/20/19 55 3 10 50%
3/21/19 55 1 17 20%
3/21/19 55 2 29 14%
3/21/19 55 3 30 17%
3/22/19 55 1 12 14%
3/22/19 55 2 10 100%
3/22/19 55 3 17 14%
3/23/19 55 1 16 10%
3/23/19 55 2 11 100%
3/23/19 55 3 13 10%
tm_daily_df = pd.read_csv('fg_TM_Daily.csv')
tm_daily_df = tm_daily_df.set_index('report_date')
fig2, ax2 = plt.subplots(figsize=(12,8))
ax3 = ax2.twinx()
group_obj = tm_daily_df.groupby('shift')
g = group_obj['Head_Count'].plot(kind='bar', x='report_date', y='Head_Count',ax=ax2,stacked=False,alpha = .2)
g = group_obj['UTL_R'].plot(kind='line',x='report_date', y='UTL_R', ax=ax3,marker='d', markersize=12)
plt.legend(tm_daily_df['shift'].unique())
This code has gotten me the closest I've been able to get. Notice that even with stacked = False, they are still stacked. I changed the setting to True, and nothing changes.
All i need is for the bars to be next to each other with the same color scheme representative of the shift
The graph:
Here are two solutions (stacked and unstacked). Based on your questions we will:
plot Head_Count in the left y axis and UTL_R in the right y axis.
report_date will be our x axis
shift will represent the hue of our graph.
The stacked version uses pandas default plotting feature, and the unstacked version uses seaborn.
EDIT
From your request, I added a 100% stacked graph. While it is not quite exactly what you asked in the comment, the graph type you asked may create some confusion when reading (are the values based on the upper line of the stack or the width of the stack). An alternative solution may be using a 100% stacked graph.
Stacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
Stacked 100%
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
# Create `Head_Count_Pct` column
for date in dfg.index.get_level_values('report_date').unique():
for shift in dfg.loc[date, :].index.get_level_values('shift').unique():
dfg.loc[(date, shift), 'Head_Count_Pct'] = dfg.loc[(date, shift), 'Head_Count'].sum() / dfg.loc[(date, 'A'), 'Head_Count'].sum()
fig, ax = plt.subplots(figsize=(12,6))
ax2 = ax.twinx()
pal = sns.color_palette("Set1")
dfg[dfg.index.get_level_values('shift').isin(['1','2','3'])]['Head_Count_Pct'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.5, color=pal)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None, color=pal)
ax.set_title('My Graph')
plt.show()
Unstacked
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(15,6))
ax2 = ax.twinx()
sns.barplot(x=dfg.index.get_level_values('report_date'),
y=dfg.Head_Count,
hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)
sns.lineplot(x=dfg.index.get_level_values('report_date'),
y=dfg.UTL_R,
hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)
ax.set_title('My Graph')
plt.show()
EDIT #2
Here is the graph as you requested in a second time (stacked, but stack n+1 does not start where stack n ends).
It is slightly more involving as we have to do multiple things:
- we need to manually assign our color to our shift in our df
- once we have our colors assign, we will iterate through each date range and 1) sort or Head_Count values descending (so that our largest sack is in the back when we plot the graph), and 2) plot the data and assign the color to each stacj
- Then we can create our second y axis and plot our UTL_R values
- Then we need to assign the correct color to our legend labels
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def assignColor(shift):
if shift == 'A':
return 'R'
if shift == '1':
return 'B'
if shift == '2':
return 'G'
if shift == '3':
return 'Y'
# map a color to a shift
df['color'] = df['shift'].apply(assignColor)
fig, ax = plt.subplots(figsize=(15,6))
# plot our Head_Count values
for date in df.report_date.unique():
d = df[df.report_date == date].sort_values(by='Head_Count', ascending=False)
y = d.Head_Count.values
x = date
color = d.color
b = plt.bar(x,y, color=color)
# Plot our UTL_R values
ax2 = ax.twinx()
sns.lineplot(x=df.report_date, y=df.UTL_R, hue=df['shift'], marker='o', legend=None)
# Assign the color label color to our legend
leg = ax.legend(labels=df['shift'].unique(), loc=1)
legend_maping = dict()
for shift in df['shift'].unique():
legend_maping[shift] = df[df['shift'] == shift].color.unique()[0]
i = 0
for leg_lab in leg.texts:
leg.legendHandles[i].set_color(legend_maping[leg_lab.get_text()])
i += 1
How about this?
tm_daily_df['UTL_R'] = tm_daily_df['UTL_R'].str.replace('%', '').astype('float') / 100
pivoted = tm_daily_df.pivot_table(values=['Head_Count', 'UTL_R'],
index='report_date',
columns='shift')
pivoted
# Head_Count UTL_R
# shift 1 2 3 A 1 2 3 A
# report_date
# 3/17/19 11 27 18 72 0.10 0.13 0.25 0.25
# 3/18/19 23 16 12 71 1.00 0.25 0.50 0.10
# 3/19/19 28 23 14 76 0.10 0.50 0.33 0.20
# 3/20/19 29 29 10 59 0.25 0.25 0.50 0.33
# 3/21/19 17 29 30 65 0.20 0.14 0.17 0.10
# 3/22/19 12 10 17 54 0.14 1.00 0.14 0.20
# 3/23/19 16 11 13 66 0.10 1.00 0.10 0.14
fig, ax = plt.subplots()
pivoted['Head_Count'].plot.bar(ax=ax)
pivoted['UTL_R'].plot.line(ax=ax, legend=False, secondary_y=True, marker='D')
ax.legend(loc='upper left', title='shift')