Different box plots on the same oX position - python

I am trying to combine box plots with a scatter plot for an algorithm scoring visualization. My data is divided as following:
oX - information about the time period (1 year, 2 years, etc.)
oY - information about the score
2 algorithms for each period with different simulation results (plotted as boxplots)
2 heuristics with a single value (plotted as a point)
I'm trying to easily compare method efficiency for each period of time.
Small sample data:
1 year 2 years
A1 A2 H1 H2 A1 A2 H1 H2
124 168 155 167 130 130 150 164
102 155 100 172
103 153 117 145
102 132 145 143
145 170 133 179
136 125 115 153
116 150 136 131
146 192 106 148
124 122 127 158
128 123 149 200
141 158 137 156
I'm trying to get something that looks like this:
So far I've cleared up my data to have the observations for each algorithm (RS, EA) and for each period (52, 104, 156 etc.) separately like so but I can't figure out how to group them per period while drawing 2 different boxplots for the same X tick. I assume once I'd sort out the boxplot dataframe and plot, I can just plot the scatter on top.

Managed to solve this meanwhile, in case it helps anyone else out:
ax1 = sns.boxplot(data = meta, x = 'Time', y = 'PRS', color = '#880BDD', linewidth=0.8)
ax1 = sns.boxplot(data = meta, x = 'Time', y = 'EA', color = '#0BC9DD', linewidth=0.8)
ax1 = sns.boxplot(data = meta, x = 'Time', y = 'ERS', color = '#9BD19D', linewidth=0.8)
ax1 = sns.pointplot(data = simple, x = 'Time', y = 'Greedy Average', color='#FFC48C', markers ='s', join=False)
ax1 = sns.pointplot(data = simple, x = 'Time', y = 'Greedy Total', color='#FF9F80', markers='o', join=False)
ax1 = sns.pointplot(data = simple, x = 'Time', y = 'Greedy Weeks', color='#F56991', markers='*', join=False)
ax1.set(xlabel = "Planning Horizon (weeks)")
ax1.set(ylabel = "Hypervolume")
EA = mpatches.Patch(color='#0BC9DD', label = 'EA')
PRS = mpatches.Patch(color='#880BDD', label = 'PRS')
ERS = mpatches.Patch(color='#9BD19D', label = 'ERS')
GA = mlines.Line2D([], [], color='#FFC48C', marker = 's', label = 'Greedy Average')
GT = mlines.Line2D([], [],color='#FF9F80', label = 'Greedy Total', marker = 'o')
GW = mlines.Line2D([], [],color='#F56991', label = 'Greedy Weeks', marker = '*')
plt.legend(handles = [EA, ERS, PRS, GA, GT, GW], loc = 'bottom left', title = "Algorithm")
ax1.set_title("Algorithm Comparison")
Results in this:

Related

How to plot more than 1 graph in 1 figure with 3D Mesh Plotly?

I have a problem with my plotting.
I want to plot multiple meshes in one graph, and each mesh is marked by label.
This is what the data looks like:
I only could plot 1 mesh. Please help.
this is my code (just one mesh) :
import numpy as np
import pandas as pd
import plotly.graph_objects as go
geob_data = pd.read_csv("Geobody.csv")
x = list(geob_data["X"])
y = list(geob_data["Y"])
z = list(geob_data["Z"])
label = list(geob_data["LABEL"])
fig = go.Figure(data=[go.Mesh3d(x=x, y=y, z=z, color='green',
opacity=1, alphahull=0)])
fig.show()
Your question was code with the understanding that you want to draw two meshes on a 3D graph. The key is to extract and add a graph for each label.
import pandas as pd
import io
data = '''
X Y Z LABEL
500 500 -200 1
500 500 -180 1
505 505 -190 1
495 495 -190 1
495 505 -190 1
505 495 -190 1
400 400 -150 2
400 400 -130 2
405 405 -140 2
395 395 -140 2
395 405 -140 2
405 395 -140 2
'''
geob_data = pd.read_csv(io.StringIO(data), delim_whitespace=True)
import plotly.graph_objects as go
#geob_data = pd.read_csv("Geobody.csv")
x = list(geob_data["X"])
y = list(geob_data["Y"])
z = list(geob_data["Z"])
label = list(geob_data["LABEL"])
fig = go.Figure()
for lbl in geob_data['LABEL'].unique():
df = geob_data.query('LABEL == #lbl')
colors = 'green' if lbl == 1 else 'red'
fig.add_trace(go.Mesh3d(x=df['X'].tolist(),
y=df['Y'].tolist(),
z=df['Z'].tolist(),
color=colors,
opacity=1,
alphahull=0
))
fig.update_layout(
autosize=False,
height=600,
width=600,
)
fig.show()

number of observation on violinplot is not good

I use this tips https://python-graph-gallery.com/58-show-number-of-observation-on-violinplot/ to add Number of observation on a violon plot.
Here is m code:
# Calculate number of obs per group & median to position labels
medians = dataset.groupby([x_attrib])[y_attrib].median().values
nobs = dataset[x_attrib].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
#nobs = ["Nb: " + i for i in nobs]
nobs = [i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick,label in zip(pos,ax.get_xticklabels()):
ax.text(pos[tick], medians[tick] + 0.03, nobs[tick], horizontalalignment='center', size='x-large', color='black', weight='semibold')
I plot variable with these value counts:
0 355
1 174
2 36
-1 19
3 15
4 5
...
As you can see on the plot, for -1 value: real count is 19 and the plot return 355 (count for 0 value)
How can i modify the code to get a good plot please?
Thanks a lot.
Theo

X-Axis scales not matching with 2 data sets on same plot

I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()

Set Xticks frequency to dataframe index

I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis).
To create the dataframe:
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot = df_.fillna(0)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
Total Population Urban Population
1990 150 50
1991 151 53
1992 152 56
1993 153 59
1994 154 62
1995 155 65
1996 156 68
1997 157 71
1998 158 74
1999 159 77
2000 160 80
2001 161 83
2002 162 86
2003 163 89
2004 164 92
2005 165 95
2006 166 98
2007 167 101
2008 168 104
2009 169 107
2010 170 110
2011 171 113
2012 172 116
2013 173 119
2014 174 122
The code that I currently have:
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1, xticklabels=pop_plot.index)
plt.subplot(2, 2, 1)
plt.plot(pop_plot)
legend = plt.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(range(len(pop_plot.index)))
This is the plot that I get:
When I comment the set_xticks I get the following plot:
#ax1.set_xticks(range(len(pop_plot.index)))
I've tried a couple of answers that I found here, but I didn't have much success.
It's not clear what ax1.set_xticks(range(len(pop_plot.index))) should be used for. It will set the ticks to the numbers 0,1,2,3 etc. while your plot should range from 1990 to 2014.
Instead, you want to set the ticks to the numbers of your data:
ax1.set_xticks(pop_plot.index)
Complete corrected example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1)
ax1.plot(pop_plot)
legend = ax1.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(pop_plot.index)
plt.show()
The easiest option is to use the xticks parameter for pandas.DataFrame.plot
Pass the dataframe index to xticks: xticks=pop_plot.index
# given the dataframe in the OP
ax = pop_plot.plot(xticks=pop_plot.index, figsize=(15, 5))
# move the legend
ax.legend(bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand', frameon=False)

"Inputs x and y must be 1D or 2D" error in matplotlib

I am trying to plot some data from a big file.
The data has the following form:
0.025876 139 0
0.030881 140 0
0.030982 141 0
0.035602 142 0
0.035521 143 0
0.038479 144 0
0.040668 145 0
0.040121 146 0
0.037953 147 0
0.039027 148 0
0.038338 149 0
0.047557 139 1
0.045105 140 1
0.044943 141 1
0.042370 142 1
0.042025 143 1
0.038946 144 1
0.037953 145 1
0.033373 146 1
0.030070 147 1
0.029118 148 1
0.025552 149 1
In principle, each line corresponds to a three dimensional point and I would "simply" like to plot a 3d surface generated from these points akin to what I could do with the splot function in gnuplot for those of you that know about it.
Going on the net to find an answer to my problem, I tried the following thing with the matplolib contour function:
#!/usr/bin/python
from numpy import *
import pylab as p
import sys
import mpl_toolkits.mplot3d.axes3d as p3
s = str(sys.argv[1])
f = open(s)
z,y,x = loadtxt(f, unpack = True)
f.close
#x = [1,2,3]
#y = [1,2,3]
#z = [1,8,16]
data = zip(x,y,z)
#map data on the plane
X, Y = meshgrid(arange(0, 89, 1), arange(0, 300, 1))
Z = zeros((len(X),len(Y)),'Float32')
for x_,y_,z_ in data:
Z[x_, y_] = z_ #this should work, but only because x and y are integers
#and arange was done with a step of 1, starting from 0
fig=p.figure()
ax = p3.Axes3D(fig)
ax.contourf(X,Y,Z)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
p.show()
This piece of code worked actually fine with the vectors x,y and z commented with an hashtag in the above code.
But know that I am trying with the data given above, I get "Inputs x and y must be 1D or 2D" error in matplotlib.
I have read that this could be related to the fact that Z does not have the same shape as X or Y...but I am not sure how to deal with this problem.
By the way, as you probably realized, I am a super newbie in Python and I apologize if the code appears very ugly to some of you.
In any case, any help will be very much welcome.
Thanks !
Fabien
Using scipy.interpolate.griddata:
import io
import sys
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
import scipy.interpolate as interpolate
content = '''0.025876 139 0
0.030881 140 0
0.030982 141 0
0.035602 142 0
0.035521 143 0
0.038479 144 0
0.040668 145 0
0.040121 146 0
0.037953 147 0
0.039027 148 0
0.038338 149 0
0.047557 139 1
0.045105 140 1
0.044943 141 1
0.042370 142 1
0.042025 143 1
0.038946 144 1
0.037953 145 1
0.033373 146 1
0.030070 147 1
0.029118 148 1
0.025552 149 1'''
data = np.genfromtxt(io.BytesIO(content), dtype=None, names='x, y, z')
# Or, to read from a file:
# data = np.genfromtxt(filename, dtype=None, names='x, y, z')
x, y, z = data['x'], data['y'], data['z']
N = 20
xi = np.linspace(x.min(), x.max(), N)
yi = np.linspace(y.min(), y.max(), N)
X, Y = np.meshgrid(xi, yi)
Z = interpolate.griddata((x, y), z, (X, Y), method='nearest')
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(data['x'], data['y'], data['z'])
ax.plot_wireframe(X, Y, Z, rstride=1, cstride=1)
# ax.plot_surface(X, Y, Z)
plt.show()
yields
Relevant links:
scipy.interpolate.griddata
np.genfromtxt
Axes3D.plot_wireframe
Axes3D.plot_surface

Categories