Python boxplot fails at automatic plot boundaries/limits - python

I am manually putting a bunch of boxplots in a plot.
The code I am using is this (I am computing mean_, iqr, CL, etc. elsewhere):
A = np.random.random(2)
D = plt.boxplot(A, positions=np.atleast_1d(dist_val), widths=np.min(unique_dists_vals) / 10.) # a simple case with just one variable to boxplot
D['medians'][0].set_ydata(median_)
D['boxes'][0]._xy[[0,1,4], 1] = iqr[0]
D['boxes'][0]._xy[[2,3],1] = iqr[1]
D['whiskers'][0].set_ydata(np.array([iqr[0], CL[0]]))
D['whiskers'][1].set_ydata(np.array([iqr[1], CL[1]]))
D['caps'][0].set_ydata(np.array([CL[0], CL[0]]))
D['caps'][1].set_ydata(np.array([CL[1], CL[1]]))
I do this in a loop, putting one box plot per some location x.
I am not making any changes to the axis limits. The resulting figure looks like this:
what is going on with 1 x-tick?
the limits are just off on both x and y.
This appears to be a bug?
And no, I cannot just manually set the limits etc. since this has to be a completely general code.
What I have tried so far is:
During the loop when I compute the box plots, try keeping track of the largest y value seen so far and the largest x value etc. and then at the end manually set the bound to this. Other issues come up here, however, such as boxes extending beyond the plot etc. and then I manually have to adjust the limits to extend beyond the box width etc.
I have used both "ax.axis('auto')" and "ax.set_autoscale_on(True)" after plotting right before plt.show(), does not work:
While the first item in the list above does technically work (not ideal) I would like to know if there is a generic way to simply say: "done plotting, fix limits" (should automatically be done while plotting I guess?).
Thank you.

Related

matplolib arrow is creating a weird vertical line at the arrow head

I am trying to place an axis arrow.
For some reason, when I place an arrow at my plot it also creates a huge vertical line orders of magnitude bigger.
I am instantiating the arrow like this:
#examples of what would be found within x_length, set, y_length, and ax on the anomalous case
x_length=[30000000000.0]
y_length=[[7.7e-09, 1.613e-08]]
set=0
ax=plt.subplot(1,2,1)
#The problematic statement by itself
arrow=ax.arrow(x_length[set], 0, 0.04*x_length[set], 0, shape='full',head_width=max(y_length[set])*0.04,head_length=0.04*x_length[set],length_includes_head=True,color='black', zorder=2)
It works properly when y values are big (let's say "t_values>1"). Although, when the y values are small (let's say "y_values<1e-6"), this problem emerges.
The figures below show a case that what is expected happen and another with the anomalous behavior:
Based on this Figure, I think the lines always is drawn, but only noticed when y values are small
With large values it works as expected
Note: Using the zoom feature, it's possible to verify that the arrow is placed as expected although, this weird line is also placed at the arrow's head.
I have already tried to modify every single parameter, also applying constanst values instead of variables. Although, nothing worked. Moreover, even if a inclined arrow is placed, the unpleasant line is always vertical.
I solved the problem.
This weird line was infinitesimal arrow tail width. So, replacing the arrow method with the width arg included solved the problem. Since the default width is 1e-3, that was the reason why the issue only happened for plot in some orders of magnitude greater then this default width.
ax.arrow(x_max[set], 0, 0.04*x_length[set], 0,shape='full',head_width=y_length[set][i]*0.04,head_length=0.04*x_length[set],length_includes_head=True,color='black', zorder=2,width=max(y_length[set])*0.04)

How to start graph lines at 0 in the Y axis with Bokeh (Python)

I'm using Bokeh for showing line graphs at my Django/Python web.
By default, the graphs start at the minimum value provided, but I want them to start always at 0 in the Y-axis.
For example, in the following example it starts at 167 in the Y-axis (the minimum value in that data set), but I wanted to start at 0.
y_range seems to work fine if I want to define a minimum and a maximum, but I only want to define the minimum (0) and let the data "decide" the maximum.
I've tried using y_range=(0, None), min_border=0, start=0 and a bunch of other things, without success. ChatGPT keeps recommending me alternatives that don't really work or even exist.
This is my current code:
y = WHATEVER
x = WHATEVER
plot = figure(title='TITLE',
x_axis_type='datetime',
sizing_mode="stretch_width",
max_width=600,
height=400)
plot.line(x, y, line_width=4)
script, div = components(plot)
ChatGPT keeps recommending me alternatives that don't really work or even exist.
This should not be surprising, ChatGPT is not a serious or reliable source of accurate information.
In any case, the only thing you need to do is set:
plot.y_range.start = 0
with the default range (i.e. don't pass a range value to figure). That will keep auto-ranging for the upper y-axis but pin the start to 0.

Why am I unable to make a plot containing subplots in plotly using a px.scatter plot?

I have been trying to make a figure using plotly that combines multiple figures together. In order to do this, I have been trying to use the make_subplots function, but I have found it very difficult to have the plots added in such a way that they are properly formatted. I can currently make singular plots (as seen directly below):
However, whenever I try to combine these singular plots using make_subplots, I end up with this:
This figure has the subplots set up completely wrong, since I need each of the four subplots to contain data pertaining to the four methods (A, B, C, and D). In other words, I would like to have four subplots that look like my singular plot example above.
I have set up the code in the following way:
for sequence in sequences:
#process for making sequence profile is done here
sequence_df = pd.DataFrame(sequence_profile)
row_number=1
grand_figure = make_subplots(rows=4, cols=1)
#there are four groups per sequence, so the grand figure should have four subplots in total
for group in sequence_df["group"].unique():
figure_df_group = sequence_df[(sequence_df["group"]==group)]
figure_df_group.sort_values("sample", ascending=True, inplace=True)
figure = px.line(figure_df_group, x = figure_df_group["sample"], y = figure_df_group["intensity"], color= figure_df_group["method"])
figure.update_xaxes(title= "sample")
figure.update_traces(mode='markers+lines')
#note: the next line fails, since data must be extracted from the figure, hence why it is commented out
#grand_figure.append_trace(figure, row = row_number, col=1)
figure.update_layout(title_text="{} Profile Plot".format(sequence))
grand_figure.append_trace(figure.data[0], row = row_number, col=1)
row_number+=1
figure.write_image(os.path.join(output_directory+"{}_profile_plot_subplots_in_{}.jpg".format(sequence, group)))
grand_figure.write_image(os.path.join(output_directory+"grand_figure_{}_profile_plot_subplots.jpg".format(sequence)))
I have tried following directions (like for example, here: ValueError: Invalid element(s) received for the 'data' property) but I was unable to get my figures added as is as subplots. At first it seemed like I needed to use the graph object (go) module in plotly (https://plotly.com/python/subplots/), but I would really like to keep the formatting/design of my current singular plot. I just want the plots to be conglomerated in groups of four. However, when I try to add the subplots like I currently do, I need to use the data property of the figure, which causes the design of my scatter plot to be completely messed up. Any help for how I can ameliorate this problem would be great.
Ok, so I found a solution here. Rather than using the make_subplots function, I just instead exported all the figures onto an .html file (Plotly saving multiple plots into a single html) and then converted it into an image (HTML to IMAGE using Python). This isn't exactly the approach I would have preferred to have, but it does work.
UPDATE
I have found that plotly express offers another solution, as the px.line object has the parameter of facet that allows one to set up multiple subplots within their plot. My code is set up like this, and is different from the code above in that the dataframe does not need to be iterated in a for loop based on its groups:
sequence_df = pd.DataFrame(sequence_profile)
figure = px.line(sequence_df, x = sequence_df["sample"], y = sequence_df["intensity"], color= sequence_df["method"], facet_col= sequence_df["group"])
Although it still needs more formatting, my plot now looks like this, which is works much better for my purposes:

Modifying bar-width and bar-position in matplotlib bar-plot (looping over containers)

I'm trying to adapt the following strategy (taken from here) to adjust the sizes of bars in matplotlib barplots
# Iterate over bars
for container in ax.containers:
# Each bar has a Rectangle element as child
for i,child in enumerate(container.get_children()):
# Reset the lower left point of each bar so that bar is centered
child.set_y(child.get_y()- 0.125 + 0.5-hs[i]/2)
# Attribute height to each Recatangle according to country's size
plt.setp(child, height=hs[i])
but have encountered a strange behaviour when using this on a plot based on a two-columns DataFrame. The relevant part of the code is almost identical:
for container in axes.containers:
for size, child in zip(sizes, container.get_children()):
child.set_x(child.get_x()- 0.50 + 0.5-size/2)
plt.setp(child, width=size)
The effect I get is that the size of the width of the bars (I'm using in in bar-chart; not an hbar) is changed as intended, but that the re-centering is only applied to the bars that correspond to the second column of the DataFrame (I've swapped them to check), which corresponds to the lighter blue in the figure below.
I don't quite see how this could happen, since both changes seem to be applied as part of the same loop. I also find it difficult to trouble-shoot since in my case the outer-loop goes through two containers, and the inner-loop goes through as many children as there are bars (and this for each container).
How could I start troubleshooting this? And how could I find out what I'm actually looping through? (I know each child is a rectangle-object, but this doesn't yet tell me the difference between the rectangles in both containers)
Apparently the following approach works better when modifying vertical bar-plots:
for container in axes.containers:
for i, child in enumerate(container.get_children()):
child.set_x(df.index[i] - sizes[i]/2)
plt.setp(child, width=sizes[i])
So the main difference with the original approach I was adapting is that I do not get the current x_position of the container, but re-use the index of the DataFrame to set the x_position of the container at the index minus half of its new width.

Matplotlib adding overlay labels to an axis

In matplotlib I wish to know the cleanest and most robust means of overlaying labels onto an axis. This is probably best demonstrated with an example:
While normal axis labels/ticks are placed every 5.00 units additional labels without ticks have been overlayed onto the axis (this can be seen at 1113.75 which partially covers 1114.00 and 1105.00 which is covered entirely). The labels also have the same font and size as their normal, ticked, counterparts with the background (if any) going right up to the axis (as a tick mark would).
What is the simplest way of obtaining this effect in matplotlib?
Edit
Following on from #Ken's suggestion I have managed to obtain the effect for an existing tick/label by using ax.yaxis.get_ticklines and ax.yaxis.get_ticklabels to both remove the tick marker and change the background/font/zorder of a label. However, I am unsure how best to add a new tick/label to an axis.
In other words I am looking for a function add_tick(ax.yaxis, loc) that adds a tick at location loc and returns the tickline and ticklabel objects for me to operate on.
I haven't ever tried to do that, but I think that the Artist tutorial might be helpful for you. In particular, the last section has the following code:
for line in ax1.yaxis.get_ticklines():
# line is a Line2D instance
line.set_color('green')
line.set_markersize(25)
line.set_markeredgewidth(3)
I think that using something like line.set_markersize(0) might make the markers have size zero. The difficult part might be finding the ones that need that done. It is possible that the line.xdata or line.ydata arrays might contain enough information to isolate the ones you need. Of course, if you are manually adding the tick marks, it is possible that as you do that the instance gets returned, so you can just modify them as you create them.
The best solution I have been able to devise:
# main: axis; olocs: locations list; ocols: location colours
def overlay_labels(main, olocs, ocols):
# Append the overlay labels as ticks
main.yaxis.set_ticks(np.append(main.yaxis.get_ticklocs(), olocs))
# Perform generic formatting to /all/ ticks
# [...]
labels = reversed(main.yaxis.get_ticklabels())
markers = reversed(main.yaxis.get_ticklines()[1::2]) # RHS ticks only
glines = reversed(main.yaxis.get_gridlines())
rocols = reversed(ocols)
# Suitably format each overlay tick (colours and lines)
for label,marker,grid,colour in izip(labels, markers, glines, rocols):
label.set_color('white')
label.set_backgroundcolor(colour)
marker.set_visible(False)
grid.set_visible(False)
It is not particularly elegant but does appear to work.

Categories