Accurate line ends with matplotlib - python

The picture shows some graphs plotted on top of each other.
The thin ones have 4 data points using the style:
plot(xdata,ydata),'|-',lw=1.5,markersize=10)
and the thick, shorter ones spanning only a subset of the data points using:
plot(xdata[-2:-1],ydata[-2:-1],'-',lw=4.5)
The thick lines are however overshooting at their ends. How can I make them end right at the data points and coincide with the markers?

I think this is because the default cap style on lines is "projecting", while you need it to be "butt". If so, something like this should help:
overlapped = plot(xdata[-2:-1],ydata[-2:-1],'-',lw=4.5)
for item in overlapped:
item.set_solid_capstyle('butt')

Related

matplolib arrow is creating a weird vertical line at the arrow head

I am trying to place an axis arrow.
For some reason, when I place an arrow at my plot it also creates a huge vertical line orders of magnitude bigger.
I am instantiating the arrow like this:
#examples of what would be found within x_length, set, y_length, and ax on the anomalous case
x_length=[30000000000.0]
y_length=[[7.7e-09, 1.613e-08]]
set=0
ax=plt.subplot(1,2,1)
#The problematic statement by itself
arrow=ax.arrow(x_length[set], 0, 0.04*x_length[set], 0, shape='full',head_width=max(y_length[set])*0.04,head_length=0.04*x_length[set],length_includes_head=True,color='black', zorder=2)
It works properly when y values are big (let's say "t_values>1"). Although, when the y values are small (let's say "y_values<1e-6"), this problem emerges.
The figures below show a case that what is expected happen and another with the anomalous behavior:
Based on this Figure, I think the lines always is drawn, but only noticed when y values are small
With large values it works as expected
Note: Using the zoom feature, it's possible to verify that the arrow is placed as expected although, this weird line is also placed at the arrow's head.
I have already tried to modify every single parameter, also applying constanst values instead of variables. Although, nothing worked. Moreover, even if a inclined arrow is placed, the unpleasant line is always vertical.
I solved the problem.
This weird line was infinitesimal arrow tail width. So, replacing the arrow method with the width arg included solved the problem. Since the default width is 1e-3, that was the reason why the issue only happened for plot in some orders of magnitude greater then this default width.
ax.arrow(x_max[set], 0, 0.04*x_length[set], 0,shape='full',head_width=y_length[set][i]*0.04,head_length=0.04*x_length[set],length_includes_head=True,color='black', zorder=2,width=max(y_length[set])*0.04)

Grouped bar chart of multiindex

first of all: I'm completely new to python.
I'm trying to visualize some measured data. Each entry has a quadrant, number and sector. The original data lies in a .xlsx file. I've managed to use a .pivot_table to sort the data according to its sector. Due to overlapping, number and quadrant also have to be indexed. Now I want to plot it as a bar chart, where the bars are grouped by sector and the colors represent the quadrant.
But because number also has to be indexed, it shows up in the bar chart as a separate group. There should only be three groups, 0, i and a.
MWE:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'quadrant': ["0","0","0","0","0","0","I","I","I","I","I","I","I","I","I","I","I","I","II","II","II","II","II","II","II","II","II","II","II","II","III","III","III","III","III","III","III","III","III","III","III","III","IV","IV","IV","IV","IV","IV","IV","IV","IV","IV","IV","IV"], 'sector': [0,"0","0","0","0","0","a","a","a","a","a","a","i","i","i","i","i","i","a","a","a","a","a","a","i","i","i","i","i","i","a","a","a","a","a","a","i","i","i","i","i","i","a","a","a","a","a","a","i","i","i","i","i","i"], 'number': [1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6], 'Rz_m': [67.90,44.17,44.30,63.43,49.87,39.33,61.17,69.37,66.20,44.20,64.77,39.93,44.33,50.97,55.90,51.33,58.23,44.53,50.03,47.40,58.67,71.57,57.60,70.77,63.93,47.37,46.90,34.73,41.27,48.23,58.30,47.07,50.53,51.20,32.67,50.37,37.50,55.50,41.20,48.07,56.80,49.77,40.87,44.43,44.00,60.03,63.73,72.80,51.60,45.53,60.27,71.00,59.63,48.70]}
df = pd.DataFrame(data=d)
B = df.pivot_table(index=['sector','number', 'quadrant'])
B.unstack().plot.bar(y='Rz_m')
The data viz ecosystem in Python is pretty diverse and there are multiple libraries you can use to produce the same chart. Matplotlib is a very powerful library, but it's also quite low-level, meaning you often have to do a lot of preparatory work before getting to the chart, so usually you'll find people use seaborn for static visualisations, especially if there is a scientific element to them (it has built-in support for things like error bars, etc.)
Out of the box, it has a lot of chart types to support exploratory data analysis and is built on top of matplotlib. For your example, if I understood it right, it would be as simple as:
import seaborn as sns
sns.catplot(x="sector", y="Rz_m", hue="quadrant", data=df, ci=None,
height=6, kind="bar", palette="muted")
And the output would look like this:
Note that in your example, you missed out "" for one of the zeroes and 0 and "0" are plotted as separate columns. If you're using seaborn, you don't need to pivot the data, just feed it the df as you've defined it.
For interactive visualisations (with tooltips, zoom, pan, etc.), you can also check out bokeh.
There is an interesting wrinkle to this - how to center the nested bars on the label. By default the bars are drawn with center alignment which works fine for an odd number of columns. However, for an even number, you'd want them to be centered on the right edge. You can make a small alteration in the source code categorical.py, lines beginning 1642 like so:
# Draw the bars
offpos = barpos + self.hue_offsets[j]
barfunc(offpos, self.statistic[:, j], -self.nested_width,
color=self.colors[j], align="edge",
label=hue_level, **kws)
Save the .png and then change it back, but it's not ideal. Probably worth flagging up to the library maintainers.

Venn3: How to reposition circles and labels?

I have made a three way venn diagram. I have three issues with it that I can't seem to solve.
What is the code to move the circle labels (i.e."Set1","Set2","Set3") because right now one is too far away from the circle.
What is the code to make the circles be three equal sizes/change the circle size?
What is the code to move the circles around the plot. Right now, set2 is within set3 (but coloured differently), I would like the diagram to look more like the "standard" way of showing a venn diagram (i.e. 3 separate circles with some overlap in the middle).
On another note, I found it difficult to find what the commands such as "set_x", "set_alpha" should be; if anyone knew of a manual that would answer by above questions I would appreciate it, I couldn't seem to find one place with all the information I needed.
import sys
import numpy
import scipy
from matplotlib_venn import venn3,venn3_circles
from matplotlib import pyplot as plt
#Build three lists to make 3 way venn diagram with
list_line = lambda x: set([line.strip() for line in open(sys.argv[x])])
set1,set2,set3 = list_line(1),list_line(2),list_line(3)
#Make venn diagram
vd = venn3([set1,set2,set3],set_labels=("Set1","Set2","Set3"))
#Colours: get the HTML codes from the net
vd.get_patch_by_id("100").set_color("#FF8000")
vd.get_patch_by_id("001").set_color("#5858FA")
vd.get_patch_by_id("011").set_color("#01DF3A")
#Move the numbers in the circles
vd.get_label_by_id("100").set_x(-0.55)
vd.get_label_by_id("011").set_x(0.1)
#Strength of color, 2.0 is very strong.
vd.get_patch_by_id("100").set_alpha(0.8)
vd.get_patch_by_id("001").set_alpha(0.6)
vd.get_patch_by_id("011").set_alpha(0.8)
plt.title("Venn Diagram",fontsize=14)
plt.savefig("output",format="pdf")
What is the code to move the circle labels (i.e."Set1","Set2","Set3") because right now one is too far away from the circle.
Something like that:
lbl = vd.get_label_by_id("A")
x, y = lbl.get_position()
lbl.set_position((x+0.1, y-0.2)) # Or whatever
The "A", "B", and "C" are predefined identifiers, denoting the three sets.
What is the code to make the circles be three equal sizes/change the circle size?
If you do not want the circle/region sizes to correspond to your data (not necessarily a good idea), you can get an unweighted ("classical") Venn diagram using the function venn3_unweighted:
from matplotlib_venn import venn3_unweighted
venn3_unweighted(...same parameters you used in venn3...)
You can further cheat and tune the result by providing a subset_areas parameter to venn3_unweighted - this is a seven-element vector specifying the desired relative size of each region. In this case the diagram will be drawn as if the region areas were subset_areas, yet the numbers will be shown from the actual subsets. Try, for example:
venn3_unweighted(...., subset_areas=(10,1,1,1,1,1,1))
What is the code to move the circles around the plot.
The need to "move the circles around" is somewhat unusual - normally you would either want the circles to be positioned so that their intersection sizes correspond to your data, or use the "default" positioning. The functions venn3 and venn3_unweighted cater to those two requirements. Moving circles around arbitrarily is possible, but would require some lower-level coding and I'd advice against that.
I found it difficult to find what the commands such as "set_x", "set_alpha" should be
The object you get when you call v.get_label_by_id is a Matplotlib Text object. You can read about its methods and properties here. The object returned by v.get_patch_by_id is a PathPatch, look here and here for reference.

Python boxplot fails at automatic plot boundaries/limits

I am manually putting a bunch of boxplots in a plot.
The code I am using is this (I am computing mean_, iqr, CL, etc. elsewhere):
A = np.random.random(2)
D = plt.boxplot(A, positions=np.atleast_1d(dist_val), widths=np.min(unique_dists_vals) / 10.) # a simple case with just one variable to boxplot
D['medians'][0].set_ydata(median_)
D['boxes'][0]._xy[[0,1,4], 1] = iqr[0]
D['boxes'][0]._xy[[2,3],1] = iqr[1]
D['whiskers'][0].set_ydata(np.array([iqr[0], CL[0]]))
D['whiskers'][1].set_ydata(np.array([iqr[1], CL[1]]))
D['caps'][0].set_ydata(np.array([CL[0], CL[0]]))
D['caps'][1].set_ydata(np.array([CL[1], CL[1]]))
I do this in a loop, putting one box plot per some location x.
I am not making any changes to the axis limits. The resulting figure looks like this:
what is going on with 1 x-tick?
the limits are just off on both x and y.
This appears to be a bug?
And no, I cannot just manually set the limits etc. since this has to be a completely general code.
What I have tried so far is:
During the loop when I compute the box plots, try keeping track of the largest y value seen so far and the largest x value etc. and then at the end manually set the bound to this. Other issues come up here, however, such as boxes extending beyond the plot etc. and then I manually have to adjust the limits to extend beyond the box width etc.
I have used both "ax.axis('auto')" and "ax.set_autoscale_on(True)" after plotting right before plt.show(), does not work:
While the first item in the list above does technically work (not ideal) I would like to know if there is a generic way to simply say: "done plotting, fix limits" (should automatically be done while plotting I guess?).
Thank you.

Matplotlib adding overlay labels to an axis

In matplotlib I wish to know the cleanest and most robust means of overlaying labels onto an axis. This is probably best demonstrated with an example:
While normal axis labels/ticks are placed every 5.00 units additional labels without ticks have been overlayed onto the axis (this can be seen at 1113.75 which partially covers 1114.00 and 1105.00 which is covered entirely). The labels also have the same font and size as their normal, ticked, counterparts with the background (if any) going right up to the axis (as a tick mark would).
What is the simplest way of obtaining this effect in matplotlib?
Edit
Following on from #Ken's suggestion I have managed to obtain the effect for an existing tick/label by using ax.yaxis.get_ticklines and ax.yaxis.get_ticklabels to both remove the tick marker and change the background/font/zorder of a label. However, I am unsure how best to add a new tick/label to an axis.
In other words I am looking for a function add_tick(ax.yaxis, loc) that adds a tick at location loc and returns the tickline and ticklabel objects for me to operate on.
I haven't ever tried to do that, but I think that the Artist tutorial might be helpful for you. In particular, the last section has the following code:
for line in ax1.yaxis.get_ticklines():
# line is a Line2D instance
line.set_color('green')
line.set_markersize(25)
line.set_markeredgewidth(3)
I think that using something like line.set_markersize(0) might make the markers have size zero. The difficult part might be finding the ones that need that done. It is possible that the line.xdata or line.ydata arrays might contain enough information to isolate the ones you need. Of course, if you are manually adding the tick marks, it is possible that as you do that the instance gets returned, so you can just modify them as you create them.
The best solution I have been able to devise:
# main: axis; olocs: locations list; ocols: location colours
def overlay_labels(main, olocs, ocols):
# Append the overlay labels as ticks
main.yaxis.set_ticks(np.append(main.yaxis.get_ticklocs(), olocs))
# Perform generic formatting to /all/ ticks
# [...]
labels = reversed(main.yaxis.get_ticklabels())
markers = reversed(main.yaxis.get_ticklines()[1::2]) # RHS ticks only
glines = reversed(main.yaxis.get_gridlines())
rocols = reversed(ocols)
# Suitably format each overlay tick (colours and lines)
for label,marker,grid,colour in izip(labels, markers, glines, rocols):
label.set_color('white')
label.set_backgroundcolor(colour)
marker.set_visible(False)
grid.set_visible(False)
It is not particularly elegant but does appear to work.

Categories