Show only the n'th ticklabel in a pandas boxplot - python

I am new to pandas and matplotlib, but not to Python. I have two questions; a primary and a secondary one.
Primary:
I have a pandas boxplot with FICO score on the x-axis and interest rate on the y-axis.
My x-axis is all messed up since the FICO scores are overwriting each other.
I'd like to show only every 4th or 5th ticklabel on the x-axis for a couple of reasons:
in general it's less chart-junky
in this case it will allow the labels to actually be read.
My code snippet is as follows:
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p = loansmin.boxplot('Interest.Rate','FICO.Score')
I saved the return value in p as I thought I might need to manipulate the plot further which I do now.
Secondary:
How do I access the plot, subplot, axes objects from pandas boxplot.
p above is an matplotlib.axes.AxesSubplot object.
help(matplotlib.axes.AxesSubplot) gives a message saying:
'AttributeError: 'module' object has no attribute 'AxesSubplot'
dir(matplotlib.axes) lists Axes, Subplot and Subplotbase as in that namespace but no AxesSubplot. How do I understand this returned object better?

As I explored further I found that one could explore the returned object p via dir().
Doing this I found a long list of useful methods, amongst which was set_xticklabels.
Doing help(p.set_xticklabels) gave some cryptic, but still useful, help - essentially suggesting passing in a list of strings for ticklabels.
I then tried doing the following - adding set_xticklabels to the end of the last line in the above code effectively chaining the invocations.
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score').set_xticklabels(['650','','','','','700'])
This gave the desired result. I suspect there's a better way as in the way matplotlib does it which allows you to show every n'th label. But for immediate use this works, and also allows setting labels where they are not periodic for whatever reason, if you need that.
As usual, writing out the question explicitly helped me find the answer. And if anyone can help me get to the underlying matplotlib object that is still an open question.

AxesSubplot (I think) is just another way to get at the Axes in matplotlib. set_xticklabels() is part of the matplotlib object oriented interface (on axes). So, if you were using something like pylab, you might use xticks(ticks, labels), but instead here you have to separate it into different calls ax.set_xticks(ticks), ax.set_xticklabels(labels). (where ax is an Axes object).
Let's say you only want to set ticks at 650 and 700. You could do the following:
ticks = labels = [650, 700]
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score')
p.set_xticks(ticks)
p.set_xticklabels(labels)
Similarly, you can use set_xlim and set_ylim to do the equivalent of xlim() and ylim() in plt.

Related

Why am I unable to make a plot containing subplots in plotly using a px.scatter plot?

I have been trying to make a figure using plotly that combines multiple figures together. In order to do this, I have been trying to use the make_subplots function, but I have found it very difficult to have the plots added in such a way that they are properly formatted. I can currently make singular plots (as seen directly below):
However, whenever I try to combine these singular plots using make_subplots, I end up with this:
This figure has the subplots set up completely wrong, since I need each of the four subplots to contain data pertaining to the four methods (A, B, C, and D). In other words, I would like to have four subplots that look like my singular plot example above.
I have set up the code in the following way:
for sequence in sequences:
#process for making sequence profile is done here
sequence_df = pd.DataFrame(sequence_profile)
row_number=1
grand_figure = make_subplots(rows=4, cols=1)
#there are four groups per sequence, so the grand figure should have four subplots in total
for group in sequence_df["group"].unique():
figure_df_group = sequence_df[(sequence_df["group"]==group)]
figure_df_group.sort_values("sample", ascending=True, inplace=True)
figure = px.line(figure_df_group, x = figure_df_group["sample"], y = figure_df_group["intensity"], color= figure_df_group["method"])
figure.update_xaxes(title= "sample")
figure.update_traces(mode='markers+lines')
#note: the next line fails, since data must be extracted from the figure, hence why it is commented out
#grand_figure.append_trace(figure, row = row_number, col=1)
figure.update_layout(title_text="{} Profile Plot".format(sequence))
grand_figure.append_trace(figure.data[0], row = row_number, col=1)
row_number+=1
figure.write_image(os.path.join(output_directory+"{}_profile_plot_subplots_in_{}.jpg".format(sequence, group)))
grand_figure.write_image(os.path.join(output_directory+"grand_figure_{}_profile_plot_subplots.jpg".format(sequence)))
I have tried following directions (like for example, here: ValueError: Invalid element(s) received for the 'data' property) but I was unable to get my figures added as is as subplots. At first it seemed like I needed to use the graph object (go) module in plotly (https://plotly.com/python/subplots/), but I would really like to keep the formatting/design of my current singular plot. I just want the plots to be conglomerated in groups of four. However, when I try to add the subplots like I currently do, I need to use the data property of the figure, which causes the design of my scatter plot to be completely messed up. Any help for how I can ameliorate this problem would be great.
Ok, so I found a solution here. Rather than using the make_subplots function, I just instead exported all the figures onto an .html file (Plotly saving multiple plots into a single html) and then converted it into an image (HTML to IMAGE using Python). This isn't exactly the approach I would have preferred to have, but it does work.
UPDATE
I have found that plotly express offers another solution, as the px.line object has the parameter of facet that allows one to set up multiple subplots within their plot. My code is set up like this, and is different from the code above in that the dataframe does not need to be iterated in a for loop based on its groups:
sequence_df = pd.DataFrame(sequence_profile)
figure = px.line(sequence_df, x = sequence_df["sample"], y = sequence_df["intensity"], color= sequence_df["method"], facet_col= sequence_df["group"])
Although it still needs more formatting, my plot now looks like this, which is works much better for my purposes:

Change order of axes drawn in matplotlib figure

I have a figure in matplotlib where two axes overlap one-another. The axes draw according to the order in which they were created in the code, as shown when I print fig.axes, which returns [<matplotlib.axes._subplots.AxesSubplot object at 0x0000021C7417B320>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000021C72371630>]. I would like to change the order so that the second axes created draws first. The property fig.axes is not writeable, so I unfortunately can't create a new list with the order I need and then assign it. I've also tried using ax.set_zorder() where I specified ax1.set_zorder(0) and ax2.set_zorder(1), but this did not work, and neither did the reverse or larger integer values. I can't seem to find anything in the documentation that would allow me to change the order in which axes are drawn, does anyone know of a way?
Purpose
As you can see, the grey year labels along the x-axis are covered by the black date call-outs. The year labels are part of ax1 and the date labels ax2. I'd like to switch the order so that ax1 is drawn above ax2, so as to not cover the year text.
Thanks for any help and suggestions!

Change single axis in plotly subplots

I generated a plot in plotly using plotly.express. I used a pandas.DataFrame and one column to differentiate the two subplots putting it into facet_row
Now my data has very different scales they are operating one, but plotly assigns the same range to both yaxis.
I've tried to assign a 'range' attribute to the .layout.yaxis1 dictionary (using a list), but this changes the yaxis for both the upper and the lower plot.
Minimal working example:
import plotly.express as px
px.bar(pd.DataFrame({'x':[50,60,45,.80],'y': [800,900,1,2],"dif":['a','a','b','b']}),x='x',y='y',facet_row='dif')
How can I change the first axis alone?
If you add .update_yaxes(matches=None) that will break the linkage between the Y-axis ranges.
This works because px.bar() (and any other px.*()) returns a Figure object which are are updatable via various .update_*() methods. These methods mutate and return the figure. px.bar(facet_row="whatever") by default sets the yxaxis*.matches attribute to y1 and so clearing that via .update_yaxes() will get the behaviour you want.
After that you can set the range independently however you like.
More information here: https://plot.ly/python/creating-and-updating-figures/

pyplot.scatter(dataframe) vs. dataframe.plot(kind='scatter')

I have several pandas dataframes. I want to plot several columns against one another in separate scatter plots, and combine them as subplots in a figure. I want to label each subplot accordingly. I had a lot of trouble with getting subplot labels working, until I discovered that there are two ways of plotting directly from dataframes, as far as I know; see SO and pandasdoc:
ax0 = plt.scatter(df.column0, df.column5)
type(ax0): matplotlib.collections.PathCollection
and
ax1 = df.plot(0,5,kind='scatter')
type(ax1): matplotlib.axes._subplots.AxesSubplot
ax.set_title('title') works on ax1 but not on ax0, which returns
AttributeError: 'PathCollection' object has no attribute 'set_title'
I don't understand why the two separate ways exist. What is the purpose of the first method using PathCollections? The second one was added in 17.0; is the first one obsolete or has it a different purpose?
As you have found, the pandas function returns an axes object. The PathCollection object can be interpreted as an axes object as well using the "get current axes" function. For instance:
plot = plt.scatter(df.column0, df.column5)
ax0 = plt.gca()
type(ax0)
< matplotlib.axes._subplots.AxesSubplot at 0x10d2cde10>
A more standard way you might see this is the following:
fig = plt.figure()
ax0 = plt.add_subplot()
ax0.scatter(df.column0, df.column5)
At this point you are welcome to do "set" commands such as your set_title.
Hope this helps.
The difference between the two is that they are from different libraries. The first one is from matplotlib, the second one from pandas. They do the same, which is create a matplotlib scatter plot, but the matplotlib version returns a collection of points, whereas the pandas version returns a matplotlib subplot. This makes the matplotlib version a bit more versatile, as you can use the collection of points in another plot.

Matplotlib adding overlay labels to an axis

In matplotlib I wish to know the cleanest and most robust means of overlaying labels onto an axis. This is probably best demonstrated with an example:
While normal axis labels/ticks are placed every 5.00 units additional labels without ticks have been overlayed onto the axis (this can be seen at 1113.75 which partially covers 1114.00 and 1105.00 which is covered entirely). The labels also have the same font and size as their normal, ticked, counterparts with the background (if any) going right up to the axis (as a tick mark would).
What is the simplest way of obtaining this effect in matplotlib?
Edit
Following on from #Ken's suggestion I have managed to obtain the effect for an existing tick/label by using ax.yaxis.get_ticklines and ax.yaxis.get_ticklabels to both remove the tick marker and change the background/font/zorder of a label. However, I am unsure how best to add a new tick/label to an axis.
In other words I am looking for a function add_tick(ax.yaxis, loc) that adds a tick at location loc and returns the tickline and ticklabel objects for me to operate on.
I haven't ever tried to do that, but I think that the Artist tutorial might be helpful for you. In particular, the last section has the following code:
for line in ax1.yaxis.get_ticklines():
# line is a Line2D instance
line.set_color('green')
line.set_markersize(25)
line.set_markeredgewidth(3)
I think that using something like line.set_markersize(0) might make the markers have size zero. The difficult part might be finding the ones that need that done. It is possible that the line.xdata or line.ydata arrays might contain enough information to isolate the ones you need. Of course, if you are manually adding the tick marks, it is possible that as you do that the instance gets returned, so you can just modify them as you create them.
The best solution I have been able to devise:
# main: axis; olocs: locations list; ocols: location colours
def overlay_labels(main, olocs, ocols):
# Append the overlay labels as ticks
main.yaxis.set_ticks(np.append(main.yaxis.get_ticklocs(), olocs))
# Perform generic formatting to /all/ ticks
# [...]
labels = reversed(main.yaxis.get_ticklabels())
markers = reversed(main.yaxis.get_ticklines()[1::2]) # RHS ticks only
glines = reversed(main.yaxis.get_gridlines())
rocols = reversed(ocols)
# Suitably format each overlay tick (colours and lines)
for label,marker,grid,colour in izip(labels, markers, glines, rocols):
label.set_color('white')
label.set_backgroundcolor(colour)
marker.set_visible(False)
grid.set_visible(False)
It is not particularly elegant but does appear to work.

Categories