python matplotlib not plotting y axis in order - python

I am very new to Python - I have a time series that I want to model, but I get stuck early on with simply plotting the time series. The plot seems to be ordering the y-axis in order of the numbers appearing:
plt.plot(model_data2['month'], model_data2['opening_position'], color='blue', linewidth=2)
plt.ylabel('Opening Position ($)')
plt.show()
I would greatly appreciate advise on how to correct this.

You're passing strings here, so yes, it assumes you gave them in the order you wanted them. It doesn't know how to plot the value of a string. Convert these to floats, and you will get the results you expect.

Related

python matplotlib.pyplot doesn't rendering timeseries plot

ax.plot() doesn't render the time series plot, while pandas.plot() and plt.scatter() works. But because I need to reform my axis, I need to use ax.plot().
My data:
When I try to plot it:
ax=df_cuba['ORD'].plot()
It works perfectly:
But when I try to use :
plt.plot(x=df_cuba.index,y=df_cuba['ORD'])
It shows nothing:
It also works for scatterplot.
I couldn't find any posts about this. I suspect the py.plot is trying a different way to plot time series data.
plot does not have named x and y arguments. So suppose you have xdata and ydata you want to plot, you cannot use plt.plot(x=xdata, y=ydata), but instead need
plt.plot(xdata, ydata)
For scatter this is different, here you can use both
plt.scatter(xdata, ydata)
plt.scatter(x=xdata, y=ydata)
I found out the real reason: plt.plot can only plot numerical data, so if we want to plot the time series data with np.datetime64 or pd.datetime format, we need to use the following command:
plt.plot_date(date, y)

python matplotlib - retrieve xaxis_major_locator as time value

I am plotting a timeserie with matplotlib (timeserie looks like the following):
Part of the code that i use sets major locator for each day at 0AM:
fig, ax = plt.subplots(figsize=(20, 3))
mpf.candlestick_ohlc(ax,quotes, width=0.01)
ax.xaxis_date()
ax.xaxis.set_major_locator(mpl.dates.DayLocator(interval=1) )
I would like to plot a darker background on the chart for each day between 16pm and 8am, and planning to use axvspan for that task. Considering that axvspan takes as argument axvspan(xmin, xmax) I was wondering if it would be possible to retrieve the xaxis_major_locator as a x value in order to pass it to axvspan as axvspan(xmin=major_locator-3600s, xmax=major_locator+3600s)
Edit: I found that function in the docs: http://matplotlib.org/2.0.0rc2/api/ticker_api.html#matplotlib.ticker.Locator
If anyone knows how to returns a list of ticker location from the Xaxis_major with it let me know. Thanks.
Edit2: if i use print(ax.xaxis.get_major_locator()) i receive as a return <matplotlib.dates.DayLocator object at 0x7f70f3b34090> How do i extarct a list of tick location from that?
ok found it...
majors=ax.xaxis.get_majorticklocs()

Why do sns.lmplot and FacetGrid+plt.scatter create different scatter points from the same data?

I'm quite new to Python, pandas DataFrames and Seaborn. When I was trying to understand Seaborn better, particularly sns.lmplot, I came across a difference between two figures made of the same data, that I thought were supposed to look alike, and I wonder why that is.
Data: My data is a pandas DataFrame that has 454 rows and 19 columns. The data relevant to this question includes 4 columns and looks something like this:
Columns: Av_density; pred2; LOC; Year;
Variable type: Continuous variable; Continuous variable; Categorical variable 1...4;Categorical 2012...2014
There are no missing data points.
My aim is to draw a 2x2 figure panel describing the relationship between Av_density and pred2 separately for each LOC(=location) with years marked with different colours. I call seaborn with:
import seaborn as sns
sns.set(style="whitegrid")
np.random.seed(sum(map(ord, "linear_categorical")))
(Side point: for some reason calling "linear_quantitative" does not work, i.e. I get a "File "stdin", line 2
sns.lmplot("Av_density", "pred2", Data, col="LOC", hue="YEAR", col_wrap=2);
^
SyntaxError: invalid syntax")
Figure method 1, FacetGrid + scatter:
sur=sns.FacetGrid(Data,col="LOC", col_wrap=2,hue="YEAR")
sur.map(plt.scatter, "Av_density", "pred2" );
plt.legend()
This produces a nice scatter of the data accurately. You can see the picture here:https://drive.google.com/file/d/0B7h2wsx9mUBScEdUbGRlRk5PV1E/view?usp=sharing
Figure method 2, sns.lmplot:
sns.lmplot("Av_density", "pred2", Data, col="LOC", hue="YEAR", col_wrap=2);
This produces the figure panel divided by LOC accurately, with Years in different colours, but the scatter of the data points does not look right. Instead, it looks like lmplot has linearised the data points, and lost the original scatter points that it is supposed to be drawing in addition to the regression lines.
You can see the figure here: https://drive.google.com/file/d/0B7h2wsx9mUBSRkN5ZXhBeW9ob1E/view?usp=sharing
My data produces only three points per location per year, and I was first wondering if this is what makes the "mistake" in lmplot datapoint. Optimally I would have a shorter line describing the trend between years instead of a proper regression, but I have not figured out the code to this yet.
But before tackling that issue, I would really like to know if there is something I am doing wrong that I can fix, or if this is an issue of lmplot trying to handle my data?
Any help, comments and ideas on this are warmly welcome!
-TA-
Ps. I'm running Python 2.7.8 with Spyder 2.3.4
EDIT: I get shorter "trend lines" with the first method by adding:
sur.map(plt.plot,"Av_density", "pred2" );
Still would like to know what is messing the figure with lmplot.
The issue is probably only that the added regression line is messing up the y-axis, so that the variability in the data cannot be seen.
Try resetting the y-axis based on the variability in your original plot to see if they show the same thing, in your case e.g.
fig1 = sns.lmplot("Av_density", "pred2", Data, col="LOC", hue="YEAR", col_wrap=2);
fig1.set(ylim=(-0.03, 0.05))
plt.show(fig1)

Why is set_xlim() not setting the x-limits in my figure?

I'm plotting some data with matplotlib. I want the plot to focus on a specific range of x-values, so I'm using set_xlim().
Roughly, my code looks like this:
fig=plt.figure()
ax=fig.add_subplot(111)
for ydata in ydatalist:
ax.plot(x_data,y_data[0],label=ydata[1])
ax.set_xlim(left=0.0,right=1000)
plt.savefig(filename)
When I look at the plot, the x range ends up being from 0 to 12000. This occurs whether set_xlim() occurs before or after plot(). Why is set_xlim() not working in this situation?
Out of curiosity, what about switching in the old xmin and xmax?
fig=plt.figure()
ax=fig.add_subplot(111)
ax.plot(x_data,y_data)
ax.set_xlim(xmin=0.0, xmax=1000)
plt.savefig(filename)
The text of this answer was taken from an answer that was deleted almost immediately after it was posted.
set_xlim() limits the data that is displayed on the plot.
In order to change the bounds of the axis, use set_xbound().
fig=plt.figure()
ax=fig.add_subplot(111)
ax.plot(x_data,y_data)
ax.set_xbound(lower=0.0, upper=1000)
plt.savefig(filename)
In my case the following solutions alone did not work:
ax.set_xlim([0, 5.00])
ax.set_xbound(lower=0.0, upper=5.00)
However, setting the aspect using set_aspect worked, i.e:
ax.set_aspect('auto')
ax.set_xlim([0, 5.00])
ax.set_xbound(lower=0.0, upper=5.00)
I have struggled a lot with the ax.set_xlim() and couldn't get it to work properly and I found out why exactly. After setting the xlim I was setting the xticks and xticklabels (those are the vertical lines on the x-axis and their labels) and this somehow elongated the axis to the needed extent. So if the last tick was at 300 and my xlim was set at 100, it again widened the axis to the 300 just to place the tick there.
So the solution was to put it just after the troublesome code:
ax.set_xlabel(label)
ax.set_xticks(xticks)
ax.set_xticklabels(xticks, rotation=60)
ax.set_xlim(xmin=0.0, xmax=100.0)
The same thing occurred to me today. My issue was that the data was not in the right format, i.e. not floats. The limits I set (itself floats) became meaningless compared to e.g. strings. After putting float() around the data, everything worked as expected.

Show only the n'th ticklabel in a pandas boxplot

I am new to pandas and matplotlib, but not to Python. I have two questions; a primary and a secondary one.
Primary:
I have a pandas boxplot with FICO score on the x-axis and interest rate on the y-axis.
My x-axis is all messed up since the FICO scores are overwriting each other.
I'd like to show only every 4th or 5th ticklabel on the x-axis for a couple of reasons:
in general it's less chart-junky
in this case it will allow the labels to actually be read.
My code snippet is as follows:
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p = loansmin.boxplot('Interest.Rate','FICO.Score')
I saved the return value in p as I thought I might need to manipulate the plot further which I do now.
Secondary:
How do I access the plot, subplot, axes objects from pandas boxplot.
p above is an matplotlib.axes.AxesSubplot object.
help(matplotlib.axes.AxesSubplot) gives a message saying:
'AttributeError: 'module' object has no attribute 'AxesSubplot'
dir(matplotlib.axes) lists Axes, Subplot and Subplotbase as in that namespace but no AxesSubplot. How do I understand this returned object better?
As I explored further I found that one could explore the returned object p via dir().
Doing this I found a long list of useful methods, amongst which was set_xticklabels.
Doing help(p.set_xticklabels) gave some cryptic, but still useful, help - essentially suggesting passing in a list of strings for ticklabels.
I then tried doing the following - adding set_xticklabels to the end of the last line in the above code effectively chaining the invocations.
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score').set_xticklabels(['650','','','','','700'])
This gave the desired result. I suspect there's a better way as in the way matplotlib does it which allows you to show every n'th label. But for immediate use this works, and also allows setting labels where they are not periodic for whatever reason, if you need that.
As usual, writing out the question explicitly helped me find the answer. And if anyone can help me get to the underlying matplotlib object that is still an open question.
AxesSubplot (I think) is just another way to get at the Axes in matplotlib. set_xticklabels() is part of the matplotlib object oriented interface (on axes). So, if you were using something like pylab, you might use xticks(ticks, labels), but instead here you have to separate it into different calls ax.set_xticks(ticks), ax.set_xticklabels(labels). (where ax is an Axes object).
Let's say you only want to set ticks at 650 and 700. You could do the following:
ticks = labels = [650, 700]
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score')
p.set_xticks(ticks)
p.set_xticklabels(labels)
Similarly, you can use set_xlim and set_ylim to do the equivalent of xlim() and ylim() in plt.

Categories