I have a time series with many gaps. Default plotting gives:
Obviously its impossible to discern any detail here, and I can't just make the plot meters wide. So I need to get rid of the gaps.
I can of course just plot it without gaps as done here, but just plotting everything on a continuous axis will make the plot very unintuitive:
The only proper solution would be a broken X axis to properly display datetime information without having gaps, like shown here
How do I get that? I could first have to identify the gaps and then place the broken axis thingy. But I can't believe that there is no easier way. Does anyone know of a simpler, cleaner way? I would have guessed that this is a standard pandas/matplotlib feature....
Related
I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks
There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.
This is not duplicate, because existing answers on similar questions don't describe exactly what I need.
Matplotlib has great formatters inside and I love to use them:
ax.xaxis.set_major_locator(matplotlib.dates.MonthLocator())
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b%y'))
They let me plot such stock market charts:
This is what I need, but it has 1 issue: weekends. They are present on x axis and make my chart a little ugly.
Other questions about this issue give advice to create custom formatter. They show examples of such formatters. But no one of them do pretty formatting like matplotlib do:
May19, Jun19, Jul19...
I mean this line of code:
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b%y'))
My question is: please help me to format x axis like matplotlib do: May19, Jun19, Jul19... and don't create weekends when stock market is closed.
What you could almost always do is something similar to what Nic Wanavit suggested.
Manually set your labels, depending on what you need on your axis.
Especially in this case the plot is looking a bit ugly because you have timespans in your data that are not provided with actual data (the weekends in this case) so pyplot will simply connect these points with the corresponding length from the x-axis.
What you can do then is just to plot your data equally distant - which is correct if the data is daily - otherwise consider to interpolate it using e.g. pandas bultin interpolation.
To avoid pyplot automatically detect the index I had to do this:
df['plotidx'] = [i for i in range(len(df['close'])):
Here all the closing values for the stock are stored in a column named 'close' obvsl.
You plot this correspondingly.
Then you can obtain all the ticks created via
labels = [item.get_text() for item in ax.get_xticklabels()]
Adjust them as desired with
labels[i] = string_for_the_label_no_i
Then get them back on the graph using
ax.xaxis.set_ticklabels(labels)
You need to somewhat "update" the plot then. Also keep in mind, that resizing a lot could end up with the labels being as also said in the documentation strange location.
It is some kind of a workaround but worked fine for me because it feels natural to plot data equally distant next to each other rather then making up some data for the weekends.
Greets
to set the x ticks
assuming that you have the dates variable in dataframe row df['dates']
ax.xaxis.set_ticks(df['dates'])
I don't think the title is precise enouth. If anyone will modify it, please help me.
I used to use numpy and matplotlib to draw a distribution diagram. As far as I know, np.histogram can only set the range with a bottom and a top value. But I'd like to make it three values, which are bottom, top and infinite.
For example
MW=[121,131,...,976,1400] # hundreds of out-of-order items
b,bins = np.histogram(MW,bins=10,range=(0,1000))
ax.bar(bins[:-1]+50,b,align='center',facecolor='grey',alpha=0.5,width=100,)
with these codes, I can draw a distribution diagram in which ten bins locates (0-100,100-200,...900-1000). But there are a few numbers higher than 1000. I want to put them in "(1000 - +∞)". So it seems like to make the parameter of range become (0,1000,infinite/or a number big enough), but it is not available.
A awful way to do is using some tricks such as:
MW=[x if x <1000 else 1001 for x in MW]
b,bins = np.histogram(MW,bins=11,range=(0,1100))
And change the xlabel of the plot.
Is there any better way to implement it?
If trick is the only way, is it possible to quickly change the xlabel?
I am using python to plot points. The plot shows relationship between area and the # of points of interest (POIs) in this area. I have 3000 area values and 3000 # of POI values.
Now the plot looks like this:
The problem is that, at lower left side, points are severely overlapping each other so it is hard to get enough information. Most areas are not that big and they don't have many POIs.
I want to make a plot with little overlapping. I am wondering whether I can use unevenly distributed axis or use histogram to make a beautiful plot. Can anyone help me?
I would suggest using a logarithmic scale for the y axis. You can either use pyplot.semilogy(...) or pyplot.yscale('log') (http://matplotlib.org/api/pyplot_api.html).
Note that points where area <= 0 will not be rendered.
I think we have two major choices here. First adjusting this plot, and second choosing to display your data in another type of plot.
In the first option, I would suggest clipping the boundries. You have plenty of space around the borders. If you limit the plot to the boundries, your data would scale better. On top of it, you may choose to plot the points with smaller dots, so that they would seem less overlapping.
Second option would be to choose displaying data in a different view, such as histograms. This might give a better insight in terms of distribution of your data among different bins. But this would be completely different type of view, in regards to the former plot.
I would suggest trying to adjust the plot by limiting the boundries of the plot to the data points, so that the plot area would have enough space to scale the data and try using histograms later. But as I mentioned, these are two different things and would give different insights about your data.
For adjusting you might try this:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,y1,y2))
You would probably need to make minor adjustments to the axis variables. Note that there should definetly be better options instead of this, but this was the first thing that came to my mind.
I've done some searching around, and cannot easily find a solution this problem. Effectively, I want to have multiple tick locators on a single axis such that I can do something like in the plot below.
Note how the x-axis starts off logarithmic, but becomes linear once 500 is reached. I figured one possible solution was to simply divide the data into two portions, plot it on two graphs, each with their own locators, and then put the graphs right next to each other so they're seamless, but that seems very unpythonic. Anyone have a better solution?
I suspect the following URL might be of use:
http://matplotlib.org/examples/axes_grid/parasite_simple2.html (click on the plot to have the python code)
If you need some specialized graphs, it's always a good idea to have a look at the Matplotlib gallery:
http://matplotlib.org/gallery.html
EDIT: It is possible to make custom ticks on the X-axis:
http://matplotlib.org/examples/ticks_and_spines/ticklabels_demo_rotation.html
You may find an implementation of this scale by Jesús Torrado here.