What is wrong with this matplotlib code? - python

I am trying to plot datetime on y axis and time on x-axis using a bar graph. I need to specify the heights in terms of datetime of y-axis and I am not sure how to do that.
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import datetime as dt
# Make a series of events 1 day apart
y = mpl.dates.drange(dt.datetime(2009,10,1),
dt.datetime(2010,1,15),
dt.timedelta(days=1))
# Vary the datetimes so that they occur at random times
# Remember, 1.0 is equivalent to 1 day in this case...
y += np.random.random(x.size)
# We can extract the time by using a modulo 1, and adding an arbitrary base date
times = y % 1 + int(y[0]) # (The int is so the y-axis starts at midnight...)
# I'm just plotting points here, but you could just as easily use a bar.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(times, y, width = 10)
ax.yaxis_date()
fig.autofmt_ydate()
plt.show()

I think it could be your y range.
When I do your code I get a plot with the y axis ranging from about 11/Dec/2011 to 21/Dec/2011.
However, you've generated dates ranging from 1/10/2009 to 15/1/2010.
When I adjusted my y limits to take this into account it worked fine. I added this after the ax.bar.
plt.ylim( (min(y)-1,max(y)+1) )
Another reason the output is confusing is that since you've picked a width of 10, the bars are too wide and are actually obscuring each other.
Try use ax.plot(times,y,'ro') to see what I mean.
I produced the following plot using ax.bar(times,y,width=.1,alpha=.2) and ax.plot(times,y,'ro') to show you what I meant about bars overlapping each other:
And that's with a width of .1 for the bars, so if they had a width of 10 they'd be completely obscuring each other.

Related

How to set y-scale when making a boxplot with dataframe

I have a column of data with a very large distribution and thus I log2-transform it before plotting and visualizing it. This works fine but I cannot seem to figure out how to set the y-scale to the exponential values of 2 (instead I have just the exponents themselves).
df['num_ratings_log2'] = df['num_ratings'] + 1
df['num_ratings_log2'] = np.log2(df['num_ratings_log2'])
df.boxplot(column = 'num_ratings_log2', figsize=(10,10))
As the scale, I would like to have 1 (2^0), 32 (2^5), 1024 (2^1) ... instead of 0, 5, 10 ...
I want everything else about the plot to stay the same. How can I achieve this?
Instead of taking the log of the data, you can create a normal boxplot and then set a log scale on the y-axis (ax.set_yscale('log'), or symlog to also represent zero). To get the ticks at powers of 2 (instead of powers of 10), use a LogLocator with base 2. A ScalarFormatter shows the values as regular numbers (instead of as powers such as 210). A NullLocator for the minor ticks suppresses undesired extra ticks.
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, LogLocator, NullLocator
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'num_ratings': (np.random.pareto(10, 10000) * 800).astype(int)})
ax = df.boxplot(column='num_ratings', figsize=(10, 10))
ax.set_yscale('symlog') # symlog also allows zero
# ax.yaxis.set_major_formatter(ScalarFormatter()) # show tick labels as regular numbers
ax.yaxis.set_major_formatter(lambda x, p: f'{int(x):,}')
ax.yaxis.set_minor_locator(NullLocator()) # remove minor ticks
plt.show()
Hope you are looking for below,
Code
ax = df.boxplot(column='num_ratings_log2', figsize=(20,10))
ymin = 0
ymax = 20
ax.set_ylim(2**ymin, 2**ymax)

Adding extra space along the x-axis in matplotlib bar graph

I'm using matplotlib to draw a bar chart with 3 bars. I want to add some extra space along the x-axis (so that the x-axis line is drawn longer).
Below is what I have:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
dt = [1,3,2]
plt.figure()
xvals = range(len(dt))
plt.bar(xvals, dt, width=0.5)
plt.tick_params(bottom=False)
plt.xticks(xvals, ['a','b','c'])
plt.yticks(range(0,4), [0,1,2,3])
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.show()
This code produces:
I simply want (note the elongated x-axis):
Change the limits of the x-axis using xlim()
for ex:
plt.xlim(-0.5,3.5) # adjust as necessary
Just add the following limits. Yo can use None as the left hand limit to let the plot choose the limit as the default value. Since the x-values are 0, 1, 2 and now you add the right hand side limit as 3, you will have an extended axis. Replace 3 by whatever value you want.
plt.xlim(None, 3)

Set margins of a time series plotted with pandas

I have the following code for generating a time series plot
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
series = pd.Series([np.sin(ii*np.pi) for ii in range(30)],
index=pd.date_range(start='2019-01-01', end='2019-12-31',
periods=30))
series.plot(ax=ax)
I want to set an automatic limit for x and y, I tried using ax.margins() but it does not seem to work:
ax.margins(y=0.1, x=0.05)
# even with
# ax.margins(y=0.1, x=5)
What I am looking for is an automatic method like padding=0.1 (10% of whitespace around the graph)
Pandas and matplotlib seem to be confused rather often while collaborating when axes have dates. For some reason in this case ax.margins doesn't work as expected with the x-axis.
Here is a workaround which does seem to do the job, explicitely moving the xlims:
xmargins = 0.05
ymargins = 0.1
ax.margins(y=ymargins)
x0, x1 = plt.xlim()
plt.xlim(x0-xmargins*(x1-x0), x1+xmargins*(x1-x0))
Alternatively, you could work directly with matplotlib's plot, which does work as expected applying the margins to the date axis.
ax.plot(series.index, series)
ax.margins(y=0.1, x=0.05)
PS: This post talks about setting use_sticky_edges to False and calling autoscale_view after setting the margins, but also that doesn't seem to work here.
ax.use_sticky_edges = False
ax.autoscale_view(scaley=True, scalex=True)
You can use ax.set_xlim and ax.set_ylim to set the x and y limits of your plot respectively.
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
series = pd.Series([np.sin(ii*np.pi) for ii in range(30)],
index=pd.date_range(start='2019-01-01', end='2019-12-31',
periods=30))
# set xlim to be a between certain dates
ax.set_xlim((pd.to_datetime('2019-01-01'), pd.to_datetime('2019-01-31'))
# set ylim to be between certain values
ax.set_ylim((-0.5, 0.5))
series.plot(ax=ax)

same scale of Y axis on differents figures

I try to plot different data with similar representations but slight different behaviours and different origins on several figures. So the min & max of the Y axis is different between each figure, but the scale too.
e.g. here are some extracts of my batch plotting :
Does it exists a simple way with matplotlib to constraint the same Y step on those different figures, in order to have an easy visual interpretation, while keeping an automatically determined Y min and Y max ?
In others words, I'd like to have the same metric spacing between each Y-tick
you could use a MultipleLocator from the ticker module on both axes to define the tick spacings:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
fig=plt.figure()
ax1=fig.add_subplot(211)
ax2=fig.add_subplot(212)
ax1.set_ylim(0,100)
ax2.set_ylim(40,70)
# set ticks every 10
tickspacing = 10
ax1.yaxis.set_major_locator(ticker.MultipleLocator(base=tickspacing))
ax2.yaxis.set_major_locator(ticker.MultipleLocator(base=tickspacing))
plt.show()
EDIT:
It seems like your desired behaviour was different to how I interpreted your question. Here is a function that will change the limits of the y axes to make sure ymax-ymin is the same for both subplots, using the larger of the two ylim ranges to change the smaller one.
import matplotlib.pyplot as plt
import numpy as np
fig=plt.figure()
ax1=fig.add_subplot(211)
ax2=fig.add_subplot(212)
ax1.set_ylim(40,50)
ax2.set_ylim(40,70)
def adjust_axes_limits(ax1,ax2):
yrange1 = np.ptp(ax1.get_ylim())
yrange2 = np.ptp(ax2.get_ylim())
def change_limits(ax,yr):
new_ymin = ax.get_ylim()[0] - yr/2.
new_ymax = ax.get_ylim()[1] + yr/2.
ax.set_ylim(new_ymin,new_ymax)
if yrange1 > yrange2:
change_limits(ax2,yrange1-yrange2)
elif yrange2 > yrange1:
change_limits(ax1,yrange2-yrange1)
else:
pass
adjust_axes_limits(ax1,ax2)
plt.show()
Note that the first subplot here has expanded from (40, 50) to (30, 60), to match the y range of the second subplot
The answer of Tom is pretty fine !
But I decided to use a simpler solution
I define an arbitrary yrange for all my plots e.g.
yrang = 0.003
and for each plot, I do :
ymin, ymax = ax.get_ylim()
ymid = np.mean([ymin,ymax])
ax.set_ylim([ymid - yrang/2 , ymid + yrang/2])
and possibly:
ax.yaxis.set_major_locator(ticker.MultipleLocator(base=0.005))

Pyplot: using percentage on x axis

I have a line chart based on a simple list of numbers. By default the x-axis is just the an increment of 1 for each value plotted. I would like to be a percentage instead but can't figure out how. So instead of having an x-axis from 0 to 5, it would go from 0% to 100% (but keeping reasonably spaced tick marks. Code below. Thanks!
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid.axislines import Subplot
data=[8,12,15,17,18,18.5]
fig=plt.figure(1,(7,4))
ax=Subplot(fig,111)
fig.add_subplot(ax)
plt.plot(data)
The code below will give you a simplified x-axis which is percentage based, it assumes that each of your values are spaces equally between 0% and 100%.
It creates a perc array which holds evenly-spaced percentages that can be used to plot with. It then adjusts the formatting for the x-axis so it includes a percentage sign using matplotlib.ticker.FormatStrFormatter. Unfortunately this uses the old-style string formatting, as opposed to the new style, the old style docs can be found here.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you can do as follows to set the axis:
import matplotlib.ticker as mtick
# Actual plotting code omitted
ax.xaxis.set_major_formatter(mtick.PercentFormatter(5.0))
This will display values from 0 to 5 on a scale of 0% to 100%. The formatter is similar in concept to what #Ffisegydd suggests doing except that it can take any arbitrary existing ticks into account.
PercentFormatter() accepts three arguments, max, decimals, and symbol. max allows you to set the value that corresponds to 100% on the axis (in your example, 5).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Note that this formatter will use whatever ticks would normally be generated if you just plotted your data. It does not modify anything besides the strings that are output to the tick marks.
Update
PercentFormatter was accepted into Matplotlib in version 2.1.0.
Totally late in the day, but I wrote this and thought it could be of use:
def transformColToPercents(x, rnd, navalue):
# Returns a pandas series that can be put in a new dataframe column, where all values are scaled from 0-100%
# rnd = round(x)
# navalue = Nan== this
hv = x.max(axis=0)
lv = x.min(axis=0)
pp = pd.Series(((x-lv)*100)/(hv-lv)).round(rnd)
return pp.fillna(navalue)
df['new column'] = transformColToPercents(df['a'], 2, 0)

Categories