Lines not showing up on Matplotlib graph

Lines not showing up on Matplotlib graph - python

I am trying to plot three lines on the same plot in Matplotlib. They are InvoicesThisYear, DisputesThisYear, and PercentThisYear (Which is Disputes/Invoices)
The original input is two columns of dates -- one for the date of a logged dispute and one for the date of a logged invoice.
I use the dates to count up the number of disputes and invoices per month during a certain year.
Then I try to graph it, but it comes up empty. I started with just trying to print PercentThisYear and InvoicesThisYear.
PercentThisYear = (DisputesFYThisYear/InvoicesFYThisYear).fillna(0.0)
#Percent_ThisYear.plot(kind = 'line')
#InvoicesFYThisYear.plot(kind = 'line')
plt.plot(PercentThisYear)
plt.xlabel('Date')
plt.ylabel('Percent')
plt.title('Customer Disputes')
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax2 = ax.twinx()
ax2.plot(InvoicesFYThisYear)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
#ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
datenow = datetime.datetime.now()
dstart = datetime.datetime(2015,4,1)
print datenow
#plt.ylim(0, .14)
plt.xlim(dstart, datenow)
firsts=[]
for i in range(dstart.month, datenow.month+1):
firsts.append(datetime.datetime(2015,i,1))
plt.xticks(firsts)
plt.show()
This is the output... The date is all messed up and nothing prints. But the scaled on the axes look right. What am I doing wrong?
Here is the set up leading up to the graph if that is helpful
The Input looks like this:
InvoicesThisYear
Out[82]:
7 7529
5 5511
6 4934
8 3552
dtype: int64
DisputesThisYear
Out[83]:
2 211
1 98
7 54
4 43
3 32
6 29
5 21
8 8
dtype: int64
PercentThisYear
Out[84]:
1 0.000000
2 0.000000
3 0.000000
4 0.000000
5 0.003810
6 0.005877
7 0.007172
8 0.002252
dtype: float64

Matplotlib has no way of knowing which dates are associated with which data points. When you call plot with only one argument y, Matplotlib automatically assumes that the x-values are range(len(y)). You need to supply the dates as the first argument to plot. Assuming that InvoicesThisYear is a count of the number of invoices each month, starting at 1 and ending at 8, you could do something like
import datetime
import matplotlib.pyplot as plt
import pandas as pd
InvoicesFYThisYear = pd.DataFrame([0, 0, 0, 0, 5511, 4934, 7529, 3552])
Disputes = pd.DataFrame([98, 211, 32, 43, 21, 29, 54, 8])
PercentThisYear = (Disputes / InvoicesFYThisYear)
datenow = datetime.date.today()
ax = plt.subplot(111)
dates = [datetime.date(2015,i,1) for i in xrange(1, 9, 1)]
plt.plot(dates, PercentThisYear)
ax2 = ax.twinx()
ax2.plot(dates, InvoicesFYThisYear)
dstart = datetime.datetime(2015,4,1)
plt.xlim(dstart, datenow)
plt.xticks(dates, dates)
plt.show()
If your data is in a Pandas series and the index is an integer representing the month, all you have to do is change the index to datetime objects instead. The plot method for pandas.Series will handle things automatically from there. Here's how you might do that:
Invoices = pd.Series((211, 98, 54, 43, 32, 29, 21, 8), index = (2, 1, 7, 4, 3, 6, 5, 8))
dates = [datetime.date(2015, month, 1) for month in Invoices.index]
Invoices.index = dates
Invoices.plot()

Related

Highlight time interval in multivariate time-series plot using matplotlib and seaborn

I want to annotate a plot of multivariate time-series with time intervals (in colour for each type of annotation).
data overview
An example dataset looks like this:
metrik_0 metrik_1 metrik_2 geospatial_id topology_id \
2020-01-01 -0.848009 1.305906 0.924208 12 4
2020-01-01 -0.516120 0.617011 0.623065 8 3
2020-01-01 0.762399 -0.359898 -0.905238 19 3
2020-01-01 0.708512 -1.502019 -2.677056 8 4
2020-01-01 0.249475 0.590983 -0.677694 11 3
cohort_id device_id
2020-01-01 1 1
2020-01-01 1 9
2020-01-01 2 13
2020-01-01 2 8
2020-01-01 1 12
The labels look like this:
cohort_id marker_type start end
0 1 a 2020-01-02 00:00:00 NaT
1 1 b 2020-01-04 05:00:00 2020-01-05 16:00:00
2 1 a 2020-01-06 00:00:00 NaT
desired result
multivariate plot of all the time-series of a cohort_id
highlighting for the markers (different color for each type)
notice the markers might overlay / transparency is useful
there will be attenuation around the marker type a (configured by the number of hours)
I thought about using seaborn/matplotlib for this task.
So far I have come around:
%pylab inline
import seaborn as sns; sns.set()
import matplotlib.dates as mdates
aut_locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
aut_formatter = mdates.ConciseDateFormatter(aut_locator)
g = df[df['cohort_id'] == 1].plot(figsize=(8,8))
g.xaxis.set_major_locator(aut_locator)
g.xaxis.set_major_formatter(aut_formatter)
plt.show()
which is rather chaotic.
I fear, it will not be possible to fit the metrics (multivariate data) into a single plot.
It should be facetted by each column.
However, this again would require to reshape the dataframe for seaborn FacetGrid to work, which also doesn`t quite feel right - especially if the number of elements (time-series) in a cohort_id gets larger.
If FacetGrid is the right way, then something along the lines of: https://seaborn.pydata.org/examples/timeseries_facets.html would be the first part, but the labels would still be missing.
How could the labels be added?
How should the first part be accomplished?
An example of the desired result:
https://imgur.com/9J1EcmI, i.e. one of
for each metric value
code for the example data
The datasets are generated from the code snippet below:
import pandas as pd
import numpy as np
import random
random_seed = 47
np.random.seed(random_seed)
random.seed(random_seed)
def generate_df_for_device(n_observations, n_metrics, device_id, geo_id, topology_id, cohort_id):
df = pd.DataFrame(np.random.randn(n_observations,n_metrics), index=pd.date_range('2020', freq='H', periods=n_observations))
df.columns = [f'metrik_{c}' for c in df.columns]
df['geospatial_id'] = geo_id
df['topology_id'] = topology_id
df['cohort_id'] = cohort_id
df['device_id'] = device_id
return df
def generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels):
results = []
for i in range(1, n_devices +1):
#print(i)
r = random.randrange(1, n_devices)
cohort = random.randrange(1, cohort_levels)
topo = random.randrange(1, topo_levels)
df_single_dvice = generate_df_for_device(n_observations, n_metrics, i, r, topo, cohort)
results.append(df_single_dvice)
#print(r)
return pd.concat(results)
# hourly data, 1 week of data
n_observations = 7 * 24
n_metrics = 3
n_devices = 20
cohort_levels = 3
topo_levels = 5
df = generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels)
df = df.sort_index()
df.head()
marker_labels = pd.DataFrame({'cohort_id':[1,1, 1], 'marker_type':['a', 'b', 'a'], 'start':['2020-01-2', '2020-01-04 05', '2020-01-06'], 'end':[np.nan, '2020-01-05 16', np.nan]})
marker_labels['start'] = pd.to_datetime(marker_labels['start'])
marker_labels['end'] = pd.to_datetime(marker_labels['end'])

In general, you can use either plt.fill_between for horizontal and plt.fill_betweenx for vertical bands. For "bands-within-bands" you can just call the method twice.
A basic example using your data would look like this. I've used fixed values for the position of the bands, but you can put them on the main dataframe and reference them dynamically inside the loop.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3 ,figsize=(20, 9), sharex=True)
plt.subplots_adjust(hspace=0.2)
metriks = ["metrik_0", "metrik_1", "metrik_2"]
colors = ['#66c2a5', '#fc8d62', '#8da0cb'] #Set2 palette hexes
for i, metric in enumerate(metriks):
df[[metric]].plot(ax=ax[i], color=colors[i], legend=None)
ax[i].set_ylabel(metric)
ax[i].fill_betweenx(y=[-3, 3], x1="2020-01-04 05:00:00",
x2="2020-01-05 16:00:00", color='gray', alpha=0.2)
ax[i].fill_betweenx(y=[-3, 3], x1="2020-01-04 15:00:00",
x2="2020-01-05 00:00:00", color='gray', alpha=0.4)

How to make a date-based color bar based on df.idxmax series?

Python beginner/first poster here.
I'm running into trouble adding color bars to scatter plots. I have two types of plot: one that shows all the data color-coded by date, and one that shows just the maximum values of my data color-coded by date. In the first case, I can use the df.index (which is datetime) to make my color bar, but in the second case, I am using df2['col'].idxmax to generate the colors because my df2 is a df.groupby object which I'm using to generate the daily maximums in my data, and it does not have an accessible index.
For the first type of plot, I have succeeded in generating a date-based color bar with the code below, cobbled together from online examples:
fig, ax = plt.subplots(1,1, figsize=(20,20))
smap=plt.scatter(df.col1, df.col2, s=140,
c=[date2num(i.date()) for i in df.index],
marker='.')
cb = fig.colorbar(smap, orientation='vertical',
format=DateFormatter('%d %b %y'))
However for the second type of plot, where I am trying to use df2['col'].idxmax to create the date series instead of df.index, the following does not work:
for n in cols1:
for m in cols2:
fig, ax = plt.subplots(1,1, figsize=(15,15))
maxTimes=df2[n].idxmax()
PlottableTimes=maxTimes.dropna() #some NaNs in the
#.idxmax series were giving date2num trouble
smap2=plt.scatter(df2[n].max(), df2[m].max(),
s=160, c=[date2num(i.date()) for i in PlottableTimes],
marker='.')
cb2 = fig.colorbar(smap2, orientation='vertical',
format=DateFormatter('%d %b %y'))
plt.show()
The error is: 'length of rgba sequence should be either 3 or 4'
Because the error was complaining of the color argument, I separately checked the output of the color (that is, c=) arguments in the respective plotting commands, and both look similar to me, so I can't figure out why one color argument works and the other doesn't:
one that works:
[736809.0,
736809.0,
736809.0,
736809.0,
736809.0,
736809.0,
736809.0,
736809.0,
736809.0,
736809.0,
...]
one that doesn't work:
[736845.0,
736846.0,
736847.0,
736848.0,
736849.0,
736850.0,
736851.0,
736852.0,
736853.0,
736854.0,
...]
Any suggestions or explanations? I'm running python 3.5.2. Thank you in advance for helping me understand this.
Edit 1: I made the following example for others to explore, and in the process realized the crux of the issue is different than my first question. The code below works the way I want it to:
df=pd.DataFrame(np.random.randint(low=0, high=10, size=(169, 8)),
columns=['a', 'b', 'c', 'd', 'e','f','g','h']) #make sample data
date_rng = pd.date_range(start='1/1/2018', end='1/8/2018', freq='H')
df['i']=date_rng
df = df.set_index('i') #get a datetime index
df['ts']=date_rng #get a datetime column to group by
from pandas import Grouper
df2=df.groupby(Grouper(key='ts', freq='D'))
for n in ['a','b','c','d']: #now make some plots
for m in ['e','f','g','h']:
print(m)
print(n)
fig, ax = plt.subplots(1,1, figsize=(5,5))
maxTimes=df2[n].idxmax()
PlottableTimes=maxTimes.dropna()
smap=plt.scatter(df2[n].max(), df2[m].max(), s=160,
c=[date2num(i.date()) for i in PlottableTimes],
marker='.')
cb = fig.colorbar(smap, orientation='vertical',
format=DateFormatter('%d %b %y'))
plt.show()
The only difference between my real data and this example is that my real data has many NaNs scattered throughout. So, I think what is going wrong is that the 'c=' argument isn't long enough for the plotting command to interpret it as covering the whole date range...? For example, if I manually put in the output of the c= command, I get the following code which also works:
for n in ['a','b','c','d']:
for m in ['e','f','g','h']:
print(m)
print(n)
fig, ax = plt.subplots(1,1, figsize=(5,5))
maxTimes=df2[n].idxmax()
PlottableTimes=maxTimes.dropna()
smap=plt.scatter(df2[n].max(), df2[m].max(), s=160,
c=[736809.0, 736810.0, 736811.0, 736812.0, 736813.0, 736814.0, 736815.0, 736816.0],
marker='.')
cb = fig.colorbar(smap, orientation='vertical',
format=DateFormatter('%d %b %y'))
plt.show()
But, if I shorten the c= array by some amount, to emulate what is happening in my code when NaNs are being dropped from idxmax, it gives the same error I am seeing:
for n in ['a','b','c','d']:
for m in ['e','f','g','h']:
print(m)
print(n)
fig, ax = plt.subplots(1,1, figsize=(5,5))
maxTimes=df2[n].idxmax()
PlottableTimes=maxTimes.dropna()
smap=plt.scatter(df2[n].max(), df2[m].max(), s=160,
c=[736809.0, 736810.0, 736811.0, 736812.0, 736813.0, 736814.0],
marker='.')
cb = fig.colorbar(smap, orientation='vertical',
format=DateFormatter('%d %b %y'))
plt.show()
So this means the real question is: how can I grab the grouper column after grouping from the groupby object, when none of the columns appear to be grab-able with df2.col? I would like to be able to grab 'ts' from the following and use it to be the color data, instead of using idxmax:
df2['a'].max()
ts
2018-01-01 9
2018-01-02 9
2018-01-03 9
2018-01-04 9
2018-01-05 9
2018-01-06 9
2018-01-07 9
2018-01-08 8
Freq: D, Name: a, dtype: int64

Essentially, your Grouper call is similar to indexing on your date time column and callingpandas.DataFrame.resample specifying the aggregate function:
df.set_index('ts').resample('D').max()
# a b c d e f g h
# ts
# 2018-01-01 9 9 8 9 9 9 9 9
# 2018-01-02 9 9 9 9 9 9 9 9
# 2018-01-03 9 9 9 9 9 9 9 9
# 2018-01-04 9 9 9 9 9 9 9 9
# 2018-01-05 9 9 9 9 9 9 9 9
# 2018-01-06 9 9 9 8 9 9 9 9
# 2018-01-07 9 9 9 9 9 9 9 9
# 2018-01-08 2 8 6 3 1 3 2 7
Therefore, the return of df2['a'].max() is a Pandas Resampler object, very similar to a Pandas Series and hence carries the index property which you can use for color bar specification:
df['a'].max().index
# DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
# '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
# dtype='datetime64[ns]', name='ts', freq='D')
From there you can pass into date2num without list comprehension:
date2num(df2['a'].max().index)
# array([736695., 736696., 736697., 736698., 736699., 736700., 736701., 736702.])
Altogether, simply use above in loop without needing maxTimes or PlottableTimes:
fig, ax = plt.subplots(1, 1, figsize = (5,5))
smap = plt.scatter(df2[n].max(), df2[m].max(), s = 160,
c = date2num(df2[n].max().index),
marker = '.')
cb = fig.colorbar(smap, orientation = 'vertical',
format = DateFormatter('%d %b %y'))

Seaborn distplot only whole numbers

How can I make a distplot with seaborn to only have whole numbers?
My data is an array of numbers between 0 and ~18. I would like to plot the distribution of the numbers.
Impressions
0 210
1 1084
2 2559
3 4378
4 5500
5 5436
6 4525
7 3329
8 2078
9 1166
10 586
11 244
12 105
13 51
14 18
15 5
16 3
dtype: int64
Code I'm using:
sns.distplot(Impressions,
# bins=np.arange(Impressions.min(), Impressions.max() + 1),
# kde=False,
axlabel=False,
hist_kws={'edgecolor':'black', 'rwidth': 1})
plt.xticks = range(current.Impressions.min(), current.Impressions.max() + 1, 1)
Plot looks like this:
What I'm expecting:
The xlabels should be whole numbers
Bars should touch each other
The kde line should simply connect the top of the bars. By the looks of it, the current one assumes to have 0s between (x, x + 1), hence why the downward spike (This isn't required, I can turn off kde)
Am I using the correct tool for the job or distplot shouldn't be used for whole numbers?

For your problem can be solved bellow code,
import seaborn as sns # for data visualization
import numpy as np # for numeric computing
import matplotlib.pyplot as plt # for data visualization
arr = np.array([1,2,3,4,5,6,7,8,9])
sns.distplot(arr, bins = arr, kde = False)
plt.xticks(arr)
plt.show()
enter image description here
In this way, you can plot histogram using seaborn sns.distplot() function.
Note: Whatever data you will pass to bins and plt.xticks(). It should be an ascending order.

Pandas dataframe plotting - issue when switching from two subplots to single plot w/ secondary axis

I have two sets of data I want to plot together on a single figure. I have a set of flow data at 15 minute intervals I want to plot as a line plot, and a set of precipitation data at hourly intervals, which I am resampling to a daily time step and plotting as a bar plot. Here is what the format of the data looks like:
2016-06-01 00:00:00 56.8
2016-06-01 00:15:00 52.1
2016-06-01 00:30:00 44.0
2016-06-01 00:45:00 43.6
2016-06-01 01:00:00 34.3
At first I set this up as two subplots, with precipitation and flow rate on different axis. This works totally fine. Here's my code:
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
filename = 'manhole_B.csv'
plotname = 'SSMH-2A B'
plt.style.use('bmh')
# Read csv with precipitation data, change index to datetime object
pdf = pd.read_csv('precip.csv', delimiter=',', header=None, index_col=0)
pdf.columns = ['Precipitation[in]']
pdf.index.name = ''
pdf.index = pd.to_datetime(pdf.index)
pdf = pdf.resample('D').sum()
print(pdf.head())
# Read csv with flow data, change index to datetime object
qdf = pd.read_csv(filename, delimiter=',', header=None, index_col=0)
qdf.columns = ['Flow rate [gpm]']
qdf.index.name = ''
qdf.index = pd.to_datetime(qdf.index)
# Plot
f, ax = plt.subplots(2)
qdf.plot(ax=ax[1], rot=30)
pdf.plot(ax=ax[0], kind='bar', color='r', rot=30, width=1)
ax[0].get_xaxis().set_ticks([])
ax[1].set_ylabel('Flow Rate [gpm]')
ax[0].set_ylabel('Precipitation [in]')
ax[0].set_title(plotname)
f.set_facecolor('white')
f.tight_layout()
plt.show()
2 Axis Plot
However, I decided I want to show everything on a single axis, so I modified my code to put precipitation on a secondary axis. Now my flow data data has disppeared from the plot, and even when I set the axis ticks to an empty set, I get these 00:15 00:30 and 00:45 tick marks along the x-axis.
Secondary-y axis plots
Any ideas why this might be occuring?
Here is my code for the single axis plot:
f, ax = plt.subplots()
qdf.plot(ax=ax, rot=30)
pdf.plot(ax=ax, kind='bar', color='r', rot=30, secondary_y=True)
ax.get_xaxis().set_ticks([])

Here is an example:
Setup
In [1]: from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame({'x' : np.arange(10),
'y1' : np.random.rand(10,),
'y2' : np.square(np.arange(10))})
df
Out[1]: x y1 y2
0 0 0.451314 0
1 1 0.321124 1
2 2 0.050852 4
3 3 0.731084 9
4 4 0.689950 16
5 5 0.581768 25
6 6 0.962147 36
7 7 0.743512 49
8 8 0.993304 64
9 9 0.666703 81
Plot
In [2]: fig, ax1 = plt.subplots()
ax1.plot(df['x'], df['y1'], 'b-')
ax1.set_xlabel('Series')
ax1.set_ylabel('Random', color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
ax2 = ax1.twinx() # Note twinx, not twiny. I was wrong when I commented on your question.
ax2.plot(df['x'], df['y2'], 'ro')
ax2.set_ylabel('Square', color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
Out[2]:

Matplotlib showing x-tick labels overlapping

Have a look at the graph below:
It's a subplot of this larger figure:
I see two problems with it. First, the x-axis labels overlap with one another (this is my major issue). Second. the location of the x-axis minor gridlines seems a bit wonky. On the left of the graph, they look properly spaced. But on the right, they seem to be crowding the major gridlines...as if the major gridline locations aren't proper multiples of the minor tick locations.
My setup is that I have a DataFrame called df which has a DatetimeIndex on the rows and a column called value which contains floats. I can provide an example of the df contents in a gist if necessary. A dozen or so lines of df are at the bottom of this post for reference.
Here's the code that produces the figure:
now = dt.datetime.now()
fig, axes = plt.subplots(2, 2, figsize=(15, 8), dpi=200)
for i, d in enumerate([360, 30, 7, 1]):
ax = axes.flatten()[i]
earlycut = now - relativedelta(days=d)
data = df.loc[df.index>=earlycut, :]
ax.plot(data.index, data['value'])
ax.xaxis_date()
ax.get_xaxis().set_minor_locator(mpl.ticker.AutoMinorLocator())
ax.get_yaxis().set_minor_locator(mpl.ticker.AutoMinorLocator())
ax.grid(b=True, which='major', color='w', linewidth=1.5)
ax.grid(b=True, which='minor', color='w', linewidth=0.75)
What is my best option here to get the x-axis labels to stop overlapping each other (in each of the four subplots)? Also, separately (but less urgently), what's up with the minor tick issue in the top-left subplot?
I am on Pandas 0.13.1, numpy 1.8.0, and matplotlib 1.4.x.
Here's a small snippet of df for reference:
id scale tempseries_id value
timestamp
2014-11-02 14:45:10.302204+00:00 7564 F 1 68.0000
2014-11-02 14:25:13.532391+00:00 7563 F 1 68.5616
2014-11-02 14:15:12.102229+00:00 7562 F 1 68.9000
2014-11-02 14:05:13.252371+00:00 7561 F 1 69.0116
2014-11-02 13:55:11.792191+00:00 7560 F 1 68.7866
2014-11-02 13:45:10.782227+00:00 7559 F 1 68.6750
2014-11-02 13:35:10.972248+00:00 7558 F 1 68.4500
2014-11-02 13:25:10.362213+00:00 7557 F 1 68.1116
2014-11-02 13:15:10.822247+00:00 7556 F 1 68.2250
2014-11-02 13:05:10.102200+00:00 7555 F 1 68.5616
2014-11-02 12:55:10.292217+00:00 7554 F 1 69.0116
2014-11-02 12:45:10.382226+00:00 7553 F 1 69.3500
2014-11-02 12:35:10.642245+00:00 7552 F 1 69.2366
2014-11-02 12:25:12.642255+00:00 7551 F 1 69.1250
2014-11-02 12:15:11.122382+00:00 7550 F 1 68.7866
2014-11-02 12:05:11.332224+00:00 7549 F 1 68.5616
2014-11-02 11:55:11.662311+00:00 7548 F 1 68.2250
2014-11-02 11:45:11.122193+00:00 7547 F 1 68.4500
2014-11-02 11:35:11.162271+00:00 7546 F 1 68.7866
2014-11-02 11:25:12.102211+00:00 7545 F 1 69.2366
2014-11-02 11:15:10.422226+00:00 7544 F 1 69.4616
2014-11-02 11:05:11.412216+00:00 7543 F 1 69.3500
2014-11-02 10:55:10.772212+00:00 7542 F 1 69.1250
2014-11-02 10:45:11.332220+00:00 7541 F 1 68.7866
2014-11-02 10:35:11.332232+00:00 7540 F 1 68.5616
2014-11-02 10:25:11.202411+00:00 7539 F 1 68.2250
2014-11-02 10:15:11.932326+00:00 7538 F 1 68.5616
2014-11-02 10:05:10.922229+00:00 7537 F 1 68.9000
2014-11-02 09:55:11.602357+00:00 7536 F 1 69.3500
Edit: Trying fig.autofmt_xdate():
I don't think this going to do the trick. This seems to use the same x-tick labels for both graphs on the left and also for both graphs on the right. Which is not correct given my data. Please see the problematic output below:

Ok, finally got it working. The trick was to use plt.setp to manually rotate the tick labels. Using fig.autofmt_xdate() did not work as it does some unexpected things when you have multiple subplots in your figure. Here's the working code with its output:
for i, d in enumerate([360, 30, 7, 1]):
ax = axes.flatten()[i]
earlycut = now - relativedelta(days=d)
data = df.loc[df.index>=earlycut, :]
ax.plot(data.index, data['value'])
ax.get_xaxis().set_minor_locator(mpl.ticker.AutoMinorLocator())
ax.get_yaxis().set_minor_locator(mpl.ticker.AutoMinorLocator())
ax.grid(b=True, which='major', color='w', linewidth=1.5)
ax.grid(b=True, which='minor', color='w', linewidth=0.75)
plt.setp(ax.get_xticklabels(), rotation=30, horizontalalignment='right')
fig.tight_layout()
By the way, the comment earlier about some matplotlib things taking forever is very interesting here. I'm using a raspberry pi to act as a weather station at a remote location. It's collecting the data and serving the results via the web. And boy oh boy, it's really wheezing trying to put out these graphics.

Due to the way text rendering is handled in matplotlib, auto-detecting overlapping text really slows things down. (The space that text takes up can't be accurately calculated until after it's been drawn.) For that reason, matplotlib doesn't try to do this automatically.
Therefore, it's best to rotate long tick labels. Because dates most commonly have this problem, there's a figure method fig.autofmt_xdate() that will (among other things) rotate the tick labels to make them a bit more readable. (Note: If you're using a pandas plot method, it returns an axes object, so you'll need to use ax.figure.autofmt_xdate().)
As a quick example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
time = pd.date_range('01/01/2014', '4/01/2014', freq='H')
values = np.random.normal(0, 1, time.size).cumsum()
fig, ax = plt.subplots()
ax.plot_date(time, values, marker='', linestyle='-')
fig.autofmt_xdate()
plt.show()
If we were to leave fig.autofmt_xdate() out:
And if we use fig.autofmt_xdate():

For the problems which don't have date values in x axis, rather a string, you can insert \n character in x axis values so they don't overlap. Here is an example -
The data frame is
somecol value
category 1 of column 16
category 2 of column 13
category 3 of column 21
category 4 of column 20
category 5 of column 11
category 6 of column 22
category 7 of column 19
category 8 of column 14
category 9 of column 18
category 10 of column 23
category 11 of column 10
category 12 of column 24
category 13 of column 17
category 14 of column 15
category 15 of column 12
I need to plot value on y axis and somecol on x axis, which will normally be plotted like this -
As you can see, there is a lot of overlap. Now introduce \n character in somecol column.
somecol = df['somecol'].values.tolist()
for i in range(len(somecol)):
x = somecol[i].split(' ')
# insert \n before 'of'
x.insert(x.index('of'),'\n')
somecol[i] = ' '.join(x)
Now if you plot, it will look like this -
plt.plot(somecol, df['val'])
This method works well if you don't want to rotate your labels.
The only con so far I found in this method is that you need to tweak your labels 3-4 times i.e., try with multiple formats to display the plot in best format.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Lines not showing up on Matplotlib graph - python

Related

Highlight time interval in multivariate time-series plot using matplotlib and seaborn

How to make a date-based color bar based on df.idxmax series?

Seaborn distplot only whole numbers

Pandas dataframe plotting - issue when switching from two subplots to single plot w/ secondary axis

Matplotlib showing x-tick labels overlapping

Categories

Resources