Matplotlib: custom ticker for pandas MultiIndex DataFrame - python

I have a large pandas MultiIndex DataFrame that I would like to plot. A minimal example would look like:
import pandas as pd
years = range(2015, 2018)
fields = range(4)
days = range(4)
bands = ['R', 'G', 'B']
index = pd.MultiIndex.from_product(
[years, fields], names=['year', 'field'])
columns = pd.MultiIndex.from_product(
[days, bands], names=['day', 'band'])
df = pd.DataFrame(0, index=index, columns=columns)
df.loc[(2015,), (0,)] = 1
df.loc[(2016,), (1,)] = 1
df.loc[(2017,), (2,)] = 1
If I plot this using plt.spy, I get:
However, the tick locations and labels are less than desirable. I would like the ticks to completely ignore the second level of the MultiIndex. Using IndexLocator and IndexFormatter, I'm able to do the following:
from matplotlib.ticker import IndexFormatter, IndexLocator
import matplotlib.pyplot as plt
ax = plt.gca()
plt.spy(df)
xbase = len(bands)
xoffset = xbase / 2
xlabels = df.columns.get_level_values('day')
ax.xaxis.set_major_locator(IndexLocator(base=xbase, offset=xoffset))
ax.xaxis.set_major_formatter(IndexFormatter(xlabels))
plt.xlabel('Day')
ax.xaxis.tick_bottom()
ybase = len(fields)
yoffset = ybase / 2
ylabels = df.index.get_level_values('year')
ax.yaxis.set_major_locator(IndexLocator(base=ybase, offset=yoffset))
ax.yaxis.set_major_formatter(IndexFormatter(ylabels))
plt.ylabel('Year')
plt.show()
This gives me exactly what I want:
But here's the problem. My actual DataFrame has 15 years, 4,000 fields, 365 days, and 7 bands. If I actually label every single day, the labels would be illegible. I could place a tick every 50 days, but I would like the ticks to be dynamic so that when I zoom in, the ticks become more fine-grained. Basically what I'm looking for is a custom MultiIndexLocator that combines the placement of IndexLocator with the dynamism of MaxNLocator.
Bonus: My data is really nice in the sense that there are always the same number of fields for every year and the same number of bands for every day. But what if this was not the case? I would love to contribute a generic MultiIndexLocator and MultiIndexFormatter to matplotlib that works for any MultiIndex DataFrame.

Matplotlib does not know about dataframes or MultiIndex. It simply plots the data you supply. I.e. you get the same as if you were plotting the numpy array of data, spy(df.values).
So I would suggest to first set the extent of the image correctly such that you may use numeric tickers. Then a MaxNLocator should work fine, unless you do not zoom in too much.
import numpy as np
import pandas as pd
from matplotlib.ticker import MaxNLocator
import matplotlib.pyplot as plt
plt.rcParams['axes.formatter.useoffset'] = False
years = range(2000, 2018)
fields = range(9) #17
days = range(120) #365
bands = ['R', 'G', 'B', 'A']
index = pd.MultiIndex.from_product(
[years, fields], names=['year', 'field'])
columns = pd.MultiIndex.from_product(
[days, bands], names=['day', 'band'])
data = np.random.rand(len(years)*len(fields),len(days)*len(bands))
x,y = np.meshgrid(np.arange(data.shape[1]),np.arange(data.shape[0]))
data += 2*((y//len(fields)+x//len(bands)) % 2)
df = pd.DataFrame(data, index=index, columns=columns)
############
# Plotting
############
xbase = len(bands)
xlabels = df.columns.get_level_values('day')
ybase = len(fields)
ylabels = df.index.get_level_values('year')
extent = [xlabels.min()-np.diff(np.unique(xlabels))[0]/2.,
xlabels.max()+np.diff(np.unique(xlabels))[0]/2.,
ylabels.min()-np.diff(np.unique(ylabels))[0]/2.,
ylabels.max()+np.diff(np.unique(ylabels))[0]/2.,]
fig, ax = plt.subplots()
ax.imshow(df.values, extent=extent, aspect="auto")
ax.set_ylabel('Year')
ax.set_xlabel('Day')
ax.xaxis.set_major_locator(MaxNLocator(integer=True,min_n_ticks=1))
ax.yaxis.set_major_locator(MaxNLocator(integer=True,min_n_ticks=1))
plt.show()

Related

How to annotate points in a scatterplot based on a pandas column

Wanted 'Age' as the x-axis, 'Pos' as the y-axis and labels as 'Player' Names. But for some reason, not able to do label the points.
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import adjustText as at
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
y = df.Player
texts = []
for i, txt in enumerate(y):
plt.text()
at.adjust_text(texts, arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='black', lw=0.5, alpha=0.5))
plt.show()
Summary of the data :
df.head()
Player Pos Age
0 Thibaut Courtois GK 28
1 Karim Benzema FW 32
2 Sergio Ramos DF 34
3 Raphael Varane DF 27
4 Luka Modric MF 35
Error :
ConversionError: Failed to convert value(s) to axis units: 'GK'
This is the plot so far; not able to label these points:
EDIT:
This is what I wanted but of all points:
Also, Could anyone help me in re-ordering the labels on the yaxis.
Like, I wanted FW,MF,DF,GK as my order but the plot is in MF,DF,FW,GK.
Thanks.
A similar solution was described here. Essentially, you want to annotate the points in your scatter plot.
I have stripped your code. Note that you need to plot the data with matplotlib (and not with pandas): df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age']). In this way, you can use the annotation()-method.
import matplotlib.pyplot as plt
import pandas as pd
# build data
data = [
['Thibaut Courtois', 'GK', 28],
['Karim Benzema', 'FW', 32],
['Sergio Ramos','DF', 34],
['Raphael Varane', 'DF', 27],
['Luka Modric', 'MF', 35],
]
# create pandas DataFrame
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
# open figure + axis
fig, ax = plt.subplots()
# plot
ax.scatter(x=df['Age'],y=df['Pos'],c='DarkBlue')
# set labels
ax.set_xlabel('Age')
ax.set_ylabel('Pos')
# annotate points in axis
for idx, row in df.iterrows():
ax.annotate(row['Player'], (row['Age'], row['Pos']) )
# force matplotlib to draw the graph
plt.show()
This is what you'll get as output:

hourly heatmap from multi years timeseries python

I need to create a hourly mean multi plot heatmap of Temperature as in:
for sevel years. The data to plot are read from excel sheet. The excel sheet is formated as "year", "month", "day", "hour", "Temp".
I created a mounthly mean heatmap using seaborn library, using this code :
df = pd.read_excel('D:\\Users\\CO2_heatmap.xlsx')
co2=df.pivot_table(index="month",columns="year",values='CO2',aggfunc="mean")
ax = sns.heatmap(co2,cmap='bwr',vmin=370,vmax=430, cbar_kws={'label': '$\mathregular{CO_2}$ [ppm]', 'orientation': 'vertical'})
Obtaining this graph:
How can I generate a
co2=df.pivot_table(index="hour",columns="day",values='CO2',aggfunc="mean")
for each month and for each year?
The seaborn heat map did not allow me to draw multiple graphs of different axes. I created a graph by SNSing that one graph with multiple graphs. It was not customizable like the reference graph. Sorry we are not able to help you.
import pandas as pd
import numpy as np
import random
date_rng = pd.date_range('2018-01-01', '2019-12-31',freq='1H')
temp = np.random.randint(-30.0, 40.0,(17497,))
df = pd.DataFrame({'CO2':temp},index=pd.to_datetime(date_rng))
df.insert(1, 'year', df.index.year)
df.insert(2, 'month', df.index.month)
df.insert(3, 'day', df.index.day)
df.insert(4, 'hour', df.index.hour)
df = df.copy()
yyyy = df['year'].unique()
month = df['month'].unique()
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(figsize=(20,10), nrows=2, ncols=12)
for m, ax in zip(range(1,25), axes.flat):
if m <= 12:
y = yyyy[0]
df1 = df[(df['year'] == y) & (df['month'] == m)]
else:
y = yyyy[1]
m -= 12
df1 = df[(df['year'] == y) & (df['month'] == m)]
df1 = df1.pivot_table(index="hour",columns="day",values='CO2',aggfunc="mean")
plt.figure(m)
sns.heatmap(df1, cmap='RdBu', cbar=False, ax=ax)
This might help- /hourly-heatmap-graph-using-python-s-ggplot2-implementation-plotnine
There's also a guide to producing this exact plot (for two years of data) on the
Python graph gallery-heatmap-for-timeseries-matplotlib
I'm afraid I don't know any Python, so didn't want to copy/paste in case I missed anything. I did, however, create the original plot in R :) The main trick was to use facet_grid to split the data by year and month, and reverse the y axis labels.
It looks like
fig, axes = plt.subplots(2, 12, figsize=(14, 10), sharey=True)
for i, year in enumerate([2004, 2005]):
for j, month in enumerate(range(1, 13)):
single_plot(data, month, year, axes[i, j])
does the work of splitting by year and month.
I hope this helps you get further forward

How to set a time range on the X axis and date range in the Y axis with colormap

I have created a code, which shows a heatmap of the data in the CSV file.
The code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
data= pd.read_csv("data.csv" , sep=';', header=0,
index_col='Date')
fig=plt.imshow(data, cmap='YlOrBr', interpolation='nearest')
plt.colorbar()
plt.xlabel("Time (UTC)")
plt.ylabel("Date")
plt.show()
The dataset is as follows:
The time range varies from 00:00 till 23:50 with steps of 10 minutes.
I want the x axis to show the time from 00:00 till 23:50 in steps per hour.
The index is set as date. The date range is from 29-Oct-2017 till 24-Mar-2018.
I want the Y axis to show the date range in steps of months.
You can stack columns, then groupby month and hour and then unstack it back (I'm taking mean values here when aggregating, but you can change to sum or whatever aggregation should be done there):
df = pd.DataFrame(np.nan,
columns=pd.date_range('00:00', '23:50', freq='10min'),
index=pd.date_range('2017-10-29', '2018-03-24'))
df[df.columns] = np.random.randint(0, 100, df.shape)
fig, ax = plt.subplots(2, figsize=(10,6))
ax[0].imshow(df, cmap='YlOrBr')
ix = df.stack().index
l1 = ix.get_level_values(0).month
l2 = ix.get_level_values(1).hour
df2 = df.stack().groupby([l1,l2], sort=False).mean().unstack(1)
ax[1].imshow(df2, cmap='YlOrBr')
Output (original DataFrame above, processed below):
Update:
If the goal is just to put monthly and hourly labels on the same plot, please see below:
df = pd.DataFrame(np.nan,
columns=pd.date_range('00:00', '23:50', freq='10min').astype(str),
index=pd.date_range('2017-10-29', '2018-03-24').astype(str))
df[df.columns] = np.random.randn(*(df.shape))
fig, ax = plt.subplots(1, figsize=(10,6))
l1 = pd.to_datetime(df.index).month
l2 = pd.to_datetime(df.columns).hour
x = pd.Series(l2).drop_duplicates()
y = pd.Series(l1).drop_duplicates()
ax.imshow(df, cmap='YlOrBr')
ax.set_xticks(x.index)
ax.set_xticklabels(x)
ax.set_yticks(y.index)
ax.set_yticklabels(y)
Output:

Pandas and Matplotlib plotting df as subplots with 2 y-axes

I'm trying to plot a dataframe to a few subplots using pandas and matplotlib.pyplot. But I want to have the two columns use different y axes and have those shared between all subplots.
Currently my code is:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Area':['A', 'A', 'A', 'B', 'B', 'C','C','C','D','D','D','D'],
'Rank':[1,2,3,1,2,1,2,3,1,2,3,4],
'Count':[156,65,152,70,114,110,195,92,44,179,129,76],
'Value':[630,426,312,191,374,109,194,708,236,806,168,812]}
)
df = df.set_index(['Area', 'Rank'])
fig = plt.figure(figsize=(6,4))
for i, l in enumerate(['A','B','C','D']):
if i == 0:
sub1 = fig.add_subplot(141+i)
else:
sub1 = fig.add_subplot(141+i, sharey=sub1)
df.loc[l].plot(kind='bar', ax=sub1)
This produces:
This works to plot the 4 graphs side by side which is what I want but both columns use the same y-axis I'd like to have the 'Count' column use a common y-axis on the left and the 'Value' column use a common secondary y-axis on the right.
Can anybody suggest a way to do this? My attempts thus far have lead to each graph having it's own independent y-axis.
To create a secondary y axis, you can use twinax = ax.twinx(). Once can then join those twin axes via the join method of an axes Grouper, twinax.get_shared_y_axes().join(twinax1, twinax2). See this question for more details.
The next problem is then to get the two different barplots next to each other. Since I don't think there is a way to do this using the pandas plotting wrappers, one can use a matplotlib bar plot, which allows to specify the bar position quantitatively. The positions of the left bars would then be shifted by the bar width.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Area':['A', 'A', 'A', 'B', 'B', 'C','C','C','D','D','D','D'],
'Rank':[1,2,3,1,2,1,2,3,1,2,3,4],
'Count':[156,65,152,70,114,110,195,92,44,179,129,76],
'Value':[630,426,312,191,374,109,194,708,236,806,168,812]}
)
df = df.set_index(['Area', 'Rank'])
fig, axes = plt.subplots(ncols=len(df.index.levels[0]), figsize=(6,4), sharey=True)
twinaxes = []
for i, l in enumerate(df.index.levels[0]):
axes[i].bar(df["Count"].loc[l].index.values-0.4,df["Count"].loc[l], width=0.4, align="edge" )
ax2 = axes[i].twinx()
twinaxes.append(ax2)
ax2.bar(df["Value"].loc[l].index.values,df["Value"].loc[l], width=0.4, align="edge", color="C3" )
ax2.set_xticks(df["Value"].loc[l].index.values)
ax2.set_xlabel("Rank")
[twinaxes[0].get_shared_y_axes().join(twinaxes[0], ax) for ax in twinaxes[1:]]
[ax.tick_params(labelright=False) for ax in twinaxes[:-1]]
axes[0].set_ylabel("Count")
axes[0].yaxis.label.set_color('C0')
axes[0].tick_params(axis='y', colors='C0')
twinaxes[-1].set_ylabel("Value")
twinaxes[-1].yaxis.label.set_color('C3')
twinaxes[-1].tick_params(axis='y', colors='C3')
twinaxes[0].relim()
twinaxes[0].autoscale_view()
plt.show()

convert Panda Line graph to Bar graph with month name

I am trying to convert Line garph to Bar graph using python panda.
Here is my code which gives perfect line graph as per my requirement.
conn = sqlite3.connect('Demo.db')
collection = ['ABC','PQR']
df = pd.read_sql("SELECT * FROM Table where ...", conn)
df['DateTime'] = df['Timestamp'].apply(lambda x: dt.datetime.fromtimestamp(x))
df.groupby('Type').plot(x='DateTime', y='Value',linewidth=2)
plt.legend(collection)
plt.show()
Here is my DataFrame df
http://postimg.org/image/75uy0dntf/
Here is my Line graph output from above code.
http://postimg.org/image/vc5lbi9xv/
I want to draw bar graph instead of line graph.I want month name on x axis and value on y axis. I want colorful bar graph.
Attempt made
df.plot(x='DateTime', y='Value',linewidth=2, kind='bar')
plt.show()
It gives improper bar graph with date and time(instead of month and year) on x axis. Thank you for help.
Here is a code that might do what you want.
In this code, I first sort your database by time. This step is important, because I use the indices of the sorted database as abscissa of your plots, instead of the timestamp. Then, I group your data frame by type and I plot manually each group at the right position (using the sorted index). Finally, I re-define the ticks and the tick labels to display the date in a given format (in this case, I chose MM/YYYY but that can be changed).
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
types = ['ABC','BCD','PQR']*3
vals = [126,1587,141,10546,1733,173,107,780,88]
ts = [1414814371, 1414814371, 1406865621, 1422766793, 1422766793, 1425574861, 1396324799, 1396324799, 1401595199]
aset = zip(types, vals, ts)
df = pd.DataFrame(data=aset, columns=['Type', 'Value', 'Timestamp'])
df = df.sort(['Timestamp', 'Type'])
df['Date'] = df['Timestamp'].apply(lambda x: datetime.datetime.fromtimestamp(x).strftime('%m/%Y'))
groups = df.groupby('Type')
ngroups = len(groups)
colors = ['r', 'g', 'b']
fig = plt.figure()
ax = fig.add_subplot(111, position=[0.15, 0.15, 0.8, 0.8])
offset = 0.1
width = 1-2*offset
#
for j, group in enumerate(groups):
x = group[1].index+offset
y = group[1].Value
ax.bar(x, y, width=width, color=colors[j], label=group[0])
xmin, xmax = min(df.index), max(df.index)+1
ax.set_xlim([xmin, xmax])
ax.tick_params(axis='x', which='both', top='off', bottom='off')
plt.xticks(np.arange(xmin, xmax)+0.5, list(df['Date']), rotation=90)
ax.legend()
plt.show()
I hope this works for you. This is the output that I get, given my subset of your database.

Categories