I am using Matplotlib's PdfPages to plot various figures and tables from queried data and generate a Pdf. I want to group plots by various sections such as "Stage 1", "Stage 2", and "Stage 3", by essentially creating section headers. For example, in a Jupyter notebook I can make cell's markdown and create bolded headers. However, I am not sure how to do something similar with PdfPages. One idea I had was to generate a 1 cell table containing the section title. Instead of creating a 1 cell table, it has a cell per character in the title.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 2))
ax = plt.subplot(111)
ax.axis('off')
tab = ax.table(cellText=['Stage 1'], bbox=[0, 0, 1, 1])
tab.auto_set_font_size(False)
tab.set_fontsize(24)
This results in the following output:
If anyone has suggestions for how to create section headers or at least fix the cell issue in the table I created, I would appreciate your input. Thanks!
You need to use colLabels to name the columns and use the cellText with a corresponding shape
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 2))
ax = plt.subplot(111)
ax.axis('off')
length = 7
colLabels = ['Stage %s' %i for i in range(1,length+1)] # <--- 1 row, 7 columns
cellText = np.random.randint(0, 10, (1,length))
tab = ax.table(cellText=cellText, colLabels=colLabels, bbox=[0, 0, 1, 1], cellLoc = 'center')
tab.auto_set_font_size(False)
tab.set_fontsize(14)
Table with multiple rows
cellText = np.random.randint(0, 10, (3,length)) # <--- 3 rows, 7 columns
tab = ax.table(cellText=cellText, colLabels=colLabels, bbox=[0, 0, 1, 1], cellLoc = 'center')
To get a single row with multiple columns starting from 2 rows, 7 columns
tab = ax.table(cellText=[['']*length], colLabels=colLabels, bbox=[0, 0, 1, 1], cellLoc = 'center')
cells=tab.get_celld()
for i in range(length):
cells[(1,i)].set_height(0)
Getting a single column Using in the above code
length = 1
produces
A table expects two dimensional cellText. I.e. the mth column of the nth row has the content cellText[n][m]. If cellText=['Stage 1'], cellText[0][0] will evaluate to "S", because there is one row and the string inside is indexed as the columns. Instead you probably want to use
ax.table(cellText=[['Stage 1']])
i.e. the whole text as the first column of the first row.
Now the underlying question seems to be how to add a section title, and maybe using a table for that is not the best approach? At least a similar result could be achieved with a usual text,
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 2))
ax.tick_params(labelleft=False, left=False, labelbottom=False, bottom=False)
ax.annotate('Stage 1', (.5,.5), ha="center", va="center", fontsize=24)
plt.show()
I may be misunderstanding your question, but if your ultimate goal is to group multiple plots together in PDF, one solution is to make each of your plots a subplot of the same figure. For example:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import random
# Declare the PDF file and the single figure
pp = PdfPages('test.pdf')
thefig = plt.figure()
thefig.suptitle("Group 1")
# Generate 4 subplots for the same figure, arranged in a 2x2 grid
subplots = [ ["Plot One", 221], ["Plot Two", 222],
["Plot Three", 223], ["Plot Four", 224] ]
for [subplot_title, grid_position] in subplots:
plt.subplot(grid_position)
plt.title(subplot_title)
# Make a random bar graph:
plt.bar(range(1,11), [ random.random() for i in range(10) ])
# Add some spacing, so that the writing doesn't overlap
plt.subplots_adjust(hspace=0.35, wspace=0.35)
# Finish
pp.savefig()
pp.close()
When I do this, I get something like the following:
Related
I have this data (df) and I get their percentages (data=rel) and plotted a stacked bar graph.
Now I want to add values (non percentage values) to the centers of each bar but from my first dataframe.
My code for now:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from csv import reader
import seaborn as sns
df = pd.DataFrame({'IL':['Balıkesir', 'Bursa', 'Çanakkale', 'Edirne', 'İstanbul', 'Kırklareli', 'Kocaeli', 'Sakarya','Tekirdağ','Yalova'],'ENGELLIUYGUN':[7,13,3,1,142,1,14,1,2,2],'ENGELLIUYGUNDEGIL':[1,5,0,0,55,0,3,0,1,0]})
iller=df.iloc[:,[0]]
df_total = df["ENGELLIUYGUN"] + df["ENGELLIUYGUNDEGIL"]
df_rel = df[df.columns[1:]].div(df_total, 0)*100
rel=[]
rel=pd.DataFrame(df_rel)
rel['İller'] = iller
d=df.iloc[:,[1]] #I want to add these values to the center of blue bars.
f=df.iloc[:,[2]] #I want to add these values to the center of green bars.
sns.set_theme (style='whitegrid')
ax=rel.plot(x='İller',kind='bar', stacked=True, color=["#3a88e2","#5c9e1e"], label=("Uygun","Uygun Değil"))
plt.legend(["Evet","Hayır"],fontsize=8, bbox_to_anchor=(1, 0.5))
plt.xlabel('...........',fontsize=12)
plt.ylabel('..........',fontsize=12)
plt.title('.............',loc='center',fontsize=14)
plt.ylim(0,100)
ax.yaxis.grid(color='gray', linestyle='dashed')
plt.show()
I have this for now:
I want the exact same style of this photo:
I am using Anaconda-Jupyter Notebook.
Answering: I want to add values (non percentage values) to the centers of each bar but from my first dataframe.
The correct way to annotate bars, is with .bar_label, as explained in this answer.
The values from df can be sent to the label= parameter instead of the percentages.
This answer shows how to succinctly calculate the percentages, but plots the counts and annotates with percentage and value, whereas this OP wants to plot the percentage on the y-axis and annotate with counts.
This answer shows how to place the legend at the bottom of the plot.
This answer shows how to format the axis tick labels as percent.
See pandas.DataFrame.plot for an explanation of the available parameters.
I am using Anaconda-Jupyter Notebook. Everything from the comment, # plot percent; ..., should be in the same notebook cell.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2
import pandas as pd
import matplotlib.ticker as tkr
# sample data
df = pd.DataFrame({'IL': ['Balıkesir', 'Bursa', 'Çanakkale', 'Edirne', 'İstanbul', 'Kırklareli', 'Kocaeli', 'Sakarya','Tekirdağ','Yalova'],
'ENGELLIUYGUN': [7, 13, 3, 1, 142, 1, 14, 1, 2, 2],
'ENGELLIUYGUNDEGIL': [1, 5, 0, 0, 55, 0, 3, 0, 1, 0]})
# set IL as the index
df = df.set_index('IL')
# calculate the percent
per = df.div(df.sum(axis=1), axis=0).mul(100)
# plot percent; adjust rot= for the rotation of the xtick labels
ax = per.plot(kind='bar', stacked=True, figsize=(10, 8), rot=0,
color=['#3a88e2', '#5c9e1e'], yticks=range(0, 101, 10),
title='my title', ylabel='', xlabel='')
# move the legend
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=2, frameon=False)
# format the y-axis tick labels
ax.yaxis.set_major_formatter(tkr.PercentFormatter())
# iterate through the containers
for c in ax.containers:
# get the current segment label (a string); corresponds to column / legend
col = c.get_label()
# use label to get the appropriate count values from df
# customize the label to account for cases when there might not be a bar section
labels = [v if v > 0 else '' for v in df[col]]
# the following will also work
# labels = df[col].replace(0, '')
# add the annotation
ax.bar_label(c, labels=labels, label_type='center', fontweight='bold')
Alternate Annotation Implementation
Since the column names in df and per are the same, they can be extracted directly from per.
# iterate through the containers and per column names
for c, col in zip(ax.containers, per):
# add the annotations with custom labels from df
ax.bar_label(c, labels=df[col].replace(0, ''), label_type='center', fontweight='bold')
I don't think any subtle method exist. So you have to print those yourself, adding explicitly text. Which is not that hard to do. For example, if you add this just after your plot
for i in range(len(d)):
ax.text(i, df_rel.iloc[i,0]/2, d.iloc[i,0], ha='center', fontweight='bold', color='#ffff00', fontsize='small')
ax.text(i, 50+df_rel.iloc[i,0]/2, f.iloc[i,0], ha='center', fontweight='bold', color='#400040', fontsize='small')
you get this result
You can of course change color, size, position, etc. (I am well known for by total lack of bon goût for those matter). But also decide some arbitrary rule, such as not printing '0' (that the advantage of doing things explicitly: your code, your rule; you don't have to fight an existing API to convince it to do it your way).
I have a pandas dataframe which I used the pandas.plot function to plot a bar chart. Within the function I set the table function to on. How can I format the values in this accompanying table with comma separators?
I am able to do these to the axis values, just not the accompanying table
I have already tried converting the values to float, but pandas plot only plots integers and therefore gives an error saying 'Empty Dataframe': no numeric data to plot.
ax1 = mydf.plot(kind='bar', title= chart with table, fontsize=8, width=0.75, legend=True, table=True)
ax1.legend(loc=5, bbox_to_anchor=(1.25,0.5), fontsize='x-small')
ax1.axes.get_xaxis().set_visible(False)
ax1.get_yaxis().get_major_formatter().set_scientific(False)
ax1.get_yaxis().set_major_formatter(ticker.StrMethodFormatter('${x:,.0f}'))
ax1.set_ylim(-10000000,10000000)
ax1.set_ylabel("P&L",fontsize=9)
ax1.axhline(0,0,1, color='k', linewidth=0.5)
table_ax1 = ax1.tables[0]
table_ax1.auto_set_font_size(False)
table_ax1.set_fontsize('8')
table_ax1.scale(1,2)
plt.tight_layout()
I don't know of a great way to force that formatting on the table ahead of time, without you explicitly making the matplotlib table yourself, however, you could iterate through the contents of the table and convert them this way instead (if you need to use the pandas implementation). I added some code in here from this related question demonstrating how the table can be manipulated as well.
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1, figsize = (5,5))
df= pd.DataFrame({'City': ['LA', 'SF', 'Dallas'],
'Lakes': [10000, 90000, 600000], # lets set some that can be comma formatted
'Rivers': [1, 0, 0],
'State': ['CA', 'CA', 'TX'],
'Waterfalls': [200500, 450000, 50000]})
myplot = df.plot(x=['City','State'],kind='bar',stacked='True',table=True, ax =ax)
### you can also scale and change the table
### see https://stackoverflow.com/questions/39668665/format-a-table-that-was-added-to-a-plot-using-pandas-dataframe-plot
myplot.axes.get_xaxis().set_visible(False)
# Getting the table created by pandas and matplotlib
table = myplot.tables[0]
# Setting the font size
table.set_fontsize(12)
# Rescaling the rows to be more readable
table.scale(1,2)
## to format the table values I will retrieve them and change them when they aren't text labels
xtls = ax.get_xticklabels()
xtls = [i.get_text() for i in xtls]
ax.set_xticklabels([])
t = ax.tables[0]
for c in t.get_children():
tobj = c.get_text()
text = tobj.get_text()
if text not in xtls:
try: # some texts will be strings that are labels, we can't convert them
s = '{:0,d}'.format(int(text))
tobj.set_text(s)
except:
pass
If I draw the plot using the following code, it works and I can see all the subplots in a single row. I can specifically break the number of cols into three or two and show them. But I have 30 columns and I wanted to use a loop mechanism so that they are plotted in a grid of say 4x4 sub-plots
regressionCols = ['col_a', 'col_b', 'col_c', 'col_d', 'col_e']
sns.pairplot(numerical_df, x_vars=regressionCols, y_vars='price',height=4, aspect=1, kind='scatter')
plt.show()
The code using loop is below. However, I don't see anything rendered.
nr_rows = 4
nr_cols = 4
li_cat_cols = list(regressionCols)
fig, axs = plt.subplots(nr_rows, nr_cols, figsize=(nr_cols*4,nr_rows*4), squeeze=False)
for r in range(0, nr_rows):
for c in range(0,nr_cols):
i = r*nr_cols+c
if i < len(li_cat_cols):
sns.set(style="darkgrid")
bp=sns.pairplot(numerical_df, x_vars=li_cat_cols[i], y_vars='price',height=4, aspect=1, kind='scatter')
bp.set(xlabel=li_cat_cols[i], ylabel='Price')
plt.tight_layout()
plt.show()
Not sure what I am missing.
I think you didnt connect each of your subplot spaces in a matrix plot to scatter plots generated in a loop.
Maybe this solution with inner pandas plots could be proper for you:
For example,
1.Lets simply define an empty pandas dataframe.
numerical_df = pd.DataFrame([])
2. Create some random features and price depending on them:
numerical_df['A'] = np.random.randn(100)
numerical_df['B'] = np.random.randn(100)*10
numerical_df['C'] = np.random.randn(100)*-10
numerical_df['D'] = np.random.randn(100)*2
numerical_df['E'] = 20*(np.random.randn(100)**2)
numerical_df['F'] = np.random.randn(100)
numerical_df['price'] = 2*numerical_df['A'] +0.5*numerical_df['B'] - 9*numerical_df['C'] + numerical_df['E'] + numerical_df['D']
3. Define number of rows and columns. Create a subplots space with nr_rows and nr_cols.
nr_rows = 2
nr_cols = 4
fig, axes = plt.subplots(nrows=nr_rows, ncols=nr_cols, figsize=(15, 8))
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
4. Enumerate each feature in dataframe and plot a scatterplot with price:
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
where axes[idx // 4, idx % 4] defines the location of each scatterplot in a matrix you create in (3.)
So, we got a matrix plot:
Scatterplot matrix
I have a code in python 3.x which uses matplotlib.
colLabels = ["Name", "Number"]
data = [["Peter", 17], ["Sara", 21], ["John", 33]]
the_table = ax.table(cellText=data,
colLabels=colLabels,
loc='center')
plt.pause(0.1)
The above code is in a loop, now I want to search for the row with "Peter" in first column (it's unique) and edit it so that in every iteration the entry in second column changes. I could clear whole ax and add new table but it's inefficient (I would be redrawing table with multiple rows every 0.1s)
Is there a way to edit this in matplotlib (and how) or should I use some other library (which)?
The text in a matplotlib table can be updated by chosing the cell and set the text of the cell's _text attribute. E.g.
the_table.get_celld()[(2, 1)].get_text().set_text("new text")
will update the cell in the third row and second column.
An animated example:
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
fig, ax = plt.subplots(figsize=(4,2))
colLabels = ["Name", "Number"]
data = [["Peter", 1], ["Sara", 1], ["John", 1]]
the_table = ax.table(cellText=data,
colLabels=colLabels,
loc='center')
def update(i):
the_table.get_celld()[(1, 1)].get_text().set_text(str(i))
the_table.get_celld()[(2, 1)].get_text().set_text(str(i*2))
the_table.get_celld()[(3, 1)].get_text().set_text(str(i*3))
ani = FuncAnimation(fig, update, frames=20, interval=400)
plt.show()
Finding out which cell needs to be updated, would probably best be done using the data instead of reading it from the table.
inx = list(zip(*data))[0].index("Peter")
gives you the index 0, such that the cell can be accessed via
the_table.get_celld()[(inx+1, 1)] (note the +1, which is there because of the table headline).
I'm working with a dictionary of values which have a string (date) and float for time in milliseconds. I want to present the data in a bar graph and also with a table below. I have the bar graph working but the table gets messed up. I want the dates as columns and time as a single row.
The dictionary is something like:
time_and_dates_for_plot = {'04-26': 488.1063166666667, '04-27': 289.7289333333333, '04-28': 597.2343999999999, '04-29': 0, '04-30': 0, '05-01': 1061.958075}
plot.bar(range(len(time_and_dates_for_plot)), time_and_dates_for_plot.values(), align='center')
plot.xticks(range(len(time_and_dates_for_plot)), list(time_and_dates_for_plot.keys()))
plot.xlabel('Date (s)')
plot.ylabel('milliseconds')
plot.grid(True)
plot.gca().set_position((.1, .3, .8, .6))
col_labels = list(time_and_dates_for_plot.keys())
print(col_labels)
row_labels = ['ms']
cell_text = []
val = []
for key in time_and_dates_for_plot.keys():
val.append((time_and_dates_for_plot.get(key)))
cell_text.append(val)
val = []
print(cell_text)
plot.table(cellText=cell_text, colLabels=col_labels)
plot.show()
As you can see from the picture, I get all entries under one column where as I want something like one cell data under one coloumn (just tabulate plot data).
Also, how do I add some padding between the table and graph?
First time I'm using matplotlib and pretty sure I'm missing something. Any help is really appreciated.
In the table function you need an extra pair of brackets []. ...cellText=[cell_text]...
Also, you can use subplots to have a better arrangement of the plots. Here, my solution uses subplots of 2 rows withheight_ratiosof 8 to 1, and ahspace` pf 0.3
import matplotlib as mpl
import matplotlib.pyplot as plt
time_and_dates_for_plot = {'04-26': 488.1063166666667,
'04-27': 289.7289333333333,
'04-28': 597.2343999999999,
'04-29': 0,
'04-30': 0,
'05-01': 1061.958075}
fig,axs = plt.subplots(figsize=(8,5),ncols=1,nrows=2,
gridspec_kw={'height_ratios':[8,1],'hspace':0.3})
ax = axs[0]
ax.bar(range(len(time_and_dates_for_plot)),
time_and_dates_for_plot.values(), align='center')
ax.set_xticks(range(len(time_and_dates_for_plot)),
list(time_and_dates_for_plot.keys()))
ax.set_xlabel('Date (s)')
ax.set_ylabel('milliseconds')
ax.grid(True)
col_labels = list(time_and_dates_for_plot.keys())
row_labels = ['ms']
cell_text = []
for key in time_and_dates_for_plot.keys():
cell_text += [time_and_dates_for_plot[key]]
ax = axs[1]
ax.set_frame_on(False) # turn off frame for the table subplot
ax.set_xticks([]) # turn off x ticks for the table subplot
ax.set_yticks([]) # turn off y ticks for the table subplot
ax.table(cellText=[cell_text], colLabels=col_labels, loc='upper center')
plt.show()
The output looks like:
** UPDATE **
Using only one subplot, no xticklabels, sorted dates, nicer numbers with %g, and larger table cells using bbox :
import matplotlib as mpl
import matplotlib.pyplot as plt
time_and_dates_for_plot = {'04-26': 488.1063166666667,
'04-27': 289.7289333333333,
'04-28': 597.2343999999999,
'04-29': 0,
'04-30': 0,
'05-01': 1061.958075}
N = len(time_and_dates_for_plot)
colLabels = sorted(time_and_dates_for_plot.keys())
fig,ax = plt.subplots()
aa = ax.bar(range(N),[time_and_dates_for_plot[x] for x in colLabels],
align='center')
ax.set_xlabel('Date')
ax.set_ylabel('milliseconds')
ax.set_xticklabels([]) # turn off x ticks
ax.grid(True)
fig.subplots_adjust(bottom=0.25) # making some room for the table
cell_text = []
for key in colLabels:
cell_text += ["%g"%time_and_dates_for_plot[key]]
ax.table(cellText=[cell_text], colLabels=colLabels,
rowLabels=['ms'],cellLoc='center',
bbox=[0, -0.27, 1, 0.15])
ax.set_xlim(-0.5,N-0.5) # Helps having bars aligned with table columns
ax.set_title("milliseconds vs Date")
fig.savefig("Bar_graph.png")
plt.show()
Output:
** Update: Making room for the table using subplots_adjust **