Using `matplotlib` to plot - <Figure size 432x288 with 0 Axes> - python

I've a huge data set with 158 columns and 3.1 million rows. I'm trying to plot univariate distibutions for that data set. Code is as given below.
dtf = pd.read_csv('hackathon_train_data1.csv')
dtf.head()
dtf.columns
Output was:
Index(['visit_id', 'cod_order_nbr', 'cod_orig_ord_nbr', 'src_bu_id',
'int_ref_nbr', 'cod_orig_bu_id', 'cod_src_bu_id', 'onln_flg',
'sohf_ord_dt', 'cod_init',
...
'csat_guid_v42', 'visit_num', 'chat_drawer_rightrail_open',
'chat_unavailable', 'chat_portal', 'ishmximpressions', 'pagination_c40',
'chat_intent_flag', 'coupon_code_stp_v96', 'isbreadcrumbhit_flg'],
dtype='object', length=157)
Then I assigned the one of the column names to y and plotted the graph. Column cod_flg has only 2 entries, 0 and 1.
y = "cod_flg"
ax = dtf[y].value_counts().sort_values().plot(kind="barh")
Output was:
Then I tried to refine it as,
totals= []
for i in ax.patches:
totals.append(i.get_width())
total = sum(totals)
for i in ax.patches:
ax.text(i.get_width()+.3, i.get_y()+.20,
str(round((i.get_width()/total)*100, 2))+'%',
fontsize=10, color='black')
ax.grid(axis="x")
plt.suptitle(y, fontsize=20)
plt.show()
It threw me this error:
Figure size 432x288 with 0 Axes
Do I need to modify this line? ax.text(i.get_width()+.3, i.get_y()+.20, str(round((i.get_width()/total)*100, 2))+'%', fontsize=10, color='black')

try without plt.show() , if you are using Google Colab

Related

python - matplot lib sub-plot grid: where to insert row/column arguments

I'm trying to display the topic extraction results of an LDA text analysis across several data sets in the form of a matplotlib subplot.
Here's where I'm at:
I think my issue is my unfamiliarity with matplotlib. I have done all my number crunching ahead of time so that I can focus on how to plot the data:
top_words_master = []
top_weights_master = []
for i in range(len(tf_list)):
tf = tf_vectorizer.fit_transform(tf_list[i])
lda.fit(tf)
n_top_words = 20
tf_feature_names = tf_vectorizer.get_feature_names_out()
top_features_ind = lda.components_[0].argsort()[: -n_top_words - 1 : -1]
top_features = [tf_feature_names[i] for i in top_features_ind]
weights = lda.components_[0][top_features_ind]
top_words_master.append(top_features)
top_weights_master.append(weights)
This gives me my words and my weights (the x axis values) to make my sub-plot matrix of row/bar charts.
My attempt to construct this via matplot lib:
fig, axes = plt.subplots(2, 5, figsize=(30, 15), sharex=True)
plt.subplots_adjust(hspace=0.5)
fig.suptitle("Topics in LDA Model", fontsize=18, y=0.95)
axes = axes.flatten()
for i in range(len(tf_list)):
ax = axes[i]
ax.barh(top_words_master[i], top_weights_master[i], height=0.7)
ax.set_title(topic_map[f"Topic {i +1}"], fontdict={"fontsize": 30})
ax.invert_yaxis()
ax.tick_params(axis="both", which="major", labelsize=20)
for j in "top right left".split():
ax.spines[j].set_visible(False)
fig.suptitle("Topics in LDA Model", fontsize=40)
plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
plt.show()
However, it only showed one, the first one. For the remaining 6 data sets it just printed:
<Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes>
Question
I've been at this for days. I feel I'm close, but this kind of result is really puzzling me, anyone have a solution or able to point me in the right direction?
As far as I understood from your question, your problem is to get the right indices for your subplots.
In your case, you have an array range(len(tf_list)) to index your data, some data (e.g. top_words_master[i]) to plot, and a figure with 10 subplots (rows=2,cols=5). For example, if you want to plot the 7th item (i=6) of your data, the indices of ax would be axes[1,1].
In order to get the correct indices for the subplot axes, you can use numpy.unravel_index. And, of course, you should not flatten your axes.
import matplotlib.pyplot as plt
import numpy as np
# dummy function
my_func = lambda x: np.random.random(x)
x_max = 100
# fig properties
rows = 2
cols = 5
fig, axes = plt.subplots(rows,cols,figsize=(30, 15), sharex=True)
for i in range(rows*cols):
ax_i = np.unravel_index(i,(rows,cols))
axes[ax_i[0],ax_i[1]].barh(np.arange(x_max),my_func(x_max), height=0.7)
plt.show()
You should create the figure first:
def top_word_comparison(axes, model, feature_names, n_top_words):
for topic_idx, topic in enumerate(model.components_):
top_features_ind = topic.argsort()[: -n_top_words - 1 : -1]
top_features = [feature_names[i] for i in top_features_ind]
weights = topic[top_features_ind]
ax = axes[topic_idx]
ax.barh(top_features, weights, height=0.7)
ax.set_title(topic_map[f"Topic {topic_idx +1}"], fontdict={"fontsize": 30})
ax.invert_yaxis()
ax.tick_params(axis="both", which="major", labelsize=20)
for i in "top right left".split():
ax.spines[i].set_visible(False)
tf_list = [cm_array, xb_array]
fig, axes = plt.subplots(len(tf_list), 5, figsize=(30, 15), sharex=True)
fig.suptitle("Topics in LDA model", fontsize=40)
for i in range(enumerate(tf_list)):
tf = tf_vectorizer.fit_transform(tf_list[i])
n_components = 1
lda.fit(tf)
n_top_words = 20
tf_feature_names = tf_vectorizer.get_feature_names_out()
top_word_comparison(axes[i], lda, tf_feature_names, n_top_words)
plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
plt.show()

How to add legend with labels of IDs to my code

I have the electricity consumption of 25 houses, and Im doing K-Means clustering on the dataset that holds those houses. After importing the dataset, pre-processing it, and applying K-Means with K=2, I plotted the data but when Im adding the legend I`m getting this:
No handles with labels found to put in legend.
No error in the code and it is running but I want my code to generate automatic legends that hold the ID of each house starting from 0 to 24.
Here is my code where I`m plotting the data:
plt.figure(figsize=(13,13))
import itertools
marker = itertools.cycle(('+', 'o', '*' , 'X', 's','8','>','1','<'))
for cluster_index in [0,1]:
plt.subplot(2,1,cluster_index + 1)
for index, row in data1.iterrows():
if row.iloc[-1] == cluster_index:
plt.plot(row.iloc[1:-1] ,marker = next(marker) , alpha=1)
plt.legend(loc="right")
plt.plot(kmeans.cluster_centers_[cluster_index], color='k' ,marker='o', alpha=1)
ax = plt.gca()
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
plt.xticks(rotation="vertical")
plt.ylabel('Monthly Mean Consumption 2018-2019', fontsize=10)
plt.title(f'Cluster {cluster_index}', fontsize=15)
plt.tight_layout()
plt.show()
plt.close()
I just want to have the legend in the output figure with the id of each house, please any help
As I do not have your data, I can not test it in a plot right now, but I assume the problem comes from not passing a label argument to plt.plot i.e.:
for index, row in data1.iterrows():
if row.iloc[-1] == cluster_index:
plt.plot(row.iloc[1:-1] ,marker = next(marker), alpha=1, label=index)
plt.legend(loc="right")

pandas dataframe recession highlighting plot

I have a pandas dataframe as shown in the figure below which has index as yyyy-mm,
US recession period (USREC) and timeseries varaible M1. Please see table below
Date USREC M1
2000-12 1088.4
2001-01 1095.08
2001-02 1100.58
2001-03 1108.1
2001-04 1 1116.36
2001-05 1 1117.8
2001-06 1 1125.45
2001-07 1 1137.46
2001-08 1 1147.7
2001-09 1 1207.6
2001-10 1 1166.64
2001-11 1 1169.7
2001-12 1182.46
2002-01 1190.82
2002-02 1190.43
2002-03 1194.85
2002-04 1186.82
2002-05 1186.9
2002-06 1194.55
2002-07 1199.26
2002-08 1183.7
2002-09 1197.1
2002-10 1203.47
I want to plot a chart in python that looks like the attached chart which was created in excel..
I have searched for various examples online, but none are able to show the chart like below. Can you please help? Thank you.
I would appreciate if there is any easier to use plotting library which has few inputs but easy to use for majority of plots similar to plots excel provides.
EDIT:
I checked out the example in the page https://matplotlib.org/examples/pylab_examples/axhspan_demo.html. The code I have used is below.
fig, axes = plt.subplots()
df['M1'].plot(ax=axes)
ax.axvspan(['USREC'],color='grey',alpha=0.5)
So I didnt see in any of the examples in the matplotlib.org webpage where I can input another column as axvspan range. In my code above I get the error
TypeError: axvspan() missing 1 required positional argument: 'xmax'
I figured it out. I created secondary Y axis for USREC and hid the axis label just like I wanted to, but it also hid the USREC from the legend. But that is a minor thing.
def plot_var(y1):
fig0, ax0 = plt.subplots()
ax1 = ax0.twinx()
y1.plot(kind='line', stacked=False, ax=ax0, color='blue')
df['USREC'].plot(kind='area', secondary_y=True, ax=ax1, alpha=.2, color='grey')
ax0.legend(loc='upper left')
ax1.legend(loc='upper left')
plt.ylim(ymax=0.8)
plt.axis('off')
plt.xlabel('Date')
plt.show()
plt.close()
plot_var(df['M1'])
There is a problem with Zenvega's answer: The recession lines are not vertical, as they should be. What exactly goes wrong, I am not entirely sure, but I show below how to get vertical lines.
My answer uses the following syntax ax.fill_between(date_index, y1=ymin, y2=ymax, where=True/False), where I compute the y1 and y2 arguments manually from the axis object and where the where argument takes the recession data as a boolean of True or False values.
import pandas as pd
import matplotlib.pyplot as plt
# get data: see further down for `string_data`
df = pd.read_csv(string_data, skipinitialspace=True)
df['Date'] = pd.to_datetime(df['Date'])
# convenience function
def plot_series(ax, df, index='Date', cols=['M1'], area='USREC'):
# convert area variable to boolean
df[area] = df[area].astype(int).astype(bool)
# set up an index based on date
df = df.set_index(keys=index, drop=False)
# line plot
df.plot(ax=ax, x=index, y=cols, color='blue')
# extract limits
y1, y2 = ax.get_ylim()
ax.fill_between(df[index].index, y1=y1, y2=y2, where=df[area], facecolor='grey', alpha=0.4)
return ax
# set up figure, axis
f, ax = plt.subplots()
plot_series(ax, df)
ax.grid(True)
plt.show()
# copy-pasted data from OP
from io import StringIO
string_data=StringIO("""
Date,USREC,M1
2000-12,0,1088.4
2001-01,0,1095.08
2001-02,0,1100.58
2001-03,0,1108.1
2001-04,1,1116.36
2001-05,1,1117.8
2001-06,1,1125.45
2001-07,1,1137.46
2001-08,1,1147.7
2001-09,1,1207.6
2001-10,1,1166.64
2001-11,1,1169.7
2001-12,0,1182.46
2002-01,0,1190.82
2002-02,0,1190.43
2002-03,0,1194.85
2002-04,0,1186.82
2002-05,0,1186.9
2002-06,0,1194.55
2002-07,0,1199.26
2002-08,0,1183.7
2002-09,0,1197.1
2002-10,0,1203.47""")
# after formatting, the data would look like this:
>>> df.head(2)
Date USREC M1
Date
2000-12-01 2000-12-01 False 1088.40
2001-01-01 2001-01-01 False 1095.08
See how the lines are vertical:
An alternative approach would be to use plt.axvspan() which would automatically calculate the y1 and y2values.

How to set marker color separately in matplotlib

Having a problem with 2D lines. What i want to do is to plot a chart that would have such info:
TEST1 1 0 0
TEST2 1 1 1
TEST3 0 0 1
Where when there is 1 in results so there should be a green bubble and when zero - red one.
I have tried(given example of matplotlib) :
http://matplotlib.org/examples/lines_bars_and_markers/marker_fillstyle_reference.html
However, i could not understand how to plot each bubble depending on its value.
Tried to make 3 different arrays and do offset on X axis, but somehow it did not work... Any thoughts how can i do this?
Adding some code that i tried to write:
from matplotlib.lines import Line2D
points = np.ones(2) # Draw 3 points for each line
print points
text_style = dict(horizontalalignment='right', verticalalignment='center',
fontsize=16, fontdict={'family': 'monospace'})
marker_style_red = dict(color='red', linestyle=':', marker='o',
markersize=20, markerfacecoloralt='gray')
marker_style_green = dict(color='green', linestyle=':', marker='o',
markersize=20, markerfacecoloralt='gray')
def format_axes(ax):
ax.margins(0.2)
ax.set_axis_off()
def nice_repr(text):
return repr(text).lstrip('u')
fig, ax = plt.subplots()
y_ax = 1
for y in data2:
ax.text(-0.5, y_ax, nice_repr(y[6]) , **text_style)
if y[0] == 0:
ax.plot(y_ax * points[1], fillstyle='full', **marker_style_red)
elif y[0] == 1:
ax.plot(y_ax * points[1], fillstyle='full', **marker_style_green)
y_ax += 1
format_axes(ax)
plt.show()
But this does not work as bubbles is not changing according to values in array.
Adding a photo how should i want to look it like:
And don't know how to have my array data as tabular form but with bubbles that colors according to data. Should i create lots of 1D lines and offset? If i use scatter , it might work for bubbles ,but how to show all other data next to bubbles? Or should i use beautifulsoup command and somehow add css to show tab as bubble?

Increasing Number of Rows in Python 2D Heatmap

I am plotting python heatmap. Following is the code I am using.
df = pd.read_csv('Book1.csv', index_col=0)
print (df)
# plotting
fig,ax = plt.subplots()
ax.matshow(df.mask(df.isin(df.att1)!=1), cmap=cm.Reds)
ax.matshow(df.mask(df.isin(df.att2)!=1), cmap=cm.Greens)
ax.matshow(df.mask(df.isin(df.att3)!=1), cmap=cm.Blues)
#ax.matshow(df.mask(df.isin(df.att4)!=1), cmap=cm.Reds)
#ax.matshow(df.mask(df.isin(df.att5)!=1), cmap=cm.Greens)
#ax.matshow(df.mask(df.isin(df.att6)!=1), cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(864), df.index)
plt.show()
df consists of data of the form:
att1 att2 att3
fun1 1 2 1
fun2 1 0 0
.......
....
This code is working fine if I used few rows like 10-12. Giving following output:
However if I increase number of rows equal to 800 the graph is looking like garbage. Following is the output:
Image after removing plt.yticks:
Does anyone have any idea, how can I increase number of rows in this kind of heatmap?
You can use aspect option, but for the y-ticks, I'm not sure it's useful to have them all since it will be unreadable, but you can set some :
fig, ax = plt.subplots(figsize=(6, 10))
ax.matshow(df.mask(df.isin(df.att1)!=True), cmap=cm.Reds, aspect='auto')
ax.matshow(df.mask(df.isin(df.att2)!=True), cmap=cm.Greens, aspect='auto')
ax.matshow(df.mask(df.isin(df.att3)!=True), cmap=cm.Blues, aspect='auto')
plt.xticks(range(3), df.columns)
plt.yticks(range(0, 800, 100), df.index[::100])
Hope this helps

Categories