Using `matplotlib` to plot - <Figure size 432x288 with 0 Axes>

Using `matplotlib` to plot - <Figure size 432x288 with 0 Axes> - python

I've a huge data set with 158 columns and 3.1 million rows. I'm trying to plot univariate distibutions for that data set. Code is as given below.
dtf = pd.read_csv('hackathon_train_data1.csv')
dtf.head()
dtf.columns
Output was:
Index(['visit_id', 'cod_order_nbr', 'cod_orig_ord_nbr', 'src_bu_id',
'int_ref_nbr', 'cod_orig_bu_id', 'cod_src_bu_id', 'onln_flg',
'sohf_ord_dt', 'cod_init',
...
'csat_guid_v42', 'visit_num', 'chat_drawer_rightrail_open',
'chat_unavailable', 'chat_portal', 'ishmximpressions', 'pagination_c40',
'chat_intent_flag', 'coupon_code_stp_v96', 'isbreadcrumbhit_flg'],
dtype='object', length=157)
Then I assigned the one of the column names to y and plotted the graph. Column cod_flg has only 2 entries, 0 and 1.
y = "cod_flg"
ax = dtf[y].value_counts().sort_values().plot(kind="barh")
Output was:
Then I tried to refine it as,
totals= []
for i in ax.patches:
totals.append(i.get_width())
total = sum(totals)
for i in ax.patches:
ax.text(i.get_width()+.3, i.get_y()+.20,
str(round((i.get_width()/total)*100, 2))+'%',
fontsize=10, color='black')
ax.grid(axis="x")
plt.suptitle(y, fontsize=20)
plt.show()
It threw me this error:
Figure size 432x288 with 0 Axes
Do I need to modify this line? ax.text(i.get_width()+.3, i.get_y()+.20, str(round((i.get_width()/total)*100, 2))+'%', fontsize=10, color='black')

try without plt.show() , if you are using Google Colab

Related

python - matplot lib sub-plot grid: where to insert row/column arguments

I'm trying to display the topic extraction results of an LDA text analysis across several data sets in the form of a matplotlib subplot.
Here's where I'm at:
I think my issue is my unfamiliarity with matplotlib. I have done all my number crunching ahead of time so that I can focus on how to plot the data:
top_words_master = []
top_weights_master = []
for i in range(len(tf_list)):
tf = tf_vectorizer.fit_transform(tf_list[i])
lda.fit(tf)
n_top_words = 20
tf_feature_names = tf_vectorizer.get_feature_names_out()
top_features_ind = lda.components_[0].argsort()[: -n_top_words - 1 : -1]
top_features = [tf_feature_names[i] for i in top_features_ind]
weights = lda.components_[0][top_features_ind]
top_words_master.append(top_features)
top_weights_master.append(weights)
This gives me my words and my weights (the x axis values) to make my sub-plot matrix of row/bar charts.
My attempt to construct this via matplot lib:
fig, axes = plt.subplots(2, 5, figsize=(30, 15), sharex=True)
plt.subplots_adjust(hspace=0.5)
fig.suptitle("Topics in LDA Model", fontsize=18, y=0.95)
axes = axes.flatten()
for i in range(len(tf_list)):
ax = axes[i]
ax.barh(top_words_master[i], top_weights_master[i], height=0.7)
ax.set_title(topic_map[f"Topic {i +1}"], fontdict={"fontsize": 30})
ax.invert_yaxis()
ax.tick_params(axis="both", which="major", labelsize=20)
for j in "top right left".split():
ax.spines[j].set_visible(False)
fig.suptitle("Topics in LDA Model", fontsize=40)
plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
plt.show()
However, it only showed one, the first one. For the remaining 6 data sets it just printed:
<Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes> <Figure size 432x288 with 0 Axes>
Question
I've been at this for days. I feel I'm close, but this kind of result is really puzzling me, anyone have a solution or able to point me in the right direction?

As far as I understood from your question, your problem is to get the right indices for your subplots.
In your case, you have an array range(len(tf_list)) to index your data, some data (e.g. top_words_master[i]) to plot, and a figure with 10 subplots (rows=2,cols=5). For example, if you want to plot the 7th item (i=6) of your data, the indices of ax would be axes[1,1].
In order to get the correct indices for the subplot axes, you can use numpy.unravel_index. And, of course, you should not flatten your axes.
import matplotlib.pyplot as plt
import numpy as np
# dummy function
my_func = lambda x: np.random.random(x)
x_max = 100
# fig properties
rows = 2
cols = 5
fig, axes = plt.subplots(rows,cols,figsize=(30, 15), sharex=True)
for i in range(rows*cols):
ax_i = np.unravel_index(i,(rows,cols))
axes[ax_i[0],ax_i[1]].barh(np.arange(x_max),my_func(x_max), height=0.7)
plt.show()

You should create the figure first:
def top_word_comparison(axes, model, feature_names, n_top_words):
for topic_idx, topic in enumerate(model.components_):
top_features_ind = topic.argsort()[: -n_top_words - 1 : -1]
top_features = [feature_names[i] for i in top_features_ind]
weights = topic[top_features_ind]
ax = axes[topic_idx]
ax.barh(top_features, weights, height=0.7)
ax.set_title(topic_map[f"Topic {topic_idx +1}"], fontdict={"fontsize": 30})
ax.invert_yaxis()
ax.tick_params(axis="both", which="major", labelsize=20)
for i in "top right left".split():
ax.spines[i].set_visible(False)
tf_list = [cm_array, xb_array]
fig, axes = plt.subplots(len(tf_list), 5, figsize=(30, 15), sharex=True)
fig.suptitle("Topics in LDA model", fontsize=40)
for i in range(enumerate(tf_list)):
tf = tf_vectorizer.fit_transform(tf_list[i])
n_components = 1
lda.fit(tf)
n_top_words = 20
tf_feature_names = tf_vectorizer.get_feature_names_out()
top_word_comparison(axes[i], lda, tf_feature_names, n_top_words)
plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
plt.show()

How to add legend with labels of IDs to my code

I have the electricity consumption of 25 houses, and Im doing K-Means clustering on the dataset that holds those houses. After importing the dataset, pre-processing it, and applying K-Means with K=2, I plotted the data but when Im adding the legend I`m getting this:
No handles with labels found to put in legend.
No error in the code and it is running but I want my code to generate automatic legends that hold the ID of each house starting from 0 to 24.
Here is my code where I`m plotting the data:
plt.figure(figsize=(13,13))
import itertools
marker = itertools.cycle(('+', 'o', '*' , 'X', 's','8','>','1','<'))
for cluster_index in [0,1]:
plt.subplot(2,1,cluster_index + 1)
for index, row in data1.iterrows():
if row.iloc[-1] == cluster_index:
plt.plot(row.iloc[1:-1] ,marker = next(marker) , alpha=1)
plt.legend(loc="right")
plt.plot(kmeans.cluster_centers_[cluster_index], color='k' ,marker='o', alpha=1)
ax = plt.gca()
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
plt.xticks(rotation="vertical")
plt.ylabel('Monthly Mean Consumption 2018-2019', fontsize=10)
plt.title(f'Cluster {cluster_index}', fontsize=15)
plt.tight_layout()
plt.show()
plt.close()
I just want to have the legend in the output figure with the id of each house, please any help

As I do not have your data, I can not test it in a plot right now, but I assume the problem comes from not passing a label argument to plt.plot i.e.:
for index, row in data1.iterrows():
if row.iloc[-1] == cluster_index:
plt.plot(row.iloc[1:-1] ,marker = next(marker), alpha=1, label=index)
plt.legend(loc="right")

pandas dataframe recession highlighting plot

I have a pandas dataframe as shown in the figure below which has index as yyyy-mm,
US recession period (USREC) and timeseries varaible M1. Please see table below
Date USREC M1
2000-12 1088.4
2001-01 1095.08
2001-02 1100.58
2001-03 1108.1
2001-04 1 1116.36
2001-05 1 1117.8
2001-06 1 1125.45
2001-07 1 1137.46
2001-08 1 1147.7
2001-09 1 1207.6
2001-10 1 1166.64
2001-11 1 1169.7
2001-12 1182.46
2002-01 1190.82
2002-02 1190.43
2002-03 1194.85
2002-04 1186.82
2002-05 1186.9
2002-06 1194.55
2002-07 1199.26
2002-08 1183.7
2002-09 1197.1
2002-10 1203.47
I want to plot a chart in python that looks like the attached chart which was created in excel..
I have searched for various examples online, but none are able to show the chart like below. Can you please help? Thank you.
I would appreciate if there is any easier to use plotting library which has few inputs but easy to use for majority of plots similar to plots excel provides.
EDIT:
I checked out the example in the page https://matplotlib.org/examples/pylab_examples/axhspan_demo.html. The code I have used is below.
fig, axes = plt.subplots()
df['M1'].plot(ax=axes)
ax.axvspan(['USREC'],color='grey',alpha=0.5)
So I didnt see in any of the examples in the matplotlib.org webpage where I can input another column as axvspan range. In my code above I get the error
TypeError: axvspan() missing 1 required positional argument: 'xmax'

I figured it out. I created secondary Y axis for USREC and hid the axis label just like I wanted to, but it also hid the USREC from the legend. But that is a minor thing.
def plot_var(y1):
fig0, ax0 = plt.subplots()
ax1 = ax0.twinx()
y1.plot(kind='line', stacked=False, ax=ax0, color='blue')
df['USREC'].plot(kind='area', secondary_y=True, ax=ax1, alpha=.2, color='grey')
ax0.legend(loc='upper left')
ax1.legend(loc='upper left')
plt.ylim(ymax=0.8)
plt.axis('off')
plt.xlabel('Date')
plt.show()
plt.close()
plot_var(df['M1'])

There is a problem with Zenvega's answer: The recession lines are not vertical, as they should be. What exactly goes wrong, I am not entirely sure, but I show below how to get vertical lines.
My answer uses the following syntax ax.fill_between(date_index, y1=ymin, y2=ymax, where=True/False), where I compute the y1 and y2 arguments manually from the axis object and where the where argument takes the recession data as a boolean of True or False values.
import pandas as pd
import matplotlib.pyplot as plt
# get data: see further down for `string_data`
df = pd.read_csv(string_data, skipinitialspace=True)
df['Date'] = pd.to_datetime(df['Date'])
# convenience function
def plot_series(ax, df, index='Date', cols=['M1'], area='USREC'):
# convert area variable to boolean
df[area] = df[area].astype(int).astype(bool)
# set up an index based on date
df = df.set_index(keys=index, drop=False)
# line plot
df.plot(ax=ax, x=index, y=cols, color='blue')
# extract limits
y1, y2 = ax.get_ylim()
ax.fill_between(df[index].index, y1=y1, y2=y2, where=df[area], facecolor='grey', alpha=0.4)
return ax
# set up figure, axis
f, ax = plt.subplots()
plot_series(ax, df)
ax.grid(True)
plt.show()
# copy-pasted data from OP
from io import StringIO
string_data=StringIO("""
Date,USREC,M1
2000-12,0,1088.4
2001-01,0,1095.08
2001-02,0,1100.58
2001-03,0,1108.1
2001-04,1,1116.36
2001-05,1,1117.8
2001-06,1,1125.45
2001-07,1,1137.46
2001-08,1,1147.7
2001-09,1,1207.6
2001-10,1,1166.64
2001-11,1,1169.7
2001-12,0,1182.46
2002-01,0,1190.82
2002-02,0,1190.43
2002-03,0,1194.85
2002-04,0,1186.82
2002-05,0,1186.9
2002-06,0,1194.55
2002-07,0,1199.26
2002-08,0,1183.7
2002-09,0,1197.1
2002-10,0,1203.47""")
# after formatting, the data would look like this:
>>> df.head(2)
Date USREC M1
Date
2000-12-01 2000-12-01 False 1088.40
2001-01-01 2001-01-01 False 1095.08
See how the lines are vertical:
An alternative approach would be to use plt.axvspan() which would automatically calculate the y1 and y2values.

How to set marker color separately in matplotlib

Having a problem with 2D lines. What i want to do is to plot a chart that would have such info:
TEST1 1 0 0
TEST2 1 1 1
TEST3 0 0 1
Where when there is 1 in results so there should be a green bubble and when zero - red one.
I have tried(given example of matplotlib) :
http://matplotlib.org/examples/lines_bars_and_markers/marker_fillstyle_reference.html
However, i could not understand how to plot each bubble depending on its value.
Tried to make 3 different arrays and do offset on X axis, but somehow it did not work... Any thoughts how can i do this?
Adding some code that i tried to write:
from matplotlib.lines import Line2D
points = np.ones(2) # Draw 3 points for each line
print points
text_style = dict(horizontalalignment='right', verticalalignment='center',
fontsize=16, fontdict={'family': 'monospace'})
marker_style_red = dict(color='red', linestyle=':', marker='o',
markersize=20, markerfacecoloralt='gray')
marker_style_green = dict(color='green', linestyle=':', marker='o',
markersize=20, markerfacecoloralt='gray')
def format_axes(ax):
ax.margins(0.2)
ax.set_axis_off()
def nice_repr(text):
return repr(text).lstrip('u')
fig, ax = plt.subplots()
y_ax = 1
for y in data2:
ax.text(-0.5, y_ax, nice_repr(y[6]) , **text_style)
if y[0] == 0:
ax.plot(y_ax * points[1], fillstyle='full', **marker_style_red)
elif y[0] == 1:
ax.plot(y_ax * points[1], fillstyle='full', **marker_style_green)
y_ax += 1
format_axes(ax)
plt.show()
But this does not work as bubbles is not changing according to values in array.
Adding a photo how should i want to look it like:
And don't know how to have my array data as tabular form but with bubbles that colors according to data. Should i create lots of 1D lines and offset? If i use scatter , it might work for bubbles ,but how to show all other data next to bubbles? Or should i use beautifulsoup command and somehow add css to show tab as bubble?

Increasing Number of Rows in Python 2D Heatmap

I am plotting python heatmap. Following is the code I am using.
df = pd.read_csv('Book1.csv', index_col=0)
print (df)
# plotting
fig,ax = plt.subplots()
ax.matshow(df.mask(df.isin(df.att1)!=1), cmap=cm.Reds)
ax.matshow(df.mask(df.isin(df.att2)!=1), cmap=cm.Greens)
ax.matshow(df.mask(df.isin(df.att3)!=1), cmap=cm.Blues)
#ax.matshow(df.mask(df.isin(df.att4)!=1), cmap=cm.Reds)
#ax.matshow(df.mask(df.isin(df.att5)!=1), cmap=cm.Greens)
#ax.matshow(df.mask(df.isin(df.att6)!=1), cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(864), df.index)
plt.show()
df consists of data of the form:
att1 att2 att3
fun1 1 2 1
fun2 1 0 0
.......
....
This code is working fine if I used few rows like 10-12. Giving following output:
However if I increase number of rows equal to 800 the graph is looking like garbage. Following is the output:
Image after removing plt.yticks:
Does anyone have any idea, how can I increase number of rows in this kind of heatmap?

You can use aspect option, but for the y-ticks, I'm not sure it's useful to have them all since it will be unreadable, but you can set some :
fig, ax = plt.subplots(figsize=(6, 10))
ax.matshow(df.mask(df.isin(df.att1)!=True), cmap=cm.Reds, aspect='auto')
ax.matshow(df.mask(df.isin(df.att2)!=True), cmap=cm.Greens, aspect='auto')
ax.matshow(df.mask(df.isin(df.att3)!=True), cmap=cm.Blues, aspect='auto')
plt.xticks(range(3), df.columns)
plt.yticks(range(0, 800, 100), df.index[::100])
Hope this helps

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using `matplotlib` to plot - <Figure size 432x288 with 0 Axes> - python

try without plt.show() , if you are using Google Colab

Related

python - matplot lib sub-plot grid: where to insert row/column arguments

How to add legend with labels of IDs to my code

pandas dataframe recession highlighting plot

How to set marker color separately in matplotlib

Increasing Number of Rows in Python 2D Heatmap

Categories

Resources