I am trying to make subplots from multiple columns of a pandas dataframe. Following code is somehow working, but I would like to improve it by moving all the legends to outside of plots (to the right) and add est_fmc variable to each plot.
L = new_df_honeysuckle[["Avg_1h_srf_mc", "Avg_1h_prof_mc", "Avg_10h_fuel_stick", "Avg_100h_debri_mc", "Avg_Daviesia_mc",
"Avg_Euclaypt_mc", "obs_fmc_average", "obs_fmc_max", "est_fmc"]].resample("1M").mean().interpolate().plot(figsize=(10,15),
subplots=True, linewidth = 3, yticks = (0, 50, 100, 150, 200))
plt.legend(loc='center left', markerscale=6, bbox_to_anchor=(1, 0.4))
Any help highly appreciated.
Since the plotting function of pandas does not allow for fine control, it is easiest to use the subplotting function of mpl and handle it through loop processing.' It was unclear whether you wanted to add the 'est_fmc' line or annotate it, so I added the line. For annotations, see this.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import numpy as np
import itertools
columns = ["Avg_1h_srf_mc", "Avg_1h_prof_mc", "Avg_10h_fuel_stick", "Avg_100h_debri_mc", "Avg_Daviesia_mc", "Avg_Euclaypt_mc", "obs_fmc_average", "obs_fmc_max",'est_fmc']
date_rng = pd.date_range('2017-01-01','2020-02-01', freq='1m')
df = pd.DataFrame({'date':pd.to_datetime(date_rng)})
for col in columns:
tmp = np.random.randint(0,200,(37,))
df = pd.concat([df, pd.Series(tmp, name=col, index=df.index)], axis=1)
fig, axs = plt.subplots(len(cols[:-1]), 1, figsize=(10,15), sharex=True)
fig.subplots_adjust(hspace=0.5)
colors = mcolors.TABLEAU_COLORS
for i,(col,cname) in enumerate(zip(columns[:-1], itertools.islice(colors.keys(),9))):
axs[i].plot(df['date'], df[col], label=col, color=cname)
axs[i].plot(df['date'], df['est_fmc'], label='est_fmc', color='tab:olive')
axs[i].set_yticks([0, 50, 100, 150, 200])
axs[i].grid()
axs[i].legend(loc='upper left', bbox_to_anchor=(1.02, 1.0))
plt.show()
Related
I need help creating subplots in matplotlib dynamically from a pandas dataframe.
The data I am using is from data.word.
I have already created the viz but the plots have been created manually.
The reason why I need it dynamically is because I am going to apply a filter dynamically (in Power BI) and i need the graph to adjust to the filter.
This is what i have so far:
I imported the data and got it in the shape i need:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as
# read file from makeover monday year 2018 week 48
df = pd.read_csv(r'C:\Users\Ruth Pozuelo\Documents\python_examples\data\2018w48.csv', usecols=["city", "category","item", "cost"], index_col=False, decimal=",")
df.head()
this is the table:
I then apply the filter that will come from Power BI dynamically:
df = df[df.category=='Party night']
and then I count the number of plots based on the number of items I get after I apply the filter:
itemCount = df['item'].nunique() #number of plots
If I then plot the subplots:
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
I get the skeleton:
So far so good!
But now i am suck on how to feed the x axis to the loop to generate the subcategories. I am trying something like below, but nothing works.
#for i, ax in enumerate(axes.flatten()):
# ax.plot(??,cityValues, marker='o',markersize=25, lw=0, color="green") # The top-left axes
As I already have the code for the look and feel of the chart, annotations,ect, I would love to be able to use the plt.subplots method and I prefer not use seaborn if possible.
Any ideas on how to get his working?
Thanks in advance!
The data was presented to us and we used it as the basis for our code. I prepared a list of columns and a list of coloring and looped through them. axes.rabel() is more memory efficient than axes.fatten(). This is because the list contains an object for each subplot, allowing for centralized configuration.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
url='https://raw.githubusercontent.com/Curbal-Data-Labs/Matplotlib-Labs/master/2018w48.csv'
dataset = pd.read_csv(url)
dataset.drop_duplicates(['city','item'], inplace=True)
dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1)
df = dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1).sort_values('All', ascending=False, axis=1).drop('All').reset_index()
# comma replace
for c in df.columns[1:]:
df[c] = df[c].str.replace(',','.').astype(float)
fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(30,10), sharey=True)
colors = ['green','blue','red','black','brown']
col_names = ['Dinner','Drinks at Dinner','2 Longdrinks','Club entry','Cinema entry']
for i, (ax,col,c) in enumerate(zip(axes.ravel(), col_names, colors)):
ax.plot(df.loc[:,col], df['city'], marker='o', markersize=25, lw=0, color=c)
ax.set_title(col)
for i,j in zip(df[col], df['city']):
ax.annotate('$'+str(i), xy=(i, j), xytext=(i-4,j), color="white", fontsize=8)
ax.set_xticks([])
ax.spines[['top', 'right', 'left', 'bottom']].set_visible(False)
ax.grid(True, axis='y', linestyle='solid', linewidth=2)
ax.grid(True, axis='x', linestyle='solid', linewidth=0.2)
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.set_xlim(xmin=0, xmax=160)
ax.xaxis.set_major_formatter('${x:1.0f}')
ax.tick_params(labelsize=8, top=False, left=False)
plt.show()
Working Example below. I used seaborn to plot the bars but the idea is the same you can loop through the facets and increase a count. Starting from -1 so that your first count = 0, and use this as the axis label.
import seaborn as sns
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
df['Cost'] = df['Cost'].astype(float)
count = -1
variables = df['Item'].unique()
fig, axs = plt.subplots(1,itemCount , figsize=(25,70), sharex=False, sharey= False)
for var in variables:
count += 1
sns.barplot(ax=axs[count],data=df, x='Cost', y='City')
I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.
I'm trying to set my plot xticks to similar to the pandas dataframe default format.
I've been trying to set using the plt.set_xticklabels functions, but did not succeed.
fig, axarr = plt.subplots(len(stations), 2, figsize=(10,11))
plt.subplots_adjust(bottom=0.05)
hPc3.plot(use_index=True, subplots=True, ax=axarr[0:len(stations),0],
for i in range(0,len(axarr)):
axarr[i,0].set_ylabel('$nT$')
axarr[len(stations)-1,0].set_xlabel('$(UT)$')
for i in range(0,len(axarr)):
plot4 = axarr[i,1].pcolormesh(tti, wPc3_period[i], np.log10(abs(wPc3_power[i])), cmap = 'jet')
axarr[i,1].set_yscale('log', basey=2, subsy=None)
axarr[i,1].set_xlabel('$(UT)$')
axarr[i,1].set_ylabel('$Period$ $(s)$')
axarr[i,1].set_ylim([np.min(wPc3_period[i]), np.max(wPc3_period[i])])
axarr[i,1].invert_yaxis()
axarr[i,1].plot(tti, te_coi3, 'w')
cbar_coord = replace_at_index1(make_axes_locatable(axarr[i,1]).get_position(), [0,2], [0.92, 0.01])
cbar_ax = fig.add_axes(cbar_coord)
cbar = plt.colorbar(plot4, cax=cbar_ax, boundaries=np.linspace(-10, 10, 512),
ticks=[-10, -5, 0, 5, 10], label='$log_{2}$')
cbar.set_clim([-10,5])
the left panel show the default label of pandas data frame plot. The right panel is how is my formatation
Matplotlib dates api provides plenty of convenience functions and classes to represent and convert date and time data.
You can reproduce pandas style using a simple combination of DateFormatter, DayLocator and HourLocator. Here's an example on a dummy dataset given you didn't provide complete working code, but it shouldn't be hard to adapt to your use case.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# create toy dataset
index = pd.date_range("2018-05-25 00:00:00", "2018-05-26 00:00:00", freq = "1min")
series = pd.Series(np.random.random(len(index)), index=index)
x = index.to_pydatetime()
y = series
# plot
fig = plt.figure(figsize=(5,1))
ax = fig.gca()
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%H:%M"))
ax.xaxis.set_minor_locator(mdates.HourLocator(interval=3))
ax.tick_params(which='minor', labelrotation=30)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d-%b"))
ax.xaxis.set_major_locator(mdates.DayLocator())
ax.tick_params(which='major', pad=10, labelrotation=30)
ax.set_xlim(x.min(), x.max())
ax.plot(x, y)
plt.show()
UPDATED
I have write down a code like the given bellow..
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv("data_1.csv",index_col="Group")
print df
fig,ax = plt.subplots(1)
heatmap = ax.pcolor(df)########
ax.pcolor(df,edgecolors='k')
cbar = plt.colorbar(heatmap)##########
plt.ylim([0,12])
ax.invert_yaxis()
locs_y, labels_y = plt.yticks(np.arange(0.5, len(df.index), 1), df.index)
locs_x, labels_x = plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns)
ax.set_xticklabels(labels_x, rotation=10)
ax.set_yticklabels(labels_y,fontsize=10)
plt.show()
Which takes input like given bellow and plot a heat map with the two side leabel left and bottom..
GP1,c1,c2,c3,c4,c5
S1,21,21,20,69,30
S2,28,20,20,39,25
S3,20,21,21,44,21
I further want to add additional labels at right side as given bellow to the data and want to plot a heatmap with three side label. right left and bottom.
GP1,c1,c2,c3,c4,c5
S1,21,21,20,69,30,V1
S2,28,20,20,39,25,V2
S3,20,21,21,44,21,V3
What changes should i incorporate into the code.
Please help ..
You may create a new axis on the right of the plot, called twinx. Then you need to essentially adjust this axis the same way you already did with the first axis.
u = u"""GP1,c1,c2,c3,c4,c5
S1,21,21,20,69,30
S2,28,20,20,39,25
S3,20,21,21,44,21"""
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df= pd.read_csv(io.StringIO(u),index_col="GP1")
fig,ax = plt.subplots(1)
heatmap = ax.pcolor(df, edgecolors='k')
cbar = plt.colorbar(heatmap, pad=0.1)
bx = ax.twinx()
ax.set_yticks(np.arange(0.5, len(df.index), 1))
ax.set_xticks(np.arange(0.5, len(df.columns), 1), )
ax.set_xticklabels(df.columns, rotation=10)
ax.set_yticklabels(df.index,fontsize=10)
bx.set_yticks(np.arange(0.5, len(df.index), 1))
bx.set_yticklabels(["V1","V2","V3"],fontsize=10)
ax.set_ylim([0,12])
bx.set_ylim([0,12])
ax.invert_yaxis()
bx.invert_yaxis()
plt.show()
I am trying to set a background image to a line plot that I have done in matplotlib. While importing the image and using zorder argument also, I am getting two seperate images, in place of a single combined image. Please suggest me a way out. My code is --
import quandl
import pandas as pd
import sys, os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import itertools
def flip(items, ncol):
return itertools.chain(*[items[i::ncol] for i in range(ncol)])
df = pd.read_pickle('neer.pickle')
rows = list(df.index)
countries = ['USA','CHN','JPN','DEU','GBR','FRA','IND','ITA','BRA','CAN','RUS']
x = range(len(rows))
df = df.pct_change()
fig, ax = plt.subplots(1)
for country in countries:
ax.plot(x, df[country], label=country)
plt.xticks(x, rows, size='small', rotation=75)
#legend = ax.legend(loc='upper left', shadow=True)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show(1)
plt.figure(2)
im = plt.imread('world.png')
ax1 = plt.imshow(im, zorder=1)
ax1 = df.iloc[:,:].plot(zorder=2)
handles, labels = ax1.get_legend_handles_labels()
plt.legend(flip(handles, 2), flip(labels, 2), loc=9, ncol=12)
plt.show()
So in the figure(2) I am facing problem and getting two separate plots
In order to overlay background image over plot, we need imshow and extent parameter from matplotlib.
Here is an condensed version of your code. Didn't have time to clean up much.
First a sample data is created for 11 countries as listed in your code. It is then pickled and saved to a file (since there is no pickle file data).
import quandl
import pandas as pd
import sys, os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import itertools
from scipy.misc import imread
countries = ['USA','CHN','JPN','DEU','GBR','FRA','IND','ITA','BRA','CAN','RUS']
df_sample = pd.DataFrame(np.random.randn(10, 11), columns=list(countries))
df_sample.to_pickle('c:\\temp\\neer.pickle')
Next the pickle file is read and we create bar plot directly from pandas
df = pd.read_pickle('c:\\temp\\neer.pickle')
my_plot = df.plot(kind='bar',stacked=True,title="Plot Over Image")
my_plot.set_xlabel("countries")
my_plot.set_ylabel("some_number")
Next we use imread to read image into plot.
img = imread("c:\\temp\\world.png")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.imshow(img,zorder=0, extent=[0.1, 10.0, -10.0, 10.0])
plt.show()
Here is an output plot with image as background.
As stated this is crude and can be improved further.
You're creating two separate figures in your code. The first one with fig, ax = plt.subplots(1) and the second with plt.figure(2)
If you delete that second figure, you should be getting closer to your goal