Pandas dataframe plotting: yscale log and xy label and legend issues

Pandas dataframe plotting: yscale log and xy label and legend issues - python

Intro
I am new to python, matplotlib and pandas. I spent a lot of time reviewing material to come up with the following. And I am stuck.
Question:
I am trying to plot using pandas. I have three Y axis and of which one is log scale.
I cannot figure out why the log function(1) and label function(2) doesn't work for my secondary axis ax2 in the code. It works everywhere else.
All the legends are separated (3). Is there a simpler way to handle this other than do manually.
When I plot the secondary axis part, separately it comes out fine. I ran the plot removing third axis, still problem persists. I put here the code with all axis as I need the solution proposed to work together in this manner.
Here methods are given for solving (3) alone but I am particularly looking for dataframe based plotting. Also other manual techniques are given in the same site, which I do not want to use!
Code and explanation
# Importing the basic libraries
import matplotlib.pyplot as plt
from pandas import DataFrame
# test3 = Dataframe with 5 columns
test3 = df.ix[:,['tau','E_tilde','Max_error_red','time_snnls','z_t_gandb']]
# Setting up plot with 3 'y' axis
fig, ax = plt.subplots()
ax2, ax3 = ax.twinx(), ax.twinx()
rspine = ax3.spines['right']
rspine.set_position(('axes', 1.25))
ax3.set_frame_on(True)
ax3.patch.set_visible(False)
fig.subplots_adjust(right=0.75)
# Setting the color and labels
ax.set_xlabel('tau(nounit)')
ax.set_ylabel('Time(s)', color = 'b')
ax2.set_ylabel('Max_error_red', color = 'r')
ax3.set_ylabel('E_tilde', color = 'g')
# Setting the logscaling
ax.set_xscale('log') # Works
ax2.set_yscale('log')# Doesnt work
# Plotting the dataframe
test3.plot(x = 'tau', y = 'time_snnls', ax=ax, style='b-')
test3.plot(x = 'tau', y = 'Max_error_red', ax=ax2, style='r-', secondary_y=True)
test3.plot(x = 'tau', y = 'z_t_gandb', ax=ax, style='b-.')
test3.plot(x = 'tau', y = 'E_tilde', ax=ax3, style='g-')

The issue is the secondary_y=True option. Remove that, and it works fine. I think the problem is that you have already set up your twin axes, and having secondary_y=True is interfering with that.
As for the legend: set legend=False in each of your test3.plot commands, and then gather then legend handles and labels from the axes after you have made the plot using ax.get_legend_handles_labels(). Then you can plot them all on one legend.
Finally, to make sure the axes labels are set correctly, you must set them after you have plotted your data, as the pandas DataFrame plotting methods will overwrite whatever you have tried to set. By doing this afterwards, you make sure that it is your label that is set.
Heres a working script (with dummy data):
import matplotlib.pyplot as plt
from pandas import DataFrame
import numpy as np
# Fake up some data
test3 = DataFrame({
'tau':np.logspace(-3,0,100),
'E_tilde':np.linspace(100,0,100),
'Max_error_red':np.logspace(-2,1,100),
'time_snnls':np.linspace(5,0,100),
'z_t_gandb':np.linspace(16,15,100)
})
# Setting up plot with 3 'y' axis
fig, ax = plt.subplots()
ax2, ax3 = ax.twinx(), ax.twinx()
rspine = ax3.spines['right']
rspine.set_position(('axes', 1.25))
ax3.set_frame_on(True)
ax3.patch.set_visible(False)
fig.subplots_adjust(right=0.75)
# Setting the logscaling
ax.set_xscale('log') # Works
ax2.set_yscale('log')# Doesnt work
# Plotting the dataframe
test3.plot(x = 'tau', y = 'time_snnls', ax=ax, style='b-',legend=False)
test3.plot(x = 'tau', y = 'Max_error_red', ax=ax2, style='r-',legend=False)
test3.plot(x = 'tau', y = 'z_t_gandb', ax=ax, style='b-.',legend=False)
test3.plot(x = 'tau', y = 'E_tilde', ax=ax3, style='g-',legend=False)
# Setting the color and labels
ax.set_xlabel('tau(nounit)')
ax.set_ylabel('Time(s)', color = 'b')
ax2.set_ylabel('Max_error_red', color = 'r')
ax3.set_ylabel('E_tilde', color = 'g')
# Gather all the legend handles and labels to plot in one legend
l1 = ax.get_legend_handles_labels()
l2 = ax2.get_legend_handles_labels()
l3 = ax3.get_legend_handles_labels()
handles = l1[0]+l2[0]+l3[0]
labels = l1[1]+l2[1]+l3[1]
ax.legend(handles,labels,loc=5)
plt.show()

Related

Heatmap with multi-color y-axis and correspondend colorbar

I want to create a heatmap with seaborn, similar to this (with the following code):
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
# Default heatmap
ax = sns.heatmap(df)
plt.show()
I'd also like to add a new variable (lets say new_var = pd.DataFrame(np.random.random((5,1)), columns=["new variable"])), such as that the values (and possibly the spine and ticks as well) of the y-axis are colored according to the new variable and a second color bar plotted in the same plot to represent the colors of the y-axis values. How can I do that?

This uses the new values to color the y-ticks and the y-tick labels and adds the associated colorbar.
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd
import numpy as np
# Create data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
# Default heatmap
ax = sns.heatmap(df)
new_var = pd.DataFrame(np.random.random((5,1)), columns=["new variable"])
# Create the colorbar for y-ticks and labels
norm = plt.Normalize(new_var.min(), new_var.max())
cmap = matplotlib.cm.get_cmap('turbo')
yticks_locations = ax.get_yticks()
yticks_labels = df.index.values
#hide original ticks
ax.tick_params(axis='y', left=False)
ax.set_yticklabels([])
for var, ytick_loc, ytick_label in zip(new_var.values, yticks_locations, yticks_labels):
color = cmap(norm(float(var)))
ax.annotate(ytick_label, xy=(1, ytick_loc), xycoords='data', xytext=(-0.4, ytick_loc),
arrowprops=dict(arrowstyle="-", color=color, lw=1), zorder=0, rotation=90, color=color)
# Add colorbar for y-tick colors
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
cb = ax.figure.colorbar(sm)
# Match the seaborn style
cb.outline.set_visible(False)

I found your problem interesting, and inspired by the unanswered comment above:
How do you change the second colorbar position? For example, one on top the other on bottom sides. - Py-ser
I decided to spend a while doing some tests. After a little digging i find that cbar_kws={"orientation": "horizontal"} is the argument for sns.heatmap that makes the colorbars horizontal.
Borrowing the code from the solution and making some changes, you can format your plot the way you want as in:
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd
import numpy as np
# Create data
df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
# Default heatmap
ax = sns.heatmap(df, cbar_kws={"orientation": "horizontal"}, square = False, annot = True)
new_var = pd.DataFrame(np.random.random((5,1)), columns=["new variable"])
# Create the colorbar for y-ticks and labels
norm = plt.Normalize(new_var.min(), new_var.max())
cmap = matplotlib.cm.get_cmap('turbo')
yticks_locations = ax.get_yticks()
yticks_labels = df.index.values
#hide original ticks
ax.tick_params(axis='y', left=False)
ax.set_yticklabels([])
for var, ytick_loc, ytick_label in zip(new_var.values, yticks_locations, yticks_labels):
color = cmap(norm(float(var)))
ax.annotate(ytick_label, xy=(1, ytick_loc), xycoords='data', xytext=(-0.4, ytick_loc),
arrowprops=dict(arrowstyle="-", color=color, lw=1), zorder=0, rotation=90, color=color)
# Add colorbar for y-tick colors
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
cb = ax.figure.colorbar(sm)
# Match the seaborn style
cb.outline.set_visible(False)
Also, you will notice that I listed the values related to each cell in the heatmap, but just out of curiosity to make it clearer to check that everything was working as expected.
I'm still not very happy with the shape/size of the horizontal colorbar, but I'll keep testing and update any progress by editing this answer!
==========================================
EDIT
just to keep track of the updates, first i tried to change just some parameters of seaborn's heatmap function but wouldn't consider this a major improvement on the task... by adding
ax = sns.heatmap(df, cbar_kws = dict(use_gridspec=True, location="top", shrink =0.6), square = True, annot = True)
I end up with:
I did get to separate the colormap using the matplotlib subplot routine and honestly i believe this is the right way given the parameter control that is possible to get here, by:
# Define two rows for subplots
fig, (cax, ax) = plt.subplots(nrows=2, figsize=(5,5.025), gridspec_kw={"height_ratios":[0.025, 1]})
# Default heatmap
ax = sns.heatmap(df, cbar=False, annot = True)
# colorbar
fig.colorbar(ax.get_children()[0], cax=cax, orientation="horizontal")
plt.show()
I obtained:
Which is still not the prettiest graph I've ever made, but now the position and size of the heatmap can be edited normally within the plt.subplots subroutines that give absolute control over these parameters.

How do I overlay multiple sns distplots or change the colour based on a secondary variable using a pandas df

I have a pandas dataframe with a 'frequency_mhz' variable and a 'type' variable. I want to create a dist plot using seaborne that overlays all of the frequencys but changes the colour based on the 'type'.
small_df = df[df.small_airport.isin(['Y'])]
medium_df = df[df.medium_airport.isin(['Y'])]
large_df = df[df.large_airport.isin(['Y'])]
plt.figure()
sns.distplot(small_df['frequency_mhz'], color='red')
plt.figure()
sns.distplot(medium_df['frequency_mhz'], color='green')
plt.figure()
sns.distplot(large_df['frequency_mhz'])
Is there a way I can overlay the 3 into one plot? or a way ive missed to change the colour of the bars based on another variable as you can with 'hue=' in other plots?

You can specify ax as kwarg to superimpose your plots:
small_df = df[df.small_airport.isin(['Y'])]
medium_df = df[df.medium_airport.isin(['Y'])]
large_df = df[df.large_airport.isin(['Y'])]
ax = sns.distplot(small_df['frequency_mhz'], color='red')
sns.distplot(medium_df['frequency_mhz'], color='green', ax=ax)
sns.distplot(large_df['frequency_mhz'], ax=ax)
plt.show()

Matplotlib pie charts as scatter plot

I have an interesting problem where I am trying to use multiple matplotlib pie charts as a scatter plot. I have read this post regarding this matplotlib tutorial and was able to get those working. However, I found that I was able to achieve the same results using the built-in pie function and plotting many pie charts on the same axis.
When using this alternative method, I found that after plotting the pie charts the axes lose their labels and whenever you pan the original data is still contained inside of the where the bounds of the original data should be, but the pie charts are only contained inside of the figure canvas.
The following code replicates the issue that I'm having.
import matplotlib.pyplot as plt
import pandas as pd
import random
def rand(): #simulate some random data
return [random.randint(0,100) for _ in range(10)]
def plot_pie(x, ax):
ax.pie(x[['a','b','c']], center=(x['lat'],x['lon']), radius=1,colors=['r', 'b', 'g'])
#my data is stored in a similar styled dataframe that I read from a csv and the data is static
sim_data = pd.DataFrame({'a':rand(),'b':rand(),'c':rand(), 'lat':rand(),'lon':rand()})
fig, ax = plt.subplots()
plt.scatter(x=sim_data['lat'], y=sim_data['lon'], s=1000, facecolor='none',edgecolors='r')
y_init = ax.get_ylim()
x_init = ax.get_xlim()
sim_data.apply(lambda x : plot_pie(x,ax), axis=1)
ax.set_ylim(y_init)
ax.set_xlim(x_init)
plt.show()
The reason that I reset the x and y limits of the axis is that I assume the pie function automatically sets the bounds of the axes to the last pie chart and this was my work around.
UPDATE
After reading the docs again I found that matplotlib pie chart objects as a default are set to not clip to the extents of any axes. To solve it, just updating that parameter seemed to work for me. The following code is the solution to my problem. I also found that by plotting each pie chart I would lose my axes ticks, to solve that I had to pass the frame parameter to the pie charts.
def plot_pie(x, ax):
ax.pie(x[['a','b','c']], center=(x['lat'],x['lon']), radius=1,colors=['r', 'b', 'g'], wedgeprops={'clip_on':True}, frame=True)

Data generated as in original post. I added a frame for each plot for clarity.
def plot_pie(x, ax, r=1):
# radius for pieplot size on a scatterplot
ax.pie(x[['a','b','c']], center=(x['lat'],x['lon']), radius=r, colors=['r', 'b', 'g'])
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
fig.patch.set_facecolor('white')
# original plot
ax = axs[0]
ax.scatter(x=sim_data['lat'], y=sim_data['lon'], s=1000, facecolor='none', edgecolors='r')
y_init = ax.get_ylim()
x_init = ax.get_xlim()
sim_data.apply(lambda x : plot_pie(x,ax), axis=1)
ax.set_ylim(y_init)
ax.set_xlim(x_init)
ax.set_title('Original')
ax.set_frame_on(True)
# r-beginner's solution
ax = axs[1]
ax.scatter(x=sim_data['lat'], y=sim_data['lon'], s=1000, facecolor='none', edgecolors='r')
y_init = ax.get_ylim()
x_init = ax.get_xlim()
sim_data.apply(lambda x : plot_pie(x,ax), axis=1)
ax.set_ylim([0, y_init[1]*1.1])
ax.set_xlim([0, x_init[1]*1.1])
ax.set_title('r-beginners')
ax.set_frame_on(True)
# my solution
ax = axs[2]
# do not use `s=` for size, it will not work properly when you are scattering pieplots
# because pieplots will be plotted above them
ax.scatter(x=sim_data['lat'], y=sim_data['lon'], s=0)
# git min/max values for the axes
y_init = ax.get_ylim()
x_init = ax.get_xlim()
sim_data.apply(lambda x : plot_pie(x, ax, r=7), axis=1)
# from zero to xlim/ylim with step 10
_ = ax.yaxis.set_ticks(range(0, round(y_init[1])+10, 10))
_ = ax.xaxis.set_ticks(range(0, round(x_init[1])+10, 10))
_ = ax.set_title('My')
ax.set_frame_on(True)

How to add label to interval group in y-axis in matplotlib pyplot?

With reference to this stackoverflow thread Specifying values on x-axis, following figure is generated .
I want to add interval name in the above figure like this way.
How to add such interval group name in every interval group in y-axis?

This is one way of doing it by creating a twin axis and modifying its tick labels and positions. Trick here is to find the middle positions loc_new between the existing ticks for placing your strings Interval i. You just need to play around a bit to get exactly the figure you want.
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.array([0,1,2,3])
y = np.array([0.650, 0.660, 0.675, 0.685])
my_xticks = ['a', 'b', 'c', 'd']
plt.xticks(x, my_xticks)
plt.yticks(np.arange(y.min(), y.max(), 0.005))
plt.plot(x, y)
plt.grid(axis='y', linestyle='-')
ax2 = ax.twinx()
ax2.set_ylim(ax.get_ylim())
loc = ax2.get_yticks()
loc_new = ((loc[1:]+loc[:-1])/2)[1:-1]
ax2.set_yticks(loc_new)
labels = ['Interval %s' %(i+1) for i in range(len(loc_new))]
ax2.set_yticklabels(labels)
ax2.tick_params(right=False) # This hides the ticks on the right hand y-axis
plt.show()

Two Y axis Bar plot: custom xticks

I am trying to add custom xticks to a relatively complicated bar graph plot and I am stuck.
I am plotting from two data frames, merged_90 and merged_15:
merged_15
Volume y_err_x Area_2D y_err_y
TripDate
2015-09-22 1663.016032 199.507503 1581.591701 163.473202
merged_90
Volume y_err_x Area_2D y_err_y
TripDate
1990-06-10 1096.530711 197.377497 1531.651913 205.197493
I want to create a bar graph with two axes (i.e. Area_2D and Volume) where the Area_2D and Volume bars are grouped based on their respective data frame. An example script would look like:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
fig = plt.figure()
ax1 = fig.add_subplot(111)
merged_90.Volume.plot(ax=ax1, color='orange', kind='bar',position=2.5, yerr=merged_90['y_err_x'] ,use_index=False , width=0.1)
merged_15.Volume.plot(ax=ax1, color='red', kind='bar',position=0.9, yerr=merged_15['y_err_x'] ,use_index=False, width=0.1)
ax2 = ax1.twinx()
merged_90.Area_2D.plot(ax=ax2,color='green', kind='bar',position=3.5, yerr=merged_90['y_err_y'],use_index=False, width=0.1)
merged_15.Area_2D.plot(ax=ax2,color='blue', kind='bar',position=0, yerr=merged_15['y_err_y'],use_index=False, width=0.1)
ax1.set_xlim(-0.5,0.2)
x = scipy.arange(1)
ax2.set_xticks(x)
ax2.set_xticklabels(['2015'])
plt.tight_layout()
plt.show()
The resulting plot is:
One would think I could change:
x = scipy.arange(1)
ax2.set_xticks(x)
ax2.set_xticklabels(['2015'])
to
x = scipy.arange(2)
ax2.set_xticks(x)
ax2.set_xticklabels(['1990','2015'])
but that results in:
I would like to see the ticks ordered in chronological order (i.e. 1990,2015)
Thanks!

Have you considered dropping the second axis and plotting them as follows:
ind = np.array([0,0.3])
width = 0.1
fig, ax = plt.subplots()
Rects1 = ax.bar(ind, [merged_90.Volume.values, merged_15.Volume.values], color=['orange', 'red'] ,width=width)
Rects2 = ax.bar(ind + width, [merged_90.Area_2D.values, merged_15.Area_2D.values], color=['green', 'blue'] ,width=width)
ax.set_xticks([.1,.4])
ax.set_xticklabels(('1990','2015'))
This produces:
I omitted the error and colors but you can easily add them. That would produce a readable graph given your test data. As you mentioned in comments you would still rather have two axes, presumably for different data with proper scales. To do this you could do:
fig = plt.figure()
ax1 = fig.add_subplot(111)
merged_90.Volume.plot(ax=ax, color='orange', kind='bar',position=2.5, use_index=False , width=0.1)
merged_15.Volume.plot(ax=ax, color='red', kind='bar',position=1.0, use_index=False, width=0.1)
ax2 = ax1.twinx()
merged_90.Area_2D.plot(ax=ax,color='green', kind='bar',position=3.5,use_index=False, width=0.1)
merged_15.Area_2D.plot(ax=ax,color='blue', kind='bar',position=0,use_index=False, width=0.1)
ax1.set_xlim([-.45, .2])
ax2.set_xlim(-.45, .2])
ax1.set_xticks([-.35, 0])
ax1.set_xticklabels([1990, 2015])
This produces:
Your problem was with resetting just one axis limit and not the other, they are created as twins but do not necessarily follow the changes made to one another.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe plotting: yscale log and xy label and legend issues - python

Related

Heatmap with multi-color y-axis and correspondend colorbar

How do I overlay multiple sns distplots or change the colour based on a secondary variable using a pandas df

Matplotlib pie charts as scatter plot

How to add label to interval group in y-axis in matplotlib pyplot?

Two Y axis Bar plot: custom xticks

Categories

Resources