This question already has answers here:
Annotated heatmap with multiple color schemes
(2 answers)
How to create a heat-map with multiple colormaps?
(2 answers)
Heatmap with multiple colormaps by column
(2 answers)
Closed last month.
I am trying to build a heatmap with seaborn that each row of the 'table' get a different color range. Example:
A: yellow
B: blue
C: green
D: red
must be in the same table for comparing the own row along the year and different rows in the same month. Is my question clear, thanks so much?
I only could plot the heatmap comparing all the table:
d = {'jan': [44, 2, 3, 4],
'feb': [33, 4, 6, 8],
'mar': [50, 10, 15, 20],
'abr': [11, 12, 13, 14],
'mai': [3, 40, 6, 60],
'jun': [40, 8, 12, 16]}
idx = ['A', 'B', 'C', 'D']
df = pd.DataFrame(d, index = idx)
sns.heatmap(df, annot = True, cmap = 'Blues')
plt.yticks(rotation = 0)
That is what i trying to plot. Made with google sheets:
I suggest you to do one plot per line and put them into a plt subplot:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'jan': [44, 2, 3, 4], 'feb': [33, 4, 6, 8], 'mar': [50, 10, 15, 20], 'abr': [11, 12, 13, 14], 'mai': [3, 40, 6, 60], 'jun': [40, 8, 12, 16]}
idx = ['A', 'B', 'C', 'D']
df = pd.DataFrame(d, index = idx)
cm = ['Blues', 'Reds', 'Greens', 'Purples']
f, axs = plt.subplots(4, 1, gridspec_kw={'hspace': 0})
counter = 0
for index, row in df.iterrows():
sns.heatmap(np.array([row.values]), yticklabels=[idx[counter]], xticklabels=df.columns, annot=True, fmt='.2f', ax=axs[counter], cmap=cm[counter], cbar=False)
counter += 1
plt.show()
Output :
Say i have a dictionary named "DictionaryOfRoutes" that includes lists like in the following example:
'DictionaryOfRoutes'= {'RouteOfVehicle_1': [0, 6, 1, 5, 0],
'RouteOfVehicle_2': [0, 4, 3, 0],
'RouteOfVehicle_3': [0, 2, 0]
}
The lists in the values of it correspond to routes and the integers inside them to indices of points.
For the points i have the following data in the form of lists:
allIds = [0, 1, 2, 3, 4, 5, 6]
allxs = [50, 18, 33, 98, 84, 13, 50]
allys = [50, 73, 16, 58, 49, 63, 56]
For example the first point has an id of 0, x coordinates of 50, and y coordinates of 50, and so on..
I want to plot these routes like in the following png:
Example of Routes Visualized and Colorized
So far i have only managed to visualize one route (list) but not all of them, by using the following code:
def SolutionPlot(xx, yy, all_ids, matrix, final_route):
'''
xx = [50, 18, 33, 98, 84, 13, 50]
yy = [50, 73, 16, 58, 49, 63, 56]
all_ids = [0, 1, 2, 3, 4, 5, 6]
matrix = a numpy array of arrays (a distance matrix)
final_route = [0, 5, 4, 3, 2, 1]
'''
fig, ax = plt.subplots()
fig.set_size_inches(6, 6)
allxs = np.array(xx)
allys = np.array(yy)
final_route = np.array(final_route)
ax.plot(allxs[final_route], allys[final_route], ls="-", marker="o", markersize=6)
plt.xlim([0, 100])
plt.ylim([0, 100])
plt.title("Travelling Salesman (Nearest Neighbor Algorithm)")
for xi, yi, pidi in zip(xx, yy, all_ids):
ax.annotate(str(pidi), xy=(xi,yi))
plt.show()
which returns the following plot:
Plot i made so far
I modified your code as below:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def SolutionPlot(xx, yy, all_ids, routes_dictionary , color):
plt.style.use('dark_background')
fig, ax = plt.subplots()
for route in routes_dictionary.keys():
x = [xx[i] for i in routes_dictionary[route]]
y = [yy[i] for i in routes_dictionary[route]]
ind = [all_ids[i] for i in routes_dictionary[route]]
u = np.diff(x)
v = np.diff(y)
pos_x = x[:-1] + u/2
pos_y = y[:-1] + v/2
# calculate position and direction vectors:
x0 = np.array(x[:-1])
x1 = np.array(x[1:])
y0 = np.array(y[:-1])
y1 = np.array(y[1:])
xpos = (x0+x1)/2
ypos = (y0+y1)/2
xdir = x1-x0
ydir = y1-y0
ax.plot(x,y , color = color[route] , lw = 3)
ax.scatter(x,y , c = 'white' , edgecolors = color[route] , s = 200)
# plot arrow on each line:
for label, x_text, y_text, X,Y,dX,dY in zip(ind, x , y, xpos, ypos, xdir, ydir):
ax.annotate("", xytext=(X,Y),xy=(X+0.001*dX,Y+0.001*dY),
arrowprops=dict(arrowstyle="->", color = color[route]), size = 20)
ax.annotate(label, (x_text - 1 ,y_text - 1), color = 'black' , size = 10)
plt.grid(color='white', linestyle='-', linewidth=0.7)
plt.show()
DictionaryOfRoutes = {'RouteOfVehicle_1': [0, 6, 1, 5, 0],
'RouteOfVehicle_2': [0, 4, 3, 0],
'RouteOfVehicle_3': [0, 2, 0]}
allIds = [0, 1, 2, 3, 4, 5, 6]
allxs = [50, 18, 33, 98, 84, 13, 50]
allys = [50, 73, 16, 58, 49, 63, 56]
colors = {'RouteOfVehicle_1': 'red',
'RouteOfVehicle_2': 'green',
'RouteOfVehicle_3': 'blue'}
SolutionPlot(allxs, allys, allIds, DictionaryOfRoutes , colors)
And this is the result:
I hope this helps, usings the keys() function on the dictionary object for a loop helps.
final_routes=[]
for key in DictionaryOfRoutes.keys():
temp=[]
for index in DictionaryOfRoutes[key]:
temp.append(all_Ids[index])
final_routes.append(temp)
SolutionPlot(xx,yy,all_Ids,final_routes)
and in the function I added this,
for final_route in final_routes:
final_route = np.array(final_route)
ax.plot(allxs[final_route], allys[final_route], ls="-", marker="x", markersize=6)
plt.xlim([0, 100])
plt.ylim([0, 100])
plt.title("Travelling Salesman (Nearest Neighbor Algorithm)")
for xi, yi, pidi in zip(xx, yy, all_ids):
ax.annotate(str(pidi), xy=(xi,yi))
plt.show()
I am not so sure what the matrix variable is for.
I started this expecting to have to use plt.hold(True) but this wasn't the case, and I'm not sure why.
I have a function that allows me to display the circle of correlations of my pca.
The problem with this function is that the labels of my variables (column names) prevent me from reading my results correctly.
To overcome this problem, I have to insert a line of code before my function to associate numbers with the labels of the variables (of the df used to make my pca):
n_labels = [value for value in range(1, (len(df.columns) + 1))]
I tried unsuccessfully to insert this line in my function:
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
import numpy as np
import pandas as pd
def display_circles(pcs,
n_comp,
pca,
axis_ranks,
labels=None,
label_rotation=0,
lims=None):
for d1, d2 in axis_ranks: # We display the first 3 factorial planes, so the first 6 components
if d2 < n_comp:
# figure initialization
fig, ax = plt.subplots(figsize=(10, 8))
# determination of graph limits
if lims is not None:
xmin, xmax, ymin, ymax = lims
elif pcs.shape[1] < 30:
xmin, xmax, ymin, ymax = -1, 1, -1, 1
else:
xmin, xmax, ymin, ymax = min(pcs[d1, :]), max(pcs[d1, :]), min(
pcs[d2, :]), max(pcs[d2, :])
# arrow display
# if there are more than 30 arrows, the triangle is not displayed at their end
if pcs.shape[1] < 30:
plt.quiver(np.zeros(pcs.shape[1]),
np.zeros(pcs.shape[1]),
pcs[d1, :],
pcs[d2, :],
angles='xy',
scale_units='xy',
scale=1,
color="grey")
else:
lines = [[[0, 0], [x, y]] for x, y in pcs[[d1, d2]].T]
ax.add_collection(
LineCollection(lines, axes=ax, alpha=.1, color='black'))
# display of variable names
if labels is not None:
for i, (x, y) in enumerate(pcs[[d1, d2]].T):
if x >= xmin and x <= xmax and y >= ymin and y <= ymax:
plt.text(x,
y,
labels[i],
fontsize='22',
ha='center',
va='bottom',
rotation=label_rotation,
color="red",
alpha=0.7)
# circle display
circle = plt.Circle((0, 0), 1, facecolor='none', edgecolor='b')
plt.gca().add_artist(circle)
# setting graph limits
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
# display of horizontal and vertical lines
plt.plot([-1, 1], [0, 0], color='grey', ls='--')
plt.plot([0, 0], [-1, 1], color='grey', ls='--')
# names of the axes, with the percentage of inertia explained
plt.xlabel('F{} ({}%)'.format(
d1 + 1, round(100 * pca.explained_variance_ratio_[d1], 1)),
fontsize=14)
plt.ylabel('F{} ({}%)'.format(
d2 + 1, round(100 * pca.explained_variance_ratio_[d2], 1)),
fontsize=14)
plt.title("Circle of correlations (F{} and F{})".format(
d1 + 1, d2 + 1),
size=24)
plt.show()
This is how I call my function:
import pandas as pd
from sklearn import decomposition, preprocessing
# Here a dataset for the example:
column_1 = [1, 2, 3, 4, 5, 6, 7 ,8]
column_2 = [4, 2, 9, 23, 3, 52, 41, 4]
column_3 = [9, 8, 7, 6, 6, 9, 24, 11]
column_4 = [45, 36, 74, 35, 29, 45, 29, 39]
column_5 = [35, 84, 3, 54, 68, 78, 65, 97]
column_6 = [24, 96, 7, 54, 67, 69, 88, 95]
column_7 = [5, 39, 72, 42, 22, 41, 24, 41]
column_8 = [30, 98, 8, 67, 68, 41, 27, 87]
df = pd.DataFrame({'column_1': column_1,
'column_2': column_2,
'column_3': column_3,
'column_4': column_4,
'column_5': column_5,
'column_6': column_6,
'column_7': column_7,
'column_8': column_8})
pca_data = preprocessing.scale(df)
pca = decomposition.PCA(n_components = 8)
pca.fit(pca_data)
# We set the number of components
n_comp = 2
# Recovery of the components of the pca object
pcs = pca.components_
# We label a number to each column name
n_labels = [value for value in range(1, (len(df.columns) + 1))]
display_circles(pcs, n_comp, pca, [(0, 1), (0, 2)], labels=n_labels)
for element in zip(n_labels, df.columns):
print(element)
Here is my obtained result:
Edit 1: that i would like (UPD: with the answer of #Stef -Thanks you very much and congratulations for this solution-)
it's almost perfect but the problem is when I use this function:
n_comp = 3
pcs = pca.components_
# I always have to write this line to get a label number
n_labels=[value for value in range(1,(len(list_candidates.columns)+1))]
display_circles(pcs, n_comp, pca, [(0, 1), (0, 2)], labels=n_labels)
on my real dataframe, this throws me two problems:
I still have to include the line
n_labels=[value for value in range(1,(len(list_candidates.columns)+1))]
to obtain a label number instead of the name of my variables.
I get the error message "NameError: name 'df' is not defined" when running
display_circles(pcs, n_comp, pca, [(0, 1), (0, 2)], labels=n_labels)
So I'm looking to define my display_circles() function so that when I set the labels="name_of_the_df" argument it returns me the same result as
n_labels=[value for value in range(1,(len("name_of_the_df".columns)+1))]
plus a plt.legend() like the one made by #Steph (thanks)
To get this (desired) result:
I also have to modify "name_of_the_df" in the function definition:
#legend
plt.legend(n_labels,
candidate_list.columns,
handler_map={int: IntHandler()},
bbox_to_anchor=(1, 1))
You can define your own legend handler for integers:
from matplotlib.text import Text
class IntHandler:
def legend_artist(self, legend, orig_handle, fontsize, handlebox):
x0, y0 = handlebox.xdescent, handlebox.ydescent
text = Text(x0, y0, str(orig_handle), color='red')
handlebox.add_artist(text)
return text
and then call
plt.legend(n_labels, df.columns, handler_map={int: IntHandler()}, bbox_to_anchor=(1,1))
before plt.show() in display_circles:
Full example as per comment below and edited question:
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib.text import Text
import numpy as np
import pandas as pd
from sklearn import decomposition, preprocessing
class IntHandler:
def legend_artist(self, legend, orig_handle, fontsize, handlebox):
x0, y0 = handlebox.xdescent, handlebox.ydescent
text = Text(x0, y0, str(orig_handle), color='red')
handlebox.add_artist(text)
return text
def display_circles(pcs,
n_comp,
pca,
axis_ranks,
labels=None,
label_rotation=0,
lims=None):
if labels == None:
labels = range(1, len(pca.feature_names_in_) + 1)
for d1, d2 in axis_ranks: # We display the first 3 factorial planes, so the first 6 components
if d2 < n_comp:
# figure initialization
fig, ax = plt.subplots(figsize=(10, 8))
# determination of graph limits
if lims is not None:
xmin, xmax, ymin, ymax = lims
elif pcs.shape[1] < 30:
xmin, xmax, ymin, ymax = -1, 1, -1, 1
else:
xmin, xmax, ymin, ymax = min(pcs[d1, :]), max(pcs[d1, :]), min(
pcs[d2, :]), max(pcs[d2, :])
# arrow display
# if there are more than 30 arrows, the triangle is not displayed at their end
if pcs.shape[1] < 30:
plt.quiver(np.zeros(pcs.shape[1]),
np.zeros(pcs.shape[1]),
pcs[d1, :],
pcs[d2, :],
angles='xy',
scale_units='xy',
scale=1,
color="grey")
else:
lines = [[[0, 0], [x, y]] for x, y in pcs[[d1, d2]].T]
ax.add_collection(
LineCollection(lines, axes=ax, alpha=.1, color='black'))
# display of variable names
if labels is not None:
for i, (x, y) in enumerate(pcs[[d1, d2]].T):
if x >= xmin and x <= xmax and y >= ymin and y <= ymax:
plt.text(x,
y,
labels[i],
fontsize='22',
ha='center',
va='bottom',
rotation=label_rotation,
color="red",
alpha=0.7,
)
# circle display
circle = plt.Circle((0, 0), 1, facecolor='none', edgecolor='b')
plt.gca().add_artist(circle)
# setting graph limits
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
# display of horizontal and vertical lines
plt.plot([-1, 1], [0, 0], color='grey', ls='--')
plt.plot([0, 0], [-1, 1], color='grey', ls='--')
# names of the axes, with the percentage of inertia explained
plt.xlabel('F{} ({}%)'.format(
d1 + 1, round(100 * pca.explained_variance_ratio_[d1], 1)),
fontsize=14)
plt.ylabel('F{} ({}%)'.format(
d2 + 1, round(100 * pca.explained_variance_ratio_[d2], 1)),
fontsize=14)
plt.title("Circle of correlations (F{} and F{})".format(
d1 + 1, d2 + 1),
size=24)
plt.legend(labels,
pca.feature_names_in_,
handler_map={int: IntHandler()},
bbox_to_anchor=(1,1))
plt.show()
# Here a dataset for the example:
column_1 = [1, 2, 3, 4, 5, 6, 7 ,8]
column_2 = [4, 2, 9, 23, 3, 52, 41, 4]
column_3 = [9, 8, 7, 6, 6, 9, 24, 11]
column_4 = [45, 36, 74, 35, 29, 45, 29, 39]
column_5 = [35, 84, 3, 54, 68, 78, 65, 97]
column_6 = [24, 96, 7, 54, 67, 69, 88, 95]
column_7 = [5, 39, 72, 42, 22, 41, 24, 41]
column_8 = [30, 98, 8, 67, 68, 41, 27, 87]
df = pd.DataFrame({'column_1': column_1,
'column_2': column_2,
'column_3': column_3,
'column_4': column_4,
'column_5': column_5,
'column_6': column_6,
'column_7': column_7,
'column_8': column_8})
pca_data = preprocessing.scale(df)
pca = decomposition.PCA(n_components = 8)
pca.fit(pd.DataFrame(pca_data, columns=df.columns))
# We set the number of components
n_comp = 2
# Recovery of the components of the pca object
pcs = pca.components_
display_circles(pcs, n_comp, pca, [(0, 1), (0, 2)])
Currently, I am trying to output image outputs for my data.
My original source is like this:
total_dict = {"A" : {"A1" : [1, 2, 3, 4, 5], "A2" : [2, 3, 4, 5, 6]}, "B" : {"B1" : [11, 22, 13, 34, 5], "B2" : [12, 31, 42, 52, 16]},"C" : {"C1" : [12, 22, 33, 4, 5], "C2" : [42, 33, 42, 15, 6]}, "D" : {"D1" : [1, 23, 35, 4, 5], "D2" : [21, 23, 34, 5, 6]}}
Now, I am trying to create 4 heat maps for each sub-library: A, B, C and D.
My program is:
import pandas as pd
import seaborn as sns
for sub in total_dict:
df = pd.DataFrame(total_dict[sub])
image = sns.heatmap(df, cmap="YlGnBu", linewidths = 0.1, vmax = 100)
print (image)
However, it does not print out 4 separate images as I expected. The final result is:
heatmap_result
Could you please suggest me any method that I can use to get my expected 4 separate outputs as well as be able to save them in 4 separate files.
Thanks for your help.
You can define a new figure, plot the heat-map, and save the figure.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
for index, sub in enumerate(total_dict):
df = pd.DataFrame(total_dict[sub])
# create a new figure
fig = plt.figure()
image = sns.heatmap(df, cmap="YlGnBu", linewidths = 0.1, vmax = 100)
fig_name = 'test_%s.png' %(index)
fig.savefig(fig_name)