Legend for matplotlib plot not showing up - python

I am trying to plot legends for the attached image and I am unable to plot legend with my below query, Could someone please help me resolve this?
#Predicted Labels on PCA
pcadf = pd.DataFrame(preprocessed_data)
pcadf["kmeans"] = pipe["clusterer"]["kmeans"].labels_
pcadf.columns = ['component_1', 'component_2', 'kmeans']
x = pcadf['component_1'].values
y = pcadf['component_2'].values
Cluster = pcadf["kmeans"].values
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
scatter = ax.scatter(x,y,c=Cluster,s=50)
ax.legend()
fig.savefig('KMeans_Cluster.png', bbox_inches='tight', dpi=1200)

See this matplotlib help page for the 2 options. You either loop through the different labels, or use the PathCollection's legend_elements() , and below I use an example for the 2nd option:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
iris = sns.load_dataset('iris')
df = StandardScaler().fit_transform(iris.iloc[:,:4])
pcadf = PCA(n_components=2).fit_transform(df)
pcadf = pd.DataFrame(pcadf,columns = ['component_1','component_2'])
pcadf["kmeans"] = KMeans(n_clusters=2).fit_predict(df)
#Cluster = pcadf["kmeans"].values
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
ax.scatter(pcadf['component_1'],pcadf['component_2'],c=pcadf['kmeans'],s=50)
legend1 = ax.legend(*scatter.legend_elements(),
loc="lower left", title="Clusters")
ax.add_artist(legend1)

Related

How to plot colors for two variables in scatterplot in python?

I have a dataset with two different variables, i want to give colors to each with different color, Can anyone help please? Link to my dataset : "https://github.com/mayuripandey/Data-Analysis/blob/main/word.csv"
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x = df['Friends Network-metrics'], y = df['Number of Followers'],cmap = "magma")
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
Not very clear what you want to do here. But I'll provide a solution that may help you a bit.
Could use seaborn to implement the colors on the variables. Otherwise, you'd need to iterate through the points to set the color. Or create a new column that conditionally inputs a color for a value.
I don't know what your variable is, but you just want to put that in for the hue parameter:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
# Use the 'hue' argument to provide a factor variable
sns.lmplot(x='Friends Network-metrics',
y='Number of Followers',
height=8,
aspect=.8,
data=df,
fit_reg=False,
hue='Sentiment',
legend=True)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
This can give you a view like this:
If you were looking for color scale for one of the variables though, you would do the below. However, the max value is so big that the range also doesn't make it really an effective visual:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
fig, ax = plt.subplots(figsize=(10, 6))
g = ax.scatter(x = df['Friends Network-metrics'],
y = df['Number of Followers'],
c = df['Friends Network-metrics'],
cmap = "magma")
fig.colorbar(g)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
So you could adjust the scale (I'd also add edgecolors = 'black' as its hard to see the light plots):
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
fig, ax = plt.subplots(figsize=(10, 6))
g = ax.scatter(x = df['Friends Network-metrics'],
y = df['Number of Followers'],
c = df['Friends Network-metrics'],
cmap = "magma",
vmin=0, vmax=10000,
edgecolors = 'black')
fig.colorbar(g)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()

Image as axis values

Is there a way to keep images as axis values?
Two similar questions here and here does not answer my question.
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
sns.catplot(x="sex", y="survived", hue="class", kind="bar", data=titanic)
I would like to replace the male and female axis values with the corresponding image present in the image link. Can we map the axis values to the image links?
Male:
https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcSuGDLqvyU56RbTEFQP3ohzx9d0vJv-nQOk1g&usqp=CAU
Female:
https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRnSlVrt4o9yHIHnJ7H-cPi_fhOC4bePnyOoA&usqp=CAU
The answer using an OffsetBox in the questions you linked is probably the best option
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnchoredOffsetbox
titanic = sns.load_dataset("titanic")
images = ["https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcSuGDLqvyU56RbTEFQP3ohzx9d0vJv-nQOk1g&usqp=CAU",
"https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRnSlVrt4o9yHIHnJ7H-cPi_fhOC4bePnyOoA&usqp=CAU"]
pos = [0,1]
fig, ax = plt.subplots()
ax = sns.barplot(x="sex", y="survived", hue="class", data=titanic)
ax.set_xticklabels([])
for m,p in zip(images,pos):
image = plt.imread(m)
im = OffsetImage(image, zoom=0.1)
ab = AnchoredOffsetbox(loc='upper center', child=im,
bbox_to_anchor=(p,0), bbox_transform=ax.get_xaxis_transform(),
frameon=False)
ax.add_artist(ab)
plt.tight_layout()

Correlation values in pairplot()

Is there a way to show pair-correlation values with seaborn.pairplot(), as in the example below (created with ggpairs() in R)? I can make the plots using the attached code, but cannot add the correlations. Thanks
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
# remove upper triangle plots
for i, j in zip(*np.triu_indices_from(g.axes, 1)):
g.axes[i, j].set_visible(False)
plt.show()
If you use PairGrid instead of pairplot, then you can pass a custom function that would calculate the correlation coefficient and display it on the graph:
from scipy.stats import pearsonr
def reg_coef(x,y,label=None,color=None,**kwargs):
ax = plt.gca()
r,p = pearsonr(x,y)
ax.annotate('r = {:.2f}'.format(r), xy=(0.5,0.5), xycoords='axes fraction', ha='center')
ax.set_axis_off()
iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map_diag(sns.distplot)
g.map_lower(sns.regplot)
g.map_upper(reg_coef)

How to change border color of violin plot in pandas graph?

I want to change the color of lineborder of violinplots.
I can set lines.linewidth to 0 but I want to show borders not to hide them. How to change the color of the border?
sns.set_context("paper", rc={"lines.linewidth": 0.8})
My code is as follows:
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import numpy as np
datasets = pd.read_csv("merged.csv", index_col=0);
df = datasets
df.protocol = df.protocol.astype(str)
f, ax = plt.subplots(figsize=(18, 6))
sns.violinplot(x="time",
y="values",
hue="protocol",
data=df,
bw=.5,
scale="count"
)
sns.despine(left=True)
f.suptitle('Title', fontsize=22, fontweight='bold')
ax.set_xlabel("Time",size = 16,alpha=0.7)
ax.set_ylabel("Values",size = 16,alpha=0.7)
ax.set_xticklabels(df.qber, rotation=90)
ax.grid(True)
plt.legend(loc='upper right')
plt.grid(linestyle='--', alpha=0.7)
fig = ax.get_figure()
fig.savefig('time_v.pdf', bbox_inches='tight')
Thank you!
this should be very close to what you're looking for:
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import numpy as np
def patch_violinplot(palette, n):
from matplotlib.collections import PolyCollection
ax = plt.gca()
violins = [art for art in ax.get_children() if isinstance(art, PolyCollection)]
colors = sns.color_palette(palette, n_colors=n) * (len(violins)//n)
for i in range(len(violins)):
violins[i].set_edgecolor(colors[i])
datasets = pd.read_csv("merged.csv", index_col=0);
df = datasets
df.protocol = df.protocol.astype(str)
num_cols = df['protocol'].nunique()
f, ax = plt.subplots(figsize=(18, 6))
sns.violinplot(x="time",
y="values",
hue="protocol",
data=df,
bw=.5,
scale="count",
palette="deep"
)
patch_violinplot("deep", num_cols)
sns.despine(left=True)
f.suptitle('Title', fontsize=22, fontweight='bold')
ax.set_xlabel("Time",size = 16,alpha=0.7)
ax.set_ylabel("Values",size = 16,alpha=0.7)
ax.set_xticklabels(df.qber, rotation=90)
ax.grid(True)
plt.legend(loc='upper right')
plt.grid(linestyle='--', alpha=0.7)
fig = ax.get_figure()
fig.savefig('time_v.pdf', bbox_inches='tight')
The patch_violin function came from here.

Scatter plot with colorbar and datetime axis ticks

I am getting lost in different methods used in matplotlib.
I want to create a colour-coded scatter plot with a colorbar on the side and datetime on the x axis.
But depending on how I define my ax, I get different errors.
Below is the core of my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.cm as cm
import matplotlib.dates as mdates
#.....loading files etc.
norm = mcolors.Normalize(vmin=0,vmax=1000)
timerange = pd.date_range(start='2015-01-01', end='2016-01-01', freq='30D')
### PLOTTING
fig = plt.figure(figsize=(6.,5))
ax = fig.add_subplot(111)
for Af in Afiles:
for index, row in Af.iterrows():
time = pd.to_datetime(row['date'], format="%Y-%m-%d")
plt.scatter(time, row['A'], c=row['z'], norm=norm, cmap=colormap,edgecolor='k', lw=0.8, s=80)
plt.xticks(timerange, rotation=90)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d/%m/%Y"))
plt.xlabel('Time', fontsize=11, color='k')
clb = fig.colorbar(ax)
clb.ax.set_title('Value', y=-0.125, fontsize=11)
clb.ax.invert_yaxis()
fig.tight_layout()
this produces AttributeError: 'AxesSubplot' object has no attribute 'autoscale_None'
but if I specify my ax as the scatter plot so that I can get my colour-coding working, I then have trouble with the axis formatter.
Writing instead ax = plt.scatter generates AttributeError: 'PathCollection' object has no attribute 'xaxis'.
How can I have both the colorbar AND formatted axis ticks?
Don't call the scatter ax. (This overwrites the existinge axes ax.)
The colorbar expects as first argument a ScalarMappable (as e.g. the scatter). Since the scatters are all normalized, you can use it from the loop,
norm = plt.Normalize(...)
for bla in blubb:
scatter = plt.scatter(..., norm=norm)
Then,
clb = fig.colorbar(scatter)
The rest should stay the same.
The basic idea is that you need to add an extra axis for the colorbar.
It's hard to know if this is an exact match, as you haven't provided a working example with data. But this may at least serve as a template.
First, some example data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.cm as cm
import matplotlib.dates as mdates
from mpl_toolkits.axes_grid1 import make_axes_locatable
vmin = 0
vmax = 1000
timerange = pd.date_range(start='2015-01-01', end='2016-01-01', freq='30D')
N = len(timerange)
data = np.random.randint(vmin, vmax, size=N)
# z contains the colorbar values for each point
cmap = plt.get_cmap('Reds')
z = [cmap((x-vmin)/(vmax-vmin))[:3] for x in data]
df = pd.DataFrame({"value":data, "datetime":timerange, "z":z})
Now plot:
fig = plt.figure(figsize=(6.,5))
ax = fig.add_subplot(111)
plt.scatter(x=df.datetime.values, y=df.value.values, c=df.z)
ax.set_xticklabels(timerange, rotation=90)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d/%m/%Y"))
ax.set_xlabel('Time')
Now add colorbar:
norm = mcolors.Normalize(vmin=vmin,vmax=vmax)
m = cm.ScalarMappable(cmap='Reds', norm=norm)
m.set_array([(x-vmin)/(vmax-vmin) for x in df.value.values])
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
clb = plt.colorbar(m, cax=cax)

Categories