I have a dataframe and I'm using seaborn pairplot to plot one target column vs the rest of the columns.
Code is below,
import seaborn as sns
import matplotlib.pyplot as plt
tgt_var = 'AB'
var_lst = ['A','GH','DL','GT','MS']
pp = sns.pairplot(data=df,
y_vars=[tgt_var],
x_vars=var_lst)
pp.fig.set_figheight(6)
pp.fig.set_figwidth(20)
The var_lst is not a static list, I just provided an example.
What I need is to plot tgt_var on Y axis and each var_lst on x axis.
I'm able to do this with above code, but I also want to use log scale on X axis only if the var_lst item is 'GH' or 'MS', for the rest normal scale. Is there any way to achieve this?
Iterate pp.axes.flat and set xscale="log" if the xlabel matches "GH" or "MS":
log_columns = ["GH", "MS"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
Full example with the iris dataset where the petal columns are xscale="log":
import seaborn as sns
df = sns.load_dataset("iris")
pp = sns.pairplot(df)
log_columns = ["petal_length", "petal_width"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
Related
Here I am trying to separate the data with the factor male or not by plotting Age on x-axis and Fare on y-axis and I want to display two labels in the legend differentiating male and female with respective colors.Can anyone help me do this.
Code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male']=df['Sex']=='male'
sc1= plt.scatter(df['Age'],df['Fare'],c=df['male'])
plt.legend()
plt.show()
You could use the seaborn library which builds on top of matplotlib to perform the exact task you require. You can scatterplot 'Age' vs 'Fare' and colour code it by 'Sex' by just passing the hue parameter in sns.scatterplot, as follows:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure()
# No need to call plt.legend, seaborn will generate the labels and legend
# automatically.
sns.scatterplot(df['Age'], df['Fare'], hue=df['Sex'])
plt.show()
Seaborn generates nicer plots with less code and more functionality.
You can install seaborn from PyPI using pip install seaborn.
Refer: Seaborn docs
PathCollection.legend_elements method
can be used to steer how many legend entries are to be created and how they
should be labeled.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex']=='male'
sc1= plt.scatter(df['Age'], df['Fare'], c=df['male'])
plt.legend(handles=sc1.legend_elements()[0], labels=['male', 'female'])
plt.show()
Legend guide and Scatter plots with a legend for reference.
This can be achieved by segregating the data in two separate dataframe and then, label can be set for these dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
subset1 = df[(df['Sex'] == 'male')]
subset2 = df[(df['Sex'] != 'male')]
plt.scatter(subset1['Age'], subset1['Fare'], label = 'Male')
plt.scatter(subset2['Age'], subset2['Fare'], label = 'Female')
plt.legend()
plt.show()
enter image description here
'''
I did a clustermap with thousands of genes, using seaborn. Because, I'm interested in only few genes, I'd like to display those genes of interest on the ytick. I'm trying to figure it out using the iris dataset. Please find below my code. I'm not sure how to get the samples of interest at their right indexes. Thank you in advance for helpful assistance.
'''
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
samples = ['sample_'+str(x) for x in list(iris.index)] #creating sample ID lining up with the internal index.[![enter image description here][1]][1]
iris.insert(0,'Sample_ID',samples)
samples_of_interest = ['sample_41','sample_34','sample_114','sample_55'] #samples to be visible on ytick
sns.clustermap(iris.iloc[:,1:-1],yticklabels=samples_of_interest) #Not giving the expected result as all of thesmples of interest are not at their right index
plt.show()
plt.close()
Here's why your answer wasn't working:
See this about the yticklabels argument in the documentation:
If list-like, plot these alternate labels as the xticklabels.
So basically when you only pass a few tick labels, it is just setting those names as the tick labels, without knowledge of the tick positions. One way to get around this is to do the following, adding sample_labels which makes a label for all ticks, but sets non-interesting ones to None. You then follow this answer to rotate the ticks):
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
samples = ['sample_'+str(x) for x in list(iris.index)]
iris.insert(0,'Sample_ID',samples)
samples_of_interest = ['sample_41','sample_34','sample_114','sample_55']
sample_labels = [i if i in samples_of_interest else None
for i in iris['Sample_ID'] ]
cm=sns.clustermap(iris.iloc[:,1:-1], yticklabels=sample_labels)
plt.setp(cm.ax_heatmap.yaxis.get_majorticklabels(), rotation=0)
But this is still not ideal b/c there are ticks for all the positions I'm sure there is a way to edit this but instead..
Here's a method I like more:
Get the new order of the samples from the clustergrid (object returned by clustermap, then manually set the y-tick labels and positions (with credit to this post):
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
samples_of_interest = [41, 34, 114, 55]
sample_names = ['Sample ' + str(i) for i in samples_of_interest]
cm=sns.clustermap(iris.iloc[:,:-1]) #note the loc has changed!
reorder = cm.dendrogram_row.reordered_ind
new_positions = [reorder.index(i) for i in samples_of_interest]
plt.setp(cm.ax_heatmap.yaxis.set_ticks(new_positions))
plt.setp(cm.ax_heatmap.yaxis.set_ticklabels(sample_names))
Oddly the cm.ax_heatmap.yaxis.set... commands print out the get versions (it seems), but this doesn't affect outcome
I have the following dataset, code and plot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
data = [['tom', 10,15], ['matt', 13,10]]
df3 = pd.DataFrame(data, columns = ['Name', 'Attempts','L4AverageAttempts'])
f,ax = plt.subplots(nrows=1,figsize=(16,9))
sns.barplot(x='Attempts',y='Name',data=df3)
plt.show()
How can get a marker of some description (dot, *, shape, etc) to show that tomhas averaged 15 (so is below his average) and matt has averaged 10 so is above average. So a marker basxed off the L4AverageAttempts value for each person.
I have looked into axvline but that seems to be only a set number rather than a specific value for each y axis category. Any help would be much appreciated! thanks!
You can simply plot a scatter plot on top of your bar plot using L4AverageAttempts as the x value:
You can use seaborn.scatterplot for this. Make sure to set the zorder parameter so that the markers appear on top of the bars.
import seaborn as sns
import pandas as pd
data = [['tom', 10,15], ['matt', 13,10]]
df3 = pd.DataFrame(data, columns = ['Name', 'Attempts','L4AverageAttempts'])
f,ax = plt.subplots(nrows=1,figsize=(16,9))
sns.barplot(x='Attempts',y='Name',data=df3)
sns.scatterplot(x='L4AverageAttempts', y="Name", data=df3, zorder=10, color='k', edgecolor='k')
plt.show()
Is there a way how to add multiple seaborn boxplots to one figure sequentially?
Taking example from Time-series boxplot in pandas:
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
This gives me one series of box-plots?
Now, is there any way to plot two time-series like this one the same plot side-by-side? I want to plot it in the function that would have make_new_plot boolean parameter for separating the boxplots that are plotted from the for-loop.
If I try to just call it on the same axis, it gives me the overlapping plots:
I know that it is possible to concatenate the dataframes and make box plots of the concatenated dataframe together, but I would not want to have this plotting function returning any dataframes.
Is there some other way to make it? Maybe it is possible to somehow manipulate the width&position of boxes to achieve this? The fact tact that I need a time-series of boxplots & matplotlib "positions" parameter is on purpose not supported by seaborn makes it a bit tricky for me to figure out how to do it.
Note that it is NOT the same as eg. Plotting multiple boxplots in seaborn?, because I want to plot it sequentially without returning any dataframes from the plotting function.
You could do something like the following if you want to have hue nesting of different time-series in your boxplots.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
n = 480
ts0 = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
ts1 = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
ts2 = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
def ts_boxplot(ax, list_of_ts):
new_list_of_ts = []
for i, ts in enumerate(list_of_ts):
ts = ts.to_frame(name='ts_variable')
ts['ts_number'] = i
ts['doy']=ts.index.dayofyear
new_list_of_ts.append(ts)
plot_data = pd.concat(new_list_of_ts)
sns.boxplot(data=plot_data, x='doy', y='ts_variable', hue='ts_number', ax=ax)
return ax
fig, ax = plt.subplots(figsize=(12,5))
ax = ts_boxplot(ax, [ts0, ts1, ts2])
I have a scatter plot matrix generated using the seaborn package and I'd like to remove all the tick mark labels as these are just messying up the graph (either that or just remove those on the x-axis), but I'm not sure how to do it and have had no success doing Google searches. Any suggestions?
import seaborn as sns
sns.pairplot(wheat[['area_planted',
'area_harvested',
'production',
'yield']])
plt.show()
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris)
g.set(xticklabels=[])
You can use a list comprehension to loop through all columns and turn off visibility of the xaxis.
df = pd.DataFrame(np.random.randn(1000, 2)) * 1e6
sns.pairplot(df)
plot = sns.pairplot(df)
[plot.axes[len(df.columns) - 1][col].xaxis.set_visible(False)
for col in range(len(df.columns))]
plt.show()
You could also rescale your data to something more readable:
df /= 1e6
sns.pairplot(df)
Probably using the following is more appropriate
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris)
g.set(xticks=[])