In the screenshot below, all my x-labels are overlapping each other.
g = sns.factorplot(x='Age', y='PassengerId', hue='Survived', col='Sex', kind='strip', data=train);
I know that I can remove all the labels by calling g.set(xticks=[]), but is there a way to just show some of the Age labels, like 0, 20, 40, 60, 80?
I am not sure why there aren't sensible default ticks and values like there are on the y-axis. At any rate you can do something like the following:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
titanic = sns.load_dataset('titanic')
sns.factorplot(x='age',y='fare',hue='survived',col='sex',data=titanic,kind='strip')
ax = plt.gca()
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
ax.xaxis.set_major_locator(ticker.MultipleLocator(base=20))
plt.show()
Result:
Related
I tried hard to look through all the documentation and examples but I am not able to figure it out. How do I change the number of categories = the number of size bubbles, and their boundaries in seaborn scatterplot? The sizes parameter doesn't help here.
It always gives me 6 of them regardless of what I try (here 8, 16, ..., 48):
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", size="total_bill")
or
penguins = sns.load_dataset("penguins")
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="body_mass_g")
And how do I change their boundaries? Ie. if I want to have 10, 20, 30, 40, 50 in the first case or 3000, 4000, 5000, 6000 in the second?
I know that going around and creating another column in the dataframe works but that is not wanted (adds unnecessary columns and even if I do it on the fly, it's just not what I am looking for).
Workaround:
def myfunc(mass):
if mass <3500:
return 3000
elif mass <4500:
return 4000
elif mass <5500:
return 5000
return 6000
penguins["mass"] = penguins.apply(lambda x: myfunc(x['body_mass_g']), axis=1)
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="mass")
I don't think seaborn has a fine-grained control, it just tries to come up with something that works a bit intuitively for many situations, but not for all. The legend='full' parameter shows all values of the size column, but that can be too overwhelming.
The suggestion to create a new column with binned sizes has the drawback that this will also change the sizes used in the scatterplot.
An approach could be to create your own custom legend. Note that when the legend also contains other elements, this approach needs to be adapted a bit.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
tips = sns.load_dataset("tips")
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", size="total_bill", legend='full')
handles, labels = ax.get_legend_handles_labels()
labels = np.array([float(l) for l in labels])
desired_labels = [10, 20, 30, 40, 50]
desired_handles = [handles[np.argmin(np.abs(labels - d))] for d in desired_labels]
ax.legend(handles=desired_handles, labels=desired_labels, title=ax.legend_.get_title().get_text())
plt.show()
The code can be wrapped into a function, and e.g. applied to the penguins:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
def sizes_legend(desired_sizes, ax=None):
ax = ax or plt.gca()
handles, labels = ax.get_legend_handles_labels()
labels = np.array([float(l) for l in labels])
desired_handles = [handles[np.argmin(np.abs(labels - d))] for d in desired_sizes]
ax.legend(handles=desired_handles, labels=desired_sizes, title=ax.legend_.get_title().get_text())
penguins = sns.load_dataset("penguins")
ax = sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="body_mass_g", legend='full')
sizes_legend([3000, 4000, 5000, 6000], ax)
plt.show()
I would like to change this from a line of regression to a curve. Also to have the line reach either side of the graph. Here is my code:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Days': [5, 10, 15, 20],
'Impact': [33.7561, 30.6281, 29.5748, 29.0482]
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x='Days', y='Impact', color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
ax = sns.regplot(x=np.arange(0,len(a)), y=a['Impact'], marker="+")
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()
Alternatively, I would prefer to do it in matplotlib as a scatter plot instead of bar chart. Here is an example in excel, but ideally to have the curve extend beyond the outside markers at least a little.
Can anyone help?
Currently displaying some data with Seaborn / Pandas. I'm looking to overlay the mean of each category (x=ks2) - but can't figure out how to do this with Seaborn.
I can remove the inner="box" - but want to replace that with a marker for the mean of each category.
Ideally, then link each mean calculated...
Any pointers greatly received.
Cheers
Science.csv has 9k+ entries
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
sns.set(style="whitegrid", palette="pastel", color_codes=True)
# Load the dataset
# df = pd.read_csv("science.csv") << loaded from csv
df = pd.DataFrame({'ks2': [1, 1, 2,3,3,4],
'science': [40, 50, 34,20,0,44]})
# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x="ks2", y="science", data=df, split=True,
inner="box",linewidth=2)
sns.despine(left=True)
plt.savefig('plot.png')
try:
from numpy import mean
then overlay sns.pointplot with estimator=mean
sns.pointplot(x = 'ks2', y='science', data=df, estimator=mean)
then play with linestyles
I'm using seaborn to draw a heatmap. But if there are too many yticks, some of them will be automatically hidden. The result looks like:
As you can see, the yticks only shows 1, 3, 5, 7.... 31, 33
How can I let seaborn or matplotlib show all of them like: 1, 2, 3, 4.....31, 32, 33, 34 ?
my code is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
y = np.random.randint(1, 100, 510)
y = y.reshape((34,15))
df = pd.DataFrame(y, columns=[x for x in 'wwwwwwwwwwwwwww'], index=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34'])
sns.heatmap(df, annot=True)
plt.yticks(rotation=0)
plt.show()
Seaborn heatmap provides arguments
xticklabels, yticklabels : “auto”, bool, list-like, or int, optional
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Hence the easies solution is to add yticklabels=1 as argument.
sns.heatmap(df, annot=True, yticklabels=1)
You can find my data set here.
I am using seaborn to plot the heatmap. But open to other choices.
I have trouble getting the color scheme right. I wish to have a black and white scheme. As the current color scheme doesn't clear show the result.
I also wish to display only x and y intervals as (0 , 25 , 50, 100 , 127).
How can I do this.
Below is my try:
import pandas as pd
import numpy
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
data_sorted = pd.read_csv("tors_sorted.txt", sep="\t")
ax = plt.axes()
ax.set_xlim(right=128)
minor_ticks = numpy.arange(0, 128, 50) # doesn't seem to work
data_sorted= data_sorted.pivot("dst","src","pdf_num_bytes")
#sns.heatmap(data_sorted,ax=ax)
sns.heatmap(data_sorted,linecolor='black',xticklabels=True,yticklabels=True)
ax.set_title('Sample plot')
ax.set_xticks(minor_ticks, minor=True)
fig = ax.get_figure()
fig.savefig('heatmap.jpg')
This is the image that I get.
thanks.