Seaborn: edges in distplot don't fit the plot - python

When using matplotlib and seaborn in Jupyter with Python 2.7.12, I noticed that the edges of the distplot I drew don't fit the plot correctly (cf. the 2 figures below). At first I thought it was an issue with the code, but when trying the same code on someone else's laptop with the exact same versions of Jupyter and Python, the issue did not occur. Could anyone point me in the right direction?
wrong plot:
right plot:
I would gladly share the notebook with the code and the dataset, but since I am kind of new to sharing notebooks online, I do not know what the 'standard way to go' is.
Any help would be greatly appreciated.

It looks to me like the difference between the two plots is the bandwidth of the kernel used to calculate the KDE. Maybe there are different default values on both machines.
Try to play with either the bw= or kernel= parameters (documentation). Like so:
fig,(ax1,ax2) = plt.subplots(2,1, figsize=(5,10))
x = np.random.randn(100)
sns.distplot(x, ax=ax1)
sns.distplot(x, kde_kws={'bw':5}, ax=ax2)

Related

Kernel density estimate plot is not appearing in Jupyter notebook output when running seaborn.histplot()

I am trying to plot a probability density function (PDF) and approaching this through adding a kernel density estimate (KDE) to smooth my histogram by using seaborn.histplot().
sns.histplot(data=np.reshape(eddy_model_ds.q.isel(lev=0).values, (-1)), stat='density', kde=True)
plt.ticklabel_format(axis='y', style='sci', scilimits=(0,0))
plt.xlabel(r'$q_{1}$ [$s^{-1}$]')
plt.ylabel('probability density')
plt.title('Upper PV PDF')
plt.grid()
plt.show()
I'm expecting to get something like this where the curve is plotted with the histogram:
However this is what gets outputted when I run the above code in Jupyter notebook. The KDE curve is not actually getting plotted along with the histogram.
Would anyone be able to provide any insight as to why this is happening?
I've already tried adding %matplotlib inline with my imports as well in the case that there was an issue with seaborn sending plots outside of the Jupyter notebook. I've also tried running the other various methods similar to histplot() instead, including displot() and kdeplot(), however the curve fails to show when using these also.
It looks like I was using version v0.11.0 of seaborn and I just updated to the latest version (v0.12.1) and the KDE plot is now appearing.

Truncated figure with plotly

I am facing a problem with the Scatter3d from plotly: the figure is always truncated at the bottom:
I create the plot via plotly.express this way:
fig = px.scatter_3d(BFM_pcaFull, x=0, y=1, z=2, color=3)
with BFM_pcaFull being the pandas.DataFrame where the data are stored. I tried to create the plot via plotly.graph_object instead of plotly.epxress but the result is the same.
I tried to tweak the layout parameter via the update_layout() method of fig:
Padding
Auto margin
Scaling
scaleratio
constrain
of course without any change to the graph (which does surprise me and make me think I am doing something wrong, even if apparently the 3D surface seems to follow different rules somewhat).
An issue for the same problem is open on the Github repo of the project but has not been solved so far (https://github.com/plotly/plotly.py/issues/3785).
Has anybody faced the same problem and found a solution by any chance?
Thanks for your help
To avoid missing parts of the graph in a 3D graph, you can change the viewpoint angle. See here for more information. The following code can be used to deal with this problem.
import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_length',
y='sepal_width', z='petal_width',
color='species')
fig.show()
When the camera viewpoint is changed
fig.update_layout(margin=dict(l=0,r=0,t=0,b=0), scene_camera=dict(eye=dict(x=2.0, y=2.0, z=0.75)))

Error when plotting line graph using seaborn: If using all scalar values, you must pass an index

I am trying to make a more aesthetically pleasing graph for a project and was told that seaborn would make beautiful plots but I am having trouble with it as it returns the error: If using all scalar values, you must pass an index. I'm not sure why there is this error as I am able to plot a regular graph using the same dataframe.
This is the dataframe that I am using:
and I have successfully created a graph:
ax = data1.plot(xlabel='Year', ylabel='Electricity generation capacity', figsize=(15,10), marker='.')
ax.legend(title='Electricity generation capacity by Year', bbox_to_anchor=(1, 1.02), loc='upper left')
However, the graph is quite ugly as you can barely see the trend of the bottom three lines. (I do not know if seaborn will help with this issue as I am rather new to python and am unfamiliar with data visualization using python.)
Perhaps my code is wrong but when I try to make a graph, sns.lineplot(data1) , it returns an error as mentioned above.
Please let me know how I can solve this issue (Or if I can create a better-looking graph without seaborn, please teach me). Thank you.
From your screenshot it seems like the Year is the dataframe index. Try this:
sns.lineplot (data=data1, x=data1.index)

Customizing plots in python (countplot and boxplot)

I am working on a data science project, and as I am fairly new I need some help when it comes to customzing my plots. Just a quick intro, I am working on a analysis of a dataset from Las Vegas car crashes. Here are the problems I am facing.
Countplot for crash severity
In the first image I would need to increase the size of the graph so the text on the x variable is visible.
The code for the plot:
sns.catplot(x="Crash_Seve", kind="count", data=df);
sns.set(style="darkgrid")
plt.title("Types of Crash Severity in Las Vegas car crashes")
plt.show()
Boxplots comparing speed of two drivers
Here I would also need to increase the size so the graphs are more visible, I tried something which you can see but whatever I type in the size the graph does not increase. I would also like to plot these box plots through seaborn or matplotlib so they are a bit prettier. They both come from two different columns but have the same interpretation mph of a drive, which means both are numeric. Thank you for the input
boxplot = df.boxplot(column=['V1_Driver_', 'V2_Driver_'])
plt.title("Speed of both drivers")
figure(num=None, figsize=(40, 20), dpi=160, facecolor='w', edgecolor='k')
plt.show()
In both examples, you can use the figsize option in the figure command (as you have tried) but you have to call figure before you plot something. I would also recommend to rotate the labels a bit how-to-rotate-axis-labels-in-seaborn-and-matplotlib and to change the fontsize how-to-change-the-font-size-on-a-matplotlib-plot.

Different plots with show and savefig

I am plotting histograms with quite large number of bins. I am using Spyder, Python 3.6.3. The problem I found is that the figure I am seeing in iPython console is NOT the same as the saved plots. I have seen this thread which asks a similar question, however, my problem is worse, as it's not just the fonts and sizes that vary, I am actually getting different counts!
E.g. the same script:
plt.clf()
fig, ax=plt.subplots()
fig.dpi=100
plt.hist(df['POS'], bins=nbins, range=(0,dict_l[c]))
plt.savefig('current_chr17.jpg', dpi=100)
will produce this plot as a saved figure:
and show this in iPython console (interactive mode on, I'm not even asking for plt.show):
Does anyone have any explanation as to what is going on?

Categories