Customizing plots in python (countplot and boxplot) - python

I am working on a data science project, and as I am fairly new I need some help when it comes to customzing my plots. Just a quick intro, I am working on a analysis of a dataset from Las Vegas car crashes. Here are the problems I am facing.
Countplot for crash severity
In the first image I would need to increase the size of the graph so the text on the x variable is visible.
The code for the plot:
sns.catplot(x="Crash_Seve", kind="count", data=df);
sns.set(style="darkgrid")
plt.title("Types of Crash Severity in Las Vegas car crashes")
plt.show()
Boxplots comparing speed of two drivers
Here I would also need to increase the size so the graphs are more visible, I tried something which you can see but whatever I type in the size the graph does not increase. I would also like to plot these box plots through seaborn or matplotlib so they are a bit prettier. They both come from two different columns but have the same interpretation mph of a drive, which means both are numeric. Thank you for the input
boxplot = df.boxplot(column=['V1_Driver_', 'V2_Driver_'])
plt.title("Speed of both drivers")
figure(num=None, figsize=(40, 20), dpi=160, facecolor='w', edgecolor='k')
plt.show()

In both examples, you can use the figsize option in the figure command (as you have tried) but you have to call figure before you plot something. I would also recommend to rotate the labels a bit how-to-rotate-axis-labels-in-seaborn-and-matplotlib and to change the fontsize how-to-change-the-font-size-on-a-matplotlib-plot.

Related

plt.scatter() plots behaving like plt.plot() plots in Matplotlib

I'm trying to compare the GDP-per-capita of the world's countries' to each countries' COVID-19 death total. Every time I try to turn it into a scatter plot, it displays the same plot as would be displayed using the plt.plot() command. Here is my code:
import pandas as pd
from matplotlib import pyplot as plt
plt.style.use('seaborn-whitegrid')
data = pd.read_csv(r'/Users/john.smith/covid-data.csv')
gdp = data["gdp_per_capita"]
deaths = data["total_deaths"]
plt.scatter(gdp, deaths)
plt.title('GDP-per-Capita Compared to COVID-19 Death Total')
plt.xlabel('GDP-per-Capita')
plt.ylabel('Confirmed Deaths')
plt.tight_layout()
plt.show()
While running this code, the following graph is produced. This is obviously not the scatter plot I'm trying to get, and it's worth noting that the only thing that changes when I use the plt.scatter() command is that the points on the plot just get very large.
I ran a test of the whole Matplotlib module entirely on a different file. When I use normal variables without importing from a CSV file, like this:
x = [7, 3, 8, 3]
y = [1, 5, 7, 4]
plt.scatter(x, y)
Then the code works perfectly fine and produces a scatter plot. I have been digging for hours online to try and find a solution, and have tried to use other methods of importing CSVs or creating scatter plots but nothing is working. Thank you for any tips.
Answer is courtesy of G. Anderson in the comments above.
As it turns out I just didn't have experience with the xlim() and ylim() commands, so the individual points in the scatter plot just overlapped very tightly in vertical lines. The reason this happened is simply because the original view window was too wide for this large of a dataset.
I did some slight additional research to try and put two plots onto a single figure with one being zoomed in, here's the code:
figs, axs = plt.subplots(2)
figs.suptitle('GDP-per-Capita Compared to COVID-19 Death Total')
axs[0].scatter(gdp, deaths)
axs[1].scatter(gdp, deaths)
plt.axis([10000, 20000, 10000, 20000])
This produced some nice plots I can use:
I'm going to look into ways to make the two plots much more readable.

Python/Seaborn: What does the inside horizontal distribution of the data-points means or is it random?

It seems like that inside-distribution of the histogram data points is almost random every time you plot (using Seaborn) - is it for the ease of readability or other meaningful purpose?
I am using Python 3.0 and Seaborn provided dataset called 'tips' for this question.
import seaborn as sns
tips = sns.load_dataset("tips")
After I ran my same code below twice I see differences of inside points distribution. Here is the code you can run a couple of times:
ax = sns.stripplot(x="day", y="total_bill", data=tips, alpha=.55,
palette='Set1', jitter=True, linewidth=1 )
Now, if you look into the plots (if you ran it twice for example) you will notice that the distribution of the points is not the same between 2 plots:
Please explain why points are not distributed identically with 2 separate runs? Also, judging those points on the horizontal scale; is there a reason why (for example) one red point is further left than other red point OR is it simply for readability?
Thank you in advance!
After a bit more research, I believe that the distribution of data points is random but uniform (thank you #ImportanceOfBeingErnest for pointing to the code). Therefore, answering my own questions there is no hidden meaning in terms of distribution and horizontal range is simply set for visibility that also changes or stays the same based on set/notset seed.
I do think that both displays are identical along the vertical axis (I.e. : both distributions are equal since they represent the same scatter plot of a given dataset). The slight visual differences comes along the position onto the horizontal (categorical days) axis; this one comes from the 'jitter' option (=True) that induces slight random relatively to the vertical axis they are related to (day). The jitter option helps to distinguish scatter plots with the same total_bill value (that should be superimposed if equal) : thus the difference comes from the jitter option set to True, that is used for readability.

Seaborn: edges in distplot don't fit the plot

When using matplotlib and seaborn in Jupyter with Python 2.7.12, I noticed that the edges of the distplot I drew don't fit the plot correctly (cf. the 2 figures below). At first I thought it was an issue with the code, but when trying the same code on someone else's laptop with the exact same versions of Jupyter and Python, the issue did not occur. Could anyone point me in the right direction?
wrong plot:
right plot:
I would gladly share the notebook with the code and the dataset, but since I am kind of new to sharing notebooks online, I do not know what the 'standard way to go' is.
Any help would be greatly appreciated.
It looks to me like the difference between the two plots is the bandwidth of the kernel used to calculate the KDE. Maybe there are different default values on both machines.
Try to play with either the bw= or kernel= parameters (documentation). Like so:
fig,(ax1,ax2) = plt.subplots(2,1, figsize=(5,10))
x = np.random.randn(100)
sns.distplot(x, ax=ax1)
sns.distplot(x, kde_kws={'bw':5}, ax=ax2)

Fine control over the font size in Seaborn plots

I'm currently trying to use Seaborn to create plots for my academic papers. The plots look great and easy to generate, but one problem that I'm having some trouble with is having the fine control on the font size in the plots.
My font size in my paper is 9pt and I would like to make sure the font size in my plots are either 9pt or 10pt. But in seaborn, the font size is mainly controlled through font scale sns.set_context("paper", font_scale=0.9). So it's hard for me to find the right font size except through trial and error. Is there a more efficient way to do this?
I also want to make sure the font size is consistent between different seaborn plots. But not all my seaborn plots have the same dimension, so it seems like using the same font_scale on all the plots does not necessarily create the same font size across these different plots?
I've attached my code below. I appreciate any comments on how to format the plot for a two column academic paper. My goal is to be able to control the size of the figure without distorting the font size or the plot. I use Latex to write my paper.
# Seaborn setting
sns.set(style='whitegrid', rc={"grid.linewidth": 0.1})
sns.set_context("paper", font_scale=0.9)
plt.figure(figsize=(3.1, 3)) # Two column paper. Each column is about 3.15 inch wide.
color = sns.color_palette("Set2", 6)
# Create a box plot for my data
splot = sns.boxplot(data=df, palette=color, whis=np.inf,
width=0.5, linewidth = 0.7)
# Labels and clean up on the plot
splot.set_ylabel('Normalized WS')
plt.xticks(rotation=90)
plt.tight_layout()
splot.yaxis.grid(True, clip_on=False)
sns.despine(left=True, bottom=True)
plt.savefig('test.pdf', bbox_inches='tight')
You are right. This is a badly documented issue. But you can change the font size parameter (by opposition to font scale) directly after building the plot. Check the following example:
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
b = sns.boxplot(x=tips["total_bill"])
b.axes.set_title("Title",fontsize=50)
b.set_xlabel("X Label",fontsize=30)
b.set_ylabel("Y Label",fontsize=20)
b.tick_params(labelsize=5)
plt.show()
, which results in this:
To make it consistent in between plots I think you just need to make sure the DPI is the same. By the way it' also a possibility to customize a bit the rc dictionaries since "font.size" parameter exists but I'm not too sure how to do that.
NOTE: And also I don't really understand why they changed the name of the font size variables for axis labels and ticks. Seems a bit un-intuitive.
It is all but satisfying, isn't it? The easiest way I have found to specify when setting the context, e.g.:
sns.set_context("paper", rc={"font.size":8,"axes.titlesize":8,"axes.labelsize":5})
This should take care of 90% of standard plotting usage. If you want ticklabels smaller than axes labels, set the 'axes.labelsize' to the smaller (ticklabel) value and specify axis labels (or other custom elements) manually, e.g.:
axs.set_ylabel('mylabel',size=6)
you could define it as a function and load it in your scripts so you don't have to remember your standard numbers, or call it every time.
def set_pubfig:
sns.set_context("paper", rc={"font.size":8,"axes.titlesize":8,"axes.labelsize":5})
Of course you can use configuration files, but I guess the whole idea is to have a simple, straightforward method, which is why the above works well.
Note: If you specify these numbers, specifying font_scale in sns.set_context is ignored for all specified font elements, even if you set it.
I've just spent way too long trying to find out the actual values of the "paper" sns context. I could only find it for "talk" and honestly I am raging! Just going to use
sns.set_context("paper", rc{"font.size":8,"axes.titlesize":8,"axes.labelsize":5})
even though these might be the values anyway!!!!

uneven axis when using pl.imshow

I am having trouble when plotting an image using pylabs imshow. Well there is no problem while plotting but my data is uneven (approx. 32*850) so when I plot it, the y axis is very short compared to the x-axis and you can see an example here example image. I just want the image to be stretched out in the y-axis so it is easier to see.
The code I started with(excluded labels and so on) is:
pl.figure()
pl.imshow(fom_data, interpolation='nearest')
pl.show()
And after googling it I tried changing to
pl.figure(figsize=(6,10))
Which only made the white parts around it larger. I then tried to write it with pyplot instead since it was easier to find people discussing the same thing:
fig, ax = plt.imshow(fom_data,extent=[0,850,0,32],aspect='auto')
plt.show()
As I found in this example: Imshow: extent and aspect but then get the following error message : 'AxesImage' object is not iterable
I am obiusly no pro, but if you know where my brain is going wrong please explain.
Using pyplot:
plt.figure()
plt.imshow(my_image)
plt.axes().set_aspect(aspect="auto") # grab the current axes to set their aspect

Categories