I'm trying to compare the GDP-per-capita of the world's countries' to each countries' COVID-19 death total. Every time I try to turn it into a scatter plot, it displays the same plot as would be displayed using the plt.plot() command. Here is my code:
import pandas as pd
from matplotlib import pyplot as plt
plt.style.use('seaborn-whitegrid')
data = pd.read_csv(r'/Users/john.smith/covid-data.csv')
gdp = data["gdp_per_capita"]
deaths = data["total_deaths"]
plt.scatter(gdp, deaths)
plt.title('GDP-per-Capita Compared to COVID-19 Death Total')
plt.xlabel('GDP-per-Capita')
plt.ylabel('Confirmed Deaths')
plt.tight_layout()
plt.show()
While running this code, the following graph is produced. This is obviously not the scatter plot I'm trying to get, and it's worth noting that the only thing that changes when I use the plt.scatter() command is that the points on the plot just get very large.
I ran a test of the whole Matplotlib module entirely on a different file. When I use normal variables without importing from a CSV file, like this:
x = [7, 3, 8, 3]
y = [1, 5, 7, 4]
plt.scatter(x, y)
Then the code works perfectly fine and produces a scatter plot. I have been digging for hours online to try and find a solution, and have tried to use other methods of importing CSVs or creating scatter plots but nothing is working. Thank you for any tips.
Answer is courtesy of G. Anderson in the comments above.
As it turns out I just didn't have experience with the xlim() and ylim() commands, so the individual points in the scatter plot just overlapped very tightly in vertical lines. The reason this happened is simply because the original view window was too wide for this large of a dataset.
I did some slight additional research to try and put two plots onto a single figure with one being zoomed in, here's the code:
figs, axs = plt.subplots(2)
figs.suptitle('GDP-per-Capita Compared to COVID-19 Death Total')
axs[0].scatter(gdp, deaths)
axs[1].scatter(gdp, deaths)
plt.axis([10000, 20000, 10000, 20000])
This produced some nice plots I can use:
I'm going to look into ways to make the two plots much more readable.
Related
I am working on a data science project, and as I am fairly new I need some help when it comes to customzing my plots. Just a quick intro, I am working on a analysis of a dataset from Las Vegas car crashes. Here are the problems I am facing.
Countplot for crash severity
In the first image I would need to increase the size of the graph so the text on the x variable is visible.
The code for the plot:
sns.catplot(x="Crash_Seve", kind="count", data=df);
sns.set(style="darkgrid")
plt.title("Types of Crash Severity in Las Vegas car crashes")
plt.show()
Boxplots comparing speed of two drivers
Here I would also need to increase the size so the graphs are more visible, I tried something which you can see but whatever I type in the size the graph does not increase. I would also like to plot these box plots through seaborn or matplotlib so they are a bit prettier. They both come from two different columns but have the same interpretation mph of a drive, which means both are numeric. Thank you for the input
boxplot = df.boxplot(column=['V1_Driver_', 'V2_Driver_'])
plt.title("Speed of both drivers")
figure(num=None, figsize=(40, 20), dpi=160, facecolor='w', edgecolor='k')
plt.show()
In both examples, you can use the figsize option in the figure command (as you have tried) but you have to call figure before you plot something. I would also recommend to rotate the labels a bit how-to-rotate-axis-labels-in-seaborn-and-matplotlib and to change the fontsize how-to-change-the-font-size-on-a-matplotlib-plot.
`I am trying to reproduce the attached figure step by step. My problem was that how can i plot colorbar in above figure by my data. My data is a cosmological data and it has 7 columns totally with many raw. My main goal is reproducing the present figure step by step. You can see that there are three different plots which are interpolated each other. Firstly, i tried to plot small colorful lines in the body of figure by using two columns of data. I did it by scatter plots and then i needed to reproduce the colorbar part of figure. But, it was not possible at the first attempt. Because, the colorbar points was not a part of data. Then, i obtained the values of colorbar by some calculations and added them as additional columns to data. Now, i could you the simple colorbar function to do colorbar part. And i got it. For the next step, i need to turn small curved lines to dark solid lines.
How can I do plots in matplotlib?
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
data1 = np.loadtxt("bei_predic.txt", unpack=True)
B = np.log10(data1[3]/(4.*(data1[2])))
R = np.vstack((data1,B))
R = np.transpose(R)
D = R[~np.isnan(R).any(axis=1)]
A = plt.scatter(D[:,3],D[:,2], c=D[:,8])
cbar= plt.colorbar()
cbar.set_label("file", labelpad=+1)
plt.show()
If you could start off by telling us a little bit about the data that you are using that would be great. In order to plot the figure that you want, we must first load the data into some variables. Have you managed to do this?
Check out this example in which the author plots multicolored lines for some guidance.
I am using Jupyter-notebook with python 3.6.2 and matplotlib to plot some data.
When I plot my data, I want to add a legend to the plot (basically to know which line is which)
However calling plt.legend takes a lot of time (almost as much as the plot itself, which to my understanding should just be instant).
Minimal toy problem that reproduces the issue:
import numpy as np
import matplotlib.pyplot as plt
# Toy useless data (one milion x 4)
my_data = np.random.rand(1000000,4)
plt.plot(my_data)
#plt.legend(['A','C','G','T'])
plt.show()
The data here is just random and useless, but it reproduces my problem:
If I uncomment the plt.legend line, the run takes almost double the time
Why? Shouldn't the legend just look at the plot, see that 4 plots have been made, and draw a box assigning each color to the corresponding string?
Why is a simple legend taking so much time?
Am I missing something?
Replicating the answer by #bnaecker, such that this question is answered:
By default, the legend will be placed in the "best" location, which requires computing how many points from each line are inside a potential legend box. If there are many points, this can take a while. Drawing is much faster when specifying a location other than "best", e.g. plt.legend(loc=3).
I'm using seaborn to plot some biology data.
I just want a distribution one gene against another (expression in ~300 patients), and that's all worked fine and dandy with graph = sns.jointplot(x='Gene1',y='Gene2',data=data,kind='reg')
I like that the graph gives me a nice linear fit and a PearsonR and a P value.
All I want is to plot my data on a log scale, which is the way that such gene data is usually represented.
I've looked at a few solutions online, but they all get rid of my PearsonR value or my linear fit or they just don't look as good. I'm new to this, but it seems like graphing on a log scale shouldn't be too much trouble.
Any comments or solutions?
Thanks!
Edit: In response to comments, I've gotten closer to my answer. I now have a plot (shown below), but I need a line of fit and to do some statistics. Working on that now, but any answers/suggestions in the meantime are more than welcome.
mybins=np.logspace(0, np.log(100), 100)
g = sns.JointGrid(data1, data2, data, xlim=[.5, 1000000],
ylim=[.1, 10000000])
g.plot_marginals(sns.distplot, color='blue', bins=mybins)
g = g.plot(sns.regplot, sns.distplot)
g = g.annotate(stats.pearsonr)
ax = g.ax_joint
ax.set_xscale('log')
ax.set_yscale('log')
g.ax_marg_x.set_xscale('log')
g.ax_marg_y.set_yscale('log')
This worked just fine. In the end, I decided to just convert my table values into log(x), since that made the graph easier to scale and visualize in the short run.
By default, matplotlib plot can place lines very inaccurately.
For example, see the placement of the left endpoint in the attached plot. There's at least a whole pixel of air that shouldn't be there. In fact I think the line center is 2 pixels off.
How to get matplotlib to draw accurately? I don't mind if there is some performance hit.
Inaccurately rendered line in matplotlib plot:
Inaccurately rendered line in matplotlib plot - detail magnified:
This was made with the default installations in Ubuntu 16.04 (Python 3), Jupyter notebook (similar result from command line).
Mathematica, for comparison, does subpixel-perfect rendering directly and by default:
Why can't we?
Consider the following to see what is going on
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 4], clip_on=False, lw=5, alpha=.5)
ax.set_xlim([1, 3])
fig.savefig('so.png', dpi=400)
You can also disable pixel snapping by passing snap=False to plot, however once you get down to placing ~ single pixel wide line, you are going to have issues because the underlying rasterization is too coarse.
The problem is even more notorious when one plots functions which are symmetric with respect to the x axis and slowly approach zero. As for example here:
which is indeed embarrassing if you are telling the reader of a scientific paper that the two curves are symmetric!
I went around this problem by exporting to pdf instead of exporting to png:
I completely agree that this should be worked on. I find that plt.plot gives (at least more or less) unshifted lines (in Jupyter) by calling plt.figure with dpi=144 (the default is 72). The figures do become twice as big though...