Cannot see the points in my scatter plot despite having R value - python

I am facing this exact same issue. I am using a csv file with the missing rows dropped to make my scatter plot. I have also use matplotlib yet I am getting not output despite having the R value.
mc_corr=cars2_1["City_Mileage_km_litre"].corr(cars2_1['Fuel_Tank_Capacity_litre'])
plt.scatter(cars2_1["City_Mileage_km_litre"],cars2_1['Fuel_Tank_Capacity_litre'],color='orange')
plt.title('Mileage vs Fuel Tank Capacity')
plt.xlim(5,35)
plt.ylim(3.5,10.0)
plt.xlabel("R = "+str(mc_corr))
plt.show()
The data set:

Your ylim is too short for values to fall into, change:
plt.ylim(3.5,10.0)
to
plt.ylim(24,88)

Related

Overlapping xlabels for Seaborn Lineplot

I've come across the issue of having overlapping xlables on a Seaborn line plot. The data set has multiple occurrences of those labels, which is understandable. But is there a way to fix the xlabels without having to change the format of the plot or the data frame?
The xlabels have been formatted to the Timestamp type earlier on, and the plot is shown below;
code:
plt.figure(figsize=(15,10))
sns.lineplot(data=data_no_orkney, x="Data_Month_Date", y=percentage)
plt.xticks(list(set(data_no_orkney.Data_Month_Date)))
#plt.axvline(x=pd.Timestamp(year=2020,month=3,day=23), color='r', ls='--', label="Date of first lockdown")
plt.xlabel("Year")
plt.ylabel("Percentage meeting target")
plt.show()
Also, would it be correct of me to assume that the solid, blue line in the middle is the mean out of the values shown in the lighter blue area? I've never seen such line plot before, but that's more or less my understanding, judging by the looks of it.
I tried using plt.xticks(list), where I tried having the list to contain unduplicated Timestamp (date) values. The only result was that it to took the code longer to run, and the labels did not change.

Plot in python but it changed my y values automaticlly in the plot

When I wanna plot my data in python, it shows modified values of my y-values instead of original data.
As can be seen in the plot, it subtracts 1.455e2(which is written beside the y axis) from my original data and I wanna show the original data in my plot.
Here's my script about the plot:
plt.xlabel("H(Oe)")
plt.ylabel("R*E-3(dBm)")
plt.plot(h_1,r_2,linestyle="-",linewidth=1,label="2.50_GHz")
plt.legend(loc='upper left')
plt.grid(color="k", linestyle=":")
plt.savefig("0_deg_2.50_GHz_R.png", dpi=300,bbox_inches = 'tight')
plt.show()
And actually it does show the original y-values for some other data files, I'm using the exactly same script but for this one it always shows the modified values.
Does anyone know how to fix it? Thanks a lot.
This is not scientific notation but is known as an offset (hence the + before the number). The original values can be seen by adding the offset to all of the values.
You can prevent an offset being used as:
plt.ticklabel_format(useOffset=False)

Why are my plots being displayed separately, rather than on the same graph?

I have created two line plots with this dataset. The first lineplot shows the number of flight accidents in a given year. The second lineplot shows the number of fatalities in a given year. I want to put both line plots on the same graph. This is the code I have used:
fatalities=df[['Fatalities','Date']]
fatalities['Year of Fatality']=fatalities['Date'].dt.year
fatalities.drop('Date',inplace=True)
fatalities.set_index('Year of Fatality',inplace=True)
fatalities.sort_index(inplace=True)
plt.figure(figsize=(12,9))
plt.title("Number of Flight Accidents Since 1908",fontsize=20)
plt.ylabel("Number of Flight Accidents")
plt.xlabel("Year")
plt.xticks(year.index,rotation=90)
year.plot()
fatalities.plot()
plt.show()
What I get are two plots, with on above the other: the plot which shows the number of fatalities and the plot which shows the number of flight accidents.
What I want is one graph that shows the two line plots. Any help would be much appreciated. (Side note: how can I rotate the xticks 90 degrees? I used the rotation argument in the plt.xticks() but this had zero affect).
Given the use of .plot() and variables called df, I assume you're using pandas dataframes (if that's not the case, the answer still probably applies, look up the docs for your plot function).
Pandas' plot by default puts the plots in their own axis, unless you pass one to draw on via the ax attribute:
fig, ax = plt.subplots()
year.plot(ax=ax)
fatalities.plot(ax=ax)

Data not plotting, but no errors

I am trying to plot some precipitation data. The code I'm using is modified slightly from this code here.
The code works fine when I plot using the data from the site used in the link, but when I use a different dataset I have, it doesn't plot. The biggest difference between this dataset and the dataset used in the link's example, is my dataset is global data. The dataset I am using is also netcdf, is not masked, and I am loading it the same way as the example.
I am familiar with the data and know for a fact I should be seeing something and the contour values used in the example are reasonable for this other set of data I am using.
My code is the same, expect for some changes in the section that plots the figure (below) which I have modified so it will plot a specific area instead of CONUS like in the example (using ax.set_extent).
When I do not set the extent it appears to plot the data, but then none of the boundaries (coastlines, state lines, etc.) do not plot. Based on this, I'm guessing it's something with either the dataset itself, something with set_extent, or a combination of things that is causing it to go wrong. I am not getting back any kind of errors when I plot it, either way. However, there might be something else I'm missing with it.
In the end, I'm actually comparing my dataset to the dataset used in the example link, so I would like them in the same projection.
Thanks for any insight and let me know if you need more information about the data itself!
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.set_extent((x1,x0,y0,y1))
# draw coastlines, state and country boundaries, edge of map.
ax.coastlines()
ax.add_feature(cfeature.BORDERS)
ax.add_feature(cfeature.STATES)
cs1 = ax.contourf(ym, xm, data1, clevs, cmap=cmap, norm=norm)
# add colorbar.
cbar = plt.colorbar(cs1, orientation='horizontal')
#cbar.set_label(data1.units)
#ax.set_title(prcpvar.long_name + ' for period ending ' + nc.creation_time)
plt.show()
plt.savefig('ncep_model')
Results when extent is not included in code above:
Edit 1:
I'll add that I was able to successfully plot the data with this code below (from a default template I made). I tried to change the projection to stereographic, but I was having trouble getting it to plot correctly using basemap because I've never used it before. As an alternative, if you can't figure out the error with the code above and could instead help with changing the projection for the code below, I would also take that. At this point I just want my data to plot correctly in the correct projection I want!
(I also included the results for the code below to confirm that the data should be showing up in this location)
LLlat = 40.
LLlon = 263.
URlat = 44.
URlon = 270.
lat = xm
lon = ym
%matplotlib inline
plt.figure(1,figsize=(10, 8),)
plt.title('Convective Precipitation 8/28/2018 0Z (in) Valid July 2018')
map = Basemap(projection='cyl',\
llcrnrlat=LLlat,urcrnrlat=URlat,\
llcrnrlon=LLlon,urcrnrlon=URlon,\
rsphere=6371200.,resolution='i')
map.drawcoastlines(linewidth=0.5) # Draw some coastlines
map.drawstates(linewidth=0.5) # Draw some coastlines
map.drawrivers(color='#000000')
map.drawparallels(np.arange(-90.,91.,30),labels=[1,0,0,0]) # Drawing lines of latitude
map.drawmeridians(np.arange(0.,330.,60),labels=[0,0,0,1]) # Drawing lines of longitude
lons,lats = map(lon,lat) # Setting up the grid in cylindrical coords.
cs = plt.contourf(lons,lats,data1[:,:], clevs,cmap=cmap, norm=norm)
cb = plt.colorbar(cs,orientation='horizontal')
plt.show()
Edit 2:
I've added the resulting plot when I don't include the set_extent in the first chunk of code (Don't know if that will help at all, but thought I'd include it as well)
So it'd be really useful to have more information on your data, like a link to sample file, but my guess is that your data do not give coordinates in a stereographic projection, unlike the original data. When plotting with Cartopy, if you do not specify otherwise, all plot commands assume that the x,y values given are in the projection specified for the axes (for the original code this was ccrs.Stereographic). If this is not the case, such as when plotting lon/lats, you need to specify this by passing transform to the plotting command, as below where I specify that the x,y values are lat/lons:
data_proj = ccrs.PlateCarree()
cs1 = ax.contourf(ym, xm, data1, clevs, cmap=cmap, norm=norm,
transform=data_proj)

df.corr showing all data as 1 when read from compiled csv, even though there is data that is different

I have a compiled data frame of sp500 stock data that I am trying to find correlations with using df.corr(), but it is labeling all data as having a '1' correlation when I run the program, and when I use a heat map to visualize the data it shows an entire green chart, when there should be many many different positive and negative correlations.
Using Python 3.6 and Spyder
here is the code I am using:
def visualize_data():
df = pd.read_csv('sp500_joined_closes.csv')
pd.options.display.float_format = '{:.5f}'.format
#df['AAPL'].plot()
#plt.show()
df_corr = df.corr() #creates a correlation table of our data frame. Generates correlation values
print(df_corr.head())
data1 = df_corr.values #gets inner values of our data frame
fig1 = plt.figure() #specify our figures
ax1 = fig1.add_subplot(1,1,1) #defined axis 1 by 1 plot 1
heatmap1 = ax1.pcolor(data1, cmap=plt.cm.RdYlGn) #sets the color paramater of heat map (negative,neutral,positive)
fig1.colorbar(heatmap1)
ax1.set_xticks(np.arange(data1.shape[0]) + 0.5, minor=False) #sets x ticks for heat map, arranging ticks at every 0.5(half-mark)
ax1.set_yticks(np.arange(data1.shape[1]) + 0.5, minor=False) #sets y ticks for heat map
ax1.invert_yaxis() #removes random gap from the top of graph
ax1.xaxis.tick_top() #moves x axis ticks to the top (meant to look more like a table)
column_labels = df_corr.columns
row_labels = df_corr.index
ax1.set_xticklabels(column_labels)
ax1.set_yticklabels(row_labels)
plt.xticks(rotation=90)
heatmap1.set_clim(-1,1)
plt.tight_layout()
#plt.savefig("correlations.png", dpi = (300))
plt.show()
visualize_data()
The interesting thing is that I searched all over for anyone having a similar error, and I cannot seem to find any answers. Could it be that the ticker symbols could be considered categorical and therefore something is getting skewed? I'm not quite sure here, to be honest.
Even when I tried to plot the correlations for one single company against all the data as seen by #df['AAPL'].plot() and #plt.show() the same exact thing happened where the data is only registering a correlation value of 1.0000 to all of the data.
I initially thought it was a rounding error due to significant figures, so I put in pd.options.display.float_format = '{:.5f}'.format but that didn't work and i still am receiving the skewed correlation.
Here is a screenshot of the issue and the subsequent heat map
Here is a screenshot of part of the data, confirming that it isn't all the same or that is has become corrupted in some measure
The issue was with sourcing the data through the google finance api. There seemed to have been an error downloading one of the dates to one of the sp500 companies and when I compiled all of the data including those few missing dates it could only produce one line of data for some reason. This lead to a correlation of '1' since all the data was exactly the same. I found the specific dates and added them in manually and now the program runs as it should. Thank you.

Categories