Overlay scatter plot on map (img) - python

I'm working with a housing dataset for my own learning purposes and I'd like to be able to overlay my plots on top of a map to provide me with a better understanding of the 'hot spots'.
My code is below:
housing = pd.read_csv('https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.csv')
plt.figure()
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4,
s= housing['population']/100, label='population', figsize=(10,7),
c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True, zorder=5)
plt.legend()
plt.show()
The image I saved as 'California.png'
This is what I tried:
img=imread('California.png')
plt.figure()
plt.imshow(img,zorder=0)
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4,
s= housing['population']/100, label='population', figsize=(10,7),
c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True, zorder=5)
plt.legend()
plt.show()
But this just gives me two plots. I've tried switching the index around to no avail.
Is there a simple way to accomplish this? Thanks.
EDIT: Using the code below by #nbeuchat:
plt.figure(figsize=(10,7))
img=imread('California.png')
plt.imshow(img,zorder=0)
ax = plt.gca()
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4,
s= housing['population']/100, label='population', ax=ax,
c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True,
zorder=5)
plt.legend()
plt.show()
I get the following plot:

OK, The question is old, but I have a different answer that may be interesting to someone...
I've been working in exactly the same issue. The code that is available at GitHub (https://github.com/ageron/handson-ml.git) does what you need (see 02_end_to_end_machine_learning_project.ipynb).
However, that code uses the California map as an image, and just draws the points on top of it. One alternative is to build a real map, and plot the points on it, without having to read the ma image. To do this, I used the code below. You will need to install cartopy, and if you also want County lines you have to draw them using the instructions from here.
In the end, the image generated was this:
And here is the code I used:
# Trying to use a real map
import cartopy.crs as ccrs
import cartopy.feature as cfeature
plt.figure(figsize=(10,7))
# Creates the map
ca_map = plt.axes(projection=ccrs.PlateCarree())
ca_map.add_feature(cfeature.LAND)
ca_map.add_feature(cfeature.OCEAN)
ca_map.add_feature(cfeature.COASTLINE)
ca_map.add_feature(cfeature.BORDERS, linestyle=':')
ca_map.add_feature(cfeature.LAKES, alpha=0.5)
ca_map.add_feature(cfeature.RIVERS)
ca_map.add_feature(cfeature.STATES.with_scale('10m'))
# To add county lines
import cartopy.io.shapereader as shpreader
reader = shpreader.Reader('datasets/housing/countyl010g.shp')
counties = list(reader.geometries())
COUNTIES = cfeature.ShapelyFeature(counties, ccrs.PlateCarree())
ca_map.add_feature(COUNTIES, facecolor='none', edgecolor='gray')
ca_map.xaxis.set_visible(True)
ca_map.yaxis.set_visible(True)
# Plots the data onto map
plt.scatter(housing['longitude'], housing['latitude'], alpha=0.4,
s=housing["population"]/100, label="population",
c=housing['median_house_value'],
cmap=plt.get_cmap("jet"),
transform=ccrs.PlateCarree())
# Colorbar
prices = housing["median_house_value"]
tick_values = np.linspace(prices.min(), prices.max(), 11)
cbar = plt.colorbar()
cbar.ax.set_yticklabels(["$%dk"%(round(v/1000)) for v in tick_values], fontsize=14)
cbar.set_label('Median House Value', fontsize=16)
# Plot labels
plt.ylabel("Latitude", fontsize=14)
plt.xlabel("Longitude", fontsize=14)
plt.legend()
save_fig("housing_prices_scatterplot_cartopy")
The advantage here is to use a real map, and this code now can be easily changed for whatever part of the world you want to use. Have fun!

You are creating a new figure by using the dataframe plot function. You should pass the axes on which you want to draw your second plot. One way is to use gca to get the current axis.
The following should work (not tested though):
plt.figure(figsize=(10,7))
img=imread('California.png')
plt.imshow(img,zorder=0,extent=[housing['longitude'].min(),housing['longitude'].max(),housing['latitude'].min(),housing['latitude'].max()])
ax = plt.gca()
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4,
s= housing['population']/100, label='population', ax=ax,
c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True,
zorder=5)
plt.legend()
plt.show()
EDIT: using the extent parameter of imshow with the minimum and maximum values of your longitude and latitude data will scale the image correctly.

Related

Python matplotlib title at wrong place [duplicate]

I am trying to plot two separate quantities on the same graph using twiny as follows:
fig = figure()
ax = fig.add_subplot(111)
ax.plot(T, r, 'b-', T, R, 'r-', T, r_geo, 'g-')
ax.set_yscale('log')
ax.annotate('Approx. sea level', xy=(Planet.T_day*1.3,(Planet.R)/1000), xytext=(Planet.T_day*1.3, Planet.R/1000))
ax.annotate('Geostat. orbit', xy=(Planet.T_day*1.3, r_geo[0]), xytext=(Planet.T_day*1.3, r_geo[0]))
ax.set_xlabel('Rotational period (hrs)')
ax.set_ylabel('Orbital radius (km), logarithmic')
ax.set_title('Orbital charts for ' + Planet.N, horizontalalignment='center', verticalalignment='top')
ax2 = ax.twiny()
ax2.plot(v,r,'k-')
ax2.set_xlabel('Linear speed (ms-1)')
show()
and the data is presented fine, but I am having the problem that the figure title is overlapping with the axes labels on the secondary x axis so that it's barely legible (I wanted to post a picture example here, but I don't have a high enough rep yet).
I'd like to know if there's a straightforward way to just shift the title directly up a few tens of pixels, so that the chart looks prettier.
I'm not sure whether it is a new feature in later versions of matplotlib, but at least for 1.3.1, this is simply:
plt.title(figure_title, y=1.08)
This also works for plt.suptitle(), but not (yet) for plt.xlabel(), etc.
Forget using plt.title and place the text directly with plt.text. An over-exaggerated example is given below:
import pylab as plt
fig = plt.figure(figsize=(5,10))
figure_title = "Normal title"
ax1 = plt.subplot(1,2,1)
plt.title(figure_title, fontsize = 20)
plt.plot([1,2,3],[1,4,9])
figure_title = "Raised title"
ax2 = plt.subplot(1,2,2)
plt.text(0.5, 1.08, figure_title,
horizontalalignment='center',
fontsize=20,
transform = ax2.transAxes)
plt.plot([1,2,3],[1,4,9])
plt.show()
I was having an issue with the x-label overlapping a subplot title; this worked for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 1)
ax[0].scatter(...)
ax[1].scatter(...)
plt.tight_layout()
.
.
.
plt.show()
before
after
reference:
https://matplotlib.org/users/tight_layout_guide.html
ax.set_title('My Title\n', fontsize="15", color="red")
plt.imshow(myfile, origin="upper")
If you put '\n' right after your title string, the plot is drawn just below the title. That might be a fast solution too.
You can use pad for this case:
ax.set_title("whatever", pad=20)
Just use plt.tight_layout() before plt.show(). It works well.
A temporary solution if you don't want to get into the x, y position of your title.
Following worked for me.
plt.title('Capital Expenditure\n') # Add a next line after your title
kudos.
Using the plt.tight_layout() before the plt.show() works for me well.
you can even make it better and visible by adding a padding
ax.set_title("title", pad=15)

Python - How to have same maximum on multiple histograms

I have two distribution with different spread, say
a=N.random.normal(0,0.5,500)
b=N.random.normal(1,3.,500)
I want to make an histogram of them where the maximum is at the same level.
If I make normalised histograms with density=True parameter, it will make that area of both histograms will be 1, bit it wont change the fact that maximums are different.
P.hist(a, histtype='step', lw=2, cumulative=True)
P.hist(b, histtype='step', color='r', lw=2, density=True)
What I want is to make a histogram where it is "normalized" with the maximum values. I mean when maximum of blue and red histogram would be the same, so it could be easy to compare them. Thanks a lot in advance for your help.
I'm not sure I would do that really, but if you want to I think the best way is to add two axes (also so that you can see how tall they really are). For example, see here: https://matplotlib.org/gallery/api/two_scales.html
import numpy as np
from matplotlib import pyplot as plt
fig, ax1 = plt.subplots()
ax1.hist(a, histtype='step', color='b', lw=2, density=True)
ax1.tick_params(axis='y', labelcolor='b')
ax2 = ax1.twinx()
ax2.hist(b, histtype='step', color='r', lw=2, density=True)
ax2.tick_params(axis='y', labelcolor='r')
This gives the following output (which, I think, looks worse than what you obtained; I also changed cumulative=True to density=True in the first plot to be in line with the plot you provided):
Also, strictly speaking this does not make sure that the maxima are really identical. If you want to do that you can force it by doing e.g.
import numpy as np
from matplotlib import pyplot as plt
fig, ax1 = plt.subplots()
n1, _, _ = ax1.hist(a, histtype='step', color='b', lw=2, density=True)
ax1.tick_params(axis='y', labelcolor='b')
ax2 = ax1.twinx()
n2, _, _ = ax2.hist(b, histtype='step', color='r', lw=2, density=True)
ax2.tick_params(axis='y', labelcolor='r')
ax1.set_ylim([0, n1.max()*1.1])
ax2.set_ylim([0, n2.max()*1.1])
The following code would give a the same max as b:
a *= b.max()/a.max()
The cumulative flag in a might break this though and it should be placed before the histograms are generated.

Superimposition of histogram and density in Pandas/Matplotlib in Python

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:
import pandas as pd
import matplotlib.pyplot as plt
Maxv=200
plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):
yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
Some hint? (Additional question: how can I change the width of density smoothing?)
Based on your code, this should work:
ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])
You might not even need the secondary_y anymore.
No I try this:
ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
But the range part doesn't work, and ther's still the left y-axis problem
Seaborn makes this easy
import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

Matplotlib Pyplot logo/image in Plot

I'm struggling to achieve a simple goal in matplotlib... I want to put a small logo or indicator in the bottom right of my graph, without altering the axis or the real data that is being displayed. Here is my code now:
fig = plt.figure()
plt.rcParams.update({'font.size': 15})
img = plt.imread('./path/to/image.png')
ax1 = fig.add_subplot(111)
ax1.yaxis.tick_left()
ax1.tick_params(axis='y', colors='black', labelsize=15)
ax1.tick_params(axis='x', colors='black', labelsize=15)
plt.grid(b=True, which='major', color='#D3D3D3', linestyle='-')
plt.scatter([1,2,3,4,5],[5,4,3,2,1], alpha=1.0)
plt.autoscale(enable=True, axis=u'both')
fig.savefig('figure.png')
My output from this is below.
This is now laying the photo over the whole graph -- I'd like it scaled to 20% of width & height (if possible) and anchored to the bottom right. This also ruins my axis, because in this output I should be in the 0-100 range on both x & y. Any ideas to solve this, the scaling is the big issue.
Edit1: I've tried the solution below and linked questions here on SO. The problem is relying on the extent variable being passed to imshow() then doesn't work well when introducing new data. For example plotting a scatter plot coming from a data frame, could be from 0..1000 and 50..100 but using extent won't show the label or the position will be off.
Edit2: There seems to be some progress with getting the figure length with fig.get_size_inches() and passing the variable to extent. Apparently all of matplotlib graph calculations are done through inches, so this may be a promising lead.
import matplotlib.image as image
import matplotlib.pyplot as plt
im = image.imread('debian-swirl.png')
fig, ax = plt.subplots()
ax.imshow(im, aspect='auto', extent=(0.4, 0.6, .5, .7), zorder=-1)
ax.yaxis.tick_left()
ax.tick_params(axis='y', colors='black', labelsize=15)
ax.tick_params(axis='x', colors='black', labelsize=15)
ax.grid(b=True, which='major', color='#D3D3D3', linestyle='-')
ax.scatter([1,2,3,4,5],[5,4,3,2,1], alpha=1.0)
plt.show()
I added a png file to bottom left. Adjust the extent parameter to set the logo position.
Similar to : Scale image in matplotlib without changing the axis
The following is an adaptation of the answer by Kirubaharan J, but adapting the position of the logo to the extent of the graph (but the aspect ratio of the logo itself is not preserved)
import matplotlib.image as image
import matplotlib.pyplot as plt
im =image.imread('debian-swirl.png')
fig, ax = plt.subplots()
ax.yaxis.tick_left()
ax.tick_params(axis='y', colors='black', labelsize=15)
ax.tick_params(axis='x', colors='black', labelsize=15)
ax.grid(b=True, which='major', color='#D3D3D3', linestyle='-')
ax.scatter( [100,90,89,70], [55, 23,76,29], alpha=1.0)
plt.autoscale(enable=True, axis=u'both')
xrng=plt.xlim()
yrng=plt.ylim()
scale=.2 #the image takes this fraction of the graph
ax.imshow(im,aspect='auto',extent=(xrng[0],xrng[0] + scale*(xrng[1]-xrng[0]), yrng[0], yrng[0] + scale*(yrng[1]-yrng[0]) ), zorder=-1)
plt.xlim(xrng)
plt.ylim(yrng)
plt.show()
I've worked on a similar problem to print several pdf with a fix logo on every pages independant of the graph size. The best solution I found was using GridSpec.
fig = plt.figure(figsize = (11,8.5)) # 8.5" x 11" : letter format
G = plt.GridSpec(14,21)
I my case I'v build a grid of 14 square by 21 over an 8.5 x 11 inch template.
Then I just have to allocate a section of the grid for the logo and import it using matplotlib.image
ax = fig.add_subplot(G[2:5,5:14])
logo = mpimg.imread("logo.png")
imagebox = OffsetImage(logo, zoom=0.08)
ab = AnnotationBbox(imagebox, (0.4, 0.6), frameon = False)
ax.add_artist(ab)
You can control the scale using the zoom arg in OffsetImage
You can find the detail at the following link :
https://www.science-emergence.com/Articles/How-to-insert-an-image-a-picture-or-a-photo-in-a-matplotlib-figure/
i think it's best to simply put the image on a new axis...
in this way you have full control on where to put it without having to bother with existing axes
import matplotlib.image as image
import matplotlib.pyplot as plt
# create a plot
f, ax = plt.subplots()
im = image.imread("path-to-logo.png")
# put a new axes where you want the image to appear
# (x, y, width, height)
imax = f.add_axes([0.8, 0.75, 0.1, 0.1])
# remove ticks & the box from imax
imax.set_axis_off()
# print the logo with aspect="equal" to avoid distorting the logo
imax.imshow(im, aspect="equal")

Python Matplotlib figure title overlaps axes label when using twiny

I am trying to plot two separate quantities on the same graph using twiny as follows:
fig = figure()
ax = fig.add_subplot(111)
ax.plot(T, r, 'b-', T, R, 'r-', T, r_geo, 'g-')
ax.set_yscale('log')
ax.annotate('Approx. sea level', xy=(Planet.T_day*1.3,(Planet.R)/1000), xytext=(Planet.T_day*1.3, Planet.R/1000))
ax.annotate('Geostat. orbit', xy=(Planet.T_day*1.3, r_geo[0]), xytext=(Planet.T_day*1.3, r_geo[0]))
ax.set_xlabel('Rotational period (hrs)')
ax.set_ylabel('Orbital radius (km), logarithmic')
ax.set_title('Orbital charts for ' + Planet.N, horizontalalignment='center', verticalalignment='top')
ax2 = ax.twiny()
ax2.plot(v,r,'k-')
ax2.set_xlabel('Linear speed (ms-1)')
show()
and the data is presented fine, but I am having the problem that the figure title is overlapping with the axes labels on the secondary x axis so that it's barely legible (I wanted to post a picture example here, but I don't have a high enough rep yet).
I'd like to know if there's a straightforward way to just shift the title directly up a few tens of pixels, so that the chart looks prettier.
I'm not sure whether it is a new feature in later versions of matplotlib, but at least for 1.3.1, this is simply:
plt.title(figure_title, y=1.08)
This also works for plt.suptitle(), but not (yet) for plt.xlabel(), etc.
Forget using plt.title and place the text directly with plt.text. An over-exaggerated example is given below:
import pylab as plt
fig = plt.figure(figsize=(5,10))
figure_title = "Normal title"
ax1 = plt.subplot(1,2,1)
plt.title(figure_title, fontsize = 20)
plt.plot([1,2,3],[1,4,9])
figure_title = "Raised title"
ax2 = plt.subplot(1,2,2)
plt.text(0.5, 1.08, figure_title,
horizontalalignment='center',
fontsize=20,
transform = ax2.transAxes)
plt.plot([1,2,3],[1,4,9])
plt.show()
I was having an issue with the x-label overlapping a subplot title; this worked for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 1)
ax[0].scatter(...)
ax[1].scatter(...)
plt.tight_layout()
.
.
.
plt.show()
before
after
reference:
https://matplotlib.org/users/tight_layout_guide.html
ax.set_title('My Title\n', fontsize="15", color="red")
plt.imshow(myfile, origin="upper")
If you put '\n' right after your title string, the plot is drawn just below the title. That might be a fast solution too.
You can use pad for this case:
ax.set_title("whatever", pad=20)
Just use plt.tight_layout() before plt.show(). It works well.
A temporary solution if you don't want to get into the x, y position of your title.
Following worked for me.
plt.title('Capital Expenditure\n') # Add a next line after your title
kudos.
Using the plt.tight_layout() before the plt.show() works for me well.
you can even make it better and visible by adding a padding
ax.set_title("title", pad=15)

Categories