I am attempting to plot weather variables on a map of Oklahoma using mpl_toolkits.basemap, but am having issues figuring out how to interpolate the data to plot on top of the map.
Here is a general idea of the current code I have:
lons = [-97.9547, -97.9747, -97.4256]
lats = [35.5322, 35.864, 35.4111]
data = [2,2,2]
map = Basemap(llcrnrlon = -103.068237, llcrnrlat = 33.610045, urcrnrlon = -94.359076, urcrnrlat = 37.040928, resolution = 'i')
CS = map.contour(X, Y, data)
map.drawstates()
plt.show()
What I am attempting to accomplish is to plot the data values on the map based on the related reference index in the lons/lats lists, and then contour the values of the data variable.
Now this obviously won't work, because I need to interpolate the data. Is there a way that I could accomplish this using the griddata function? I am very confused on how I would establish the boundaries of the grid given that latitude and longitude values are not linearly spaced.
Is there an easier way to do this that I am missing?
Any help and/or hints would be greatly appreciated, this is holding me back from moving on to the next major portion of the research project!
I don't have python installed on this machine, so can't test this. But something like this should get you the required inputs for the countour plot...
import numpy as np
lons = [-97.9547, -97.9747, -97.4256]
lats = [35.5322, 35.864, 35.4111]
data = [2,2,2]
xs, ys = np.meshgrid(lons, lats)
dataMesh = np.empty_like(xs)
for i, j, d in zip(lons, lats, data):
dataMesh[lons.index(i), lats.index(j)] = d
map = Basemap(llcrnrlon = -103.068237, llcrnrlat = 33.610045, urcrnrlon = -94.359076, urcrnrlat = 37.040928, resolution = 'i')
CS = map.contour(xs, ys, dataMesh)
map.drawstates()
plt.show()
Like i said though, i haven't tested this. I don't know what happens if you try to plot unititialised values. you might need to use a different numpy array initialisation.
Related
I want to create daily histograms from a pandas Dataframe (df) and export them to xarray to combine it with another Dataset (data). When I create the DataArray I can access it without problems, but once I combine it with the Dataset the array I added only consists of nan-entries. I think I made sure that all coordinates correctly align, by normalizing the time coordinate and making sure that the spatial coordinates are the same. Something is going wrong and I am running out of ideas. Any help would be greatly appreciated!
df=pd.read_csv(filepath+dfname)
data=xr.open_dataset(filepath+bgc_xarray)
df['date'] = pd.to_datetime(df['date'])
data['time'] = data.indexes['time'].normalize()
xedges = np.arange(lonmin,lonmax+2*spacing,spacing)
yedges = np.arange(latmin,latmax+2*spacing,spacing)
latitude = xedges[:-1]
longitude = yedges[:-1]
for i in range(2):
df_i=df[df['date'] == data.time[i].values]
x = df_i['cell_ll_lon']
y = df_i['cell_ll_lat']
weights = df_i['fishing_hours']
hist, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), weights=weights)
fishing_effort = hist.T
Xarray_i = xr.DataArray(
data=fishing_effort,
dims=['longitude', 'latitude'],
coords=dict(
longitude=(['longitude'], longitude),
latitude=(['latitude'], latitude),
time = data.time[i].values),
attrs=dict(
description='Fishing Effort',
units='hours',),)
if i == 0:
Xarray = Xarray_i
else:
Xarray = xr.concat([Xarray, Xarray_i], 'time')
data['fishing_effort'] = Xarray
Oh ok, the problem was with the spatial coordinates apparently. This fixed it:
longitude=(['longitude'], data.longitude.values),
latitude=(['latitude'], data.latitude.values),
I'm moving from basemap to cartopy given basemap is going to be phased out. I've previously used the basemap.interp functionality to interpolate data, e.g. say I have data at 1 degree resolution (180x360), I would run the following to interpolate to 0.5 degrees.
import numpy as np
from mpl_toolkits import basemap
Old_Lon = np.linspace(-180,180,360)
Old_Lat = np.linspace(-90,90,180)
New_Lon = np.linspace(-180,180,720)
New_Lat = np.linspace(-90,90,360)
New_Lon,New_Lat = np.meshgrid(New_Lon,New_Lat)
New_Data = basemap.interp(Old_Data,Old_Lon,Old_Lat,New_Lon,New_Lat,order=0)
order gives me options to choose from nearest neighbour, bi-linear etc. Is there an alternative that does this in as simple way? I've seen scipy has interpolation but I'm not sure how to apply it. Any help would be appreciated!
I eventually decided to take the raw code from Basemap and make it into a standalone function - I'll be recommending it to the cartopy guys to implement it as its a useful feature. Posting here as could be useful to someone else:
def Interp(datain,xin,yin,xout,yout,interpolation='NearestNeighbour'):
"""
Interpolates a 2D array onto a new grid (only works for linear grids),
with the Lat/Lon inputs of the old and new grid. Can perfom nearest
neighbour interpolation or bilinear interpolation (of order 1)'
This is an extract from the basemap module (truncated)
"""
# Mesh Coordinates so that they are both 2D arrays
xout,yout = np.meshgrid(xout,yout)
# compute grid coordinates of output grid.
delx = xin[1:]-xin[0:-1]
dely = yin[1:]-yin[0:-1]
xcoords = (len(xin)-1)*(xout-xin[0])/(xin[-1]-xin[0])
ycoords = (len(yin)-1)*(yout-yin[0])/(yin[-1]-yin[0])
xcoords = np.clip(xcoords,0,len(xin)-1)
ycoords = np.clip(ycoords,0,len(yin)-1)
# Interpolate to output grid using nearest neighbour
if interpolation == 'NearestNeighbour':
xcoordsi = np.around(xcoords).astype(np.int32)
ycoordsi = np.around(ycoords).astype(np.int32)
dataout = datain[ycoordsi,xcoordsi]
# Interpolate to output grid using bilinear interpolation.
elif interpolation == 'Bilinear':
xi = xcoords.astype(np.int32)
yi = ycoords.astype(np.int32)
xip1 = xi+1
yip1 = yi+1
xip1 = np.clip(xip1,0,len(xin)-1)
yip1 = np.clip(yip1,0,len(yin)-1)
delx = xcoords-xi.astype(np.float32)
dely = ycoords-yi.astype(np.float32)
dataout = (1.-delx)*(1.-dely)*datain[yi,xi] + \
delx*dely*datain[yip1,xip1] + \
(1.-delx)*dely*datain[yip1,xi] + \
delx*(1.-dely)*datain[yi,xip1]
return dataout
--
The SciPy interpolation routines return a function that you can call to perform an interpolation. For nearest neighbour interpolation on a regular grid, you can use scipy.interpolate.RegularGridInterpolator:
import numpy as np
from scipy.interpolate import RegularGridInterpolator
nearest_function = RegularGridInterpolator(
(old_lon, old_lat), old_data, method="nearest", bounds_error=False
)
new_data = np.array(
[[nearest_function([i, j]) for j in new_lat] for i in new_lon]
).squeeze()
That isn't perfect, though, because lon=175 are all fill values. (If I hadn't set bounds_error=False then you'd get an error there.) In that case, you need to ask how you want to wrap around the dateline. A straightforward solution would be to copy the lon=0 line to the end of the array and call it lon=180.
Should you want linear or higher order interpolation one day, which I'd recommend if your data are points rather than cells, you can use scipy.interpolate.RectBivariateSpline:
import numpy as np
from scipy.interpolate import RectBivariateSpline
old_step = 10
old_lon = np.arange(-180, 180, old_step)
old_lat = np.arange(-90, 90, old_step)
old_data = np.random.random((len(old_lon), len(old_lat)))
interp_function = RectBivariateSpline(old_lon, old_lat, old_data, kx=1, ky=1)
new_lon = np.arange(-180, 180, new_step)
new_lat = np.arange(-90, 90, new_step)
new_data = interp_function(new_lon, new_lat)
I'm reasonably new to Python, and I'm trying to plot long-term mean rainfall data for the African continent. I have various NetCDF files, which have already been cut to just contain the long term mean value - I just need to plot it.
My issue is that the data is only plotting to the right of the 0 degree longitude line. I gather this is due to Basemap wanting -180 to 180 coordinates, and my data is 0 to 360. However, nothing I've tried seems to work.
Here's the code (which gives the correct plot, just cut off to the left of 0 degrees):
nc = Dataset(GISS-E2-H_MAM_plots.nc)
prcp = nc.variables['pr'][0,:,:]
pr = 86400*prcp[:]
lon=nc.variables['lon']
lat=nc.variables['lat']
[lonall, latall] = np.meshgrid(lon, lat)
fig = plt.figure()
m = Basemap(projection='cyl', llcrnrlat=-25, urcrnrlat=15, llcrnrlon=-20, urcrnrlon=60)
m.drawcoastlines()
m.drawcountries()
m.drawparallels(np.arange(-90.,90.,10.), labels = [1,0,0,0], fontsize = 10)
m.drawmeridians(np.arange(-180., 180., 10.), labels = [0,0,0,1], fontsize = 10)
levels=np.arange(2, 11.6, 0.8)
mymapf = plt.contourf(lonall, latall, pr, levels, cmap=plt.cm.gist_rainbow_r)
I've tried to shift the data by 180 using the following, and then np.roll to move it all along.
lonall= lonall-180
nlon=len(lonall)
pr=np.roll(pr, nlon/2, axis=1)
This worked for a colleague in a similar instance, but hasn't worked for me.
Any help would be greatly appreciated!
I think the problem is that you don't have [:] after you read in latitude and longitude. I.e. change the above lines to:
lon=nc.variables['lon'][:]
lat=nc.variables['lat'][:]
Also, you don't need the brackets around [lonall,latall]
I have interpolate a function on a grid with scipy.interpolate.griddata like so
interpolated_quantity = scipy.interpolate.griddata(old_points, old_array, grid_x, grid_y, grid_z, method='nearest')
What I would like to do is to convert have a set of 4 1-D arrays: 3 with the position of each cell and one with the corresponding value of interpolated quantity in each cell.
So far I'm using a very slow and time consuming operation:
arrays={}
base_gridx = linspace(xmin,xmax,abs(ngridx)+1)
base_gridy = linspace(ymin,ymax,abs(ngridy)+1)
base_gridz = linspace(zmin,zmax,abs(ngridz)+1)
cx = (base_gridx[1:]+base_gridx[:-1])/2.
cy = (base_gridy[1:]+base_gridy[:-1])/2.
cz = (base_gridz[1:]+base_gridz[:-1])/2.
data_len = len(cx)*len(cy)*len(cz)
for ii in arange(0,len(cx)):
for jj in arange(0,len(cy)):
for kk in arange(0,len(cz)):
arrays["x"].append(cx[ii])
arrays["y"].append(cy[jj])
arrays["z"].append(cz[kk])
arrays["prop"].append(interpolated quantity[ii][jj][kk])
This works, but it just takes a huge amount of time. Do you think there might be a faster way to do this? Maybe using ravel?
It is as simple as you suggest. The four arrays are:
grid_x.ravel()
grid_y.ravel()
grid_z.ravel()
interpolated_quantity.ravel()
Suppose I've been driving a set route with a 3g modem and GPS on my laptop, while my computer back at home records the ping delay. I've correlated ping with GPS lat/long, and now I'd like to visualise this data.
I've got about 80,000 points of data per day, and I'd like to display several month's worth. I'm especially interested in displaying areas where ping consistently times out (ie ping == 1000).
Scatter plot
My first attempt was with a scatter plot, with one point per data entry. I made the size of the point 5x larger if it was a timeout, so it was obvious where these areas were. I also dropped the alpha to 0.1, for a crude way to see overlaid points.
# Colour
c = pings
# Size
s = [2 if ping < 1000 else 10 for ping in pings]
# Scatter plot
plt.scatter(longs, lats, s=s, marker='o', c=c, cmap=cm.jet, edgecolors='none', alpha=0.1)
The obvious problem with this is that it displays one marker per data point, which is a very poor way to display large amounts of data. If I've drive past the same area twice, then the first pass data is just displayed on top of the second pass.
Interpolate over an even grid
I then had a try at using numpy and scipy to interpolate over an even grid.
# Convert python list to np arrays
x = np.array(longs, dtype=float)
y = np.array(lats, dtype=float)
z = np.array(pings, dtype=float)
# Make even grid (200 rows/cols)
xi = np.linspace(min(longs), max(longs), 200)
yi = np.linspace(min(lats), max(lats), 200)
# Interpolate data points to grid
zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='linear', fill_value=0)
# Plot contour map
plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
From this example
This looks interesting (lots of colours and shapes), but it extrapolates too far around areas I haven't explored. You can't see the routes I've travelled, just red/blue blotches.
If I've driven in a large curve, it'll interpolate for the area between (see below):
Interpolate over an uneven grid
I then had a try at using meshgrid (xi, yi = np.meshgrid(lats, longs)) instead of a fixed grid, but I'm told my array is too big.
Is there an easy way I can create a grid from my points?
My requirements:
Handle large data sets (80,000 x 60 = ~5m points)
Display duplicate data for each point either by averaging (I assume interpolation will do this), or by taking a minimum value for each point.
Don't extrapolate too far from data points
I'm happy with a scatter plot (top), but I need some way to average the data before I display it.
(Apologies for the dodgy mspaint drawings, I can't upload actual data)
Solution:
# Get sum
hsum, long_range, lat_range = np.histogram2d(longs, lats, bins=(res_long,res_lat), range=((a,b),(c,d)), weights=pings)
# Get count
hcount, ignore1, ignore2 = np.histogram2d(longs, lats, bins=(res_long,res_lat), range=((a,b),(c,d)))
# Get average
h = hsum/hcount
x, y = np.where(h)
average = h[x, y]
# Make scatter plot
scatterplot = ax.scatter(long_range[x], lat_range[y], s=3, c=average, linewidths=0, cmap="jet", vmin=0, vmax=1000)
To simplify your question, you have two set of points, one for ping<1000, one for ping>=1000.
Since the count of points is very large, you can't plot them directly by scatter(). I created some sample data by:
longs = (np.random.rand(60, 1) + np.linspace(-np.pi, np.pi, 80000)).reshape(-1)
lats = np.sin(longs) + np.random.rand(len(longs)) * 0.1
bad_index = (longs>0) & (longs<1)
bad_longs = longs[bad_index]
bad_lats = lats[bad_index]
(longs, lats) is points for ping<1000, (bad_longs, bad_lats) is points for ping>1000
You can use numpy.histogram2d() to count the points:
ranges = [[np.min(lats), np.max(lats)], [np.min(longs), np.max(longs)]]
h, lat_range, long_range = np.histogram2d(lats, longs, bins=(400,400), range=ranges)
bad_h, lat_range2, long_range2 = np.histogram2d(bad_lats, bad_longs, bins=(400,400), range=ranges)
h and bad_h are the points count in every little squere area.
Then you can choose many methods to visualize it. For example, you can plot it by scatter():
y, x = np.where(h)
count = h[y, x]
pl.scatter(long_range[x], lat_range[y], s=count/20, c=count, linewidths=0, cmap="Blues")
count = bad_h[y, x]
pl.scatter(long_range2[x], lat_range2[y], s=count/20, c=count, linewidths=0, cmap="Reds")
pl.show()
Here is the full code:
import numpy as np
import pylab as pl
longs = (np.random.rand(60, 1) + np.linspace(-np.pi, np.pi, 80000)).reshape(-1)
lats = np.sin(longs) + np.random.rand(len(longs)) * 0.1
bad_index = (longs>0) & (longs<1)
bad_longs = longs[bad_index]
bad_lats = lats[bad_index]
ranges = [[np.min(lats), np.max(lats)], [np.min(longs), np.max(longs)]]
h, lat_range, long_range = np.histogram2d(lats, longs, bins=(300,300), range=ranges)
bad_h, lat_range2, long_range2 = np.histogram2d(bad_lats, bad_longs, bins=(300,300), range=ranges)
y, x = np.where(h)
count = h[y, x]
pl.scatter(long_range[x], lat_range[y], s=count/20, c=count, linewidths=0, cmap="Blues")
count = bad_h[y, x]
pl.scatter(long_range2[x], lat_range2[y], s=count/20, c=count, linewidths=0, cmap="Reds")
pl.show()
The output figure is:
The GDAL libraries including the Python API and associated utilities, particularly gdal_grid should work for you. It includes a number of interpolation and averaging methods and options for generating gridded data from scattered points. You should be able to manipulate the grid cell size to get a pleasing resolution.
GDAL handles a number of data formats, but you should be able to pass your coordinates and ping values as CSV and get back a PNG or JPEG without much trouble.
Keep in mind lat/lon data is not a planar coordinate system. If you intend to incorporate you results with other map data you'll have to figure out what map projection, units, etc. to use.