Matplotlib basemap reading shapefile is very slow - python

I am trying to plot a simple 'merc' map with boundary from shape file. The total size of the shape file ne_10m_admin_0_countries_lakes.shp is just 8mb. The simple mslp surface plot from GFS data took about more than 28 sec which I think is too much. After investigation I found that it is the reading of shape file consuming around 10 sec. A simple code to demonstrate the issue is shown below:-
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
south = 0
north = 5
west = 70
east = 85
m = Basemap(projection='merc', llcrnrlat=south, urcrnrlat=north,
llcrnrlon=west, urcrnrlon=east, resolution='c')
m.readshapefile('data/gis-data/world_countries/'
'ne_10m_admin_0_countries_lakes', 'ne_10m_admin_0_countries_lakes', linewidth=0.7)
plt.savefig('map.png')
To show the problem I am facing, I have run the above code with and without commenting the path to shape file. Here is time the script took after reading from shape file:-
$ time python test.py
real 0m18.234s
user 0m17.832s
sys 0m1.020s
Here is the result without reading from shape file:-
$ time python test.py
real 0m2.506s
user 0m2.360s
sys 0m0.324s
Is there any way to read the shape file quickly? Is there any solution/trick for this issue?

I would put this question into Geographic Information Systems too, there are some people to help you. If basemap is slow, did you try cartopy, i saw this on the matplotlib site, see http://matplotlib.org/1.4.1/mpl_toolkits/index.html

Related

How to plot the free energy landscape of protein structure?

I understand the question is not appropriate for this platform, but I can try if I can get some hints,
I've been trying to plot the free energy landscape of a protein structure ("Chignolin"). I'm completely run out of ideas how to do that!! I've MD simulation trajectory file Trajectory file and using pyemma to plot the energy landscape. But I'm getting the error
""
TypeError: plot_free_energy() takes from 2 to 20 positional arguments but 28 were given
""
Could someone figure out where the problem lies?
Here is my code
import numpy as np
import matplotlib.pyplot as plt
import mdtraj as md
from itertools import combinations
from simtk.openmm.app.topology import Topology
from simtk.openmm.app.simulation import Simulation
from simtk.openmm.app.dcdreporter import DCDReporter
from simtk.openmm.app.statedatareporter import StateDataReporter
import simtk.unit as u
import simtk.openmm as mm
import simtk.openmm.openmm as openmm
import pyemma.coordinates as coor
import pyemma
pdb = md.load('1uao_Calpha.pdb')
feat = pyemma.coordinates.data.MDFeaturizer(pdb)
feat.add_distances_ca(periodic=False)
files = pyemma.coordinates.load('traj/DESRES/CLN025-0-c-alpha/CLN025-0-c-alpha-005.dcd', features = feat)
pyemma.plots.plot_free_energy(*files.T)
plt.show()
Here is the another pdb file.
I recommend you start reading the documentation, especially the "learn PyEMMA" section containing Jupyter notebooks teaching you the work-flow to extract properly weighted "pseudo" free-energy surfaces. Usually these surfaces are drawn into the dimensions of the first two slowest dynamical processes, but you can think of any other combination as well. These dimensions are defined by a TICA or VAMP projection, which are basically methods to extract the slow modes from your data, in case of proteins this contains folding and rare events.
As a primer I suggest reading this tutorial first, as it gives you a brief overview how to load and process your data to extract the slow modes. Note that this not yet contain Markov state modelling, so read further in the other examples to learn about that.

Python mapping: plotting an electoral boundary over a street map?

I would have thought this would be so simple it would be almost example 1 in any mapping documentation. But it seems not... I want to map the boundary of an electorate over a street map, using cartopy. I can download the GIS data of the electorates in either MapInfo or ShapeFile form. When I tried to do this a year ago, the only way I could find to do it was to extract the lat/long coordinates of the MapInfo polygon, and plot them with matplotlib.
I'm trying to be a bit more elegant this year. With the MapInfo file, I can isolate my particular electorate with
import geopandas as gpd
v = gpd.read_file('VicMaps/vic-july-2018-mid-mif/E_VIC18.MIF')
cg = v.loc[7].geometry
My efforts to extract a particular single boundary from a shapefile are given below.
The other issue is that when I try to run this in jupyter, attempts at plotting a map causes the kernel (python 3.4) to crash.
There must be examples of this somewhere, but so far I haven't found an example which works with my data. This is what I have so far, cobbled together from various helpful answers to other people's questions:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.io.shapereader import Reader
from cartopy.io.img_tiles import OSM
shp = Reader('VicMaps/E_AUGFN3_region.shp')
fig = plt.figure(figsize=(16,16))
tiler = OSM()
ax = plt.axes(projection=tiler.crs)
ax.set_extent([144.956158, 145.085398, -37.813662, -37.690999])
for r, g in zip(shp.records(), shp.geometries()):
if r.attributes['Elect_div'] == 'Cooper':
ax.add_geometries(g, ccrs.Geodetic())
plt.show()
But what happens is that the kernel just dies "unexpectedly".
If anybody could point me in the direction of a solution, I'd be delighted! Also: I'm not wedded to cartopy; if there's a better package I'll use it.
Thanks!

How do I plot GFS grib2 data with Python?

I would like to have a chart with the temperatures for the following days on my website, and the Global Forecasting System meets my needs the most. How do I plot the GRIB2 data in matplotlib and create a PNG image from the plot?
I've spend hours of searching on the internet, asking people who do know how to do this (they where not helpfull at all) and I don't know where to start.
GFS data can be found here: ftp://ftp.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/
If possible, I'd like it to be lightweight and without loosing too much server space.
When you think lightweight about data usage and storage, you may consider to use other data forms than GRIB. GRIB-files usually contain worldwide data, which is pretty useless when you only want to plot for a specific domain.
I can strongly recommend to use data from the NOAA-NCEP opendap data server. You can gain data from this server using netCDF4. Unfortunately, this server is known to be unstable at some times which may causes delays in refreshing runs and/or malformed datasets. Although, in 95% of the time, I have acces to all the data I need.
Note: This data server may be slow due to high trafficking after a release of a new run. Acces to the data server can be found here: http://nomads.ncdc.noaa.gov/data.php?name=access#hires_weather_datasets
Plotting data is pretty easy with Matplotlib and Basemap toolkits. Some examples, including usage of GFS-datasets, can be found here: http://matplotlib.org/basemap/users/examples.html
Basically, there are 2 steps:
use wgrib to extract selected variables from grib2 data, and save into NetCDF file. Although there are some API such as pygrib, yet I found it less buggy to use the command line tool directly. some useful links:
install: http://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/compile_questions.html
tricks: http://www.ftp.cpc.ncep.noaa.gov/wd51we/wgrib2/tricks.wgrib2
For example, extract temperature and humidity:
wgrib2 test.grb2 -s | egrep '(:RH:2 m above ground:|:TMP:2 m above ground:)'|wgrib2 -i test.grb2 -netcdf test.nc
use Python libraries to process NetCDF files, example code may look like this:
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
% matplotlib inline
from netCDF4 import Dataset
from mpl_toolkits.basemap import Basemap
from pyproj import Proj
import matplotlib.cm as cm
import datetime
file = "test.nc"
rootgrp = Dataset(file, "r")
x = rootgrp['longitude'][:] # 0-359, step = 1
y = rootgrp['latitude'][:] # -90~90, step =1
tmp = rootgrp['TMP_2maboveground'][:][0] # shape(181,360)
dt = datetime.datetime(1970,1,1) + datetime.timedelta(seconds = rootgrp['time'][0])
fig = plt.figure(dpi=150)
m = Basemap(projection='mill',lat_ts=10,llcrnrlon=x.min(),
urcrnrlon=x.max(),llcrnrlat=y.min(),urcrnrlat=y.max(), resolution='c')
xx, yy = m(*np.meshgrid(x,y))
m.pcolormesh(xx,yy,tmp-273.15,shading='flat',cmap=plt.cm.jet)
m.colorbar(location='right')
m.drawcoastlines()
m.drawparallels(np.arange(-90.,120.,30.), labels=[1,0,0,0], fontsize=10)
m.drawmeridians(np.arange(0.,360.,60.), labels=[0,0,0,1], fontsize=10)
plt.title("{}, GFS, Temperature (C) ".format(dt.strftime('%Y-%m-%d %H:%M UTC')))
plt.show()

Trying to save vtk files with tvtk made from NumPy arrays

I'm trying to use tvtk (the package included with Enthought's Canopy) to turn some arrays into .vtk data that I can toss over to VisIt (mayavi complains on my OS (Mac OS X). I found what looked like the solution here (Exporting a 3D numpy to a VTK file for viewing in Paraview/Mayavi) but I'm not recovering the output that the author of the answer does and was wondering if anyone could tell me what I'm doing wrong. So I enter the commands in the Canopy notebook,
import numpy as np
from enthought.tvtk.api import tvtk, write_data
data = np.random.random((10,10,10))
grid = tvtk.ImageData(spacing=(10, 5, -10), origin=(100, 350, 200),
dimensions=data.shape)
grid.point_data.scalars = np.ravel([], order='F')
grid.point_data.scalars.name = 'Test Data'
# Writes legacy ".vtk" format if filename ends with "vtk", otherwise
# this will write data using the newer xml-based format.
write_data(grid, '/Users/Epictetus/Documents/Dropbox/Work/vtktest.vtk')
which does create a vtk file, but unlike the output the author of the previous answer suggests, I just get a blank output,
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 10 10 10
SPACING 10 5 -10
ORIGIN 100 350 200
Is it obvious what I'm doing wrong? File I/O has never been my forte...
Cheers!
-user2275987
Change the line
grid.point_data.scalars = np.ravel([], order='F')
to
grid.point_data.scalars = data.ravel(order='F')
Your grid doesn't have any data, and hence nothing is saved to the vtk file! :-)

Reduce the size of .eps figure made using matplotlib

Today I was doing a report for a course and I needed to include a figure of a contour plot of some field. I did this with matplotlib (ignore the chaotic header):
import numpy as np
import matplotlib
from matplotlib import rc
rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
## for Palatino and other serif fonts use:
#rc('font',**{'family':'serif','serif':['Palatino']})
rc('text', usetex=True)
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy.ma as ma
from numpy.random import uniform
from matplotlib.colors import LogNorm
fig = plt.figure()
data = np.genfromtxt('Isocurvas.txt')
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
rc('text', usetex=True)
rc('font', family='serif')
x = data[:,0]
y = data[:,1]
z = data[:,2]
# define grid.
xi = np.linspace(0.02,1, 100)
yi = np.linspace(0.02,1.3, 100)
# grid the data.
zi = griddata(x,y,z,xi,yi)
# contour the gridded data.
CS = plt.contour(xi,yi,zi,25,linewidths=0,colors='k')
CS = plt.contourf(xi,yi,zi,25,cmap=plt.cm.jet)
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=0)
plt.xlim(0.01,1)
plt.ylim(0.01,1.3)
plt.ylabel(r'$t$')
plt.xlabel(r'$x$')
plt.title(r' Contour de $\rho(x,t)$')
plt.savefig("Isocurvas.eps", format="eps")
plt.show()
where "Isocurvas.txt" is a 3 column file, which I really don't want to touch (eliminate data, or something like that, wouldn't work for me). My problem was that the figure size was 1.8 Mb, which is too much for me. The figure itself was bigger than the whole rest of the report, and when I opened the pdf it wasn't very smooth .
So , my question is :
Are there any ways of reducing this size without a sacrifice on the quality of the figure?. I'm looking for any solution, not necessarily python related.
This is the .png figure, with a slight variation on parameters. using .png you can see the pixels, which i don't like very much, so it is preferable pdf or eps.
Thank you.
The scatter plot is what's causing your large size. Using the EPS backend, I used your data to create the figures. Here's the filesizes that I got:
Straight from your example: 1.5Mb
Without the scatter plot: 249Kb
With a raster scatter plot: 249Kb
In your particular example it's unclear why you want the scatter (not visible). But for future problems, you can use the rasterized=True keyword on the call to plt.scatter to activate a raster mode. In your example you have 12625 points in the scatter plot, and in vector mode that's going to take a bit of space.
Another trick that I use to trim down vector images from matplotlib is the following:
Save figure as EPS
Run epstopdf (available with a TeX distribution) on the resulting file
This will generally give you a smaller pdf than matplotlib's default, and the quality is unchanged. For your example, using the EPS file without the scatter, it produced a pdf with 73 Kb, which seems quite reasonable. If you really want a vector scatter command, running epstopdf on the original 1.5 Mb EPS file produced a pdf with 198 Kb in my system.
I'm not sure if it helps with size, but if your willing to try the matplotlib 1.2 release candidate there is a new backend for producing PGF images (designed to slot straight into latex seamlessly). You can find the docs for that here: http://matplotlib.org/1.2.0/users/whats_new.html#pgf-tikz-backend
If you do decide to give it a shot and you have any questions, I'm probably not the best person to talk to, so would recommend emailing the matplotlib-users mailing list.
HTH,
Try removing the scatter plot of your data. They do not appear to be visible in your final figure (because you made them size 0) and may be taking up space in your eps.
EDITED: to completely change the answer because I read the question wrong.

Categories