Suppose I've been driving a set route with a 3g modem and GPS on my laptop, while my computer back at home records the ping delay. I've correlated ping with GPS lat/long, and now I'd like to visualise this data.
I've got about 80,000 points of data per day, and I'd like to display several month's worth. I'm especially interested in displaying areas where ping consistently times out (ie ping == 1000).
Scatter plot
My first attempt was with a scatter plot, with one point per data entry. I made the size of the point 5x larger if it was a timeout, so it was obvious where these areas were. I also dropped the alpha to 0.1, for a crude way to see overlaid points.
# Colour
c = pings
# Size
s = [2 if ping < 1000 else 10 for ping in pings]
# Scatter plot
plt.scatter(longs, lats, s=s, marker='o', c=c, cmap=cm.jet, edgecolors='none', alpha=0.1)
The obvious problem with this is that it displays one marker per data point, which is a very poor way to display large amounts of data. If I've drive past the same area twice, then the first pass data is just displayed on top of the second pass.
Interpolate over an even grid
I then had a try at using numpy and scipy to interpolate over an even grid.
# Convert python list to np arrays
x = np.array(longs, dtype=float)
y = np.array(lats, dtype=float)
z = np.array(pings, dtype=float)
# Make even grid (200 rows/cols)
xi = np.linspace(min(longs), max(longs), 200)
yi = np.linspace(min(lats), max(lats), 200)
# Interpolate data points to grid
zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='linear', fill_value=0)
# Plot contour map
plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
From this example
This looks interesting (lots of colours and shapes), but it extrapolates too far around areas I haven't explored. You can't see the routes I've travelled, just red/blue blotches.
If I've driven in a large curve, it'll interpolate for the area between (see below):
Interpolate over an uneven grid
I then had a try at using meshgrid (xi, yi = np.meshgrid(lats, longs)) instead of a fixed grid, but I'm told my array is too big.
Is there an easy way I can create a grid from my points?
My requirements:
Handle large data sets (80,000 x 60 = ~5m points)
Display duplicate data for each point either by averaging (I assume interpolation will do this), or by taking a minimum value for each point.
Don't extrapolate too far from data points
I'm happy with a scatter plot (top), but I need some way to average the data before I display it.
(Apologies for the dodgy mspaint drawings, I can't upload actual data)
Solution:
# Get sum
hsum, long_range, lat_range = np.histogram2d(longs, lats, bins=(res_long,res_lat), range=((a,b),(c,d)), weights=pings)
# Get count
hcount, ignore1, ignore2 = np.histogram2d(longs, lats, bins=(res_long,res_lat), range=((a,b),(c,d)))
# Get average
h = hsum/hcount
x, y = np.where(h)
average = h[x, y]
# Make scatter plot
scatterplot = ax.scatter(long_range[x], lat_range[y], s=3, c=average, linewidths=0, cmap="jet", vmin=0, vmax=1000)
To simplify your question, you have two set of points, one for ping<1000, one for ping>=1000.
Since the count of points is very large, you can't plot them directly by scatter(). I created some sample data by:
longs = (np.random.rand(60, 1) + np.linspace(-np.pi, np.pi, 80000)).reshape(-1)
lats = np.sin(longs) + np.random.rand(len(longs)) * 0.1
bad_index = (longs>0) & (longs<1)
bad_longs = longs[bad_index]
bad_lats = lats[bad_index]
(longs, lats) is points for ping<1000, (bad_longs, bad_lats) is points for ping>1000
You can use numpy.histogram2d() to count the points:
ranges = [[np.min(lats), np.max(lats)], [np.min(longs), np.max(longs)]]
h, lat_range, long_range = np.histogram2d(lats, longs, bins=(400,400), range=ranges)
bad_h, lat_range2, long_range2 = np.histogram2d(bad_lats, bad_longs, bins=(400,400), range=ranges)
h and bad_h are the points count in every little squere area.
Then you can choose many methods to visualize it. For example, you can plot it by scatter():
y, x = np.where(h)
count = h[y, x]
pl.scatter(long_range[x], lat_range[y], s=count/20, c=count, linewidths=0, cmap="Blues")
count = bad_h[y, x]
pl.scatter(long_range2[x], lat_range2[y], s=count/20, c=count, linewidths=0, cmap="Reds")
pl.show()
Here is the full code:
import numpy as np
import pylab as pl
longs = (np.random.rand(60, 1) + np.linspace(-np.pi, np.pi, 80000)).reshape(-1)
lats = np.sin(longs) + np.random.rand(len(longs)) * 0.1
bad_index = (longs>0) & (longs<1)
bad_longs = longs[bad_index]
bad_lats = lats[bad_index]
ranges = [[np.min(lats), np.max(lats)], [np.min(longs), np.max(longs)]]
h, lat_range, long_range = np.histogram2d(lats, longs, bins=(300,300), range=ranges)
bad_h, lat_range2, long_range2 = np.histogram2d(bad_lats, bad_longs, bins=(300,300), range=ranges)
y, x = np.where(h)
count = h[y, x]
pl.scatter(long_range[x], lat_range[y], s=count/20, c=count, linewidths=0, cmap="Blues")
count = bad_h[y, x]
pl.scatter(long_range2[x], lat_range2[y], s=count/20, c=count, linewidths=0, cmap="Reds")
pl.show()
The output figure is:
The GDAL libraries including the Python API and associated utilities, particularly gdal_grid should work for you. It includes a number of interpolation and averaging methods and options for generating gridded data from scattered points. You should be able to manipulate the grid cell size to get a pleasing resolution.
GDAL handles a number of data formats, but you should be able to pass your coordinates and ping values as CSV and get back a PNG or JPEG without much trouble.
Keep in mind lat/lon data is not a planar coordinate system. If you intend to incorporate you results with other map data you'll have to figure out what map projection, units, etc. to use.
Related
I am trying to segment the time-series data as shown in the figure. I have lots of data from the sensors, any of these data can have different number of isolated peaks region. In this figure, I have 3 of those. I would like to have a function that takes the time-series as the input and returns the segmented sections of equal length.
My initial thought was to have a sliding window that calculates the relative change in the amplitude. Since the window with the peaks will have relatively higher changes, I could just define certain threshold for the relative change that would help me take the window with isolated peaks. However, this will create problem when choosing the threshold as the relative change is very sensitive to the noises in the data.
Any suggestions?
To do this you need to find signal out of noise.
get mean value of you signal and add some multiplayer that place borders on top and on bottom of noise - green dashed line
find peak values below bottom of noise -> array 2 groups of data
find peak values on top of noise -> array 2 groups of data
get min index of bottom first peak and max index of top of first peak to find first peak range
get min index of top second peak and max index of bottom of second peak to find second peak range
Some description in code. With this method you can find other peaks.
One thing that you need to input by hand is to tell program thex value between peaks for splitting data into parts.
See graphic for summary.
import numpy as np
from matplotlib import pyplot as plt
# create noise data
def function(x, noise):
y = np.sin(7*x+2) + noise
return y
def function2(x, noise):
y = np.sin(6*x+2) + noise
return y
noise = np.random.uniform(low=-0.3, high=0.3, size=(100,))
x_line0 = np.linspace(1.95,2.85,100)
y_line0 = function(x_line0, noise)
x_line = np.linspace(0, 1.95, 100)
x_line2 = np.linspace(2.85, 3.95, 100)
x_pik = np.linspace(3.95, 5, 100)
y_pik = function2(x_pik, noise)
x_line3 = np.linspace(5, 6, 100)
# concatenate noise data
x = np.linspace(0, 6, 500)
y = np.concatenate((noise, y_line0, noise, y_pik, noise), axis=0)
# plot data
noise_band = 1.1
top_noise = y.mean()+noise_band*np.amax(noise)
bottom_noise = y.mean()-noise_band*np.amax(noise)
fig, ax = plt.subplots()
ax.axhline(y=y.mean(), color='red', linestyle='--')
ax.axhline(y=top_noise, linestyle='--', color='green')
ax.axhline(y=bottom_noise, linestyle='--', color='green')
ax.plot(x, y)
# split data into 2 signals
def split(arr, cond):
return [arr[cond], arr[~cond]]
# find bottom noise data indexes
botom_data_indexes = np.argwhere(y < bottom_noise)
# split by visual x value
splitted_bottom_data = split(botom_data_indexes, botom_data_indexes < np.argmax(x > 3))
# find top noise data indexes
top_data_indexes = np.argwhere(y > top_noise)
# split by visual x value
splitted_top_data = split(top_data_indexes, top_data_indexes < np.argmax(x > 3))
# get first signal range
first_signal_start = np.amin(splitted_bottom_data[0])
first_signal_end = np.amax(splitted_top_data[0])
# get x index of first signal
x_first_signal = np.take(x, [first_signal_start, first_signal_end])
ax.axvline(x=x_first_signal[0], color='orange')
ax.axvline(x=x_first_signal[1], color='orange')
# get second signal range
second_signal_start = np.amin(splitted_top_data[1])
second_signal_end = np.amax(splitted_bottom_data[1])
# get x index of first signal
x_second_signal = np.take(x, [second_signal_start, second_signal_end])
ax.axvline(x=x_second_signal[0], color='orange')
ax.axvline(x=x_second_signal[1], color='orange')
plt.show()
Output:
red line = mean value of all data
green line - top and bottom noise borders
orange line - selected peak data
1, It depends on how you want to define a "region", but looks like you just have feeling instead of strict definition. If you have a very clear definition of what kind of piece you want to cut out, you can try some method like "matched filter"
2, You might want to detect the peak of absolute magnitude. If not working, try peak of absolute magnitude of first-order difference, even 2nd-order.
3, it is hard to work on the noisy data like this. My suggestion is to do filtering before you pick up sections (on unfiltered data). Filtering will give you smooth peaks so that the position of peaks can be detected by the change of derivative sign. For filtering, try just "low-pass filter" first. If it doesn't work, I also suggest "Hilbert–Huang transform".
*, Looks like you are using matlab. The methods mentioned are all included in matlab.
I have some noisy data that can contain 0 and n gaussian shapes, I am trying to implement an algorithm that takes the highest data points and fits a gaussian to that as per the following 'scheme':
New attempt, steps:
fit a spline through all data points
get first derivative of spline function
get both data points (left/right) where f'(x) = around 0 the data point with max intensity
fit a gaussian through the data points returned from 3
4a. Plot the gaussian (stopping at baseline) in the pdf
Calculate area under gaussian curve
Calculate area under raw data points
Calculate percentage of total area explained by gaussian area
I have implemented this concept using the following code (minimal working example):
#! /usr/bin/env python
from scipy.interpolate import InterpolatedUnivariateSpline
from scipy.optimize import curve_fit
from scipy.signal import argrelextrema
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
data = [(9.60380153195,187214),(9.62028167623,181023),(9.63676350256,174588),(9.65324602212,169389),(9.66972824591,166921),(9.68621215187,167597),(9.70269675106,170838),(9.71918105436,175816),(9.73566703995,181552),(9.75215371878,186978),(9.76864010158,191718),(9.78512816681,194473),(9.80161692526,194169),(9.81810538757,191203),(9.83459553243,186603),(9.85108637051,180273),(9.86757691233,171996),(9.88406913682,163653),(9.90056205454,156032),(9.91705467586,149928),(9.93354897998,145410),(9.95004397733,141818),(9.96653867816,139042),(9.98303506191,137546),(9.99953213889,138724)]
data2 = [(9.60476933166,163571),(9.62125990879,156662),(9.63775225872,150535),(9.65424539203,146960),(9.67073831905,146794),(9.68723301904,149326),(9.70372850238,152616),(9.72022377931,155420),(9.73672082933,156151),(9.75321866271,154633),(9.76971628954,151549),(9.78621568961,148298),(9.80271587303,146333),(9.81921584976,146734),(9.83571759987,150351),(9.85222013334,156612),(9.86872245996,164192),(9.88522656011,171199),(9.90173144362,175697),(9.91823612015,176867),(9.93474257034,175029),(9.95124980389,171762),(9.96775683032,168449),(9.98426563055,165026)]
def gaussFunction(x, *p):
""" TODO
"""
A, mu, sigma = p
return A*np.exp(-(x-mu)**2/(2.*sigma**2))
def quantify(data):
""" TODO
"""
backGround = 105000 # Normally this is dynamically determined but this value is fine for testing on the provided data
time,intensity = zip(*data)
x_data = np.array(time)
y_data = np.array(intensity)
newX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
f = InterpolatedUnivariateSpline(x_data, y_data)
fPrime = f.derivative()
newY = f(newX)
newPrimeY = fPrime(newX)
maxm = argrelextrema(newPrimeY, np.greater)
minm = argrelextrema(newPrimeY, np.less)
breaks = maxm[0].tolist() + minm[0].tolist()
maxPoint = 0
for index,j in enumerate(breaks):
try:
if max(newY[breaks[index]:breaks[index+1]]) > maxPoint:
maxPoint = max(newY[breaks[index]:breaks[index+1]])
xData = newX[breaks[index]:breaks[index+1]]
yData = [x - backGround for x in newY[breaks[index]:breaks[index+1]]]
except:
pass
# Gaussian fit on main points
newGaussX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
p0 = [np.max(yData), xData[np.argmax(yData)],0.1]
try:
coeff, var_matrix = curve_fit(gaussFunction, xData, yData, p0)
newGaussY = gaussFunction(newGaussX, *coeff)
newGaussY = [x + backGround for x in newGaussY]
# Generate plot for visual confirmation
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x_data, y_data, 'b*')
plt.plot((newX[0],newX[-1]),(backGround,backGround),'red')
plt.plot(newX,newY, color='blue',linestyle='dashed')
plt.plot(newGaussX, newGaussY, color='green',linestyle='dashed')
plt.title("Test")
plt.xlabel("rt [m]")
plt.ylabel("intensity [au]")
plt.savefig("Test.pdf",bbox_inches="tight")
plt.close(fig)
except:
pass
# Call the test
#quantify(data)
quantify(data2)
where normally the background (red line in below pictures) is dynamically determined, but for the sake of this example I have set it to a fixed number. The problem that I have is that for some data it works really well:
Corresponding f'(x):
However, for some other data it fails horrendously:
Corresponding f'(x):
Therefore, I would like to hear some suggestions or ideas on why this happens and on potential approaches to fix it. I have included the data that is shown in the picture below (in case anyone wants to try it):
The error lied in the following bit:
breaks = maxm[0].tolist() + minm[0].tolist()
for index,j in enumerate(breaks):
The breaks list now contains both the maxima and minima, but they are not sorted by time. Resulting in the list yielding the following data points for the poor fit: 9.78, 9.62 and 9.86.
The program would then examine data from 9.78 to 9.62 and 9.62 to 9.86, which meant that 9.62 to 9.86 contained the highest intensity data point yielding the fit that is shown in the second graph.
The fix was rather simple by just adding a sort on the breaks in between, as follows:
breaks = maxm[0].tolist() + minm[0].tolist()
breaks = sorted(breaks)
for index,j in enumerate(breaks):
The program then yielded a fit more closely resembling what I would expect:
I am trying to plot contour lines of pressure level. I am using a netCDF file which contain the higher resolution data (ranges from 3 km to 27 km). Due to higher resolution data set, I get lot of pressure values which are not required to be plotted (rather I don't mind omitting certain contour line of insignificant values). I have written some plotting script based on the examples given in this link http://matplotlib.org/basemap/users/examples.html.
After plotting the image looks like this
From the image I have encircled the contours which are small and not required to be plotted. Also, I would like to plot all the contour lines smoother as mentioned in the above image. Overall I would like to get the contour image like this:-
Possible solution I think of are
Find out the number of points required for plotting contour and mask/omit those lines if they are small in number.
or
Find the area of the contour (as I want to omit only circled contour) and omit/mask those are smaller.
or
Reduce the resolution (only contour) by increasing the distance to 50 km - 100 km.
I am able to successfully get the points using SO thread Python: find contour lines from matplotlib.pyplot.contour()
But I am not able to implement any of the suggested solution above using those points.
Any solution to implement the above suggested solution is really appreciated.
Edit:-
# Andras Deak
I used print 'diameter is ', diameter line just above del(level.get_paths()[kp]) line to check if the code filters out the required diameter. Here is the filterd messages when I set if diameter < 15000::
diameter is 9099.66295612
diameter is 13264.7838257
diameter is 445.574234531
diameter is 1618.74618114
diameter is 1512.58974168
However the resulting image does not have any effect. All look same as posed image above. I am pretty sure that I have saved the figure (after plotting the wind barbs).
Regarding the solution for reducing the resolution, plt.contour(x[::2,::2],y[::2,::2],mslp[::2,::2]) it works. I have to apply some filter to make the curve smooth.
Full working example code for removing lines:-
Here is the example code for your review
#!/usr/bin/env python
from netCDF4 import Dataset
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage
from mpl_toolkits.basemap import interp
from mpl_toolkits.basemap import Basemap
# Set default map
west_lon = 68
east_lon = 93
south_lat = 7
north_lat = 23
nc = Dataset('ncfile.nc')
# Get this variable for later calucation
temps = nc.variables['T2']
time = 0 # We will take only first interval for this example
# Draw basemap
m = Basemap(projection='merc', llcrnrlat=south_lat, urcrnrlat=north_lat,
llcrnrlon=west_lon, urcrnrlon=east_lon, resolution='l')
m.drawcoastlines()
m.drawcountries(linewidth=1.0)
# This sets the standard grid point structure at full resolution
x, y = m(nc.variables['XLONG'][0], nc.variables['XLAT'][0])
# Set figure margins
width = 10
height = 8
plt.figure(figsize=(width, height))
plt.rc("figure.subplot", left=.001)
plt.rc("figure.subplot", right=.999)
plt.rc("figure.subplot", bottom=.001)
plt.rc("figure.subplot", top=.999)
plt.figure(figsize=(width, height), frameon=False)
# Convert Surface Pressure to Mean Sea Level Pressure
stemps = temps[time] + 6.5 * nc.variables['HGT'][time] / 1000.
mslp = nc.variables['PSFC'][time] * np.exp(9.81 / (287.0 * stemps) * nc.variables['HGT'][time]) * 0.01 + (
6.7 * nc.variables['HGT'][time] / 1000)
# Contour only at 2 hpa interval
level = []
for i in range(mslp.min(), mslp.max(), 1):
if i % 2 == 0:
if i >= 1006 and i <= 1018:
level.append(i)
# Save mslp values to upload to SO thread
# np.savetxt('mslp.txt', mslp, fmt='%.14f', delimiter=',')
P = plt.contour(x, y, mslp, V=2, colors='b', linewidths=2, levels=level)
# Solution suggested by Andras Deak
for level in P.collections:
for kp,path in enumerate(level.get_paths()):
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter < 15000: # threshold to be refined for your actual dimensions!
#print 'diameter is ', diameter
del(level.get_paths()[kp]) # no remove() for Path objects:(
#level.remove() # This does not work. produces ValueError: list.remove(x): x not in list
plt.gcf().canvas.draw()
plt.savefig('dummy', bbox_inches='tight')
plt.close()
After the plot is saved I get the same image
You can see that the lines are not removed yet. Here is the link to mslp array which we are trying to play with http://www.mediafire.com/download/7vi0mxqoe0y6pm9/mslp.txt
If you want x and y data which are being used in the above code, I can upload for your review.
Smooth line
You code to remove the smaller circles working perfectly. However the other question I have asked in the original post (smooth line) does not seems to work. I have used your code to slice the array to get minimal values and contoured it. I have used the following code to reduce the array size:-
slice = 15
CS = plt.contour(x[::slice,::slice],y[::slice,::slice],mslp[::slice,::slice], colors='b', linewidths=1, levels=levels)
The result is below.
After searching for few hours I found this SO thread having simmilar issue:-
Regridding regular netcdf data
But none of the solution provided over there works.The questions similar to mine above does not have proper solutions. If this issue is solved then the code is perfect and complete.
General idea
Your question seems to have 2 very different halves: one about omitting small contours, and another one about smoothing the contour lines. The latter is simpler, since I can't really think of anything else other than decreasing the resolution of your contour() call, just like you said.
As for removing a few contour lines, here's a solution which is based on directly removing contour lines individually. You have to loop over the collections of the object returned by contour(), and for each element check each Path, and delete the ones you don't need. Redrawing the figure's canvas will get rid of the unnecessary lines:
# dummy example based on matplotlib.pyplot.clabel example:
import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
# difference of Gaussians
Z = 10.0 * (Z2 - Z1)
plt.figure()
CS = plt.contour(X, Y, Z)
for level in CS.collections:
for kp,path in reversed(list(enumerate(level.get_paths()))):
# go in reversed order due to deletions!
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter<1: # threshold to be refined for your actual dimensions!
del(level.get_paths()[kp]) # no remove() for Path objects:(
# this might be necessary on interactive sessions: redraw figure
plt.gcf().canvas.draw()
Here's the original(left) and the removed version(right) for a diameter threshold of 1 (note the little piece of the 0 level at the top):
Note that the top little line is removed while the huge cyan one in the middle doesn't, even though both correspond to the same collections element i.e. the same contour level. If we didn't want to allow this, we could've called CS.collections[k].remove(), which would probably be a much safer way of doing the same thing (but it wouldn't allow us to differentiate between multiple lines corresponding to the same contour level).
To show that fiddling around with the cut-off diameter works as expected, here's the result for a threshold of 2:
All in all it seems quite reasonable.
Your actual case
Since you've added your actual data, here's the application to your case. Note that you can directly generate the levels in a single line using np, which will almost give you the same result. The exact same can be achieved in 2 lines (generating an arange, then selecting those that fall between p1 and p2). Also, since you're setting levels in the call to contour, I believe the V=2 part of the function call has no effect.
import numpy as np
import matplotlib.pyplot as plt
# insert actual data here...
Z = np.loadtxt('mslp.txt',delimiter=',')
X,Y = np.meshgrid(np.linspace(0,300000,Z.shape[1]),np.linspace(0,200000,Z.shape[0]))
p1,p2 = 1006,1018
# this is almost the same as the original, although it will produce
# [p1, p1+2, ...] instead of `[Z.min()+n, Z.min()+n+2, ...]`
levels = np.arange(np.maximum(Z.min(),p1),np.minimum(Z.max(),p2),2)
#control
plt.figure()
CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
#modified
plt.figure()
CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
for level in CS.collections:
for kp,path in reversed(list(enumerate(level.get_paths()))):
# go in reversed order due to deletions!
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter<15000: # threshold to be refined for your actual dimensions!
del(level.get_paths()[kp]) # no remove() for Path objects:(
# this might be necessary on interactive sessions: redraw figure
plt.gcf().canvas.draw()
plt.show()
Results, original(left) vs new(right):
Smoothing by resampling
I've decided to tackle the smoothing problem as well. All I could come up with is downsampling your original data, then upsampling again using griddata (interpolation). The downsampling part could also be done with interpolation, although the small-scale variation in your input data might make this problem ill-posed. So here's the crude version:
import scipy.interpolate as interp #the new one
# assume you have X,Y,Z,levels defined as before
# start resampling stuff
dN = 10 # use every dN'th element of the gridded input data
my_slice = [slice(None,None,dN),slice(None,None,dN)]
# downsampled data
X2,Y2,Z2 = X[my_slice],Y[my_slice],Z[my_slice]
# same as X2 = X[::dN,::dN] etc.
# upsampling with griddata over original mesh
Zsmooth = interp.griddata(np.array([X2.ravel(),Y2.ravel()]).T,Z2.ravel(),(X,Y),method='cubic')
# plot
plt.figure()
CS = plt.contour(X, Y, Zsmooth, colors='b', linewidths=2, levels=levels)
You can freely play around with the grids used for interpolation, in this case I just used the original mesh, as it was at hand. You can also play around with different kinds of interpolation: the default 'linear' one will be faster, but less smooth.
Result after downsampling(left) and upsampling(right):
Of course you should still apply the small-line-removal algorithm after this resampling business, and keep in mind that this heavily distorts your input data (since if it wasn't distorted, then it wouldn't be smooth). Also, note that due to the crude method used in the downsampling step, we introduce some missing values near the top/right edges of the region under consideraton. If this is a problem, you should consider doing the downsampling based on griddata as I've noted earlier.
This is a pretty bad solution, but it's the only one that I've come up with. Use the get_contour_verts function in this solution you linked to, possibly with the matplotlib._cntr module so that nothing gets plotted initially. That gives you a list of contour lines, sections, vertices, etc. Then you have to go through that list and pop the contours you don't want. You could do this by calculating a minimum diameter, for example; if the max distance between points is less than some cutoff, throw it out.
That leaves you with a list of LineCollection objects. Now if you make a Figure and Axes instance, you can use Axes.add_collection to add all of the LineCollections in the list.
I checked this out really quick, but it seemed to work. I'll come back with a minimum working example if I get a chance. Hope it helps!
Edit: Here's an MWE of the basic idea. I wasn't familiar with plt._cntr.Cntr, so I ended up using plt.contour to get the initial contour object. As a result, you end up making two figures; you just have to close the first one. You can replace checkDiameter with whatever function works. I think you could turn the line segments into a Polygon and calculate areas, but you'd have to figure that out on your own. Let me know if you run into problems with this code, but it at least works for me.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
def checkDiameter(seg, tol=.3):
# Function for screening line segments. NB: Not actually a proper diameter.
diam = (seg[:,0].max() - seg[:,0].min(),
seg[:,1].max() - seg[:,1].min())
return not (diam[0] < tol or diam[1] < tol)
# Create testing data
x = np.linspace(-1,1, 21)
xx, yy = np.meshgrid(x,x)
z = np.exp(-(xx**2 + .5*yy**2))
# Original plot with plt.contour
fig0, ax0 = plt.subplots()
# Make sure this contour object actually has a tiny contour to remove
cntrObj = ax0.contour(xx,yy,z, levels=[.2,.4,.6,.8,.9,.95,.99,.999])
# Primary loop: Copy contours into a new LineCollection
lineNew = list()
for lineOriginal in cntrObj.collections:
# Get properties of the original LineCollection
segments = lineOriginal.get_segments()
propDict = lineOriginal.properties()
propDict = {key: value for (key,value) in propDict.items()
if key in ['linewidth','color','linestyle']} # Whatever parameters you want to carry over
# Filter out the lines with small diameters
segments = [seg for seg in segments if checkDiameter(seg)]
# Create new LineCollection out of the OK segments
if len(segments) > 0:
lineNew.append(mpl.collections.LineCollection(segments, **propDict))
# Make new plot with only these line collections; display results
fig1, ax1 = plt.subplots()
ax1.set_xlim(ax0.get_xlim())
ax1.set_ylim(ax0.get_ylim())
for line in lineNew:
ax1.add_collection(line)
plt.show()
FYI: The bit with propDict is just to automate bringing over some of the line properties from the original plot. You can't use the whole dictionary at once, though. First, it contains the old plot's line segments, but you can just swap those for the new ones. But second, it appears to contain a number of parameters that are in conflict with each other: multiple linewidths, facecolors, etc. The {key for key in propDict if I want key} workaround is my way to bypass that, but I'm sure someone else can do it more cleanly.
I'm plotting a point cloud and coloring by residual error. I'd like the colormap to remain centered on 0, so that 0 error is white.
I see answers for matplotlib. What about Mayavi?
from mayavi import mlab
mlab.points3d(x, y, z, e, colormap='RdBu')
you can set the vmin and vmax of the colormap explicitly with mlab.points3d. So, you could just make sure that vmin = -vmax. Something like this:
mylimit = 10
mlab.points3d(x, y, z, e, colormap='RdBu',vmin=-mylimit,vmax=mylimit)
Or, you could set the limit automatically with something like:
mylimit = max(abs(e.min()),abs(e.max()))
In case anybody wishes to do this but so that the full extent of the colorbar is used, here is a solution I made (with help from here) for mayavi that stretches the colorbar so that the centre of it is on zero:
#Mayavi surface
s = mlab.surf(data)
#Get the lut table of the data
lut = s.module_manager.scalar_lut_manager.lut.table.asarray()
maxd = np.max(data)
mind = np.min(data)
#Data range
dran = maxd - mind
#Proportion of the data range at which the centred value lies
zdp = abs(mind / dran)
#The +0.5's here are because floats are rounded down when converted to ints
#index equal portion of distance along colormap
cmzi = int(zdp * 255 + 0.5)
#linspace from zero to 128, with number of points matching portion to side of zero
topi = np.linspace(0, 127, cmzi) + 0.5
#and for other side
boti = np.linspace(128, 255, 255 - cmzi) + 0.5
#convert these linspaces to ints and map the new lut from these
shift_index = np.hstack([topi.astype(int), boti.astype(int)])
s.module_manager.scalar_lut_manager.lut.table = self.lut[shift_index]
#Force update of the figure now that we have changed the LUT
mlab.draw()
Note that if you wish to do this multiple times for the same surface (ie. if you're modifying the mayavi scalars rather than redrawing the plot) you need to make a record of the initial lut table and modify that each time.
I am attempting to plot weather variables on a map of Oklahoma using mpl_toolkits.basemap, but am having issues figuring out how to interpolate the data to plot on top of the map.
Here is a general idea of the current code I have:
lons = [-97.9547, -97.9747, -97.4256]
lats = [35.5322, 35.864, 35.4111]
data = [2,2,2]
map = Basemap(llcrnrlon = -103.068237, llcrnrlat = 33.610045, urcrnrlon = -94.359076, urcrnrlat = 37.040928, resolution = 'i')
CS = map.contour(X, Y, data)
map.drawstates()
plt.show()
What I am attempting to accomplish is to plot the data values on the map based on the related reference index in the lons/lats lists, and then contour the values of the data variable.
Now this obviously won't work, because I need to interpolate the data. Is there a way that I could accomplish this using the griddata function? I am very confused on how I would establish the boundaries of the grid given that latitude and longitude values are not linearly spaced.
Is there an easier way to do this that I am missing?
Any help and/or hints would be greatly appreciated, this is holding me back from moving on to the next major portion of the research project!
I don't have python installed on this machine, so can't test this. But something like this should get you the required inputs for the countour plot...
import numpy as np
lons = [-97.9547, -97.9747, -97.4256]
lats = [35.5322, 35.864, 35.4111]
data = [2,2,2]
xs, ys = np.meshgrid(lons, lats)
dataMesh = np.empty_like(xs)
for i, j, d in zip(lons, lats, data):
dataMesh[lons.index(i), lats.index(j)] = d
map = Basemap(llcrnrlon = -103.068237, llcrnrlat = 33.610045, urcrnrlon = -94.359076, urcrnrlat = 37.040928, resolution = 'i')
CS = map.contour(xs, ys, dataMesh)
map.drawstates()
plt.show()
Like i said though, i haven't tested this. I don't know what happens if you try to plot unititialised values. you might need to use a different numpy array initialisation.