here is python code (porting from Richard McElreath's excellent Statistical Rethinking) that results in an unwanted white trasparent 'column' in my resulting plot:
import numpy as np
import pandas as pd
import scipy.stats
import matplotlib.pyplot as plt
# import data
url = "https://raw.githubusercontent.com/pymc-devs/resources/master/Rethinking_2/Data/Howell1.csv"
df = pd.read_csv(url, delimiter = ';')
df2 = df[df.age >= 18]
# sample priors (prior predictive check)
n = 100
a = scipy.stats.norm.rvs(178, 20, n)
b1 = scipy.stats.norm.rvs(0, 10, n)
b2 = np.exp(scipy.stats.norm.rvs(0, 1, n))
xbar = df2.weight.mean()
# compare 2 priors
fig,ax = plt.subplots(1,2,sharey=True)
for i in range(100):
ax[0].plot(df2.weight, a[i] + b1[i]*(df2.weight - xbar),color = 'grey',lw=.5,alpha=.2)
ax[0].set_xlabel('weight')
ax[0].set_ylabel('height')
ax[0].set_title('normal prior of β')
ax[1].plot(df2.weight, a[i] + b2[i]*(df2.weight - xbar),color = 'grey',lw=.5,alpha=.2)
ax[1].set_xlabel('weight')
ax[1].set_title('log-normal prior of β')
plt.axis([30,60,-100,400])
plt.show()
This occurs in my Jupyter notebook, in Google CoLab and in the pdf (plt.savefig)
My notebook versions:
numpy 1.19.4
pandas 1.1.5
scipy 1.5.4
matplotlib 3.3.3
Thanks!!
I think you mean the region where the lines are drawn thinner/lighter and not the borders.
I found out it has to do with aliasing and not the data itself.
Play around with the antialiased parameter:
ax[0].plot(..., antialiased=False)
Looks like this:
Surely it makes the plot look ugly but you may increase the figure size or dpi parameter.
fig.set_dpi(300.0)
...
plt.show();
Then you get this:
This is a data artifact interacting with anti-aliasing in an interesting way. In the final image we have to pick a color for every pixel. Without anti-aliasing when we have to draw a line we have to decide is this pixel "in" the line (and hence we color it) or "out" (in which case we do not color it) which can lead to stair-step looking lines (particularly with lines that are close to flat). With anti-aliasing we color the pixel based on how much of the pixel is "in" the line vs not. That smearing out fools our eye (in a good way) and we see a more convincing straight line. Without anti-aliasing or alpha drawing the same line multiple times does not change the appearance (any given pixel is still in or out), but with anti-aliasing or alpha, every time you draw the line any of the "partial" pixels get darker.
In the original data the values in df2.weight all fall on the same line, but they are not sorted so as we draw it is going back-and-forth over the same path (see the trace in the left-center panel). Depending on exactly where the turning points are and how many times any given segment is traversed the line will look darker in someplaces than others. There is something in the exact structure of the data that is causing that "band".
If you increase the DPI, the pixels get smaller so the effect will get less pronounced (similar to zooming in) and if you turn of anti-aliasing the effect will get less pronounced. I suspect (but have not tested) if you shuffle the data you will be able to move the band around!
Sorting the weights (which from this context I do not think their order is meaningful?) makes the plots in the bottom two panels that look nicer.
So in short, that band is "real" in the sense that it is representing something in the data rather than being a bug in the render process, but is highlighting structure in the data that I do not think is meaningful.
import numpy as np
import pandas as pd
import scipy.stats
import matplotlib.pyplot as plt
# import data
url = "https://raw.githubusercontent.com/pymc-devs/resources/master/Rethinking_2/Data/Howell1.csv"
# this is a mpl 3.3 feature
fig, ad = plt.subplot_mosaic(
[
["normal", "log-normal"],
["trace", "hist"],
["sorted normal", "sorted log-normal"],
],
constrained_layout=True,
)
df = pd.read_csv(url, delimiter=";")
df2 = df[df.age >= 18]
# sample priors (prior predictive check)
n = 100
a = scipy.stats.norm.rvs(178, 20, n)
b1 = scipy.stats.norm.rvs(0, 10, n)
b2 = np.exp(scipy.stats.norm.rvs(0, 1, n))
def inner(weights, a, b1, b2, ax_dict):
xbar = np.mean(weights)
for i in range(100):
ax_dict["normal"].plot(
weights, a[i] + b1[i] * (weights - xbar), color="grey", lw=0.5, alpha=0.2
)
ax_dict["normal"].set_xlabel("weight")
ax_dict["normal"].set_ylabel("height")
ax_dict["normal"].set_title("normal prior of β")
ax_dict["log-normal"].plot(
weights, a[i] + b2[i] * (weights - xbar), color="grey", lw=0.5, alpha=0.2
)
ax_dict["log-normal"].set_xlabel("weight")
ax_dict["log-normal"].set_title("log-normal prior of β")
inner(df2.weight, a, b1, b2, ad)
inner(
np.array(sorted(df2.weight)),
a,
b1,
b2,
{"normal": ad["sorted normal"], "log-normal": ad["sorted log-normal"]},
)
ad["hist"].hist(df2.weight, bins=100, color="0.5")
ad["hist"].set_xlabel("weight")
ad["hist"].set_ylabel("#")
ad["trace"].plot(df2.weight, "-o", color="0.5", alpha=0.5)
ad["trace"].set_ylabel("weight")
ad["trace"].set_xlabel("index")
plt.show()
I set the values manually, but if you set the axes for each graph, the margins will disappear.
ax[0].axis([33,63,-100,400])
ax[1].axis([33,60,-100,400])
If you want to make the spacing between the graphs narrower, you can do so in the following way.
fig.subplots_adjust(wspace=0.05)
Related
I am just getting started with Holoviews. My questions are on customizing histograms, but also I am sharing a complete example as it may be helpful for other newbies to look at, since the documentation for Holoviews is very thorough but can be overwhelming.
I have a number of time series in text files loaded as Pandas DataFrames where:
each file is for a specific location
at each location about 10 time series were collected, each with about 15,000 points
I am building a small interactive tool where a Selector can be used to choose the location / DataFrame, and then another Selector to pick 3 of 10 of the time series to be plotted together.
My goal is to allow linked zooms (both x and y scales). The questions and code will focus on this aspect of the tool.
I cannot share the actual data I am using, unfortunately, as it is proprietary, but I have created 3 random walks with specific data ranges that are consistent with the actual data.
## preliminaries ##
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.util.transform import dim
from holoviews.selection import link_selections
from holoviews import opts
from holoviews.operation.datashader import shade, rasterize
import hvplot.pandas
hv.extension('bokeh', width=100)
## create random walks (one location) ##
data_df = pd.DataFrame()
npoints=15000
np.random.seed(71)
x = np.arange(npoints)
y1 = 1300+2.5*np.random.randn(npoints).cumsum()
y2 = 1500+2*np.random.randn(npoints).cumsum()
y3 = 3+np.random.randn(npoints).cumsum()
data_df.loc[:,'x'] = x
data_df.loc[:,'rand1'] = y1
data_df.loc[:,'rand2'] = y2
data_df.loc[:,'rand3'] = y3
This first block is just to plot the data and show how, by design, one of the random walks have different range from the other two:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],value_label='y',width=800,height=400)
As a result, although hvplot subplots work out of the box (for linking), ranges are different so the scaling is not quite there:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],
value_label='y',subplots=True,width=800,height=200).cols(1)
So, my first attempt was to adapt the Python-based Points example from Linked brushing in the documentation:
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
hv.Points(data_df, dim).opts(color=c)
for c, dim in zip(colors, [['x', d] for d in dims])
])
link_selections(layout).opts(opts.Points(width=1200, height=300)).cols(1)
That is already an amazing result for a 20 minutes effort!
However, what I would really like is to plot a curve rather than points, and also see a histogram, so I adapted the comprehension syntax to work with Curve (after reading the documentation pages Applying customization, and Composing elements):
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([hv.Curve(data_df,'x',dim).opts(height=300,width=1200,
color=c).hist(dim) for c,
dim in zip(colors,[d for d in dims])])
link_selections(layout).cols(1)
Which is almost exactly what I want. But I still struggle with the different layers of opts syntax.
Question 1: with the comprehension from the last code block, how would I make the histogram share color with the curves?
Now, suppose I want to rasterize the plots (although I do not think is quite yet necessary with 15,000 points like in this case), I tried to adapt the first example with Points:
cmaps = ['Blues', 'Greens', 'Reds']
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
shade(rasterize(hv.Points(data_df, dims),
cmap=c)).opts(width=1200, height = 400).hist(dims[1])
for c, dims in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
This is a decent start, but again I struggle with the options/customization.
Question 2: in the above cod block, how would I pass the colormaps (it does not work as it is now), and how do I make the histogram reflect data values as in the previous case (and also have the right colormap)?
Thank you!
Sander answered how to color the histogram, but for the other question about coloring the datashaded plot, Datashader renders your data with a colormap rather than a single color, so the parameter is named cmap rather than color. So you were correct to use cmap in the datashaded case, but (a) cmap is actually a parameter to shade (which does the colormapping of the output of rasterize), and (b) you don't really need shade, as you can let Bokeh do the colormapping in most cases nowadays, in which case cmap is an option rather than an argument. Example:
from bokeh.palettes import Blues, Greens, Reds
cmaps = [Blues[256][200:], Greens[256][200:], Reds[256][200:]]
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
rasterize(hv.Points(data_df, ds)).opts(cmap=c,width=1200, height = 400).hist(dims[1])
for c, ds in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
To answer your first question to make the histogram share the color of the curve, I've added .opts(opts.Histogram(color=c)) to your code.
When you have a layout you can specify the options of an element inside the layout like that.
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout(
[hv.Curve(data_df,'x',dim)
.opts(height=300,width=600, color=c)
.hist(dim)
.opts(opts.Histogram(color=c))
for c, dim in zip(colors,[d for d in dims])]
)
link_selections(layout).cols(1)
Using PyWavelets and Matplotbib.Specgram on a signal gives more detailed plots with pywt.dwt then pywt.cwt. How can I get a pywt.cwt specgram in a similar way?
With dwt:
import pywt
import pywt.data
import matplotlib.pyplot as plot
from scipy import signal
from scipy.io import wavfile
bA, bD = pywt.dwt(datamean, 'db2')
powerSpectrum, freqenciesFound, time, imageAxis = plot.specgram(bA, NFFT = 387, Fs=100)
plot.xlabel('Time')
plot.ylabel('Frequency')
plot.show()
with this spectrogram plot:
https://imgur.com/a/bYb8bBS
With cwt:
widths = np.arange(1,5)
coef, freqs = pywt.cwt(datamean, widths,'morl')
powerSpectrum, freqenciesFound, time, imageAxis = plot.specgram(coef, NFFT = 129, Fs=100)
plot.xlabel('Time')
plot.ylabel('Frequency')
plot.show()
with this spectrogram plot:
https://imgur.com/a/GIINzJp
and for better results:
sig = datamean
widths = np.arange(1, 31)
cwtmatr = signal.cwt(sig, signal.ricker, widths)
plt.imshow(cwtmatr, extent=[-1, 1, 1, 5], cmap='PRGn', aspect='auto',
vmax=abs(cwtmatr).max(), vmin=-abs(cwtmatr).max())
plt.show()
with this spectrogram plot:
https://imgur.com/a/TnXqgGR
How can I get for cwt (spectrogram plot 2 and 3) a similar spectogram plot and style like in the first one?
It seems like the 1st spectrogram plot compared to the 3rd has much more details.
This would be better as a comment, but since I lack the Karma to do that:
You don't want to make a spectrogram with wavelets, but a scalogram instead. What it looks like you're doing above is projecting your data in a scale subspace (that correlates to frequency), then taking those scales and finding the frequency content of them which is not what you probably want.
The detail and approximation coefficients are what you would want to use directly. Unfortunately, PyWavelets doesn't have a simple plotting function to do this for you, AFAIK. Matlab does, and their help page may be illuminating if I fail.
def scalogram(data):
wave='db4'
coeff=pywt.wavedec(data,wave)
levels=len(coeff)
lengths=[len(co) for co in coeff]
col=np.max(lengths)
im=np.ones([levels,col])
col=col.astype(float)
for level in range(levels):
#print [lengths[level],col]
y=coeff[level]
if lengths[1+level]<col:
x=col/(lengths[1+level]+1)*np.arange(1,len(y)+1)
xi=np.linspace(0,int(col),int(col))
yi=griddata(points=x,values=y,xi=xi,method='nearest')
else:
yi=y
im[level,:]=yi
im[im==0]=np.nan
tiles=sum(lengths)-lengths[0]
return im,tiles
Wxx,tiles=scalogram(data)
IM=plt.imshow(np.log10(abs(Wxx)),aspect='auto')
plt.show()
There are better ways of doing that, but it works. This produces a square matrix similar to spectrogram in "Wxx", and tiles is simply a counter of the number of time-frequency tilings to compare to the number used in a SFFT.
I've attached a picture of what these tilings look like
Pretty much exactly what the question states, but a little context:
I'm creating a program to plot a large number of points (~10,000, but it will be more later on). This is being done using matplotlib's plt.scatter. This command is part of a loop that saves the figure, so I can later animate it.
What I want to be able to do is randomly select a small portion of these particles (say, maybe 100?) and give them a different marker than the rest, even though they're part of the same data set. This is so I can use them as placeholders to see the motion of individual particles, as well as the bulk material.
Is there a way to use a different marker for a small subset of the same data?
For reference, the particles are uniformly distributed just using the numpy random sampler, but my code for that is:
for i in range(N): # N number of particles
particle_position[i] = np.random.uniform(0, xmax) # Initialize in spatial domain
particle_velocity[i] = np.random.normal(0, 5) # Initialize in velocity space
for i in range(maxtime):
plt.scatter(particle_position, particle_velocity, s=1, c=norm_xvel, cmap=br_disc, lw=0)
The position and velocity change on each iteration of the main loop (there's quite a bit of code), but these are the main initialization and plotting routines.
I had an idea that perhaps I could randomly select a bunch of i values from range(N), and use an ax.scatter() command to plot them on the same axes?
Here is a possible solution to have a subset of your points identified with a different marker:
import matplotlib.pyplot as plt
import numpy as np
SIZE = 100
SAMPLE_SIZE = 10
def select_subset(seq, size):
"""selects a subset of the data using ...
"""
return seq[:size]
points_x = np.random.uniform(-1, 1, size=SIZE)
points_y = np.random.uniform(-1, 1, size=SIZE)
plt.scatter(points_x, points_y, marker=".", color="blue")
plt.scatter(select_subset(points_x, SAMPLE_SIZE),
select_subset(points_y, SAMPLE_SIZE),
marker="o", color="red")
plt.show()
It uses plt.scatter twice; once on the full data set, the other on the sample points.
You will have to decide how you want to select the sample of points - it is isolated in the select_subset function..
You could also extract the sample points from the data set to prevent marking them twice, but numpy is rather inefficient at deleting or resizing.
Maybe a better method is to use a mask? A mask has the advantage of leaving your original data intact and in order.
Here is a way to proceed with masks:
import matplotlib.pyplot as plt
import numpy as np
import random
SIZE = 100
SAMPLE_SIZE = 10
def make_mask(data_size, sample_size):
mask = np.array([True] * sample_size + [False ] * (data_size - sample_size))
np.random.shuffle(mask)
return mask
points_x = np.random.uniform(-1, 1, size=SIZE)
points_y = np.random.uniform(-1, 1, size=SIZE)
mask = make_mask(SIZE, SAMPLE_SIZE)
not_mask = np.invert(mask)
plt.scatter(points_x[not_mask], points_y[not_mask], marker=".", color="blue")
plt.scatter(points_x[mask], points_y[mask], marker="o", color="red")
plt.show()
As you see, scatter is called once on a subset of the data points (the ones not selected in the sample), and a second time on the sampled subset, and draws each subset with its own marker. It is efficient & leaves the original data intact.
The code below does what you want. I have selected a random set v_sub_index of N_sub indices in the correct range (0 to N) and draw those (with _sub suffix) from the larger samples particle_position and particle_velocity. Please note that you don't have to loop to generate random samples. Numpy has great functionality for that without having to use for loops.
import numpy as np
import matplotlib.pyplot as pl
N = 100
xmax = 1.
v_sigma = 2.5 / 2. # 95% of the samples contained within 0, 5
v_mean = 2.5 # mean at 2.5
N_sub = 10
v_sub_index = np.random.randint(0, N, N_sub)
particle_position = np.random.rand (N) * xmax
particle_velocity = np.random.randn(N)
particle_position_sub = np.array(particle_position[v_sub_index])
particle_velocity_sub = np.array(particle_velocity[v_sub_index])
particle_position_nosub = np.delete(particle_position, v_sub_index)
particle_velocity_nosub = np.delete(particle_velocity, v_sub_index)
pl.scatter(particle_position_nosub, particle_velocity_nosub, color='b', marker='o')
pl.scatter(particle_position_sub , particle_velocity_sub , color='r', marker='^')
pl.show()
I am trying to plot contour lines of pressure level. I am using a netCDF file which contain the higher resolution data (ranges from 3 km to 27 km). Due to higher resolution data set, I get lot of pressure values which are not required to be plotted (rather I don't mind omitting certain contour line of insignificant values). I have written some plotting script based on the examples given in this link http://matplotlib.org/basemap/users/examples.html.
After plotting the image looks like this
From the image I have encircled the contours which are small and not required to be plotted. Also, I would like to plot all the contour lines smoother as mentioned in the above image. Overall I would like to get the contour image like this:-
Possible solution I think of are
Find out the number of points required for plotting contour and mask/omit those lines if they are small in number.
or
Find the area of the contour (as I want to omit only circled contour) and omit/mask those are smaller.
or
Reduce the resolution (only contour) by increasing the distance to 50 km - 100 km.
I am able to successfully get the points using SO thread Python: find contour lines from matplotlib.pyplot.contour()
But I am not able to implement any of the suggested solution above using those points.
Any solution to implement the above suggested solution is really appreciated.
Edit:-
# Andras Deak
I used print 'diameter is ', diameter line just above del(level.get_paths()[kp]) line to check if the code filters out the required diameter. Here is the filterd messages when I set if diameter < 15000::
diameter is 9099.66295612
diameter is 13264.7838257
diameter is 445.574234531
diameter is 1618.74618114
diameter is 1512.58974168
However the resulting image does not have any effect. All look same as posed image above. I am pretty sure that I have saved the figure (after plotting the wind barbs).
Regarding the solution for reducing the resolution, plt.contour(x[::2,::2],y[::2,::2],mslp[::2,::2]) it works. I have to apply some filter to make the curve smooth.
Full working example code for removing lines:-
Here is the example code for your review
#!/usr/bin/env python
from netCDF4 import Dataset
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage
from mpl_toolkits.basemap import interp
from mpl_toolkits.basemap import Basemap
# Set default map
west_lon = 68
east_lon = 93
south_lat = 7
north_lat = 23
nc = Dataset('ncfile.nc')
# Get this variable for later calucation
temps = nc.variables['T2']
time = 0 # We will take only first interval for this example
# Draw basemap
m = Basemap(projection='merc', llcrnrlat=south_lat, urcrnrlat=north_lat,
llcrnrlon=west_lon, urcrnrlon=east_lon, resolution='l')
m.drawcoastlines()
m.drawcountries(linewidth=1.0)
# This sets the standard grid point structure at full resolution
x, y = m(nc.variables['XLONG'][0], nc.variables['XLAT'][0])
# Set figure margins
width = 10
height = 8
plt.figure(figsize=(width, height))
plt.rc("figure.subplot", left=.001)
plt.rc("figure.subplot", right=.999)
plt.rc("figure.subplot", bottom=.001)
plt.rc("figure.subplot", top=.999)
plt.figure(figsize=(width, height), frameon=False)
# Convert Surface Pressure to Mean Sea Level Pressure
stemps = temps[time] + 6.5 * nc.variables['HGT'][time] / 1000.
mslp = nc.variables['PSFC'][time] * np.exp(9.81 / (287.0 * stemps) * nc.variables['HGT'][time]) * 0.01 + (
6.7 * nc.variables['HGT'][time] / 1000)
# Contour only at 2 hpa interval
level = []
for i in range(mslp.min(), mslp.max(), 1):
if i % 2 == 0:
if i >= 1006 and i <= 1018:
level.append(i)
# Save mslp values to upload to SO thread
# np.savetxt('mslp.txt', mslp, fmt='%.14f', delimiter=',')
P = plt.contour(x, y, mslp, V=2, colors='b', linewidths=2, levels=level)
# Solution suggested by Andras Deak
for level in P.collections:
for kp,path in enumerate(level.get_paths()):
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter < 15000: # threshold to be refined for your actual dimensions!
#print 'diameter is ', diameter
del(level.get_paths()[kp]) # no remove() for Path objects:(
#level.remove() # This does not work. produces ValueError: list.remove(x): x not in list
plt.gcf().canvas.draw()
plt.savefig('dummy', bbox_inches='tight')
plt.close()
After the plot is saved I get the same image
You can see that the lines are not removed yet. Here is the link to mslp array which we are trying to play with http://www.mediafire.com/download/7vi0mxqoe0y6pm9/mslp.txt
If you want x and y data which are being used in the above code, I can upload for your review.
Smooth line
You code to remove the smaller circles working perfectly. However the other question I have asked in the original post (smooth line) does not seems to work. I have used your code to slice the array to get minimal values and contoured it. I have used the following code to reduce the array size:-
slice = 15
CS = plt.contour(x[::slice,::slice],y[::slice,::slice],mslp[::slice,::slice], colors='b', linewidths=1, levels=levels)
The result is below.
After searching for few hours I found this SO thread having simmilar issue:-
Regridding regular netcdf data
But none of the solution provided over there works.The questions similar to mine above does not have proper solutions. If this issue is solved then the code is perfect and complete.
General idea
Your question seems to have 2 very different halves: one about omitting small contours, and another one about smoothing the contour lines. The latter is simpler, since I can't really think of anything else other than decreasing the resolution of your contour() call, just like you said.
As for removing a few contour lines, here's a solution which is based on directly removing contour lines individually. You have to loop over the collections of the object returned by contour(), and for each element check each Path, and delete the ones you don't need. Redrawing the figure's canvas will get rid of the unnecessary lines:
# dummy example based on matplotlib.pyplot.clabel example:
import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
# difference of Gaussians
Z = 10.0 * (Z2 - Z1)
plt.figure()
CS = plt.contour(X, Y, Z)
for level in CS.collections:
for kp,path in reversed(list(enumerate(level.get_paths()))):
# go in reversed order due to deletions!
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter<1: # threshold to be refined for your actual dimensions!
del(level.get_paths()[kp]) # no remove() for Path objects:(
# this might be necessary on interactive sessions: redraw figure
plt.gcf().canvas.draw()
Here's the original(left) and the removed version(right) for a diameter threshold of 1 (note the little piece of the 0 level at the top):
Note that the top little line is removed while the huge cyan one in the middle doesn't, even though both correspond to the same collections element i.e. the same contour level. If we didn't want to allow this, we could've called CS.collections[k].remove(), which would probably be a much safer way of doing the same thing (but it wouldn't allow us to differentiate between multiple lines corresponding to the same contour level).
To show that fiddling around with the cut-off diameter works as expected, here's the result for a threshold of 2:
All in all it seems quite reasonable.
Your actual case
Since you've added your actual data, here's the application to your case. Note that you can directly generate the levels in a single line using np, which will almost give you the same result. The exact same can be achieved in 2 lines (generating an arange, then selecting those that fall between p1 and p2). Also, since you're setting levels in the call to contour, I believe the V=2 part of the function call has no effect.
import numpy as np
import matplotlib.pyplot as plt
# insert actual data here...
Z = np.loadtxt('mslp.txt',delimiter=',')
X,Y = np.meshgrid(np.linspace(0,300000,Z.shape[1]),np.linspace(0,200000,Z.shape[0]))
p1,p2 = 1006,1018
# this is almost the same as the original, although it will produce
# [p1, p1+2, ...] instead of `[Z.min()+n, Z.min()+n+2, ...]`
levels = np.arange(np.maximum(Z.min(),p1),np.minimum(Z.max(),p2),2)
#control
plt.figure()
CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
#modified
plt.figure()
CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
for level in CS.collections:
for kp,path in reversed(list(enumerate(level.get_paths()))):
# go in reversed order due to deletions!
# include test for "smallness" of your choice here:
# I'm using a simple estimation for the diameter based on the
# x and y diameter...
verts = path.vertices # (N,2)-shape array of contour line coordinates
diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
if diameter<15000: # threshold to be refined for your actual dimensions!
del(level.get_paths()[kp]) # no remove() for Path objects:(
# this might be necessary on interactive sessions: redraw figure
plt.gcf().canvas.draw()
plt.show()
Results, original(left) vs new(right):
Smoothing by resampling
I've decided to tackle the smoothing problem as well. All I could come up with is downsampling your original data, then upsampling again using griddata (interpolation). The downsampling part could also be done with interpolation, although the small-scale variation in your input data might make this problem ill-posed. So here's the crude version:
import scipy.interpolate as interp #the new one
# assume you have X,Y,Z,levels defined as before
# start resampling stuff
dN = 10 # use every dN'th element of the gridded input data
my_slice = [slice(None,None,dN),slice(None,None,dN)]
# downsampled data
X2,Y2,Z2 = X[my_slice],Y[my_slice],Z[my_slice]
# same as X2 = X[::dN,::dN] etc.
# upsampling with griddata over original mesh
Zsmooth = interp.griddata(np.array([X2.ravel(),Y2.ravel()]).T,Z2.ravel(),(X,Y),method='cubic')
# plot
plt.figure()
CS = plt.contour(X, Y, Zsmooth, colors='b', linewidths=2, levels=levels)
You can freely play around with the grids used for interpolation, in this case I just used the original mesh, as it was at hand. You can also play around with different kinds of interpolation: the default 'linear' one will be faster, but less smooth.
Result after downsampling(left) and upsampling(right):
Of course you should still apply the small-line-removal algorithm after this resampling business, and keep in mind that this heavily distorts your input data (since if it wasn't distorted, then it wouldn't be smooth). Also, note that due to the crude method used in the downsampling step, we introduce some missing values near the top/right edges of the region under consideraton. If this is a problem, you should consider doing the downsampling based on griddata as I've noted earlier.
This is a pretty bad solution, but it's the only one that I've come up with. Use the get_contour_verts function in this solution you linked to, possibly with the matplotlib._cntr module so that nothing gets plotted initially. That gives you a list of contour lines, sections, vertices, etc. Then you have to go through that list and pop the contours you don't want. You could do this by calculating a minimum diameter, for example; if the max distance between points is less than some cutoff, throw it out.
That leaves you with a list of LineCollection objects. Now if you make a Figure and Axes instance, you can use Axes.add_collection to add all of the LineCollections in the list.
I checked this out really quick, but it seemed to work. I'll come back with a minimum working example if I get a chance. Hope it helps!
Edit: Here's an MWE of the basic idea. I wasn't familiar with plt._cntr.Cntr, so I ended up using plt.contour to get the initial contour object. As a result, you end up making two figures; you just have to close the first one. You can replace checkDiameter with whatever function works. I think you could turn the line segments into a Polygon and calculate areas, but you'd have to figure that out on your own. Let me know if you run into problems with this code, but it at least works for me.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
def checkDiameter(seg, tol=.3):
# Function for screening line segments. NB: Not actually a proper diameter.
diam = (seg[:,0].max() - seg[:,0].min(),
seg[:,1].max() - seg[:,1].min())
return not (diam[0] < tol or diam[1] < tol)
# Create testing data
x = np.linspace(-1,1, 21)
xx, yy = np.meshgrid(x,x)
z = np.exp(-(xx**2 + .5*yy**2))
# Original plot with plt.contour
fig0, ax0 = plt.subplots()
# Make sure this contour object actually has a tiny contour to remove
cntrObj = ax0.contour(xx,yy,z, levels=[.2,.4,.6,.8,.9,.95,.99,.999])
# Primary loop: Copy contours into a new LineCollection
lineNew = list()
for lineOriginal in cntrObj.collections:
# Get properties of the original LineCollection
segments = lineOriginal.get_segments()
propDict = lineOriginal.properties()
propDict = {key: value for (key,value) in propDict.items()
if key in ['linewidth','color','linestyle']} # Whatever parameters you want to carry over
# Filter out the lines with small diameters
segments = [seg for seg in segments if checkDiameter(seg)]
# Create new LineCollection out of the OK segments
if len(segments) > 0:
lineNew.append(mpl.collections.LineCollection(segments, **propDict))
# Make new plot with only these line collections; display results
fig1, ax1 = plt.subplots()
ax1.set_xlim(ax0.get_xlim())
ax1.set_ylim(ax0.get_ylim())
for line in lineNew:
ax1.add_collection(line)
plt.show()
FYI: The bit with propDict is just to automate bringing over some of the line properties from the original plot. You can't use the whole dictionary at once, though. First, it contains the old plot's line segments, but you can just swap those for the new ones. But second, it appears to contain a number of parameters that are in conflict with each other: multiple linewidths, facecolors, etc. The {key for key in propDict if I want key} workaround is my way to bypass that, but I'm sure someone else can do it more cleanly.
I am trying to create four gabor patches, very similar to those below.
I don't need them to be identical to the pictures below, but similar.
Despite a bit of tinkering, I have been unable to reproduce these images...
I believe they were created in MATLAB originally. I don't have access to the original MATLAB code.
I have the following code in python (2.7.10):
import numpy as np
from scipy.misc import toimage # One can also use matplotlib*
data = gabor_fn(sigma = ???, theta = 0, Lambda = ???, psi = ???, gamma = ???)
toimage(data).show()
*graphing a numpy array with matplotlib
gabor_fn, from here, is defined below:
def gabor_fn(sigma,theta,Lambda,psi,gamma):
sigma_x = sigma;
sigma_y = float(sigma)/gamma;
# Bounding box
nstds = 3;
xmax = max(abs(nstds*sigma_x*numpy.cos(theta)),abs(nstds*sigma_y*numpy.sin(theta)));
xmax = numpy.ceil(max(1,xmax));
ymax = max(abs(nstds*sigma_x*numpy.sin(theta)),abs(nstds*sigma_y*numpy.cos(theta)));
ymax = numpy.ceil(max(1,ymax));
xmin = -xmax; ymin = -ymax;
(x,y) = numpy.meshgrid(numpy.arange(xmin,xmax+1),numpy.arange(ymin,ymax+1 ));
(y,x) = numpy.meshgrid(numpy.arange(ymin,ymax+1),numpy.arange(xmin,xmax+1 ));
# Rotation
x_theta=x*numpy.cos(theta)+y*numpy.sin(theta);
y_theta=-x*numpy.sin(theta)+y*numpy.cos(theta);
gb= numpy.exp(-.5*(x_theta**2/sigma_x**2+y_theta**2/sigma_y**2))*numpy.cos(2*numpy.pi/Lambda*x_theta+psi);
return gb
As you may be able to tell, the only difference (I believe) between the images is contrast. So, gabor_fn would likely needed to be altered to do allow for this (unless I misunderstand one of the params)...I'm just not sure how.
UPDATE:
from math import pi
from matplotlib import pyplot as plt
data = gabor_fn(sigma=5.,theta=pi/2.,Lambda=12.5,psi=90,gamma=1.)
unit = #From left to right, unit was set to 1, 3, 7 and 9.
bound = 0.0009/unit
fig = plt.imshow(
data
,cmap = 'gray'
,interpolation='none'
,vmin = -bound
,vmax = bound
)
plt.axis('off')
The problem you are having is a visualization problem (although, I think you are chossing too large parameters).
By default matplotlib, and scipy's (toimage) use bilinear (or trilinear) interpolation, depending on your matplotlib's configuration script. That's why your image looks so smooth. It is because your pixels values are being interpolated, and you are not displaying the raw kernel you have just calculated.
Try using matplotlib with no interpolation:
from matplotlib import pyplot as plt
plt.imshow(data, 'gray', interpolation='none')
plt.show()
For the following parameters:
data = gabor_fn(sigma=5.,theta=pi/2.,Lambda=25.,psi=90,gamma=1.)
You get this output:
If you reduce lamda to 15, you get something like this:
Additionally, the sigma you choose changes the strength of the smoothing, adding parameters vmin=-1 and vmax=1 to imshow (similar to what #kazemakase) suggested, will give you the desired contrast.
Check this guide for sensible values (and ways to use) gabor kernels:
http://scikit-image.org/docs/dev/auto_examples/plot_gabor.html
It seems like toimage scales the input data so that the min/max values are mapped to black/white.
I do not know what amplitudes to reasonably expect from gabor patches, but you should try something like this:
toimage(data, cmin=-1, cmax=1).show()
This tells toimage what range your data is in. You can try to play around with cmin and cmax, but make sure they are symmetric (i.e. cmin=-x, cmax=x) so that a value of 0 maps to grey.