Interpolating to get rid of NANs and contour plot - python

I have these arrays that I need to interpolate and make the smoothest possible interpolation:
x = time
y = height
z = latitude
print np.shape(x)
print np.shape(y)
print np.shape(z)
Result:
(99, 25)
(99, 25)
(99, 25)
y is altitude and it's not uniform. It has a bunch of nan's and even though they're all the same size (a variable n_alt with the number of altitudes, which is for this example 99).
x is time and it's uniform all the way through (all the values in one column of that array are the same).
z is latitude and it's the actual 'z' and it's an array with the same number of rows as the number of time points and same number of rows as the altitude points.
I want to interpolate in 2D (the data set has series of nans in both x and y directions) to fill the gaps on the data, since several files will cover a certain altitude range and not others
My questions are:
1) is there a good way to fill the gaps the 2 directions while making the grid uniform (the idea is to plot that and also save the interpolated data (x,y and z) into a new file as well)?
2) what's a good way to contour plot the data with the shape I mentioned earlier (tried plt.contour, but it doesn't give a satisfactory result just plotting that straight up)?
Thanks y'all
Edit:
I believe this will illustrate the question better:
X: Time, Y: Altitude, Z: Latitude or Longitude
I essentially want to fill up the white space (I understand the consequences of extrapolations and all, but I just want, at this point, to have an algorithm that works. The blue dots is my grid and the color plot is just a normal plt.contour (no interpolation done). I want to make such that I have blue dots all over the plot area.

Rafael! With respect to your interpolation question, I can explain the math if you want to manually come up with an interpolation function, but there is an existing resource you might want to look into: scipy.interpolate.RegularGridInterpolator
(see https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.interpolate.RegularGridInterpolator.html)
If I have misunderstood your issue, another interpolation method from the class might be appropriate: see, scipy.interpolate
For plotting the 3d surface, https://matplotlib.org/examples/mplot3d/surface3d_demo.html might help guide you! Let me know if this helps! Just comment if you would like me to expand! Hopefully those are the resources you were looking for!

Related

2D interpolate list of many points [duplicate]

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

Interpolate a curve on itself using NumPy

I have the following curve as two arrays, of x and y positions.
Imagine if you were to draw vertical lines going through each point, and add points on the curve wherever these lines intersect the curve. This is what I want.
I tried using np.interp(x, x, y), but I ended up with the following mess:
How can I do this? Is it possible with np.interp?
This might be something that should be asked in a different question, but I would also like there to be points added where the curve crosses over itself.
According to the docs the array of X values should be sorted (or periodic), otherwise "the result is nonsense". You can try to split your curve into sections, and then interpolate each part on the others. You can find the correct splitting places by looking at where np.diff(x) changes sign.

Python - Problems contour plotting offset grid of data

My data is regularly spaced, but not quite a grid - each row of points is slightly offset from the one below.
The data is in the form of 3 1D arrays, x, y, z, with each index corresponding to a point. It is smoothly varying data - approximately Gaussian.
The point density is quite high. What is the best way to plot this data?
I tried meshgrid, but it gives me some bad contours through regions that have no data points near the contour's value.
I have tried rbf interpolation according to this post:
Python : 2d contour plot from 3 lists : x, y and rho?
but this just gives me nonsense - all the contours are on one edge - does not reflect the data at all.
Any other ideas for what I can try. Maybe I should be using some sort of nearest neighbour interpolation? Here is a picture of about a 1/4 of my data points: http://imgur.com/a/b00R6
I'm surprised it is causing me such difficulty - it seems like it should be fairly easy to plot.
The easiest way to plot ungridded data is probably tricontour or tricontourf (a filled tricontour plot).
Having 1D arrays of the x, y and z coordinates x, y and z, you'd simply call
plt.tricontourf(x,y,z, n, ...)
to obtain n levels of contours.
The other quick method is to interpolate on a grid using matplotlib.mlab.griddata to obtain a regular grid from the irregular points.
Both methods are compared in an example on the matplotlib page:
Tricontour vs. griddata
Found the answer: needed to rescale my data.

Trying to plot some data in matplotlib with numpy

I'm trying to simulate Conway's Game of Life in python(here is some of the code), and now I need to handle the ouput. Right now, I'm just plotting points in matplotlib but I want something like what this guy did(That script shows error in my PC but it generates the images anyway). I understand that the code I am looking for is:
plt.imshow(A, cmap='bone', interpolation='nearest')
plt.axis('off')
and that A is a numpy array alike a matrix with just True and False as entries.
By the way, I've already realized that instead of True and False I can put 1's and 0's.
I have the data of living cells as a set of points ([(x1,y1),(x2,y2),....,(xn,yn)]) of the plane(coordinates all integers). As you can see, my script is finite(it uses a for loop until 30), so I preset the plots' axis before the loop...for example, the minimum x coordinate of the plots is the minimum coordinate of the initial points minus 30, assuring then that all the points are visible in the last image.
To represent each configuration, I had the idea to do:
SuperArray=np.zeros(maxx+30,maxy+30)
for (i,j) in livecells:
SuperArray[i,j]=1
But that idea won't work, because the indices of SuperArray are all positives, and my coordinates maybe negative. To solve this I was thinking in translate ALL of the points in livecells so their coordinates be positive. I would do that by adding |minx|+30 to the x coordinate and |miny|+30 to the y coordinate
of each (x,y) in livecells...I haven't put it in practice yet, but it seems too complicated and memory consuming...Do you guys have any suggestion?

Boxplot on distance Data - set Box manually to values

I have a bunch of 2d points and angles. To visualise the amount of movement i wanted to use a boxplot and plot the difference to the mean of the points.
I sucessfully visualised the angle jitter using python and matplotlib in the following boxplot:
Now i want to do the same for my position Data. After computing the euclidean distance all the data is positive, so a naive boxplot will give wrong results. For an Example see the boxplot at the bottom, points that are exactly on the mean have a distance of zero and are now outliers.
So my Question is:
How can i set the bottom end of the box and the whiskers manually onto zero?
If i should take another approach like a bar chart please tell me (i would like to use the same style though)
Edit:
It looks similar to the following plot at the moment (This a plot of the distance the angle have from their mean).
As you can see the boxplot does't cover the zero. That is correct for the data, but not for the meaning behind it! Zero is perfect (since it represents a points that was exactly in the middle of the angles) but it is not included in the boxplot.
I found out it has already been asked before in this question on SO. While not as exact duplicate, the other question contains the answer!
In matplotlib 1.4 will probably be a faster way to do it, but for now the answer in the other thread seems to be the best way to go.
Edit:
Well it turned out that i couldn't use their approach since i have plt.boxplot(data, patch_artist=True) to get all the other fancy stuff.
So i had to resort to the following ugly final solution:
N = 12 #number of my plots
upperBoxPoints= []
for d in data:
upperBoxPoints.append(np.percentile(d, 75))
w = 0.5 # i had to tune the width by hand
ind = range(0,N) #compute the correct placement from number and width
ind = [x + 0.5+(w/2) for x in ind]
for i in range(N):
rect = ax.bar(ind[i], menMeans[i], w, color=color[i], edgecolor='gray', linewidth=2, zorder=10)
# ind[i] position
# menMeans[i] hight of box
# w width
# color=color[i] as you can see i have a complex color scheme, use '#AAAAAAA' for colors, html names won't work
# edgecolor='gray' just like the other one
# linewidth=2 dito
# zorder=2 IMPORTANT you have to use at least 2 to draw it over the other stuff (but not to high or it is over your horizontal orientation lines
And the final result:

Categories