I have a WRF output that is on a curvilinear projection (native lambert conformal projection), therefore there are 2D coordinates (XLONG & XLAT) associated with it. I am able to subset the data into a rectangular grid by slicing the array
e.g.
xlat = constants.variables['XLAT'][0,749:915,220:458]
xlon = constants.variables['XLONG'][0,749:915,220:458]
However, I want to subset all the grid points that are bounded by specific latitudes and longitudes to get a sort of trapezoid shape of grid points. I have attached an image to make it easy to understand. I want the grid points bounded by the red line, instead of the grid points within the blue box.
https://www.dropbox.com/s/bxnhuhyoena8a8e/WRF_StudySites.pdf?dl=0
This can be done in NCL (NCAR command line) using the where() function but I am having trouble doing the same thing in python.
Any tips on how I could possibly do this?
Thanks!
I used the xarray package for this. See here for the explanation of its where() function. The longitude in my dataset ranges from -5 to 13, so conditioning that XLONG>0 returns only part of my data (as intended).
For example:
import xarray as xr
import matplotlib.pyplot as plt
# Load dataset with xarray (I only import T2 here, and only the first time)
wrftemp = xr.open_dataset('wrfout_d01_.....').T2.isel(Time=1)
# Make a figure of the T2 temperature field
fig,(ax1,ax2) = plt.subplots(1,2)
wrftemp.plot(ax=ax1)
wrftemp.where(wrftemp.XLONG>0).plot(ax=ax2)
plt.show()
Related
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I'm working on a project to calculate the centroid of a state/country using python.
What I have done so far:
Take an outline of the state and run it through ImageJ to create a csv of the x,y coordinates of the border. This gives me a .csv file with data like this:
556,243
557,243
557,250
556,250
556,252
555,252
555,253
554,253
etc, etc,
For about 2500 data points.
Import this list into a Python script.
Calculate the average of the x and y coordinate arrays. This point is the centroid. (Idea similar to this)
Plot the points and the centroid using matplotlib.
Here is my code:
#####################################################
# Imports #
#####################################################
import csv
import matplotlib.pyplot as plt
import numpy as np
import pylab
#####################################################
# Setup #
#####################################################
#Set empty list for coordinates
x,y =[],[]
#Importing csv data
with open("russiadata.csv", "r") as russiadataFile:
russiadataReader = csv.reader(russiadataFile)
#Create list of points
russiadatalist = []
#Import data
for row in russiadataReader:
#While the rows have data, AKA length not equal to zero.
if len(row) != 0:
#Append data to arrays created above
x.append(float(row[0]))
y.append(float(row[1]))
#Close file as importing is done
russiadataFile.closejust flipped around the
#####################################################
# Data Analysis #
#####################################################
#Convert list to array for computations
x=np.array(x)
y=np.array(y)
#Calculate number of data points
x_len=len(x)just flipped around the
y_len=len(y)
#Set sum of points equal to x_sum and y_sum
x_sum=np.sum(x)
y_sum=np.sum(y)
#Calculate centroid of points
x_centroid=x_sum/x_len
y_centroid=y_sum/y_len
#####################################################
# Plotting #
#####################################################
#Plot all points in data
plt.xkcd()
plt.plot(x,y, "-.")
#Plot centroid and label it
plt.plot(x_centroid,y_centroid,'^')
plt.ymax=max(x)
#Add axis labels
plt.xlabel("X")
plt.ylabel("Y")
plt.title("russia")
#Show the plot
plt.show()
The problem I have run into is that some sides of the state have more points than others, so the centroid is being weighted towards areas with more points. This is not what I want. I'm trying to find the centroid of the polygon that has vertices from the x,y coordinates.
This is what my plot looks like:
https://imgur.com/a/ZdukA
As you can see, the centroid is weighted more towards the section of points with more density. (As a side note, yes, that is Russia. I'm having issues with the plot coming out backwards and stretched/squashed.)
In other words, is there a more accurate way to get the centroid?
Thanks in advance for any help.
It sounds to me like you don't want your centroid to be calculated with the density of the scatter in mind.
If you just want to use surface area, then I would eliminate any point that is contained within the current outline of the scatter. A slightly more accurate way might be to pretend there is a box outlined by your outer-most points, then to check the x- and y-coordinates of all of your points and eliminate any that fall inside of the box. Any points that fall inside the current outline are not contributing to the shape, only the density.
I think the most technical and accurate approach would be very complicated, and here's what I think it would require: to get the outer-most points to connect based on least distance from each other, and furthest distance from all other points. By "connect" I mean to pretend that a line passes through, and ends at, both points. It should be defined mathematically.
Then, for each point, calculate whether or not it falls inside or outside of this outline, and eliminate all that fall inside (they are redundant as they are already inside the shape).
You can find the correct formula for a closed polygon on Wikipedia: https://en.wikipedia.org/wiki/Centroid#Centroid_of_a_polygon
Another formula is helpful to deal with Kaliningrad oblast (exclave) and islands (if you want to be really precise): https://en.wikipedia.org/wiki/Centroid#By_geometric_decomposition
That said, such questions probably fit better to https://math.stackexchange.com
Some background first:
I want to plot of Mel-Frequency Cepstral Coefficients of various songs and compare them.
I calculate MFCC's throughout a song and then average them to get one array of 13 coefficients. I want this to represent one point on a graph that I plot.
I'm new to Python and very new to any form of plotting (though I've seen some recommendations to use matplotlib).
I want to be able to visualize this data. Any thoughts on how I might go about doing this?
Firstly, if you want to represent an array of 13 coefficients as a single point in your graph, then you need to break the 13 coefficients down to the number of dimensions in your graph as yan king yin pointed out in his comment.
For projecting your data into 2 dimensions you can either create relevant indicators yourself such as max/min/standard deviation/.... or you apply methods of dimensionality reduction such as PCA.
Whether or not to do so and how to do so is another topic.
Then, plotting is easy and is done as here:
http://matplotlib.org/api/pyplot_api.html
I provide an example code for this solution:
import matplotlib.pyplot as plt
import numpy as np
#fake example data
song1 = np.asarray([1, 2, 3, 4, 5, 6, 2, 35, 4, 1])
song2 = song1*2
song3 = song1*1.5
#list of arrays containing all data
data = [song1, song2, song3]
#calculate 2d indicators
def indic(data):
#alternatively you can calulate any other indicators
max = np.max(data, axis=1)
min = np.min(data, axis=1)
return max, min
x,y = indic(data)
plt.scatter(x, y, marker='x')
plt.show()
The results looks like this:
Yet i want to suggest another solution to your underlying problem, namely: plotting multidimensional data.
I recommend using something parralel coordinate plot which can be constructed with the same fake data:
import pandas as pd
pd.DataFrame(data).T.plot()
plt.show()
Then the result shows all coefficents for each song along the x axis and their value along the y axis. I would looks as follows:
UPDATE:
In the meantime I have discovered the Python Image Gallery which contains two nice example of high dimensional visualization with reference code:
Radar chart
Parallel plot
I have a Matrix that contains N users and K items. I want to plot that matrix in Python by considering each line as a vector with multiple coordinates. For example a simple point plot require X,Y. My vector hasK coordinates and I want to plot each one of those N vectors as a point to see there similarities. Can anyone help me with that ?
UPDATE:
#Matrix M shape = (944, 1683)
plt.figure()
plt.imshow(M, interpolation='nearest', cmap=plt.cm.ocean)
plt.colorbar()
plt.show()
but this gave me as result :
What I want is something like that:
It is difficult from this question to be sure if my answer is relevant, but here's my best guess. I believe deltascience is asking how multidimensional vectors are generally plotted into two-dimensional space, as would be the case with a scatter plot. I think the best answer is that some kind of dimension reduction algorithm is generally performed. In other words, you don't do this by finding the right matplotlib code; you get your data into the right shape (one list for the X axis, and another list for the Y axis) and you then plot it using a typical matplotlib approach:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
M = np.random.rand(944, 1683)
pca = PCA(n_components=2)
reduced = pca.fit_transform(M)
# We need a 2 x 944 array, not 944 by 2 (all X coordinates in one list)
t = reduced.transpose()
plt.scatter(t[0], t[1])
plt.show()
Here are some relevant links:
https://stats.stackexchange.com/questions/63589/how-to-project-high-dimensional-space-into-a-two-dimensional-plane
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
https://towardsdatascience.com/the-art-of-effective-visualization-of-multi-dimensional-data-6c7202990c57
https://www.evl.uic.edu/documents/etemadpour_choosingvisualization_springer2016.pdf
July 2019 Addendum: It didn't occur to me at the the time, but another way people often visualize multi-dimensional data is with network visualization. Each multi-dimensional array in this context would be a node, and the edge weight would be something like the cosine similarity of two nodes, or the Euclidian distance. Networkx in python has some really nice visualization options.
I have some data over a 2D range that I am interested in analyzing. These data were originally in lists x,y, and z where z[i] was the value for the point located at (x[i],y[i]). I then interpolated this data onto a regular grid using
x=np.array(x)
y=np.array(y)
z=np.array(z)
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
zi=griddata(x,y,z,xi,yi)
I then plotted the xi,yi,zi data using
plt.contour(xi,yi,zi)
plt.pcolormesh(xi,yi,zi,cmap=plt.get_cmap('PRGn'),norm=plt.Normalize(-10,10),vmin=-10,vmax=10)
This produced this plot:
In this plot you can see the S-like curve where the values are equal to zero (aside: the data doesn't vary as rapidly as shown in the colorbar -- that's simply a result of me normalizing the data to -10-10 when it actually extends far beyond that range; I did this to make the zero-valued region show up better -- maybe there's a better way of doing this too...).
The scattered dots are simply the points at which I have original data (yes, in this case my data was already on a regular grid). What I'm curious about is whether there is a good way for me to extract the values for which the curve is zero and obtain x,y pairs that, if plotted as a line, would trace that zero-region in the colormesh. I could interpolate to a really fine grid and then just brute force search for the values which are closest to zero. But is there a more automatic way of doing this, or a more automatic way of plotting this "zero-line"?
And a secondary question: I am using griddata correctly, right? I have these simple 1D arrays although elsewhere people use various meshgrids, loading texts, etc., before calling griddata.
Here is a full example:
import numpy as np
import matplotlib.pyplot as plt
y, x = np.ogrid[-1.5:1.5:200j, -1.5:1.5:200j]
f = (x**2 + y**2)**4 - (x**2 - y**2)**2
plt.figure(figsize=(9,4))
plt.subplot(121)
extent = [np.min(x), np.max(x), np.min(y), np.max(y)]
cs = plt.contour(f, extent=extent, levels=[0.1],
colors=["b", "r"], linestyles=["solid", "dashed"], linewidths=[2, 2])
plt.subplot(122)
# get the points on the lines
for c in cs.collections:
data = c.get_paths()[0].vertices
plt.plot(data[:,0], data[:,1],
color=c.get_color()[0], linewidth=c.get_linewidth()[0])
plt.show()
here is the output: