How to make grid of the irregular data? - python

I have the numpy arrays of longitudes, latitudes, and the data.
I want to plot this data as a raster image using numpy, scipy, and matplotlib.
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
longitudes = np.array([[139.79391479492188, 140.51760864257812, 141.19119262695312, 141.82083129882812, 142.41165161132812],
[139.79225158691406, 140.51416015625, 141.18606567382812, 141.8140869140625, 142.40338134765625],
[139.78591918945312, 140.50637817382812, 141.17694091796875, 141.80377197265625, 142.3919677734375],
[139.78387451171875, 140.50253295898438, 141.17147827148438, 141.79678344726562, 142.38360595703125],
[139.77781677246094, 140.4949951171875, 141.16250610351562, 141.78646850585938, 142.37196350097656]],dtype=float)
latitudes = np.array([[55.61929702758789, 55.621070861816406, 55.61888122558594, 55.613487243652344, 55.60547637939453],
[55.53120040893555, 55.532840728759766, 55.53053665161133, 55.525047302246094, 55.5169677734375],
[55.44305419921875, 55.444580078125, 55.44219207763672, 55.43663024902344, 55.42848587036133],
[55.35470199584961, 55.356109619140625, 55.353614807128906, 55.34796905517578, 55.33975601196289],
[55.26683807373047, 55.268131256103516, 55.26553726196289, 55.25981140136719, 55.25152587890625]],dtype=float)
data = np.array([[10, 10, 10, 10, 10],
[20, 20, 20, 20, 20],
[30, 30, 30, 30, 30],
[40, 40, 40, 40, 40],
[50, 50, 50, 50, 50]],dtype=float)
x = longitudes.ravel()
y = latitudes.ravel()
z = data.ravel()
xMin, xMax = np.min(x), np.max(x)
yMin, yMax = np.min(y), np.max(y)
xi = np.linspace(xMin, xMax, 0.005) ##choosen spacing of 0.005
yi = np.linspace(yMin, yMax, 0.005) ##choosen spacing of 0.005
The data are not exactly a grid. Actually I could not imagine how to do it ahead:
zi_matplotlib = griddata(x, y, z, xi, yi, interp='linear')
from scipy.interpolate import griddata ##Using scipy method
zi_scipy = griddata((x, y), z, (xi, yi), method='nearest')
plt.imshow(????)
Any ideas and solution please.

You can use interpolation to convert the distorted grid into a regular grid. The interpolation fits the original data points and returns a function that can be evaluated at any point of your choosing, and in this case, you would choose a regular grid of points.
Here's an example:
import numpy as np
from scipy.interpolate import interp2d
import matplotlib.pyplot as plt
# your data here, as posted in the question
f = interp2d(lon, lat, data, kind="cubic", bounds_error=False)
dlon, dlat = 1.2, .2
xlon = np.linspace(min(lon.flat), max(lon.flat), 20)
xlat = np.linspace(min(lat.flat), max(lat.flat), 20)
# the next few lines are because there seems to be a bug in interp2d
# instead one would just want to use r = interp2d(X.flat, Y.flat) (where X,Y are as below)
# but for the version of scipy I'm using ('0.13.3'), this throws an exception.
r = np.zeros((len(xlon), len(xlat)))
for i, rlat in enumerate(xlat):
for j, rlon in enumerate(xlon):
r[i,j] = f(rlon, rlat)
X, Y = np.meshgrid(xlon, xlat)
plt.imshow(r, interpolation="nearest", origin="lower", extent=[min(xlon), max(xlon), min(xlat), max(xlat)], aspect=6.)
plt.scatter(lon.flat, lat.flat, color='k')
plt.show()
Here, I left the mesh fairly coarse (20x20) and used interpolation="nearest" so you could still see the colored squares representing each of the interpolated values, done, of course, on a regular grid (created using the two linspace calls). Note also the use or origin="lower" which sets the image and the scatter plot to have the same orientation.
To interpret this, the main issue is that changing of values from left-to-right. This is due to the data being specified as constant across the horizontal set of points, but because the points where these specified were warped, the interpolated values slowly change as they move across. For example, the lowest scatter point on the right should have approximately the same color as the highest one towards the left. Also, indicative of this is that there's not much color change between any of the two leftmost pairs, but a lot between the two right most, where the warping is largest.
Note that the interpolation could be done for any values, not only a regular grid, which is just being used for imshow as per the original question. Also note that I used bounds_error=False so I could evaluate a few points slightly outside of the original dataset, but be very careful with this as points outside of the original data will quickly become unreasonable due to the cubics being evaluated beyond the region where they were fit.

Assuming that longitudes and latitudes are equally spaced, you can use imshow directly as it features interpolation:
import numpy as np
import matplotlib.pyplot as plt
longitudes = np.array([[139.79391479492188, 140.51760864257812, 141.19119262695312, 141.82083129882812, 142.41165161132812],
[139.79225158691406, 140.51416015625, 141.18606567382812, 141.8140869140625, 142.40338134765625],
[139.78591918945312, 140.50637817382812, 141.17694091796875, 141.80377197265625, 142.3919677734375],
[139.78387451171875, 140.50253295898438, 141.17147827148438, 141.79678344726562, 142.38360595703125],
[139.77781677246094, 140.4949951171875, 141.16250610351562, 141.78646850585938, 142.37196350097656]],dtype=float)
latitudes = np.array([[55.61929702758789, 55.621070861816406, 55.61888122558594, 55.613487243652344, 55.60547637939453],
[55.53120040893555, 55.532840728759766, 55.53053665161133, 55.525047302246094, 55.5169677734375],
[55.44305419921875, 55.444580078125, 55.44219207763672, 55.43663024902344, 55.42848587036133],
[55.35470199584961, 55.356109619140625, 55.353614807128906, 55.34796905517578, 55.33975601196289],
[55.26683807373047, 55.268131256103516, 55.26553726196289, 55.25981140136719, 55.25152587890625]],dtype=float)
data = np.array([[10, 10, 10, 10, 10],
[20, 20, 20, 20, 20],
[30, 30, 30, 30, 30],
[40, 40, 40, 40, 40],
[50, 50, 50, 50, 50]],dtype=float)
extent = (longitudes[0,0], longitudes[0,-1], latitudes[0,0], latitudes[-1,0])
plt.imshow(data, interpolation='bilinear', extent=extent, aspect='auto')
plt.show()
I'm aware that this does not exactly answer your question. But I think it is an easy solution to the underlying problem.
Edit
I just realized that your data is in fact not exactly a grid, but almost. You have to decide if you still want to use my solution...

Here's an example of a scatter 3d plot using your data, breaking out each set of lat/long data in its own series with respective colored markers.
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
longitudes = np.array([[139.79391479492188, 140.51760864257812, 141.19119262695312, 141.82083129882812, 142.41165161132812],
[139.79225158691406, 140.51416015625, 141.18606567382812, 141.8140869140625, 142.40338134765625],
[139.78591918945312, 140.50637817382812, 141.17694091796875, 141.80377197265625, 142.3919677734375],
[139.78387451171875, 140.50253295898438, 141.17147827148438, 141.79678344726562, 142.38360595703125],
[139.77781677246094, 140.4949951171875, 141.16250610351562, 141.78646850585938, 142.37196350097656]],dtype=float)
latitudes = np.array([[55.61929702758789, 55.621070861816406, 55.61888122558594, 55.613487243652344, 55.60547637939453],
[55.53120040893555, 55.532840728759766, 55.53053665161133, 55.525047302246094, 55.5169677734375],
[55.44305419921875, 55.444580078125, 55.44219207763672, 55.43663024902344, 55.42848587036133],
[55.35470199584961, 55.356109619140625, 55.353614807128906, 55.34796905517578, 55.33975601196289],
[55.26683807373047, 55.268131256103516, 55.26553726196289, 55.25981140136719, 55.25152587890625]],dtype=float)
data = np.array([[10, 10, 10, 10, 10],
[20, 20, 20, 20, 20],
[30, 30, 30, 30, 30],
[40, 40, 40, 40, 40],
[50, 50, 50, 50, 50]],dtype=float)
colors = ['r','g','b','k','k']
markers = ['o','o','o','o','^']
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for i in range(5):
ax.scatter(longitudes[i], latitudes[i], data[i], c=colors[i], marker=markers[i])
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_zlabel('Data')
plt.show()
Which results in an image like

Related

How to plot histogram, when the number of values in interval is given? (python)

I know that when you usually plot a histogram you have an array of values and intervals.
But if I have intervals and the number of values that are in those intervals, how can I plot the histogram?
I have something that looks like this:
amounts = np.array([23, 7, 18, 5])
and my interval is from 0 to 4 with step 1,
so on interval [0,1] there are 23 values and so on.
You could probably try matplotlib.pyplot.stairs for this.
import matplotlib.pyplot as plt
import numpy as np
amounts = np.array([23, 7, 18, 5])
plt.stairs(amounts, range(5))
plt.show()
Please mark it as solved if this helps.
I find it easier to just simulate some data having the desired distribution, and then use plt.hist to plot the histogram.
Here is am example. Hopefully it will be helpful!
import numpy as np
import matplotlib.pyplot as plt
amounts = np.array([23, 7, 18, 5])
bin_edges = np.arange(5)
bin_centres = (bin_edges[1:] + bin_edges[:-1]) / 2
# fake some data having the desired distribution
data = [[bc] * amount for bc, amount in zip(bin_centres, amounts)]
data = np.concatenate(data)
hist = plt.hist(data, bins=bin_edges, histtype='step')[0]
plt.show()
# the plotted distribution is consistent with amounts
assert np.allclose(hist, amounts)
If you already know the values, then the histogram just becomes a bar plot.
amounts = np.array([23, 7, 18, 5])
interval = np.arange(5)
midvals = (interval + 0.5)[0:len(vals)-1] # 0.5, 1.5, 2.5, 3.5
plt.bar(midvals,
amounts)
plt.xticks(interval) # Shows the interval ranges rather than the centers of the bars
plt.show()
If the gap between the bars looks to wide, you can change the width of the bars by passing in a width (as a fraction of 1 - default is 0.8) argument to plt.bar().

python violin plot regular axis

I want to to a violin plot of binned data but at the same time be able to plot a model prediction and visualize how well the model describes the main part of the individual data distributions. My problem here is, I guess, that the x-axis after the violin plot does not behave like a regular axis with numbers, but more like string-values that just accidentally happen to be numbers. Maybe not a good description, but in the example I would like to have a "normal" plot a function, e.g. f(x) = 2*x**2, and at x=1, x=5.2, x=18.3 and x=27 I would like to have the violin in the background.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
np.random.seed(10)
collectn_1 = np.random.normal(1, 2, 200)
collectn_2 = np.random.normal(802, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
ys = [collectn_1, collectn_2, collectn_3, collectn_4]
xs = [1, 5.2, 18.3, 27]
sns.violinplot(x=xs, y=ys)
xx = np.arange(0, 30, 10)
plt.plot(xx, 2*xx**2)
plt.show()
Somehow this code actually does not plot violins but only bars, this is only a problem in this example and not in the original code though. In my real code I want to have different "half-violins" on both sides, therefore I use sns.violinplot(x="..", y="..", hue="..", data=.., split=True).
I think that would be hard to do with seaborn because it does not provide an easy way to manipulate the artists that it creates, particularly if there are other things plotted on the same Axes. Matplotlib's violinplot allows setting the position of the violins, but does not provide an option for plotting only half violins. Therefore, I would suggest using statsmodels.graphics.boxplots.violinplot, which does both.
from statsmodels.graphics.boxplots import violinplot
df = sns.load_dataset('tips')
x_col = 'day'
y_col = 'total_bill'
hue_col = 'smoker'
xs = [1, 5.2, 18.3, 27]
xx = np.arange(0, 30, 1)
yy = 0.1*xx**2
cs = ['C0','C1']
fig, ax = plt.subplots()
ax.plot(xx,yy)
for (_,gr0),side,c in zip(df.groupby(hue_col),['left','right'],cs):
print(side)
data = [gr1 for (_,gr1) in gr0.groupby(x_col)[y_col]]
violinplot(ax=ax, data=data, positions=xs, side=side, show_boxplot=False, plot_opts=dict(violin_fc=c))
# violinplot above messes up which ticks are shown, the line below restores a sensible tick locator
ax.xaxis.set_major_locator(matplotlib.ticker.MaxNLocator())

How to plot specific parts of a matrix in matplotlib?

I have a matrix that represents temperature distribution in a hollow square plate (hope the attached figure helps). The problem is with the hollow part in the plate which doesn't represent any solid material so I need to exclude this part from the plot.
The simulation returns an np.array() with the temperature results (except of course for the hollow part). and this is the part where I define dimensions of the grid:
import numpy as np
plate_height = 0.4 #meters
hollow_square_height = 0.2 #meters
#discretization data
delta_x = delta_y = 0.05 #meters
grid_points_n = (plate_height/delta_x) + 1
grid = np.zeros(shape=(grid_points_n, grid_points_n))
# the simulation assures that the hollow part will remain zero valued.
So, how do I approach this?
Instead of changing the original data, you can mask the values that you don't want to be used in calculations, plots, etc.:
import matplotlib.pyplot as plt
import numpy as np
data = [
[11, 11, 12, 13],
[9, 0, 0, 12],
[8, 0, 0, 11],
[8, 9, 10, 11]
]
#Here's what you have:
data_array = np.array(data)
#Mask every position where there is a 0:
masked_data = np.ma.masked_equal(data_array, 0)
#Plot the matrix:
fig = plt.figure()
ax = fig.gca()
ax.matshow(masked_data, cmap=plt.cm.autumn_r) #_r => reverse the standard color map
plt.show()
#plt.savefig('heatmap.png')
Replace zeros by nan, nan values are ignored in any plot. For example:
import matplotlib.pyplot as plt
from numpy import nan,matrix
M = matrix([
[20,30,25,20,50],
[22,nan,nan,nan,27],
[30,nan,nan,nan,20],
[33,nan,nan,nan,31],
[21,28,29,23,36]])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.matshow(M, cmap=plt.cm.jet) # Show matrix color
plt.show()
You can replace zeros by nan in a matrix as follow:
from numpy import nan
A[A==0.0]=nan # A is your matrix

Connect different data series with the same line

Is there a way to get matplotlib to connect data from two different data sets with the same line?
Context: I need to plot some data in log scale, but some of them are negative. I use the workaround of plotting the data absolute value in different colours (red for positive and green for negative), something like:
import pylab as pl
pl.plot( x, positive_ys, 'r-' ) # positive y's
pl.plot( x, abs( negative_ys ), 'g-' ) # negative y's
pl.show()
However, as they represent the same quantity, it would be helpful to have the two data series connected by the same line. Is this possible?
I cannot use pl.plot( x, abs( ys )) because I need to be able to differentiate between the positive and originally negative values.
With numpy you can use logical indexing.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.array([10000, 1000, 100, 10, 1, 5, 50, 500, 5000, 50000])
y = np.array([-10000, -1000, -100, -10, -1, 5, 50, 500, 5000, 50000])
ax.plot(x,abs(y),'+-b',label='all data')
ax.plot(abs(x[y<= 0]),abs(y[y<= 0]),'o',markerfacecolor='none',
markeredgecolor='r',
label='we are negative')
ax.set_xscale('log')
ax.set_yscale('log')
ax.legend(loc=0)
plt.show()
The key feature is first plotting all absolute y-values and then re-plotting those that were originally negative as hollow circles to single them out. This second step uses the logical indexing x[y<=0] and y[y<=0] to only pick those elements of the y-array which are negative.
The example above gives you this figure:
If you really have two different data sets, the following code will give you the same figure as above:
x1 = np.array([1, 10, 100, 1000, 10000])
x2 = np.array([5, 50, 500, 5000, 50000])
y1 = np.array([-1, -10, -100, -1000, -10000])
y2 = np.array([5, 50, 500, 5000, 50000])
x = np.concatenate((x1,x2))
y = np.concatenate((y1,y2))
sorted = np.argsort(y)
ax.plot(x[sorted],abs(y[sorted]),'+-b',label='all data')
ax.plot(abs(x[y<= 0]),abs(y[y<= 0]),'o',markerfacecolor='none',
markeredgecolor='r',
label='we are negative')
Here, you first use np.concatenate to combine both the x- and the y-arrays. Then you employ np.argsort to sort the y-array in a way that makes sure you do not get a overly zig-zaggy line when plotting. You use that index-array (sorted) when you call the first plot. As the second plot only plots symbols but no connecting line, you do not require sorted arrays here.

plotting/marking seleted points from a 1D array

this seems a simple question but I have tried it for a really long time.
I got a 1d array data(named 'hightemp_unlocked', after I found the peaks(an array of location where the peaks are located) of it, I wanted to mark the peaks on the plot.
import matplotlib
from matplotlib import pyplot as plt
.......
plt.plot([x for x in range(len(hightemp_unlocked))],hightemp_unlocked,label='200 mk db ramp')
plt.scatter(peaks, hightemp_unlocked[x in peaks], marker='x', color='y', s=40)
for some reason, it keeps telling me that x, y must be the same size
it shows:
File "period.py", line 86, in <module>
plt.scatter(peaks, hightemp_unlocked[x in peaks], marker='x', color='y', s=40)
File "/usr/local/lib/python2.6/dist-packages/matplotlib/pyplot.py", line 2548, in scatter
ret = ax.scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, faceted, verts, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/matplotlib/axes.py", line 5738, in scatter
raise ValueError("x and y must be the same size")
I don't think hightemp_unlocked[x in peaks] is what you want. Here x in peaks reads as the conditional statement "is x in peaks?" and will return True or False depending on what was last stored in x. When parsing hightemp_unlocked[x in peaks], True or False is interpreted as 0 or 1, which returns only the first or second element of hightemp_unlocked. This explains the array size error.
If peaks is an array of indexes, then simply hightemp_unlocked[peaks] will return the corresponding values.
You are almost on the right track, but hightemp_unlocked[x in peaks] is not what you are looking for. How about something like:
from matplotlib import pyplot as plt
# dummy temperatures
temps = [10, 11, 14, 12, 10, 8, 5, 7, 10, 12, 15, 13, 12, 11, 10]
# list of x-values for plotting
xvals = list(range(len(temps)))
# say our peaks are at indices 2 and 10 (temps of 14 and 15)
peak_idx = [2, 10]
# make a new list of just the peak temp values
peak_temps = [temps[i] for i in peak_idx]
# repeat for x-values
peak_xvals = [xvals[i] for i in peak_idx]
# now we can plot the temps
plt.plot(xvals, temps)
# and add the scatter points for the peak values
plt.scatter(peak_xvals, peak_temps)

Categories