Compromise between scipy.interpolate.Rbf and scipy.interpolate.griddata? - python

I need to generate a 2D density map given some unstructured (x, y) coordinates, and a z value which acts as the weight assigned to each point.
I can interpolate the values into a grid (see code below) using either scipy.interpolate.Rbf or scipy.interpolate.griddata. The first one is very accurate but very slow, while the second one is a lot faster but noticeably less accurate.
See image below where Rfb is to the left and griddata to the right:
Is there some way to compromise between these two functions? Something that will produce a result more similar to the Rfb function but with a speed more similar to that of the griddata function?
# Data
N = 1000
x, y, z = np.random.uniform(0.0005, 0.03, N),\
np.random.uniform(6., 10., N), np.random.uniform(1., 10., N)
xmin, xmax, ymin, ymax = min(x), max(x), min(y), max(y)
# Set up a regular grid of interpolation points
xi, yi = np.linspace(xmin, xmax, 200), np.linspace(ymin, ymax, 200)
xi, yi = np.meshgrid(xi, yi)
# Normalize data and grid.
x_new, xi_new = (x - xmin) / (xmax - xmin), (xi - xmin) / (xmax - xmin)
y_new, yi_new = (y - ymin) / (ymax - ymin), (yi - ymin) / (ymax - ymin)
# Interpolate new data with Rbf.
s = time.clock()
rbf = scipy.interpolate.Rbf(x_new, y_new, z, function='linear')
zi = rbf(xi_new, yi_new)
print(time.clock() - s)
# Plot density map.
ax1 = plt.subplot(121)
plt.pcolormesh(xi, yi, zi, cmap=plt.get_cmap('GnBu_r'), zorder=2)
plt.contour(xi, yi, zi, 2, colors='#551a8b', linewidths=0.5, zorder=3)
# Interpolate new data with griddata.
s = time.clock()
# Python 2.7
vals = zip(*[x_new, y_new])
# Python 3.6
# vals = np.array([x_new, y_new]).T
zi = scipy.interpolate.griddata(vals, z, (xi_new, yi_new), method='linear')
print(time.clock() - s)
# Plot density map.
ax2 = plt.subplot(122)
plt.imshow(
zi, extent=(xmin, xmax, ymin, ymax), origin='lower',
cmap=plt.get_cmap('GnBu_r'))
plt.contour(xi, yi, zi, 2, colors='#551a8b', linewidths=0.5, zorder=3)
plt.show()

Related

How to annotate a regression line with the proper text rotation

I have the following snippet of code to draw a best-fit line through a collections of points on a graph, and annotate it with the corresponding R2 value:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
x = 50 * np.random.rand(20) + 50
y = 200 * np.random.rand(20)
plt.plot(x, y, 'o')
# k, n = np.polyfit(x, y, 1)
k, n, r, _, _ = scipy.stats.linregress(x, y)
line = plt.axline((0, n), slope=k, color='blue')
xy = line.get_xydata()
plt.annotate(
f'$R^2={r**2:.3f}$',
(xy[0] + xy[-1]) // 2,
xycoords='axes fraction',
ha='center', va='center_baseline',
rotation=k, rotation_mode='anchor',
)
plt.show()
I have tried various different (x,y) pairs, different xycoords and other keyword parameters in annotate but I haven't been able to get the annotation to properly appear where I want it. How do I get the text annotation to appear above the line with proper rotation, located either at the middle point of the line, or at either end?
1. Annotation coordinates
We cannot compute the coordinates using xydata here, as axline() just returns dummy xydata (probably due to the way matplotlib internally plots infinite lines):
print(line.get_xydata())
# array([[0., 0.],
# [1., 1.]])
Instead we can compute the text coordinates based on the xlim():
xmin, xmax = plt.xlim()
xtext = (xmin + xmax) // 2
ytext = k*xtext + n
Note that these are data coordinates, so they should be used with xycoords='data' instead of 'axes fraction'.
2. Annotation angle
We cannot compute the angle purely from the line points, as the angle will also depend on the axis limits and figure dimensions (e.g., imagine the required rotation angle in a 6x4 figure vs 2x8 figure).
Instead we should normalize the calculation to both scales to get the proper visual rotation:
rs = np.random.RandomState(0)
x = 50 * rs.rand(20) + 50
y = 200 * rs.rand(20)
plt.plot(x, y, 'o')
# save ax and fig scales
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
xfig, yfig = plt.gcf().get_size_inches()
k, n, r, _, _ = scipy.stats.linregress(x, y)
plt.axline((0, n), slope=k, color='blue')
# restore x and y limits after axline
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
# find text coordinates at midpoint of regression line
xtext = (xmin + xmax) // 2
ytext = k*xtext + n
# find run and rise of (xtext, ytext) vs (0, n)
dx = xtext
dy = ytext - n
# normalize to ax and fig scales
xnorm = dx * xfig / (xmax - xmin)
ynorm = dy * yfig / (ymax - ymin)
# find normalized annotation angle in radians
rotation = np.rad2deg(np.arctan2(ynorm, xnorm))
plt.annotate(
f'$R^2={r**2:.3f}$',
(xtext, ytext), xycoords='data',
ha='center', va='bottom',
rotation=rotation, rotation_mode='anchor',
)

Draw the profile of a scatter plot

I have a scatter plot of Lc and Fc values (please, refer to plot1).
Lc= [360.66832393 388.26294316 392.9410819 ... 384.31751584 403.52581547
384.22929343]
Fc= [77.3294787 47.5926941 44.53032575 ... 50.44012265 38.99666318
50.54763385]
plot.scatter(Lc, Fc)
I would like to draw the Fc profile of this scatter plot as shown in plot2. Does anyone have an efficient way to do it?
Here is an idea drawing a Gaussian curve through each of the points and then take the maximum of these curves. You might want to experiment with the curve widths.
import matplotlib.pyplot as plt
import numpy as np
low_lim = 30
fc = np.random.rand(120) * np.random.rand(120) * 120
fc = fc[fc > low_lim]
lc = np.random.uniform(50, 250, len(fc))
x = np.linspace(0, 300, 5000)
sigma = 15
ys = np.exp(- np.power((x.reshape(-1, 1) - lc) / sigma, 2) / 2) * fc
ymax = ys.max(axis=1)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
for ax in (ax1, ax2):
if ax == ax1:
ax.plot(x, ymax, color='black', ls=':', lw=3)
for l, f, y in zip(lc, fc, ys.T):
ax.plot(x, y)
ax.fill_between(x, 0, y, color='r', alpha=0.05)
else:
ax.plot(x, ymax, color='black', lw=2)
ax.fill_between(x, 0, ymax, color='r', alpha=0.2)
ax.scatter(lc, fc, color='darkorange')
ax.axhline(low_lim, ls='--', color='skyblue')
ax.set_ylim(ymin=0)
ax.margins(x=0)
plt.tight_layout()
plt.show()
Here is an attempt to smooth out the sharp corners, which might or might not work with your data. The effect is only very local; trying to smooth out more resulted in also losing the general shape.
from scipy.special import softmax
ys = np.exp(- np.power((x.reshape(-1, 1) - lc) / sigma, 2) / 2) * fc
softmax_weights = softmax(np.power(ys, 0.8), axis=1)
ymax = np.sum(ys * softmax_weights, axis=1)

Test for comparing multi-dimensional distributions in python

I have the following datasets:
import random
import pandas as pd
A = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
B = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
From these two datasets, I can produce the following plots
import matplotlib.pyplot as plt
import scipy.stats as st
def plot_2d_kde(df):
# Extract x and y
x = df['x']
y = df['y']
# Define the borders
deltaX = (max(x) - min(x))/10
deltaY = (max(y) - min(y))/10
xmin = min(x) - deltaX
xmax = max(x) + deltaX
ymin = min(y) - deltaY
ymax = max(y) + deltaY
# Create meshgrid
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
# We will fit a gaussian kernel using the scipy’s gaussian_kde method
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
fig = plt.figure(figsize=(13, 7))
ax = plt.axes(projection='3d')
surf = ax.plot_surface(xx, yy, f, rstride=1, cstride=1, cmap='coolwarm', edgecolor='none')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('PDF')
ax.set_title('Surface plot of Gaussian 2D KDE')
fig.colorbar(surf, shrink=0.5, aspect=5) # add color bar indicating the PDF
ax.view_init(60, 35)
plot_2d_kde(A)
plot_2d_kde(B)
Is there a way to test if the multidimensional PDF of A differs from the one of B in python?

Centering annotation on a log scale with matplotlib

I have the following, self-explanatory situation; please see the figure and working example pasted below.
I wonder how to center text in the middle of the dimension line.
import numpy as np
import matplotlib.pyplot as plt
# Dimension line
def annotation_line(ax, xmin, xmax, y, text, ytext=0, linecolor='black', linewidth=1, fontsize=12):
ax.annotate('', xy=(xmin, y), xytext=(xmax, y), xycoords='data', textcoords='data', arrowprops={'arrowstyle': '|-|', 'color':linecolor, 'linewidth':linewidth})
ax.annotate('', xy=(xmin, y), xytext=(xmax, y), xycoords='data', textcoords='data', arrowprops={'arrowstyle': '<->', 'color':linecolor, 'linewidth':linewidth})
xcenter = xmin + (xmax - xmin) / 2
if ytext==0:
ytext = y + ( ax.get_ylim()[1] - ax.get_ylim()[0] ) / 20
ax.annotate(text, xy=(xcenter, ytext), ha='center', va='bottom', fontsize=fontsize)
# Toy data
N = 8
y = np.zeros(N)
x1 = np.linspace(1, 1000, N, endpoint=True)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x1, y, 'o')
annotation_line(ax=ax, text='TEXT 1', xmin=1, xmax=100, y=0.01, ytext=0, linewidth=1, linecolor='gray', fontsize=12)
ax.set_xscale('log')
The simple solution is to do as #JohanC suggests and calculate the midpoint in log coordinates.
An other solution is to use the coordinates of the arrow to find it's midpoint. There are drawbacks to that method however. First, you need to explicitly draw the figure at an intermediate step, since the coordinates will only be valid at the time of drawing, and second, you need to set the log-scale before drawing the annotation. On the other hand, the code works regardless of the scaling of the axes
import numpy as np
import matplotlib.pyplot as plt
# Dimension line
def annotation_line(ax, xmin, xmax, y, text, ytext=0, linecolor='black', linewidth=1, fontsize=12):
an = ax.annotate('', xy=(xmin, y), xytext=(xmax, y), xycoords='data', textcoords='data', arrowprops={'arrowstyle': '|-|', 'color':linecolor, 'linewidth':linewidth})
ax.annotate('', xy=(xmin, y), xytext=(xmax, y), xycoords='data', textcoords='data', arrowprops={'arrowstyle': '<->', 'color':linecolor, 'linewidth':linewidth})
ax.figure.canvas.draw() # draw to get actual coordinates
p = an.arrow_patch.get_path().transformed(ax.transAxes.inverted())
xmin, xmax = np.min(p.vertices[:,0]),np.max(p.vertices[:,0])
xcenter = xmin+(xmax-xmin)/2
if ytext==0:
ytext = y + ( ax.get_ylim()[1] - ax.get_ylim()[0] ) / 20
ax.annotate(text, xy=(xcenter, ytext), xycoords=('axes fraction','data'), ha='center', va='bottom', fontsize=fontsize)
return an
# Toy data
N = 8
y = np.zeros(N)
x1 = np.linspace(1, 1000, N, endpoint=True)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x1, y, 'o')
ax.set_xscale('log') # must do before the call to annotation_line
an = annotation_line(ax=ax, text='TEXT 1', xmin=1, xmax=100, y=0.01, ytext=0, linewidth=1, linecolor='gray', fontsize=12)

Large matplotlib pixel figure best approach

I have a large 2D dataset where I want to associate to each X,Y pair a color and plot it with matplotlib. I am talking about 1000000 points. I wonder what is the best approach in terms of performance (speed) and if you could point to some example
If you're dealing with a regular grid, just treat it as an image:
import numpy as np
import matplotlib.pyplot as plt
nrows, ncols = 1000, 1000
z = 500 * np.random.random(nrows * ncols).reshape((nrows, ncols))
plt.imshow(z, interpolation='nearest')
plt.colorbar()
plt.show()
If you have randomly ordered x,y,z triplets that make up a regular grid, then you'll need to grid them.
Essentially, you might have something like this:
import numpy as np
import matplotlib.pyplot as plt
# Generate some data
nrows, ncols = 1000, 1000
xmin, xmax = -32.4, 42.0
ymin, ymax = 78.9, 101.3
dx = (xmax - xmin) / (ncols - 1)
dy = (ymax - ymin) / (ncols - 1)
x = np.linspace(xmin, xmax, ncols)
y = np.linspace(ymin, ymax, nrows)
x, y = np.meshgrid(x, y)
z = np.hypot(x - x.mean(), y - y.mean())
x, y, z = [item.flatten() for item in (x,y,z)]
# Scramble the order of the points so that we can't just simply reshape z
indicies = np.arange(x.size)
np.random.shuffle(indicies)
x, y, z = [item[indicies] for item in (x, y, z)]
# Up until now we've just been generating data...
# Now, x, y, and z probably represent something like you have.
# We need to make a regular grid out of our shuffled x, y, z indicies.
# To do this, we have to know the cellsize (dx & dy) that the grid is on and
# the number of rows and columns in the grid.
# First we convert our x and y positions to indicies...
idx = np.round((x - x.min()) / dx).astype(np.int)
idy = np.round((y - y.min()) / dy).astype(np.int)
# Then we make an empty 2D grid...
grid = np.zeros((nrows, ncols), dtype=np.float)
# Then we fill the grid with our values:
grid[idy, idx] = z
# And now we plot it:
plt.imshow(grid, interpolation='nearest',
extent=(x.min(), x.max(), y.max(), y.min()))
plt.colorbar()
plt.show()

Categories