Fast 3 to 7 D interpolation on non uniform & non rectangular grid - python

[COMPLETELY REWRITTEN]
I'm looking for a way of interpolating large set of data in dimensions ranging from 3 to 7.
The data is, by nature, on a non rectangular grid and non uniformly spaced.
I looked every option I could think of (griddata, KDTree + magic, linear interpolation, reworked map_coordinates...): the fastest and most usable tool seems to be Scipy's LinearNDInterpolator function. Linear interpolation in such high dimensions space is fine and should be precise enough.
However, there is one big shortcoming with this class: data with gaps or "concave regions" will produce extrapolated results when I want only interpolation.
This is best seen with some pictures (2-D test). In the following I produce some randomly generated data for X, and VALUE, while Y is upper-bounded by a function of X (just so I create gaps).
After rescaling data (mostly done using pieces of code of the LinearNDIterpolator from the master, ie. development, branch), Delaunay triangulation will produce a Convex Hull that includes the gap, and will "extrapolate" in this region. The term "extrapolate" is not really correct here, in a technical sense, but I think would be appropriate given the fact original data is assumed to be sufficiently well sampled so that the big gaps means "no data allowed" (not physical).
To start handling the problem, I "tagged" every Delaunay (hyper-)triangles whose (hyper-)volume is higher than a user-defined threshold (by default the volume equivalent to 5% of the data extent in each dimension).
Generating random data, and evaluating the values using this technique would produce the following figure:
Black dots (with red or white rings) is the randomly generated data to be evaluated. Red rings indicates points that are being rejected (ie. value = NaN) by my custom class based on LinearNDInterpolator, and white rings show accepted points.
For clarity I've plotted triangles that have been rejected from the original Delaunay triangulation.
As you can see, there are still some white rings points that fall in the gap, which I do not want. This is because the simplex to which they belong has a volume less than the authorized maximum volume (some of these triangles even appear as lines on the figure, so it is hard to see)
My question is: how could I improve from here? What could be done?
I was thinking of grabbing all points that fall in a small ball around each evaluated point, and see if there are points in that. But this is not a good solution since it would be resource-consuming and not precise enough (eg. what about points very close to the bottom of the gap, but yet outside the upper envelope?)
Here is my custom interpolation module I used:
#!/usr/bin/env python
"""
Custom N-D linear interpolation class, based on scipy's LinearNDInterpolator.
The main differences are:
- auto-scaling
- interpolation: inside convex hull (normal behavior), and "close enough" to original data.
This rejects points that would normally be interpolated by LinearNDInterpolator.
"""
# ================
# Python modules
# ================
import cPickle
import numpy as np
from scipy.spatial import Delaunay
from scipy.interpolate import LinearNDInterpolator
from scipy.misc import factorial
# =======================
# Convenience functions
# =======================
def _inv_log10(x):
return 10**x
def det(coords): #, n):
"""
Return the determinant of the given coordinates (not the usual determinant, but the one used to compute
the hyper-volume of an hyper-triangle)
From a Delaunay triangulation, the coordinates of one simplex (ie. hyper-triangle) is given by:
coords_i = tri.points[simplex_i]
where
tri = Delaunay(points)
simplex_i = tri.simplices[i]
In an N-dimensional space, the simplex will have N+1 points, each one of them of dimension N.
Eg. in 3D, a points i has coordinates pi = (xi, yi, zi). Therefore p1 = points 1 = (x1, y1, z1)
|x1 x2 x3 x4|
|y1 y2 y3 y4| |(x1-x4) (x2-x4) (x3-x4)|
det = |z1 z2 z3 z4| = |(y1-y4) (y2-y4) (y3-y4)|
|1 1 1 1 | |(z1-z4) (z2-z4) (z3-z4)|
"""
# assert n == len(coords[0]), 'number of dimensions of coordinates (%d) != %d' % (len(coords[0]), n)
q = coords[:-1, :] - coords[-1, None, :]
sign, logdet = np.linalg.slogdet(q)
return sign * np.exp(logdet)
# ==============================
# LinearNDInterpolator wrapper
# ==============================
class Interp(object):
"""
Simple wrapper around LinearNDInterpolator.
"""
def __init__(self, points, values, **kwargs):
"""
:param points: list of coordinates (eg. [(0, 1), (0, 3), (4, 4.5)] for 3 points in 2-D)
:param values: list of associated value(s) for each point (eg. [1, 2, 3] for 3 points of single value)
:keyword rescale: rescale data points so that the final extents is [0, 1] in every dimensions
:keyword transform: transform data points (prior to rescaling). If True, automatically transform dimension coordinates
if extents span more than 2 order of magnitudes. It can also be a list of tuples of
(transformation function, inverse function), that will be applied whenever needed.
:keyword fill_value: outside bounds interpolation values (default: np.nan)
"""
try:
points = np.asanyarray(points, dtype=np.float64)
values = np.asanyarray(values, dtype=np.float64)
except ValueError:
raise ValueError('Cannot convert input points to an array of floats')
# dimensions / number of points and values
self.ndim = points.shape[1]
self.nvalues = values.shape[1]
self.npoints = points.shape[0]
# locals
self._idims = range(self.ndim)
# extents
self.minis = np.min(points, axis=0)
self.maxis = np.max(points, axis=0)
self.ranges = self.maxis - self.minis
self.magnitudes = self.maxis / self.minis
# options
rescale = kwargs.pop('rescale', True)
transform = kwargs.pop('transform', True)
fill_value = kwargs.pop('fill_value', np.nan)
# transformation
if transform:
transforms = []
if transform is True:
# automatic transformation -> if extent >= 2 order of magnitudes: f(x) = log10(x)
for i, e in enumerate(self.magnitudes):
if e >= 100.:
transforms.append((np.log10, _inv_log10))
else:
transforms.append(None)
if not transforms:
transforms = None
else:
err_msg = 'transform: both the transformation function and its inverse must be given in a tuple'
if not isinstance(transform, (tuple, list)):
raise ValueError(err_msg)
if (self.ndim > 1) and (len(transform) != self.ndim):
raise ValueError('transform: None or transformations tuple must be given for every dimension')
for t in transform:
if not isinstance(t, (tuple, list)):
raise ValueError(err_msg)
elif t is None:
transforms.append(None)
else:
transforms.append(t)
self.transforms = transforms
else:
self.transforms = None
points = self._transform(points)
# scaling
self.offset = 0.
self.scale = 1.
self.rescale = rescale
if rescale:
self.offset = np.mean(points, axis=0)
self.scale = (points - self.offset).ptp(axis=0)
self.scale[~(self.scale > 0)] = 1.0 # avoid division by 0
points = self._rescale(points)
# triangulation
self.tri = self._triangulate(points)
# volumes
self.fact = 1. / factorial(self.ndim)
self.volume_max = np.product(self.tri.points.ptp(axis=0) * 0.05) # 5% peak-to-peak in each dimension
self.rej_idx = None
self.rej_vol = None
self.cached_rej = False
# linear interpolation
self.fill_value = fill_value
self.func = LinearNDInterpolator(self.tri, values, fill_value=fill_value)
def _triangulate(self, points, **kwargs):
"""
Delaunay triangulation
"""
return Delaunay(points, **kwargs)
def _get_volume_simplex(self, point):
"""
Compute the simplex volume of the given point
"""
i = self.tri.find_simplex(point)
idx = self.tri.simplices[i]
return np.abs(self.fact * det(self.tri.points[idx]))
def cache_rejected_triangles(self, p=None, check_min=False):
"""
Cache the indexes of rejected triangles.
OPTIONS
p -- peak-to-peak percentage in each dimension for the maximum volume calculation
Default: None (default at __init__: p = 0.05)
Type: float (0 < p <= 1)
Type: list of floats (length = # dimensions)
check_min -- check that the minimum spacing in each dimension is at least equal to p * extent
Default: False
Warning: *p* must be given
"""
self.cached_rej = True
if p is not None:
p = np.array(p)
# update the maximum hyper-triangle volume (p % of the extent in each dimension)
self.volume_max = np.product(self.tri.points.ptp(axis=0) * p)
if check_min:
assert p is not None, 'You must give *p* parameter for checking minimum volume of hyper-triangle'
ptps = self.tri.points.ptp(axis=0)
ps = np.ones(self.ndim) * p
n_up = 0
for i in self._idims:
_x = np.unique(self.tri.points[:, i])
mini = np.min(_x[1:] - _x[:-1])
if mini > (ptps[i] * ps[i]):
n_up += 1
print 'WARNING: changed max. volume axis of dim. %d from %.3g to %.3g' % (i+1, ps[i], mini)
ps[i] = mini
if n_up:
new_vol = np.product(ptps * ps)
print 'CHANGE: old volume was = %.3g, and is now = %.3g' % (self.volume_max, new_vol)
self.volume_max = new_vol
rej_idx = []
rej_vol = []
for i, simplex in enumerate(self.tri.simplices):
vol = np.abs(self.fact * det(self.tri.points[simplex]))
if vol > self.volume_max:
rej_idx.append(i)
rej_vol.append(vol)
self.rej_idx = np.array(rej_idx)
self.rej_vol = np.array(rej_vol)
def _transform(self, points, inverse=False):
"""
Transform point coordinates using functions. Set 'inverse' to True to transform back.
"""
if self.transforms is not None:
j = 1 - int(inverse)
for i in self._idims:
t = self.transforms[i]
if t is None:
continue
points[:, i] = t[j](points[:, i])
return points
def _rescale(self, points, inverse=False):
"""
Rescale point coordinates so that extents in each dimensions span [0, 1]. Set 'inverse' to True to scale back.
"""
if self.rescale:
if inverse:
points = points * self.scale + self.offset
else:
points = (points - self.offset) / self.scale
return points
def _check(self, x, res):
"""
Check that interpolation results are close enough to real data and have not been extrapolated.
"""
points = np.asanyarray(x)
if points.ndim == 1:
# only 1 point
values = np.asanyarray(res).reshape(1, self.ndim)
else:
# more than 1 point
values = np.asanyarray(res).reshape(points.shape[0], self.ndim)
if self.cached_rej:
idx = np.unique(np.where(np.isfinite(values))[0])
ui_tri, uii = np.unique(self.tri.find_simplex(points[idx]), return_inverse=True)
umask = np.lib.arraysetops.in1d(ui_tri, self.rej_idx, assume_unique=True)
mask = umask[uii]
values[idx[mask], :] = self.fill_value
else:
for i, v in enumerate(values):
if not np.isnan(v[0]):
vol = self._get_volume_simplex(points[i])
if vol > self.volume_max:
# reject
values[i][:] = self.fill_value
return values.reshape(res.shape)
def __call__(self, x, check=False):
"""
Interpolate. If 'check' is True, check that interpolated points are close enough to real data.
"""
_x = self._rescale(self._transform(x))
res = self.func(_x)
if check:
res = self._check(_x, res)
return res
def ev(self, x, check=False):
"""
Alias for __call__
"""
return self.__call__(x, check=check)
def get_original_points(self):
"""
Return original points
"""
return self._transform(self._rescale(self.func.points, inverse=True), inverse=True)
def get_original_values(self):
"""
Return original values
"""
return self.func.values
# ===========================
# Save / load interpolation
# ===========================
def save(filename, interp):
"""
Dump the Interp instance to a binary file with cPickle (protocol 2)
"""
with open(filename, 'wb') as f:
cPickle.dump(interp, f, protocol=2)
def load(filename):
"""
Load a previously saved (cPickled with save_interp function) Interp instance
"""
with open(filename, 'rb') as f:
interp = cPickle.load(f)
return interp
And the test script:
#!/usr/bin/env python
"""
Test the custom interpolation class (see interp.py)
"""
import sys
import numpy as np
from interp import Interp
import matplotlib.pyplot as plt
# generate random data
n = 2000 # number of generated points
x = np.random.random(n)
def f(v):
maxi = v ** (1/(v+1e-5)) * (v - 5.) ** 2 - np.exp(v-7) + 1
return np.random.random() * maxi
y = map(f, x * 10)
z = np.random.random(n)
points = np.array((x, y)).T
values = np.random.random(points.shape)
# create interpolation function
func = Interp(points, values, transform=False)
func.cache_rejected_triangles(p=0.05, check_min=True)
# generate random data + evaluate
pts = np.random.random((500, points.shape[1]))
pts *= points.ptp(0)
pts += points.min(0)
res = func(pts, check=True)
# rejected points indexes
idx_rej = np.unique(np.where(np.isnan(res))[0])
n_rej = len(idx_rej)
print '%d points (%.0f%%) have been rejected' % (n_rej, 100.*n_rej/pts.shape[0])
# plot rejected triangles
fig = plt.figure()
ax = plt.gca()
for i in func.rej_idx:
_x = [p for p in points[func.tri.simplices[i], 0]]
_x += [points[func.tri.simplices[i][0], 0]]
_y = [p for p in points[func.tri.simplices[i], 1]]
_y += [points[func.tri.simplices[i][0], 1]]
ax.plot(_x, _y, c='k', ls='-', zorder=100)
# plot original data
ax.scatter(points[:, 0], points[:, 1], c='b', linewidths=0, s=20, zorder=50)
# plot all points (both accepted and rejected): in white
ax.scatter(pts[:, 0], pts[:, 1], c='k', edgecolors='w', linewidths=1, zorder=150, s=30)
# re-plot rejected points: in red
ax.scatter(pts[idx_rej, 0], pts[idx_rej, 1], c='k', edgecolors='r', linewidths=1, zorder=200, s=30)
fig.savefig('img_tri.png', transparent=True, dpi=300)

Related

An algorithm to sort top and bottom slices of curved surfaces

I try to do:
Cut STL file https://www.dropbox.com/s/pex20yqfgmxgt0w/wing_fish.stl?dl=0 at Z-coordinate using PyVsita https://docs.pyvista.org/ )
Extract point's coordinates X, Y at given section Z
Sort points to Upper and Down groups for further manipulation
Here is my code:
import pyvista as pv
import matplotlib.pylab as plt
import numpy as np
import math
mesh = pv.read('wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
a = single_slice.points # choose only points
# p = pv.Plotter() #show section
# p.add_mesh(single_slice)
# p.show()
a = a[a[:,0].astype(float).argsort()] # sort all points by Х coord
# X min of all points
x0 = a[0][0]
# Y min of all points
y0 = a[0][1]
# X tail 1 of 2
xn = a[-1][0]
# Y tail 1 of 2
yn = a[-1][1]
# X tail 2 of 2
xn2 = a[-2][0]
# Y tail 2 of 2
yn2 = a[-2][1]
def line_y(x, x0, y0, xn, yn):
# return y coord at arbitary x coord of x0, y0 xn, yn LINE
return ((x - x0)*(yn-y0))/(xn-x0)+y0
def line_c(x0, y0, xn, yn):
# return x, y middle points of LINE
xc = (x0+xn)/2
yc = (y0+yn)/2
return xc, yc
def chord(P1, P2):
return math.sqrt((P2[1] - P1[1])**2 + (P2[0] - P1[0])**2)
xc_end, yc_end = line_c(xn, yn, xn2, yn2) # return midle at trailing edge
midLine = np.array([[x0,y0],[xc_end,yc_end]],dtype='float32')
c_temp_x_d = []
c_temp_y_d = []
c_temp_x_u = []
c_temp_y_u = []
isUp = None
isDown = None
for i in a:
if i[1] == line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
continue
elif i[1] < line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
c_temp_y_d.append(i[1])
c_temp_x_d.append(i[0])
isDown = True
else:
c_temp_y_u.append(i[1])
c_temp_x_u.append(i[0])
isUp = True
if len(c_temp_y_d) != 0 and len(c_temp_y_u) != 0:
print(c_temp_y_d[-1])
plt.plot(c_temp_x_d, c_temp_y_d, label='suppose to be down points')
plt.plot(c_temp_x_u, c_temp_y_u, label='suppose to be upper points')
plt.plot(midLine[:,0], midLine[:,1], label='Chord')
plt.scatter(a[:,0],a[:,1], label='raw points')
plt.legend();plt.grid();plt.show()
What I have:
What I want:
I would highly appreciate for any help and advises!
Thanks in advance!
You are discarding precious connectivity information that is already there in your STL mesh and in your slice!
I couldn't think of a more idiomatic solution within PyVista, but at worst you can take the cell (line) information from the slice and start walking your shape (that is topologically equivalent to a circle) from its left side to its right, and vice versa. Here's one way:
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
mesh = pv.read('../wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
# find points with smallest and largest x coordinate
points = single_slice.points
left_ind = points[:, 0].argmin()
right_ind = points[:, 0].argmax()
# sanity check for what we're about to do:
# 1. all cells are lines
assert single_slice.n_cells == single_slice.n_points
assert (single_slice.lines[::3] == 2).all()
# 2. all points appear exactly once as segment start and end
lines = single_slice.lines.reshape(-1, 3) # each row: [2, i_from, i_to]
assert len(set(lines[:, 1])) == lines.shape[0]
# create an auxiliary dict with from -> to index mappings
conn = dict(lines[:, 1:])
# and a function that walks this connectivity graph
def walk_connectivity(connectivity, start, end):
this_ind = start
path_inds = [this_ind]
while True:
next_ind = connectivity[this_ind]
path_inds.append(next_ind)
this_ind = next_ind
if this_ind == end:
# we're done
return path_inds
# start walking at point left_ind, walk until right_ind
first_side_inds = walk_connectivity(conn, left_ind, right_ind)
# now walk forward for the other half curve
second_side_inds = walk_connectivity(conn, right_ind, left_ind)
# get the point coordinates for plotting
first_side_points = points[first_side_inds, :-1]
second_side_points = points[second_side_inds, :-1]
# plot the two sides
fig, ax = plt.subplots()
ax.plot(*first_side_points.T)
ax.plot(*second_side_points.T)
plt.show()
In order to avoid using an O(n^2) algorithm, I defined an auxiliary dict that maps line segment start indices to end indices. In order for this to work we need some sanity checks, namely that the cells are all simple line segments, and that each segment has the same orientation (i.e. each start point is unique, and each end point is unique). Once we have this it's easy to start from the left edge of your wing profile and walk each line segment until we find the right edge.
The nature of this approach implies that we can't know a priori whether the path from left to right goes on the upper or the lower path. This needs experimentation on your part; name the two paths in whatever way you see fit.
And of course there's always room for fine tuning. For instance, the above implementation creates two paths that both start and end with the left and right-side boundary points of the mesh. If you want the top and bottom curves to share no points, you'll have to adjust the algorithm accordingly. And if the end point is not found on the path then the current implementation will give you an infinite loop with a list growing beyond all available memory. Consider adding some checks in the implementation to avoid this.
Anyway, this is what we get from the above:

Fitting an Orthogonal Grid to Noisy Coordinates

Problem
I have a list of coordinates that are meant to form a grid. Each coordinate has a random error component and some of the coordinates are missing. Grid could be rotated (update). I want to fit a orthogonal grid to the data points and return a list of the grid's vertices. For example:
Application
The purpose is to find a grid in a scanned image. The data points come from the results of contour or edge detection in OpenCV. An example is image with a grid of photos.
Goal
I wrote some Python code that works, but would like to find a linear algebra algorithm using SciPy, statsmodels or other modules that would be more robust and handle a small rotation of the grid (less than 10°).
Python Code Using Lists Only
# Noisy [x, y] coordinates (origin is upper-left corner)
pts = [[103,101],
[198,103],
[300, 99],
[ 97,205],
[304,202],
[102,295],
[200,303],
[104,405],
[205,394],
[298,401]]
def row_col_avgs(num_list, ratio):
# Finds the average of each row and column. Coordinates are
# assigned to a row and column by specifying an error ratio.
last_num, sum_nums, count_nums, avgs = 0, 0, 0, []
num_list.sort()
for num in num_list:
# Calculate average for last row or column and begin new row or column
if num > (1+ratio)*last_num and count_nums != 0:
avgs.append(int(round(sum_nums/count_nums,0)))
sum_nums = num
count_nums = 1
# Or continue with current row or column
else:
sum_nums += num
count_nums += 1
last_num = num
avgs.append(int(round(sum_nums/count_nums,0)))
return avgs
# Split coordinates into two lists of x's and y's
xs, ys = map(list, zip(*pts))
# Find averages of each row and column of the grid
x_avgs = row_col_avgs(xs, 0.1)
y_avgs = row_col_avgs(ys, 0.1)
# Return vertices of completed averaged grid
avg_grid = []
for y_avg in y_avgs:
avg_row = []
for x_avg in x_avgs:
avg_row.append([int(x_avg), int(y_avg)])
avg_grid.append(avg_row)
print(avg_grid)
Output
[[[102, 101], [201, 101], [301, 101]],
[[102, 204], [201, 204], [301, 204]],
[[102, 299], [201, 299], [301, 299]],
[[102, 400], [201, 400], [301, 400]]]
Parallel Slopes Ordinary Least Squares (OLS) Model:
y = mx + grp + b where m=slope, b=y-intercept, & grp=categorical variable.
This is an alternative algorithm that can handle a rotated grid.
The OLS model includes both the data points in the original orientation
and a 90° rotation of the same data points. This is necessary so all gridlines are parallel and have the same slope.
Algorithm:
Find a reference gridline to compare with remaining points by choosing two neighboring points in the first or last row with a slope closest to zero.
Calculate the distances between this reference line and the remaining points.
Segment points into groups w.r.t. the calculated distances (one group per gridline).
Repeat steps 1 to 3 for the 90 degree rotated grid and combine results.
Create a parallel slopes OLS model to determine linear equations for the gridlines.
Rotate the rotated gridlines back to their original orientation.
Calculate the intersection points.
Note: Fails if noise, angle and/or missing data are too much.
Example:
                
Python Code to Create Example
def create_random_example():
# Requires import of numpy and random packages
# Creates grid with random noise and missing points
# Example will fail if std_dev, rotation, pct_removed too large
# Parameters
first_row, last_row = 100, 900
first_col, last_col = 100, 600
num_rows = 6
num_cols = 4
rotation = 3 # degrees that grid is rotated
sd = 3 # percent std dev of avg x and avg y coordinates
pct_remove = 30 # percent of points to randomly remove from data
# Create grid
x = np.linspace(first_col, last_col, num_cols)
y = np.linspace(first_row, last_row, num_rows)
xx, yy = np.meshgrid(x, y)
# Add noise
x = xx.flatten() + sd * np.mean(xx) * np.random.randn(xx.size) / 100
y = yy.flatten() + sd * np.mean(yy) * np.random.randn(yy.size) / 100
# Randomly remove points
random_list = random.sample(range(0, num_cols*num_rows),
int(pct_remove*num_cols*num_rows/100))
x, y = np.delete(x, random_list), np.delete(y, random_list)
pts = np.column_stack((x, y))
# Rotate points
radians = np.radians(rotation)
rot_mat = np.array([[np.cos(radians),-np.sin(radians)],
[np.sin(radians), np.cos(radians)]])
einsum = np.einsum('ji, mni -> jmn', rot_mat, [pts])
pts = np.squeeze(einsum).T
return np.rint(pts)
Python Code to Fit Gridlines
import numpy as np
import pandas as pd
import itertools
import math
import random
from statsmodels.formula.api import ols
from scipy.spatial import KDTree
import matplotlib.pyplot as plt
def pt_line_dist(pt, ref_line):
pt1, pt2 = [ref_line[:2], ref_line[2:]]
# Distance from point to line defined by two other points
return np.linalg.norm(np.cross(pt1 - pt2, [pt[0],pt[1]])) \
/ np.linalg.norm(pt1 - pt2)
def segment_pts(amts, grp_var, grp_label):
# Segment on amounts (distances here) in last column of array
# Note: need to label groups with string for OLS model
amts = amts[amts[:, -1].argsort()]
first_amt_in_grp = amts[0][-1]
group, groups, grp = [], [], 0
for amt in amts:
if amt[-1] - first_amt_in_grp > grp_var:
groups.append(group)
first_amt_in_grp = amt[-1]
group = []; grp += 1
group.append(np.append(amt[:-1],[[grp_label + str(grp)]]))
groups.append(group)
return groups
def find_reference_line(pts):
# Find point with minimum absolute slope relative both min y and max y
y = np.hsplit(pts, 2)[1] # y column of array
m = []
for i, y_pt in enumerate([ pts[np.argmin(y)], pts[np.argmax(y)] ]):
m.append(np.zeros((pts.shape[0]-1, 5))) # dtype default is float64
m[i][:,2:4] = np.delete(pts, np.where((pts==y_pt).all(axis=1))[0], axis=0)
m[i][:,4] = abs( (m[i][:,3]-y_pt[1]) / (m[i][:,2]-y_pt[0]) )
m[i][:,:2] = y_pt
m = np.vstack((m[0], m[1]))
return m[np.argmin(m[:,4]), :4]
# Ignore division by zero (slopes of vertical lines)
np.seterr(divide='ignore')
# Create dataset and plot
pts = create_random_example()
plt.scatter(pts[:,0], pts[:,1], c='r') # plot now because pts array changes
# Average distance to the nearest neighbor of each point
tree = KDTree(pts)
nn_avg_dist = np.mean(tree.query(pts, 2)[0][:, 1])
# Find groups of points representing each gridline
groups = []
for orientation in ['o', 'r']: # original and rotated orientations
# Rotate points 90 degrees (note: this moves pts to 2nd quadrant)
if orientation == 'r':
pts[:,1] = -1 * pts[:,1]
pts[:, [1, 0]] = pts[:, [0, 1]]
# Find reference line to compare remaining points for grouping
ref_line = find_reference_line(pts) # line is defined by two points
# Distances between points and reference line
pt_dists = np.zeros((pts.shape[0], 3))
pt_dists[:,:2] = pts
pt_dists[:,2] = np.apply_along_axis(pt_line_dist, 1, pts, ref_line).T
# Segment pts into groups w.r.t. distances (one group per gridline)
# Groups have range less than nn_avg_dist.
groups += segment_pts(pt_dists, 0.7*nn_avg_dist, orientation)
# Create dataframe of groups (OLS model requires a dataframe)
df = pd.DataFrame(np.row_stack(groups), columns=['x', 'y', 'grp'])
df['x'] = pd.to_numeric(df['x'])
df['y'] = pd.to_numeric(df['y'])
# Parallel slopes OLS model
ols_model = ols("y ~ x + grp + 0", data=df).fit()
# OLS parameters
grid_lines = ols_model.params[:-1].to_frame() # panda series to dataframe
grid_lines = grid_lines.rename(columns = {0:'b'})
grid_lines['grp'] = grid_lines.index.str[4:6]
grid_lines['m'] = ols_model.params[-1] # slope
# Rotate the rotated lines back to their original orientation
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'b'] = grid_lines['b'] / grid_lines['m']
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'm'] = -1 / grid_lines['m']
# Find grid intersection points by combinations of gridlines
comb = list(itertools.combinations(grid_lines['grp'], 2))
comb = [i for i in comb if i[0][0] != 'r']
comb = [i for i in comb if i[1][0] != 'o']
df_comb = pd.DataFrame(comb, columns=['grp', 'r_grp'])
# Merge gridline parameters with grid points
grid_pts = df_comb.merge(grid_lines.drop_duplicates('grp'),how='left',on='grp')
grid_lines.rename(columns={'grp': 'r_grp'}, inplace=True)
grid_pts.rename(columns={'b':'o_b', 'm': 'o_m', 'grp':'o_grp'}, inplace=True)
grid_pts = grid_pts.merge(grid_lines.drop_duplicates('r_grp'),how='left',on='r_grp')
grid_pts.rename(columns={'b':'r_b', 'm': 'r_m'}, inplace=True)
# Calculate x, y coordinates of gridline interception points
grid_pts['x'] = (grid_pts['r_b']-grid_pts['o_b']) \
/ (grid_pts['o_m']-grid_pts['r_m'])
grid_pts['y'] = grid_pts['o_m'] * grid_pts['x'] + grid_pts['o_b']
# Results output
print(grid_lines)
print(grid_pts)
plt.scatter(grid_pts['x'], grid_pts['y'], s=8, c='b') # for setting axes
axes = plt.gca()
axes.invert_yaxis()
axes.xaxis.tick_top()
axes.set_aspect('equal')
axes.set_xlim(axes.get_xlim())
axes.set_ylim(axes.get_ylim())
x_vals = np.array(axes.get_xlim())
for idx in grid_lines.index:
y_vals = grid_lines['b'][idx] + grid_lines['m'][idx] * x_vals
plt.plot(x_vals, y_vals, c='gray')
plt.show()
A numpy implementation of your code can be found below. As the size AvgGrid is known, I pre-allocate the required memory (rather than append). This should have speed advantages, especially if the number of output vertices is large.
import numpy as np
# Input of [x, y] coordinates of a sparse grid with errors
xys = np.array([[103,101],
[198,103],
[300, 99],
[ 97,205],
[304,202],
[102,295],
[200,303],
[104,405],
[205,394],
[298,401]])
# Function to average
def ColAvgs(CoordinateList, CutoffRatio = 1.1):
# Length of CoordinateList
L = len(CoordinateList)
# Sort input
SortedList = np.sort(CoordinateList)
# Determine indices to average
RelativeIncrease = SortedList[-(L-1):]/SortedList[:(L-1)]
CriticalIndices = np.flatnonzero(RelativeIncrease > CutoffRatio) + 1
Indices = np.hstack((0,CriticalIndices))
if (Indices[-1] != L):
Indices = np.hstack((Indices,L))
#print(Indices) # Uncomment to show index construction
# Compute averages
Avgs = np.empty((len(Indices)-1)); Avgs[:] = np.NaN
for iter in range(len(Avgs)):
Avgs[iter] = int( round(np.mean(SortedList[Indices[iter]:Indices[(iter+1)]]) ) )
# Return output
return Avgs
# Compute x- and y-coordinates of vertices
AvgsXcoord = ColAvgs(xys[:,0])
AvgsYcoord = ColAvgs(xys[:,1])
# Return all vertices
AvgGrid = np.empty((len(AvgsXcoord)*len(AvgsYcoord),2)); AvgGrid[:] = np.NaN
iter = 0
for y in AvgsYcoord:
for x in AvgsXcoord:
AvgGrid[iter, :] = np.hstack((x,y))
iter = iter+1
print(AvgGrid)
If you project all points on a vertical or horizontal axis, the problem turns to one of clustering with equally spaced clusters.
To perform these clusterings, you can consider the distances between the successive (sorted) points. They will form two clusters: short distances corresponding to noise, and longer ones for the grid size. You can solve the two-way clustering using the Otsu method.

NetworkX - Is there a way to scale a position of nodes in a graph according to node weight?

I built an app that displays graphs, and under the hood I use NetworkX to store my graphs. each node has a size, and I want to change the node positions according to those sizes (for example, a 'big' node will have more space around it than a 'small' node)
Any ideas for an algorithm/method/library/other to help me do that?
Thanks,
Adi
By default, networkx uses the Fruchterman-Reingold (FR) algorithm to determine the node layout. The FR algorithm can be modified to take node sizes into account; however, the implementation in networkx does not do this. Below is my implementation of FR that takes node sizes into account.
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
BASE_NODE_SIZE = 1e-2
BASE_EDGE_WIDTH = 1e-2
def get_fruchterman_reingold_layout(edge_list,
k = None,
scale = None,
origin = None,
initial_temperature = 1.,
total_iterations = 50,
node_size = None,
node_positions = None,
fixed_nodes = None,
*args, **kwargs
):
"""
Arguments:
----------
edge_list : m-long iterable of 2-tuples or equivalent (such as (m, 2) ndarray)
List of edges. Each tuple corresponds to an edge defined by (source, target).
origin : (float x, float y) tuple or None (default None -> (0, 0))
The lower left hand corner of the bounding box specifying the extent of the layout.
If None is given, the origin is placed at (0, 0).
scale : (float delta x, float delta y) or None (default None -> (1, 1))
The width and height of the bounding box specifying the extent of the layout.
If None is given, the scale is set to (1, 1).
k : float or None (default None)
Expected mean edge length. If None, initialized to the sqrt(area / total nodes).
total_iterations : int (default 50)
Number of iterations.
initial_temperature: float (default 1.)
Temperature controls the maximum node displacement on each iteration.
Temperature is decreased on each iteration to eventually force the algorithm
into a particular solution. The size of the initial temperature determines how
quickly that happens. Values should be much smaller than the values of `scale`.
node_size : scalar or (n,) or dict key : float (default 0.)
Size (radius) of nodes.
Providing the correct node size minimises the overlap of nodes in the graph,
which can otherwise occur if there are many nodes, or if the nodes differ considerably in size.
NOTE: Value is rescaled by BASE_NODE_SIZE (1e-2) to give comparable results to layout routines in igraph and networkx.
node_positions : dict key : (float, float) or None (default None)
Mapping of nodes to their (initial) x,y positions. If None are given,
nodes are initially placed randomly within the bounding box defined by `origin`
and `scale`.
fixed_nodes : list of nodes
Nodes to keep fixed at their initial positions.
Returns:
--------
node_positions : dict key : (float, float)
Mapping of nodes to (x,y) positions
"""
# This is just a wrapper around `_fruchterman_reingold` (which implements (the loop body of) the algorithm proper).
# This wrapper handles the initialization of variables to their defaults (if not explicitely provided),
# and checks inputs for self-consistency.
if origin is None:
if node_positions:
minima = np.min(list(node_positions.values()), axis=0)
origin = np.min(np.stack([minima, np.zeros_like(minima)], axis=0), axis=0)
else:
origin = np.zeros((2))
else:
# ensure that it is an array
origin = np.array(origin)
if scale is None:
if node_positions:
delta = np.array(list(node_positions.values())) - origin[np.newaxis, :]
maxima = np.max(delta, axis=0)
scale = np.max(np.stack([maxima, np.ones_like(maxima)], axis=0), axis=0)
else:
scale = np.ones((2))
else:
# ensure that it is an array
scale = np.array(scale)
assert len(origin) == len(scale), \
"Arguments `origin` (d={}) and `scale` (d={}) need to have the same number of dimensions!".format(len(origin), len(scale))
dimensionality = len(origin)
unique_nodes = _get_unique_nodes(edge_list)
total_nodes = len(unique_nodes)
if node_positions is None: # assign random starting positions to all nodes
node_positions_as_array = np.random.rand(total_nodes, dimensionality) * scale + origin
else:
# 1) check input dimensionality
dimensionality_node_positions = np.array(list(node_positions.values())).shape[1]
assert dimensionality_node_positions == dimensionality, \
"The dimensionality of values of `node_positions` (d={}) must match the dimensionality of `origin`/ `scale` (d={})!".format(dimensionality_node_positions, dimensionality)
is_valid = _is_within_bbox(list(node_positions.values()), origin=origin, scale=scale)
if not np.all(is_valid):
error_message = "Some given node positions are not within the data range specified by `origin` and `scale`!"
error_message += "\nOrigin : {}, {}".format(*origin)
error_message += "\nScale : {}, {}".format(*scale)
for ii, (node, position) in enumerate(node_positions.items()):
if not is_valid[ii]:
error_message += "\n{} : {}".format(node, position)
raise ValueError(error_message)
# 2) handle discrepancies in nodes listed in node_positions and nodes extracted from edge_list
if set(node_positions.keys()) == set(unique_nodes):
# all starting positions are given;
# no superfluous nodes in node_positions;
# nothing left to do
pass
else:
# some node positions are provided, but not all
for node in unique_nodes:
if not (node in node_positions):
warnings.warn("Position of node {} not provided. Initializing to random position within frame.".format(node))
node_positions[node] = np.random.rand(2) * scale + origin
# unconnected_nodes = []
for node in node_positions:
if not (node in unique_nodes):
# unconnected_nodes.append(node)
warnings.warn("Node {} appears to be unconnected. No position is computed for this node.".format(node))
del node_positions[node]
node_positions_as_array = np.array(list(node_positions.values()))
if node_size is None:
node_size = np.zeros((total_nodes))
elif isinstance(node_size, (int, float)):
node_size = BASE_NODE_SIZE * node_size * np.ones((total_nodes))
elif isinstance(node_size, dict):
node_size = np.array([BASE_NODE_SIZE * node_size[node] if node in node_size else 0. for node in unique_nodes])
if fixed_nodes is None:
is_mobile = np.ones((len(unique_nodes)), dtype=np.bool)
else:
is_mobile = np.array([False if node in fixed_nodes else True for node in unique_nodes], dtype=np.bool)
adjacency = _edge_list_to_adjacency_matrix(edge_list)
# Forces in FR are symmetric.
# Hence we need to ensure that the adjacency matrix is also symmetric.
adjacency = adjacency + adjacency.transpose()
if k is None:
area = np.product(scale)
k = np.sqrt(area / float(total_nodes))
temperatures = _get_temperature_decay(initial_temperature, total_iterations)
# --------------------------------------------------------------------------------
# --------------------------------------------------------------------------------
# main loop
for ii, temperature in enumerate(temperatures):
node_positions_as_array[is_mobile] = _fruchterman_reingold(adjacency, node_positions_as_array,
origin = origin,
scale = scale,
temperature = temperature,
k = k,
node_radii = node_size,
)[is_mobile]
node_positions_as_array = _rescale_to_frame(node_positions_as_array, origin, scale)
# --------------------------------------------------------------------------------
# --------------------------------------------------------------------------------
# format output
node_positions = dict(zip(unique_nodes, node_positions_as_array))
return node_positions
def _is_within_bbox(points, origin, scale):
return np.all((points >= origin) * (points <= origin + scale), axis=1)
def _get_temperature_decay(initial_temperature, total_iterations, mode='quadratic', eps=1e-9):
x = np.linspace(0., 1., total_iterations)
if mode == 'quadratic':
y = (x - 1.)**2 + eps
elif mode == 'linear':
y = (1. - x) + eps
else:
raise ValueError("Argument `mode` one of: 'linear', 'quadratic'.")
return initial_temperature * y
def _fruchterman_reingold(adjacency, node_positions, origin, scale, temperature, k, node_radii):
"""
Inner loop of Fruchterman-Reingold layout algorithm.
"""
# compute distances and unit vectors between nodes
delta = node_positions[None, :, ...] - node_positions[:, None, ...]
distance = np.linalg.norm(delta, axis=-1)
# assert np.sum(distance==0) - np.trace(distance==0) > 0, "No two node positions can be the same!"
# alternatively: (hack adapted from igraph)
if np.sum(distance==0) - np.trace(distance==0) > 0: # i.e. if off-diagonal entries in distance are zero
warning.warn("Some nodes have the same position; repulsion between the nodes is undefined.")
rand_delta = np.random.rand(*delta.shape) * 1e-9
is_zero = distance <= 0
delta[is_zero] = rand_delta[is_zeros]
distance = np.linalg.norm(delta, axis=-1)
# subtract node radii from distances to prevent nodes from overlapping
distance -= node_radii[None, :] + node_radii[:, None]
# prevent distances from becoming less than zero due to overlap of nodes
distance[distance <= 0.] = 1e-6 # 1e-13 is numerical accuracy, and we will be taking the square shortly
with np.errstate(divide='ignore', invalid='ignore'):
direction = delta / distance[..., None] # i.e. the unit vector
# calculate forces
repulsion = _get_fr_repulsion(distance, direction, k)
attraction = _get_fr_attraction(distance, direction, adjacency, k)
displacement = attraction + repulsion
# limit maximum displacement using temperature
displacement_length = np.linalg.norm(displacement, axis=-1)
displacement = displacement / displacement_length[:, None] * np.clip(displacement_length, None, temperature)[:, None]
node_positions = node_positions + displacement
return node_positions
def _get_fr_repulsion(distance, direction, k):
with np.errstate(divide='ignore', invalid='ignore'):
magnitude = k**2 / distance
vectors = direction * magnitude[..., None]
# Note that we cannot apply the usual strategy of summing the array
# along either axis and subtracting the trace,
# as the diagonal of `direction` is np.nan, and any sum or difference of
# NaNs is just another NaN.
# Also we do not want to ignore NaNs by using np.nansum, as then we would
# potentially mask the existence of off-diagonal zero distances.
vectors = _set_diagonal(vectors, 0)
return np.sum(vectors, axis=0)
def _get_fr_attraction(distance, direction, adjacency, k):
magnitude = 1./k * distance**2 * adjacency
vectors = -direction * magnitude[..., None] # NB: the minus!
vectors = _set_diagonal(vectors, 0)
return np.sum(vectors, axis=0)
def _rescale_to_frame(node_positions, origin, scale):
node_positions = node_positions.copy() # force copy, as otherwise the `fixed_nodes` argument is effectively ignored
node_positions -= np.min(node_positions, axis=0)
node_positions /= np.max(node_positions, axis=0)
node_positions *= scale[None, ...]
node_positions += origin[None, ...]
return node_positions
def _set_diagonal(square_matrix, value=0):
n = len(square_matrix)
is_diagonal = np.diag(np.ones((n), dtype=np.bool))
square_matrix[is_diagonal] = value
return square_matrix
def _flatten(nested_list):
return [item for sublist in nested_list for item in sublist]
def _get_unique_nodes(edge_list):
"""
Using numpy.unique promotes nodes to numpy.float/numpy.int/numpy.str,
and breaks for nodes that have a more complicated type such as a tuple.
"""
return list(set(_flatten(edge_list)))
def _edge_list_to_adjacency_matrix(edge_list, edge_weights=None):
sources = [s for (s, _) in edge_list]
targets = [t for (_, t) in edge_list]
if edge_weights:
weights = [edge_weights[edge] for edge in edge_list]
else:
weights = np.ones((len(edge_list)))
# map nodes to consecutive integers
nodes = sources + targets
unique = set(nodes)
indices = range(len(unique))
node_to_idx = dict(zip(unique, indices))
source_indices = [node_to_idx[source] for source in sources]
target_indices = [node_to_idx[target] for target in targets]
total_nodes = len(unique)
adjacency_matrix = np.zeros((total_nodes, total_nodes))
adjacency_matrix[source_indices, target_indices] = weights
return adjacency_matrix
if __name__ == '__main__':
import networkx as nx
# create a graph
n = 10 # number of nodes
G = nx.complete_graph(n)
edge_list = list(G.edges())
# compute a "spring" layout that takes node sizes into account
node_size = dict(zip(range(n), np.arange(0, 100, 10))) # dict : node ID -> node size
node_positions = get_fruchterman_reingold_layout(edge_list, node_size=node_size, k=0.01)
nx.draw(G, pos=node_positions, node_size=[300*node_size[node] for node in node_positions]); plt.show()
However, note that when you plot a graph using networkx, node sizes are given in display coordinates whereas node positions are given in data coordinates. As the display size is determined at runtime, there is no (simple) way of knowing if the two coordinate systems match such that the nodes do not overlap in the plot. Battling with that problem some time ago, I created a fork of networkxs drawing utilities, which uses data coordinates throughout. You can find the package here.

Plot arbitrary paths with constant width given in data coordinates

General aim
I am trying to write some plotting functionality that (at its core)
plots arbitrary paths with a constant width given in data coordinates
(i.e. unlike lines in matplotlib which have widths given in display coordinates).
Previous solutions
This answer achieves
the basic goal. However, this answer converts between display and data
coordinates and then uses a matplotlib line with adjusted
coordinates. The existing functionality in my code that I would like
to replace / extend inherits from matplotlib.patches.Polygon. Since
the rest of the code base makes extensive use of
matplotlib.patches.Polygon attributes and methods, I would like to
continue to inherit from that class.
Problem
My current implementation (code below) seems to come close. However,
the patch created by simple_test seems to be subtly thicker towards
the centre than it is at the start and end point, and I have no
explanation why that may be the case.
I suspect that the problem lies in the computation of the orthogonal vector.
As supporting evidence, I would like to point to the start and end points of the patch in the figure created by complicated_test, which do not seem exactly orthogonal to the path. However, the dot product of the orthonormal vector and the tangent vector is always zero, so I am not sure that what is going on here.
Output of simple_test:
Output of complicated_test:
Code
#!/usr/bin/env python
import numpy as np
import matplotlib.patches
import matplotlib.pyplot as plt
class CurvedPatch(matplotlib.patches.Polygon):
def __init__(self, path, width, *args, **kwargs):
vertices = self.get_vertices(path, width)
matplotlib.patches.Polygon.__init__(self, list(map(tuple, vertices)),
closed=True,
*args, **kwargs)
def get_vertices(self, path, width):
left = _get_parallel_path(path, -width/2)
right = _get_parallel_path(path, width/2)
full = np.concatenate([left, right[::-1]])
return full
def _get_parallel_path(path, delta):
# initialise output
offset = np.zeros_like(path)
# use the previous and the following point to
# determine the tangent at each point in the path;
for ii in range(1, len(path)-1):
offset[ii] += _get_shift(path[ii-1], path[ii+1], delta)
# handle start and end points
offset[0] = _get_shift(path[0], path[1], delta)
offset[-1] = _get_shift(path[-2], path[-1], delta)
return path + offset
def _get_shift(p1, p2, delta):
# unpack coordinates
x1, y1 = p1
x2, y2 = p2
# get orthogonal unit vector;
# adapted from https://stackoverflow.com/a/16890776/2912349
v = np.r_[x2-x1, y2-y1] # vector between points
v = v / np.linalg.norm(v) # unit vector
w = np.r_[-v[1], v[0]] # orthogonal vector
w = w / np.linalg.norm(w) # orthogonal unit vector
# check that vectors are indeed orthogonal
assert np.isclose(np.dot(v, w), 0.)
# rescale unit vector
dx, dy = delta * w
return dx, dy
def simple_test():
x = np.linspace(-1, 1, 1000)
y = np.sqrt(1. - x**2)
path = np.c_[x, y]
curve = CurvedPatch(path, 0.1, facecolor='red', alpha=0.5)
fig, ax = plt.subplots(1,1)
ax.add_artist(curve)
ax.plot(x, y) # plot path for reference
plt.show()
def complicated_test():
random_points = np.random.rand(10, 2)
# Adapted from https://stackoverflow.com/a/35007804/2912349
import scipy.interpolate as si
def scipy_bspline(cv, n=100, degree=3, periodic=False):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
periodic: True - Curve is closed
"""
cv = np.asarray(cv)
count = cv.shape[0]
# Closed curve
if periodic:
kv = np.arange(-degree,count+degree+1)
factor, fraction = divmod(count+degree+1, count)
cv = np.roll(np.concatenate((cv,) * factor + (cv[:fraction],)),-1,axis=0)
degree = np.clip(degree,1,degree)
# Opened curve
else:
degree = np.clip(degree,1,count-1)
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Return samples
max_param = count - (degree * (1-periodic))
spl = si.BSpline(kv, cv, degree)
return spl(np.linspace(0,max_param,n))
x, y = scipy_bspline(random_points, n=1000).T
path = np.c_[x, y]
curve = CurvedPatch(path, 0.1, facecolor='red', alpha=0.5)
fig, ax = plt.subplots(1,1)
ax.add_artist(curve)
ax.plot(x, y) # plot path for reference
plt.show()
if __name__ == '__main__':
plt.ion()
simple_test()
complicated_test()

Trilinear Interpolation on Voxels at specific angle

I'm currently attempting to implement this algorithm for volume rendering in Python, and am conceptually confused about their method of generating the LH histogram (see section 3.1, page 4).
I have a 3D stack of DICOM images, and calculated its gradient magnitude and the 2 corresponding azimuth and elevation angles with it (which I found out about here), as well as finding the second derivative.
Now, the algorithm is asking me to iterate through a set of voxels, and "track a path by integrating the gradient field in both directions...using the second order Runge-Kutta method with an integration step of one voxel".
What I don't understand is how to use the 2 angles I calculated to integrate the gradient field in said direction. I understand that you can use trilinear interpolation to get intermediate voxel values, but I don't understand how to get the voxel coordinates I want using the angles I have.
In other words, I start at a given voxel position, and want to take a 1 voxel step in the direction of the 2 angles calculated for that voxel (one in the x-y direction, the other in the z-direction). How would I take this step at these 2 angles and retrieve the new (x, y, z) voxel coordinates?
Apologies in advance, as I have a very basic background in Calc II/III, so vector fields/visualization of 3D spaces is still a little rough for me.
Creating 3D stack of DICOM images:
def collect_data(data_path):
print "collecting data"
files = [] # create an empty list
for dirName, subdirList, fileList in os.walk(data_path):
for filename in fileList:
if ".dcm" in filename:
files.append(os.path.join(dirName,filename))
# Get reference file
ref = dicom.read_file(files[0])
# Load dimensions based on the number of rows, columns, and slices (along the Z axis)
pixel_dims = (int(ref.Rows), int(ref.Columns), len(files))
# Load spacing values (in mm)
pixel_spacings = (float(ref.PixelSpacing[0]), float(ref.PixelSpacing[1]), float(ref.SliceThickness))
x = np.arange(0.0, (pixel_dims[0]+1)*pixel_spacings[0], pixel_spacings[0])
y = np.arange(0.0, (pixel_dims[1]+1)*pixel_spacings[1], pixel_spacings[1])
z = np.arange(0.0, (pixel_dims[2]+1)*pixel_spacings[2], pixel_spacings[2])
# Row and column directional cosines
orientation = ref.ImageOrientationPatient
# This will become the intensity values
dcm = np.zeros(pixel_dims, dtype=ref.pixel_array.dtype)
origins = []
# loop through all the DICOM files
for filename in files:
# read the file
ds = dicom.read_file(filename)
#get pixel spacing and origin information
origins.append(ds.ImagePositionPatient) #[0,0,0] coordinates in real 3D space (in mm)
# store the raw image data
dcm[:, :, files.index(filename)] = ds.pixel_array
return dcm, origins, pixel_spacings, orientation
Calculating gradient magnitude:
def calculate_gradient_magnitude(dcm):
print "calculating gradient magnitude"
gradient_magnitude = []
gradient_direction = []
gradx = np.zeros(dcm.shape)
sobel(dcm,0,gradx)
grady = np.zeros(dcm.shape)
sobel(dcm,1,grady)
gradz = np.zeros(dcm.shape)
sobel(dcm,2,gradz)
gradient = np.sqrt(gradx**2 + grady**2 + gradz**2)
azimuthal = np.arctan2(grady, gradx)
elevation = np.arctan(gradz,gradient)
azimuthal = np.degrees(azimuthal)
elevation = np.degrees(elevation)
return gradient, azimuthal, elevation
Converting to patient coordinate system to get actual voxel position:
def get_patient_position(dcm, origins, pixel_spacing, orientation):
"""
Image Space --> Anatomical (Patient) Space is an affine transformation
using the Image Orientation (Patient), Image Position (Patient), and
Pixel Spacing properties from the DICOM header
"""
print "getting patient coordinates"
world_coordinates = np.empty((dcm.shape[0], dcm.shape[1],dcm.shape[2], 3))
affine_matrix = np.zeros((4,4), dtype=np.float32)
rows = dcm.shape[0]
cols = dcm.shape[1]
num_slices = dcm.shape[2]
image_orientation_x = np.array([ orientation[0], orientation[1], orientation[2] ]).reshape(3,1)
image_orientation_y = np.array([ orientation[3], orientation[4], orientation[5] ]).reshape(3,1)
pixel_spacing_x = pixel_spacing[0]
# Construct affine matrix
# Method from:
# http://nipy.org/nibabel/dicom/dicom_orientation.html
T_1 = origins[0]
T_n = origins[num_slices-1]
affine_matrix[0,0] = image_orientation_y[0] * pixel_spacing[0]
affine_matrix[0,1] = image_orientation_x[0] * pixel_spacing[1]
affine_matrix[0,3] = T_1[0]
affine_matrix[1,0] = image_orientation_y[1] * pixel_spacing[0]
affine_matrix[1,1] = image_orientation_x[1] * pixel_spacing[1]
affine_matrix[1,3] = T_1[1]
affine_matrix[2,0] = image_orientation_y[2] * pixel_spacing[0]
affine_matrix[2,1] = image_orientation_x[2] * pixel_spacing[1]
affine_matrix[2,3] = T_1[2]
affine_matrix[3,3] = 1
k1 = (T_1[0] - T_n[0])/ (1 - num_slices)
k2 = (T_1[1] - T_n[1])/ (1 - num_slices)
k3 = (T_1[2] - T_n[2])/ (1 - num_slices)
affine_matrix[:3, 2] = np.array([k1,k2,k3])
for z in range(num_slices):
for r in range(rows):
for c in range(cols):
vector = np.array([r, c, 0, 1]).reshape((4,1))
result = np.matmul(affine_matrix, vector)
result = np.delete(result, 3, axis=0)
result = np.transpose(result)
world_coordinates[r,c,z] = result
# print "Finished slice ", str(z)
# np.save('./data/saved/world_coordinates_3d.npy', str(world_coordinates))
return world_coordinates
Now I'm at the point where I want to write this function:
def create_lh_histogram(patient_positions, dcm, magnitude, azimuthal, elevation):
print "constructing LH histogram"
# Get 2nd derivative
second_derivative = gaussian_filter(magnitude, sigma=1, order=1)
# Determine if voxels lie on boundary or not (thresholding)
# Still have to code out: let's say the thresholded voxels are in
# a numpy array called voxels
#Iterate through all thresholded voxels and integrate gradient field in
# both directions using 2nd-order Runge-Kutta
vox_it = voxels.nditer(voxels, flags=['multi_index'])
while not vox_it.finished:
# ???

Categories