Nx3 column data to 2d matrix for image processing - python

I am trying to find local maxima and countours in a Nx3 data in format ('x','y','value') i read from a text file; 'x' and 'y' form an evenly spaced grid and there is single value for every combination of 'x','y', it looks like this:
3.0, -0.4, 56.94369888305664
3.0, -0.3, 56.97200012207031
3.0, -0.2, 56.77149963378906
3.0, -0.1, 56.41230010986328
3.0, 0, 55.8302001953125
3.0, 0.1, 55.81560134887695
3.0, 0.2, 55.600399017333984
3.0, 0.3, 55.51969909667969
3.0, 0.4, 55.18550109863281
3.2, -0.4, 56.26380157470703
3.2, -0.3, 56.228599548339844
...
The problem is that the image code I am trying to use(link) requires the data to be in a different 2d matrix format for image processing. This is the relevant part of the code:
# Construct some test data
x, y = np.ogrid[-np.pi:np.pi:100j, -np.pi:np.pi:100j]
r = np.sin(np.exp((np.sin(x)**3 + np.cos(y)**2)))
# Find contours at a constant value of 0.8
contours = measure.find_contours(r, 0.8)
Can somebody help transform my data to the required 'grided' format?
EDIT: I finally went for pandas but I find the chosen answer better in the general case.This is what I did:
from pandas import read_csv
data=read_csv(filename, names=['x','y','values']).pivot(index='x', columns='y',
values='values')
After this data.values holds the table in 2d 'image form' the like I wanted.
y -0.4 -0.3 -0.2 -0.1
x
3.0 86.9423 87.6398 87.5256 89.5779
3.2 76.9414 77.7743 78.8633 76.8955
3.4 71.4146 72.8257 71.7210 71.5232

The best solution really depends on details your not giving. By the way, you should really give your code, or at least the np.loadtxt instruction.
In the following, "data" is the array loaded from the file using:
data = np.loadtxt('file.txt', [('x',float), ('y',float), ('value',float)])
1) Direct reshape:
Following on what #tom10 said
If you know that your (x,y,value) data is stored in the specific order:
[(x0,y0,v00), (x0,y1,v01), .... , (x1,y0,v10),(x1,y1,v11), ... ,(xN,yM,vNM)]
And that the values of all (x,y) pairs are given. Then the best is to make a 1D numpy array from your list of values and reshape it:
x = np.unique(data['x'])
y = np.unique(data['y'])
r = data['value'].reshape((x.size,y.size))
2) General cases:
see Populate arrays in python (numpy)? for a similar question and an other solution using dictionaries
If your cannot guaranty anything else than having (x,y,value) tuples:
# indexing: list of x and y coordinates, and functions that map them to index
x = np.unique(data['x']).tolist()
y = np.unique(data['y']).tolist()
ix = np.vectorize(lambda i: x.index(i), otypes='i')
iy = np.vectorize(lambda j: y.index(j), otypes='i')
# create output array
r = np.zeros((x.size,y.size), float) # default value is 0
r[ix(data['x']), iy(data['y'])] = data['value']
Note: In the reference given above, an other approach using dictionaries is given. I think this is more readable, but I did not test their relative speed.
3) Intermediate cases?
You might have an intermediate case, between a regular grid coordinates given in a specific order and no constraint at all. The general case being potentially very slow, you should design your algorithm to take advantage of any rule your data follow.
One example is if you know that the x-y indexing follow a specific rule, but are not necessarily given in order. For instance, if you know that the x and y are equally spaced "grid" coordinates, of the form:
coordinate = min_coordinate + i*step
Then find min_coordinate and step (for both x and y), and find i by solving this equation. This way, you avoid the costly index mapping np.vectorized(... list.index(...)):
x = np.unique(data['x'])
y = np.unique(data['y'])
ix = (data['x']-x.min())/(x[1]-x[0])
iy = (data['y']-y.min())/(y[1]-y[0])
# create output array
r = np.ones((x.size,y.size), float)*np.nan # default value is NaN
r[ix.astype(int), iy.astype(int)] = data['value']

For program you're using, you just need the data to be rectangular array of z values (in the example they give they just use x and y to construct z, but then never use them again). It looks like you have array that's 9 by N (where N is something you don't show). One easy way to get this is to just read the data in as a flat collection of z values, skipping the x,y values, reshape to set the shape you'd like. (I can't really write the code for this because you haven't given enough info, but it shouldn't be difficult.)

Related

Pandas: Create a binary column randomly but with specific proportions

I am trying to create a new random binary column in my table and it needs to have 60% of values as 1 and 40% of values as 0. I have tried to use the np.random.choice function from the numpy package like the following, however, the proportion changes everytime I run my code.
np.random.choice(a = [0,1], size = len(df), p = [0.4, 0.6])
I need to have these proportions fixed. Can anyone help how it can be done? Thank you!
This is how you create an numpy array of size 100 with the distribution of 1 and 0 that you wanted and store it in variable m:
import numpy as np
m = np.random.choice(a = [0,1], size = 100, p = [0.4, 0.6])
I don't know anything about your pandas data frame, because you didn't post your source code here. Therefore I can't tell you, why len(df) is different each time.

How to create a two-column matrix in rpy2

I'm using rpy2 to run a method from an R library. According to the documentation:
method_name(x, range.x)
x: a two-column numeric matrix.
range.x: a list containing two vectors.
And it includes an example:
data(geyser, package="MASS")
x <- cbind(geyser$duration, geyser$waiting)
est <- method_name(x, range.x)
I checked the type of geyser$duration and geyser$waiting and both are double. I also tried replacing geyser$duration and geyser$waiting by g = c(.016, 2.15, 4.00) and h = c(.012, 2.11, 2.50) in R, and the code still works.
In my current Python code, I have:
import numpy as np
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector, FloatVector, ListVector # I tried these before too
from rpy2.robjects import numpy2ri, pandas2ri
numpy2ri.activate()
pandas2ri.activate()
base = rpackages.importr(('base'))
a = np.array([1.2, 2.1, 2.5]); b = np.array([5.2, 1.3, 2.15])
x = base.cbind(base.c(a), base.c(b))
ranges = base.range(x)
result = method_name(x, ranges)
As you can see, I'm trying to make my code as similar to the example as possible. However, I can't make the method work. I get the error Error in seq.default(a[2L], b[2L], length = M[2L]) which probably has to do with a problem in the arguments.
There's and obvious problem with ranges because it contains just two values, the minimum and maximum of x, however, it should contain two minimum values and to maximum values (one pair for each column of the matrix). I can achieve that by doing this:
ranges = base.cbind(base.range(a), base.range(b))
But this implies that there's a problem with the way I'm creating the matrix. Otherwise, I would get two pairs of values just by using base.range(x).
I also tried x = robjects.r.matrix(x, ncol = 2) but didn't work. I still get just a global minimum and maximum value for the whole matrix when calling range.
What is the correct way of creating this matrix so that the method can run?
According to the documentation of the range function, it accepts as input vectors (one dimensional arrays). Thus, it would work by applying it to each column of your matrix or by applying it first directly to the a and b elements as you have mentioned. Thus, you second approach should work
# Define a,b vectors
a = np.array([1.2, 2.1, 2.5]); b = np.array([5.2, 1.3, 2.15])
# Calculate vector ranges
range_a = base.range(base.c(a))
range_b = base.range(base.c(b))
# Define the matrix
x = base.cbind(base.c(a), base.c(b))
print(x)
>>> [[1.2 5.2 ]
[2.1 1.3 ]
[2.5 2.15]]
# Define the ranges
ranges = base.cbind(range_a, range_b)
print(ranges)
>>> [[1.2 1.3]
[2.5 5.2]]

Incorrect results for simple 2D transformation

I'm attempting a 2D transformation using the nudged package.
The code is really simple:
import nudged
# Domain data
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
x, y = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom, ran = (x_d, y_d), (x_r, y_r)
# Obtain transformation dom --> ran
trans = nudged.estimate(dom, ran)
# Apply the transformation to the (x, y) points
x_t, y_t = trans.transform((x, y))
where (x_d, y_d) and (x_r, y_r) are the 1 to 1 correlated "domain" and "range" points, and (x, y) are all the points in the (x_d, y_d) (domain) system that I want to transform to the (x_r, y_r) (range) system.
This is the result I get:
where:
trans.get_matrix()
[[-0.0006459232439068067, -0.0007947429558548157, 6.534164085946009], [0.0007947429558548157, -0.0006459232439068067, 2.515279819707991], [0, 0, 1]]
trans.get_rotation()
2.2532603497070713
trans.get_scale()
0.0010241255796531702
trans.get_translation()
[6.534164085946009, 2.515279819707991]
This is the final transformed dom values with the original ran points overlayed:
This is clearly not right and I can't figure out what I'm doing wrong.
I was able to figure out your issue. It is simply that nudge has somewhat problematic notation, which is poorly documented.
The estimate function accepts a list of coordinate pairs. You effectively have to transpose dom and ran to get this to work. I suggest either switching to numpy arrays, or using list(map(list, zip(...))) to do the transpose.
The Transform.transfom method is extremely restrictive, and requires that the inner pairs be of type list. Not tuple, not any other sequence, but specifically list. Your attempt to call trans.transform((x, y)) only happened to work by pure luck. transform assessed that the first element is not a list, and attempted to transform (x, y) as a pair of integers. Luckily for you, numpy operators are vectorized, so you can process an entire array as a single unit.
Here is a working version of your code that generates the correct plots using mostly python:
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
uni = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom = list(map(list, zip(x_d, y_d)))
ran = list(map(list, zip(x_r, y_r)))
# Obtain transformation dom --> ran
trans = estimate(dom, ran)
# Apply the transformation to the (x, y) points
tra = trans.transform(uni)
fig, ax = plt.subplots(2, 2)
ax[0][0].scatter(x_d, y_d)
ax[0][0].set_title('dom')
ax[0][1].scatter(x_r, y_r)
ax[0][1].set_title('ran')
ax[1][0].scatter(*uni)
ax[1][1].scatter(*tra)
I left in your hack with uni, since I did not feel like converting the array of random values to a nested list. The resulting plot looks like this:
My overall recommendation is to submit a number of bug reports to the nudge library based on these findings.

Confused by output from Polyval

I have some data from which I have fitted a 2nd order polynomial using numpy.polynomial.polynomial.polyfit
data_fit = poly.polyfit(length_spline_a, temp_spline_b, 2)
I am examining changes in length, and have a list of 10% changes in length
len_steps = 0.0, -0.012573565669572757, -0.025147131339145513, -0.03772069700871827...
print (len(len_steps)
>>>>11
My assumption was that polyval would solve for y for each of the x values in the len_steps list
y_data = poly.polyval(data_fit, len_steps)
However this provides a list with only 3 data points rather than the 11 I expected.
print(y_data)
>>>>[-5.34112443e+21 -2.50395581e+28 -6.75169134e+28]
Have I mis understood the purpose of polyval or have I done something wrong?
The output works if I reverse the list order to y_data = poly.polyval(len_steps, data_fit) which is actually clear now I re-read the documents. I think I was using the np.polyval syntax which expects the coefs first.

How to interpolate a line between two other lines in python

Note: I asked this question before but it was closed as a duplicate, however, I, along with several others believe it was unduely closed, I explain why in an edit in my original post. So I would like to re-ask this question here again.
Does anyone know of a python library that can interpolate between two lines. For example, given the two solid lines below, I would like to produce the dashed line in the middle. In other words, I'd like to get the centreline. The input is a just two numpy arrays of coordinates with size N x 2 and M x 2 respectively.
Furthermore, I'd like to know if someone has written a function for this in some optimized python library. Although optimization isn't exactly a necessary.
Here is an example of two lines that I might have, you can assume they do not overlap with each other and an x/y can have multiple y/x coordinates.
array([[ 1233.87375018, 1230.07095987],
[ 1237.63559365, 1253.90749041],
[ 1240.87500801, 1264.43925132],
[ 1245.30875975, 1274.63795396],
[ 1256.1449357 , 1294.48254424],
[ 1264.33600095, 1304.47893299],
[ 1273.38192911, 1313.71468591],
[ 1283.12411536, 1322.35942538],
[ 1293.2559388 , 1330.55873344],
[ 1309.4817002 , 1342.53074698],
[ 1325.7074616 , 1354.50276051],
[ 1341.93322301, 1366.47477405],
[ 1358.15898441, 1378.44678759],
[ 1394.38474581, 1390.41880113]])
array([[ 1152.27115094, 1281.52899302],
[ 1155.53345506, 1295.30515742],
[ 1163.56506781, 1318.41642169],
[ 1168.03497425, 1330.03181319],
[ 1173.26135672, 1341.30559949],
[ 1184.07110925, 1356.54121651],
[ 1194.88086178, 1371.77683353],
[ 1202.58908737, 1381.41765447],
[ 1210.72465255, 1390.65097106],
[ 1227.81309742, 1403.2904646 ],
[ 1244.90154229, 1415.92995815],
[ 1261.98998716, 1428.56945169],
[ 1275.89219696, 1438.21626352],
[ 1289.79440676, 1447.86307535],
[ 1303.69661656, 1457.50988719],
[ 1323.80994319, 1470.41028655],
[ 1343.92326983, 1488.31068591],
[ 1354.31738934, 1499.33260989],
[ 1374.48879779, 1516.93734053],
[ 1394.66020624, 1534.54207116]])
Visualizing this we have:
So my attempt at this has been using the skeletonize function in the skimage.morphology library by first rasterizing the coordinates into a filled in polygon. However, I get branching at the ends like this:
First of all, pardon the overkill; I had fun with your question. If the description is too long, feel free to skip to the bottom, I defined a function that does everything I describe.
Your problem would be relatively straightforward if your arrays were the same length. In that case, all you would have to do is find the average between the corresponding x values in each array, and the corresponding y values in each array.
So what we can do is create arrays of the same length, that are more or less good estimates of your original arrays. We can do this by fitting a polynomial to the arrays you have. As noted in comments and other answers, the midline of your original arrays is not specifically defined, so a good estimate should fulfill your needs.
Note: In all of these examples, I've gone ahead and named the two arrays that you posted a1 and a2.
Step one: Create new arrays that estimate your old lines
Looking at the data you posted:
These aren't particularly complicated functions, it looks like a 3rd degree polynomial would fit them pretty well. We can create those using numpy:
import numpy as np
# Find the range of x values in a1
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
# Create an evenly spaced array that ranges from the minimum to the maximum
# I used 100 elements, but you can use more or fewer.
# This will be used as your new x coordinates
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
# Fit a 3rd degree polynomial to your data
a1_coefs = np.polyfit(a1[:,0],a1[:,1], 3)
# Get your new y coordinates from the coefficients of the above polynomial
new_a1_y = np.polyval(a1_coefs, new_a1_x)
# Repeat for array 2:
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], 3)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
The result:
That's not bad so bad! If you have more complicated functions, you'll have to fit a higher degree polynomial, or find some other adequate function to fit to your data.
Now, you've got two sets of arrays of the same length (I chose a length of 100, you can do more or less depending on how smooth you want your midpoint line to be). These sets represent the x and y coordinates of the estimates of your original arrays. In the example above, I named these new_a1_x, new_a1_y, new_a2_x and new_a2_y.
Step two: calculate the average between each x and each y in your new arrays
Then, we want to find the average x and average y value for each of our estimate arrays. Just use np.mean:
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
midx and midy now represent the midpoint between our 2 estimate arrays. Now, just plot your original (not estimate) arrays, alongside your midpoint array:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
And voilĂ :
This method still works with more complex, noisy data (but you have to fit the function thoughtfully):
As a function:
I've put the above code in a function, so you can use it easily. It returns an array of your estimated midpoints, in the format you had your original arrays in.
The arguments: a1 and a2 are your 2 input arrays, poly_deg is the degree polynomial you want to fit, n_points is the number of points you want in your midpoint array, and plot is a boolean, whether you want to plot it or not.
import matplotlib.pyplot as plt
import numpy as np
def interpolate(a1, a2, poly_deg=3, n_points=100, plot=True):
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, n_points)
a1_coefs = np.polyfit(a1[:,0],a1[:,1], poly_deg)
new_a1_y = np.polyval(a1_coefs, new_a1_x)
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, n_points)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], poly_deg)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(n_points)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(n_points)]
if plot:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
return np.array([[x, y] for x, y in zip(midx, midy)])
[EDIT]:
I was thinking back on this question, and I overlooked a simpler way to do this, by "densifying" both arrays to the same number of points using np.interp. This method follows the same basic idea as the line-fitting method above, but instead of approximating lines using polyfit / polyval, it just densifies:
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
new_a1_y = np.interp(new_a1_x, a1[:,0], a1[:,1])
new_a2_y = np.interp(new_a2_x, a2[:,0], a2[:,1])
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
The "line between two lines" is not so well defined. You can obtain a decent though simple solution by triangulating between the two curves (you can triangulate by progressing from vertex to vertex, choosing the diagonals that produce the less skewed triangle).
Then the interpolated curve joins the middles of the sides.
I work with rivers, so this is a common problem. One of my solutions is exactly like the one you showed in your question--i.e. skeletonize the blob. You see that the boundaries have problems, so what I've done that seems to work well is to simply mirror the boundaries. For this approach to work, the blob must not intersect the corners of the image.
You can find my implementation in RivGraph; this particular algorithm is in rivers/river_utils.py called "mask_to_centerline".
Here's an example output showing how the ends of the centerline extend to the desired edge of the object:
sacuL's solution almost worked for me, but I needed to aggregate more than just two curves.
Here is my generalization for sacuL's solution:
def interp(*axis_list):
min_max_xs = [(min(axis[:,0]), max(axis[:,0])) for axis in axis_list]
new_axis_xs = [np.linspace(min_x, max_x, 100) for min_x, max_x in min_max_xs]
new_axis_ys = [np.interp(new_x_axis, axis[:,0], axis[:,1]) for axis, new_x_axis in zip(axis_list, new_axis_xs)]
midx = [np.mean([new_axis_xs[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
midy = [np.mean([new_axis_ys[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
for axis in axis_list:
plt.plot(axis[:,0], axis[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
If we now run an example:
a1 = np.array([[x, x**2+5*(x%4)] for x in range(10)])
a2 = np.array([[x-0.5, x**2+6*(x%3)] for x in range(10)])
a3 = np.array([[x+0.2, x**2+7*(x%2)] for x in range(10)])
interp(a1, a2, a3)
we get the plot:

Categories