I am trying to calculate marketsheds using the skimage.MCP_geometric find_costs function. It has been working wonderfully to calculate least-cost routes, but rather than finding the travel cost to the nearest source, I want to calculate the index of the nearest source.
Sample Code
import numpy as np
import skimage.graph as graph
import copy
img = np.array([[1,1,2,2],[2,1,1,3],[3,2,1,2],[2,2,2,1]])
mcp = graph.MCP_Geometric(img)
destinations = [[0,0],[3,3]]
costs, traceback = mcp.find_costs(destinations)
print(costs)
[[0. 1. 2.5 4.5 ]
[1.5 1.41421356 2.41421356 4. ]
[4. 2.91421356 1.41421356 1.5 ]
[5.5 3.5 1.5 0. ]]
This works as expected, and creates a nice travel cost raster. However, I want (for each cell) to know which of the destinations is the closest. The best solution I have found is to run each of the destinations separately, then combine them through min calculations. It works, but is slow, and has not been working at scale.
all_c = []
for dest in destinations:
costs, traceback = mcp.find_costs([dest])
all_c.append(copy.deepcopy(costs))
res = np.dstack(all_c)
res_min = np.amin(res, axis=2)
output = np.zeros([res_min.shape[0], res_min.shape[1]])
for idx in range(0, res.shape[2]):
cur_data = res[:,:,idx]
cur_val = (cur_data == res_min).astype(np.byte) * idx
output = output + cur_val
output = output.astype(np.byte)
print(output)
array([[0, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]], dtype=int8)
I have been looking into overloading the functions of MCP_Geometric and MCP_Flexible, but I cannot find anything providing information on the index of the destination.
Hope that provides enough information to replicate and understand what I want to do, thanks!
Ok, this is a bit of a ride, but it was fun to figure out. I'm unclear just how fast it'll be but I think it should be pretty fast in the case of many destinations and comfortably-in-RAM images.
The key is the traceback return value, which kinda-sorta tells you the neighbor index to get to the nearest destination. So with a bit of pathfinding you should be able to find that destination. Can that be fast? It turns out it can, with a bit of NumPy index wrangling, scipy.sparse matrices, and connected_components from scipy.sparse.csgraph!
Let's start with your same costs array and both destinations:
import numpy as np
image = np.array(
[[1, 1, 2, 2],
[2, 1, 1, 3],
[3, 2, 1, 2],
[2, 2, 2, 1]]
)
destinations = [[0, 0], [3, 3]]
We then make the graph, and get the costs and the traceback:
from skimage import graph
mcp = graph.MCP_Geometric(image)
costs, traceback = mcp.find_costs(destinations)
print(traceback)
gives:
[[-1 4 4 4]
[ 6 7 7 1]
[ 6 6 0 1]
[ 3 3 3 -1]]
Now, I had to look up the documentation for what traceback is:
Same shape as the costs array; this array contains the offset to
any given index from its predecessor index. The offset indices
index into the offsets attribute, which is a array of n-d
offsets. In the 2-d case, if offsets[traceback[x, y]] is (-1, -1),
that means that the predecessor of [x, y] in the minimum cost path
to some start position is [x+1, y+1]. Note that if the
offset_index is -1, then the given index was not considered.
For some reason, my mcp object didn't have an offsets attribute — possibly a Cython inheritance bug? Dunno, will investigate later — but searching the source code shows me that offsets is defined with the skimage.graph._mcp.make_offsets function. So I did a bad thing and imported from that private module, so I could claim what was rightfully mine — the offsets list, which translates from numbers in traceback to offsets in the image coordinates:
from skimage.graph import _mcp
offsets = _mcp.make_offsets(2, True)
print(offsets)
which gives:
[array([-1, -1]),
array([-1, 0]),
array([-1, 1]),
array([ 0, -1]),
array([0, 1]),
array([ 1, -1]),
array([1, 0]),
array([1, 1])]
Now, there's one last thing to do with the offsets: you'll note that destinations are marked in the traceback with "-1", which doesn't correspond to the last element of the offsets array. So we append np.array([0, 0]), and then every value in traceback corresponds to a real offset. In the case of destinations, you get a self-edge, but that's fine.
offsets.append(np.array([0, 0]))
offsets_arr = np.array(offsets) # shape (9, 2)
Now, we can build a graph from offsets, pixel coordinates, and pixel ids. First, we use np.indices to get an index for every pixel in the image:
indices = np.indices(traceback.shape)
print(indices.shape)
gives:
(2, 4, 4)
To get an array that has, for each pixel, the offset to its neighbor, we use fancy array indexing:
offset_to_neighbor = offsets_arr[traceback]
print(offset_to_neighbor.shape)
which gives:
(4, 4, 2)
The axes are different between the traceback and the numpy indices, but nothing a little transposition won't fix:
neighbor_index = indices - offset_to_neighbor.transpose((2, 0, 1))
Finally, we want to deal with integer pixel ids in order to create a graph of all the pixels, rather than coordinates. For this, we use np.ravel_multi_index.
ids = np.arange(traceback.size).reshape(image.shape)
neighbor_ids = np.ravel_multi_index(
tuple(neighbor_index), traceback.shape
)
This gives me a unique ID for each pixel, and then a unique "step towards the destination" for each pixel:
print(ids)
print(neighbor_ids)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 0 1 2]
[ 0 0 1 11]
[ 4 5 15 15]
[13 14 15 15]]
Then we can turn this into a graph using SciPy sparse matrices. We don't care about weights for this graph so we just use the value 1 for the edges.
from scipy import sparse
g = sparse.coo_matrix((
np.ones(traceback.size),
(ids.flat, neighbor_ids.flat),
shape=(ids.size, ids.size),
)).tocsr()
(This uses the (value, (row, column)) or (data, (i, j)) input format for sparse COOrdinate matrices.)
Finally, we use connected components to get the graphs — the groups of pixels that are nearest to each destination. The function returns the number of components and the mapping of "pixel id" to component:
n, components = sparse.csgraph.connected_components(g)
basins = components.reshape(image.shape)
print(basins)
[[0 0 0 0]
[0 0 0 1]
[0 0 1 1]
[1 1 1 1]]
(Note that this result is slightly different from yours because the cost is identical to destination 0 and 1 for the pixels in question, so it's arbitrary which to label.)
print(costs)
[[0. 1. 2.5 4.5 ]
[1.5 1.41421356 2.41421356 4. ]
[4. 2.91421356 1.41421356 1.5 ]
[5.5 3.5 1.5 0. ]]
Hope this helps!
Related
For example, I got the symmetric adjacency matrix(no self-loop), i.e
A =
[
[0, 1, 1, 0, 0],
[1, 0, 1, 0, 1],
[1, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[0, 1, 0, 1, 0]]
Then, I got an array of cluster label i, associated with this graph, i.e
cluster = [1,1,2,2,3]
It means that the node 1 and node 2 are in the same group, node 3 and node 4 are in the same group, node 5 are in a independent group.
The question is that how can I get the sum of the edges within groups and between groups,
Within groups: means the edge between nodes that share the same cluster label, for example, node 1 and node 2 are the in the same group, so the sum is 1 for them, the same for node 3 and node 4. For node 5, its 0.
Between groups: means the edge between nodes that share different cluster label, for example, group 1 and group 2, it means that sum the edge of, node 1 node 3, node 1 node 4, node 2 node 3, node 2 node 4. The answer is 2 between group 1 and group 2.
then return a 2d symmetric array contains the result, i.e
[[1,2,1],
[2,1,1],
[1,1,0]]
You can use matrix algebra;
cluster = np.array(cluster)
# create cluster-node adjacency matrix
aux = np.identity(cluster.max(),int)[cluster-1]
# we can now count by multiplying
out = aux.T#A#aux
# fix diagonal (which was counted twice)
np.einsum("ii->i",out)[...] //= 2
out
# array([[1, 2, 1],
# [2, 1, 1],
# [1, 1, 0]])
To speed this up we can replace the matrix product with
(1) if nodes are sorted by cluster:
boundaries = np.diff(cluster,prepend=-1).nonzero()[0]
out = np.add.reduceat(np.add.reduceat(A,boundaries,1),boundaries,0)
(2) if not:
nc = cluster.max()
out = np.zeros((nc,nc),int)
np.add.at(out,(cluster[:,None]-1,cluster-1),A)
This will return an array with element [i,j] the sum of edges of corresponding clusters i and j:
n = cluster.max()
degrees = np.zeros((n,n))
idx = [np.where(cluster==i)[0] for i in np.arange(n)+1]
for i in range(n):
degrees[i,i] = A[np.ix_(idx[i],idx[i])].sum()/2
for j in range(i):
degrees[i,j] = degrees[j,i] = A[np.ix_(idx[i],idx[j])].sum()
output:
[[1. 2. 1.]
[2. 1. 1.]
[1. 1. 0.]]
You can use itertools as well but I think this is probably faster.
Hi I'm learning tensorflow right now and I am have a sparse dataset which is made up of three columns, date, bond, spread. I figured that storing this data in sparse tensor with bond as one dimension, and date as another will make operations on this tensor feel natural, do let me know if you think there is a better way.
I am trying to perform arithmetic on two slices of the tensor where I add/subtract values on one date only if given tensor values is not empty, and while I found some functions that help me with that task I can't shake off the feeling that I'm missing a really simple solution to the problem.
Using data bellow:
import tensorflow as tf
tf.enable_eager_execution()
indicies = [[0, 0], [0, 1], [1, 0], [1, 2], [2, 2]]
values = [10 , 10 , 10 , 11 , 11 ]
spreads = tf.sparse.SparseTensor(indicies, values, [3, 3])
In above example I intend to use dimension one for dates, and dimension two for bonds such that
tf.sparse.slice(spreads,[0,2],[3,1])
Gives me all spreads for date2, but apparently subtraction is not supported for SparseTensor, nor can I use tf.math.subtract. So I am no longer sure what is supported.
Specifically what I want to accomplish in above example is subtract date 0 for all other dates only if bond has spread on both dates. For Example bond 0 shows up in date 0 and 1 but not date 2 so I want to subtract spread in date 0 from both dates 0 and 1.
Final tensor would look like this:
indicies2 = [[0, 0], [0, 1], [1, 0], [1, 2]]
output = [ 0 , 0 , 0, , 1 ]
tf.sparse.to_dense(tf.sparse(tf.sparse.SparseTensor(indicies2, output, [3, 3])))
tf.Tensor: id=128, shape=(3, 3), dtype=int32, numpy=
array([[0, 0, 0],
[ 0, 0, 1],
[ 0, 0, 0]])
I guess easy solution would be to use tf.sparse.to_dense but that kind of defeats the whole point of using SparseTensor, so I'm not really sure if I missed something in API docs that makes my solution possible or did I got wrong completely by trying to use SparseTensor?
At the end of the day I am just looking to perform some math for each value of a tensor if that value has a match in another tensor.
UPDATE:
I realized that I can apply tf.math/negative to one of the slices to subtract two slices problem is that output assumes that if value in one slice is missing then it can be assumed to be some default value(zero).
I'm not sure there is any simple trick to make that work that easily. I would either make the dense computation or write the sparse computation myself. That is a bit trickier, so probably only worth it if the data is really very sparse and you would save a lot memory and computation. Here is a way to do that:
import tensorflow as tf
tf.enable_eager_execution()
bonds = [0, 0, 1, 1, 2]
dates = [0, 1, 0, 2, 2]
values = [10, 10, 10, 11, 11]
# Find date 0 data
m0 = tf.equal(dates, 0)
bonds0 = tf.boolean_mask(bonds, m0)
values0 = tf.boolean_mask(values, m0)
# Find where date 0 bonds are
match = tf.equal(tf.expand_dims(bonds, 1), bonds0)
# Compute the amount to subtract from each data point
values_sub = tf.reduce_sum(values0 * tf.dtypes.cast(match, values0.dtype), 1)
# Compute new spread values
values_new = values - values_sub
# Mask null values
m_valid = tf.not_equal(values_new, 0)
bonds_new = tf.boolean_mask(bonds, m_valid)
dates_new = tf.boolean_mask(dates, m_valid)
values_new = tf.boolean_mask(values_new, m_valid)
# Make sparse tensor
indices_new = tf.dtypes.cast(tf.stack([bonds_new, dates_new], 1), tf.int64)
spreads_new = tf.sparse.SparseTensor(indices_new, values_new, [3, 3])
tf.print(spreads_new)
# 'SparseTensor(indices=[[1 2]
# [2 2]], values=[1 11], shape=[3 3])'
For the example that you give, I get the outputs (1, 2) => 1 and (2, 2) => 11 - (2, 2) is unaffected because there was no spread for 2 in date 0. That is different from what you wrote, so let me know if that is not what you meant.
I am trying to compute a distance between an element and a starting point in an array.
Here is an array
assume the element (0,1) is a starting point which has the highest value currently.
a neighbors is an element around a specific point if they have one axis in common and different in another axis by 1 unit.
generally, a neighbor could be the top, bottom, left, right of a specific point, which is inside the array.
the task is to label every elements with a distance value indicate how far it is from the starting point (0,1).
ds = np.array([[1, 2, 1],
[1, 1, 0],
[0, 1, 1]])
dist = np.full_like(ds, -1)
p0 = np.where(ds == 2)
dist[p0] = 0
que = []
que.append(p0)
nghb_x = [0, 0, -1, 1]
nghb_y = [-1, 1, 0, 0]
while len(que):
x, y = que.pop()
d = dist[(x,y)]
for idx0, idx1 in zip(nghb_x, nghb_y):
tmp_x = x + idx0
tmp_y = y + idx1
if np.any(tmp_x >= 0) and np.any(tmp_x < ds.shape[0]) and np.any(tmp_y >= 0) and np.any(tmp_y < ds.shape[1]) and np.any(dist[(tmp_x,tmp_y)] == -1):
dist[(tmp_x,tmp_y)] = d + 1 # distance = distance(x) + 1
que.append((tmp_x, tmp_y))
print('dist:')
print(dist)
the output
dist:
[[1 0 1]
[2 1 2]
[3 2 3]]
is as expected though, I would like to know if is there a more efficient way to do this?
You're calculating the Manhattan distance (the x-distance plus the y-distance) from a target point for each point.
You can use a numpy function to do it in one step, given the target coordinates and the shape of the array:
target = (0, 1)
np.fromfunction(lambda x,y: np.abs(target[0]-x) + np.abs(target[1]-y), ds.shape)
Result:
[[1. 0. 1.]
[2. 1. 2.]
[3. 2. 3.]]
Demo: https://repl.it/repls/TrustyUnhappyFlashdrives
Before I implement a Neural Network, I need to preprocess some data. But I'm a noob in math and I can't find a function in Python to do what I want.
I have matrix like this:
[[0 4 ... 0 ]
[0 3 ... 6 ]
[0 3 ... 10]]
And I have a number, for example 7, which determines how many rows I must have in my new matrix after the transformation. What I want to achieve is this:
[[0 4 ... 0 ]
[0 3.66 ... 2 ]
[0 3.33 ... 4 ]
[0 3 ... 6 ]
[0 3 ... 7.33]
[0 3 ... 8.66]
[0 3 ... 10 ]]
You see that first column doesn't change because for each row in the original matrix the first is zero.
For the second column the first row decrease slowly on the four first row from 4 to 3 and after it stabilizes.
And finally the last column increases from 0 to 10 passing by 6.
A math student told me that was an interpolation, but I can't find in scipy's documentation how to do that correctly.
Do you have an idea how I can do that?
You can use numpy.interp. As it is only for 1D, I used a for loop.
import numpy as np
# You input matrix:
a = np.array([[0, 4, 0], [0, 3, 6], [0, 3, 10]])
# Put the shape you need here:
old_dim, new_dim = a.shape[1], 7
# Define new matrix
b = np.zeros((7, a.shape[1]))
# Define linspace that will serve for interpolation
nls, ols = np.linspace(0, 1, new_dim), np.linspace(0, 1, old_dim)
# Interpolate on each column
for col in range(old_dim):
b[:,col] = np.interp(nls, ols, a[:,col])
print b
Output:
[[ 0. 4. 0. ]
[ 0. 3.66666667 2. ]
[ 0. 3.33333333 4. ]
[ 0. 3. 6. ]
[ 0. 3. 7.33333333]
[ 0. 3. 8.66666667]
[ 0. 3. 10. ]]
It is not a 2D interpolation function, but I am not very familiar with scipy (and numpy does not have any).
Edit
Fix issues with not square matrix
import numpy as np
a = np.array([[0, 4, 0], [0, 3, 6], [0, 3, 10]])
old_dim, n_col, new_dim = a.shape[0], a.shape[1], 7
b = np.zeros((7, n_col))
nls, ols = np.linspace(0, 1, new_dim), np.linspace(0, 1, old_dim)
for col in range(n_col):
b[:,col] = np.interp(nls, ols, a[:,col])
print b
My mistake, I inverted n_col and n_rows at some point.
I have a 2D array that I would like to down sample to compare it to another.
Lets say my array x is 512x512, I'd like an array y 128x128 where the elements of y are build using an interpolation of the values overs 4x4 blocks of x (this interpolation could just be taking the average, but other methodes, like geometric average, could be interesting)
So far I looked at scipy.ndimage.interpolation.zoom but I don't get the results I want
>> x = np.arange(16).reshape(4,4)
>> print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
>> y = scipy.ndimage.interpolation.zoom(x, 0.5)
>> print(y)
[[ 0 3]
[12 15]]
I expected y to be
[[ 2.5 4.5]
[10.5 12.5]]
Note that simply setting dtype=np.float32 doesn't solve that ...
sklearn.feature_extraction.image.extract_patches cleverly uses np.lib.stride_tricks.as_strided to produce a windowed array that can be operated on.
The sliding_window function, found here
Efficient Overlapping Windows with Numpy, produces a windowed array with or without overlap
also and let's you get a glimpse of what is happening under the hood.
>>> a = np.arange(16).reshape(4,4)
step_height,step_width determines the overlap for the windows - in your case the steps are the same as the window size, no overlap.
>>> window_height, window_width, step_height, step_width = 2, 2, 2, 2
>>> y = sliding_window(a, (window_height, window_width), (step_height,step_width))
>>> y
array([[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]],
[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]])
Operate on the windows:
>>> y = y.mean(axis = (1,2))
>>> y
array([ 2.5, 4.5, 10.5, 12.5])
You need to determine the final shape depending on the number of windows.
>>> final_shape = (2,2)
>>> y = y.reshape(final_shape)
>>> y
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
Searching SO for numpy, window, array should produce numerous other answers and possible solutions.
What you seem to be looking for is the mean over blocks of 4, which is not obtainable with zoom, since zoom uses interpolation (see its docstring)
To obtain what you show, try the following
import numpy as np
x = np.arange(16).reshape(4, 4)
xx = x.reshape(len(x) // 2, 2, x.shape[1] // 2, 2).transpose(0, 2, 1, 3).reshape(len(x) // 2, x.shape[1] // 2, -1).mean(-1)
print xx
This yields
[[ 2.5 4.5]
[ 10.5 12.5]]
Alternatively, this can be done using sklearn.feature_extraction.image.extract_patches
from sklearn.feature_extraction.image import extract_patches
patches = extract_patches(x, patch_shape=(2, 2), extraction_step=(2, 2))
xx = patches.mean(-1).mean(-1)
print xx
However, if your goal is to subsample an image in a graceful way, then taking the mean over blocks of the image is not the right way to do it: It is likely to cause aliasing effects. What you should do in this case is smooth the image ever so slightly using scipy.ndimage.gaussian_filter (e.g. sigma=0.35 * subsample_factor) and then subsample simply by indexing [::2, ::2]