Scipy sparse matrix from edge list - python

How to convert an edge list (data) to a python scipy sparse matrix to get this result:
Dataset (where 'agn' is node category one and 'fct' is node category two):
data['agn'].tolist()
['p1', 'p1', 'p1', 'p1', 'p1', 'p2', 'p2', 'p2', 'p2', 'p3', 'p3', 'p3', 'p4', 'p4', 'p5']
data['fct'].tolist()
['f1', 'f2', 'f3', 'f4', 'f5', 'f3', 'f4', 'f5', 'f6', 'f5', 'f6', 'f7', 'f7', 'f8', 'f9']
(not working) python code:
from scipy.sparse import csr_matrix, coo_matrix
csr_matrix((data_sub['agn'].values, data['fct'].values),
shape=(len(set(data['agn'].values)), len(set(data_sub['fct'].values))))
-> Error: "TypeError: invalid input format"
Do I really need three arrays to construct the matrix, like the examples in the scipy csr documentation do suggest (can only use two links, sorry!)?
(working) R code used to construct the matrix with only two vectors:
library(Matrix)
grph_tim <- sparseMatrix(i = as.numeric(data$agn),
j = as.numeric(data$fct),
dims = c(length(levels(data$agn)),
length(levels(data$fct))),
dimnames = list(levels(data$agn),
levels(data$fct)))
EDIT:
It finally worked after I modified the code from here and added the needed array:
import numpy as np
import pandas as pd
import scipy.sparse as ss
def read_data_file_as_coo_matrix(filename='edges.txt'):
"Read data file and return sparse matrix in coordinate format."
# if the nodes are integers, use 'dtype = np.uint32'
data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')
# where 'rows' is node category one and 'cols' node category 2
rows = data['agn'] # Not a copy, just a reference.
cols = data['fct']
# crucial third array in python, which can be left out in r
ones = np.ones(len(rows), np.uint32)
matrix = ss.coo_matrix((ones, (rows, cols)))
return matrix
Additionally, I converted the string names of the nodes to integers. Thus data['agn'] becomes [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] and data['fct'] becomes [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8].
I get this sparse matrix:
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
(0, 4) 1
(1, 2) 1
(1, 3) 1
(1, 4) 1
(1, 5) 1
(2, 4) 1
(2, 5) 1
(2, 6) 1
(3, 6) 1
(3, 7) 1
(4, 8) 1

It finally worked after I modified the code from here and added the needed array:
import numpy as np
import pandas as pd
import scipy.sparse as ss
def read_data_file_as_coo_matrix(filename='edges.txt'):
"Read data file and return sparse matrix in coordinate format."
# if the nodes are integers, use 'dtype = np.uint32'
data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')
# where 'rows' is node category one and 'cols' node category 2
rows = data['agn'] # Not a copy, just a reference.
cols = data['fct']
# crucial third array in python, which can be left out in r
ones = np.ones(len(rows), np.uint32)
matrix = ss.coo_matrix((ones, (rows, cols)))
return matrix
Additionally, I converted the string names of the nodes to integers. Thus data['agn'] becomes [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] and data['fct'] becomes [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8].
I get this sparse matrix:
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
(0, 4) 1
(1, 2) 1
(1, 3) 1
(1, 4) 1
(1, 5) 1
(2, 4) 1
(2, 5) 1
(2, 6) 1
(3, 6) 1
(3, 7) 1
(4, 8) 1

Related

Generate image matrix from Freeman chain code

Suppose I have a 8-direction freeman chain code as follows, in a python list:
freeman_code = [3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5]
Where directions would be defined as follows:
I need to convert this to an image matrix of variable dimensions with valules of 1s and 0s where 1s would depict the shape, as follows, for example:
image_matrix = [
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1]
]
Of course, the above is not an exact implementation of the above freeman code. Is there any implementation in python, or in any language that achieves this?
My idea (in python):
Use a defaultdict of defaultdicts with 0 as default:
ImgMatrixDict = defaultdict(lambda: defaultdict(lambda:0))
and then start at a midpoint, say ImgMatrixDict[25][25], and then change values to 1 depending on the freeman code values as I traverse. Afte tis I would convert ImgMatrixDict to a list of lists.
Is this a viable idea or are there any existing libraries or suggestions to implement this? Any idea/pseudo-code would be appreciated.
PS: On performance, yes it would not be important as I won't be doing this in realtime, but generally a code would be around 15-20 charactors in length. I assumed a 50*50 by matrix would suffice for this purpose.
If I am understanding your question correctly:
import numpy as np
import matplotlib.pyplot as plt
freeman_code = [3, 3, 3, 6, 6, 4, 6, 7, 7, 0, 0, 6]
img = np.zeros((10,10))
x, y = 4, 4
img[y][x] = 1
for direction in freeman_code:
if direction in [1,2,3]:
y -= 1
if direction in [5,6,7]:
y += 1
if direction in [3,4,5]:
x -= 1
if direction in [0,1,7]:
x += 1
img[y][x] = 1
plt.imshow(img, cmap='binary', vmin=0, vmax=1)
plt.show()
Here is a solution in python. A dictionary is not adapted to this problem, you would better use a list of list to simulate the table.
D = 10
# DY, DX
FREEMAN = [(0, 1), (-1, 1), (-1, 0), (-1, -1), (0, -1), (1, -1), (1, 0), (1, 1)]
freeman_code = [3, 3, 3, 3, 6, 6, 6, 6, 0, 0, 0, 0]
image = [[0]*D for x in range(D)]
y = D/2
x = D/2
image[y][x] = 1
for i in freeman_code:
dy, dx = FREEMAN[i]
y += dy
x += dx
image[y][x] = 1
print("freeman_code")
print(freeman_code)
print("image")
for line in image:
strline = "".join([str(x) for x in line])
print(strline)
>0000000000
>0100000000
>0110000000
>0101000000
>0100100000
>0111110000
>0000000000
>0000000000
>0000000000
>0000000000
Note that the image creation is a condensed expression of:
image = []
for y in range(D):
line = []
for x in range(D):
line.append(0)
image.append(line)
If one day, you need better performance for bigger images, there are solutions using numpy Library but requiring a good knowledge of basic python. Here is an example:
import numpy as np
D = 10
# DY, DX
FREEMAN = [(0, 1), (-1, 1), (-1, 0), (-1, -1), (0, -1), (1, -1), (1, 0), (1, 1)]
DX = np.array([1, 1, 0, -1, -1, -1, 0, 1])
DY = np.array([0, -1, -1, -1, 0, 1, 1, 1])
freeman_code = np.array([3, 3, 3, 3, 6, 6, 6, 6, 0, 0, 0, 0])
image = np.zeros((D, D), int)
y0 = D/2
x0 = D/2
image[y0, x0] = 1
dx = DX[freeman_code]
dy = DY[freeman_code]
xs = np.cumsum(dx)+x0
ys = np.cumsum(dy)+y0
print(xs)
print(ys)
image[ys, xs] = 1
print("freeman_code")
print(freeman_code)
print("image")
print(image)
Here, all loops built with 'for' on previous solution are fast-processed in C.

How to change an element in a list of lists if it has a specific index and condition that is met?

I want to be able to take a list of lists (lst) and a list of indexes and those elements in lst that have that have those indexes and also meet the condition ( == '1') to be changed to '0'.
If I input
lst = [['1','2','3'],[],['4','2','1']]
and
specific_indexes = [(0, 0), (0, 2), (2, 0), (2, 2)]
I get [['0', '2', '3'], [], ['4', '2', '0']]
but I would like faster way to do this.
def change(lst, specific_indexes):
for (x,y) in specific_indexes:
if lst[y][x] == '1':
lst[y][x] = '0'
return lst
...but I would like faster way to do this.
If you are interested in performance, you can use a specialist 3rd party library such as NumPy. This does mean you have to define a regular 2d array as an input, or transform it into one as shown below.
import numpy as np
lst = [['1','2','3'],[],['4','2','1']]
idx = [(0, 0), (0, 2), (2, 0), (2, 2)]
# calculate column number and construct NumPy array
colnum = max(map(len, lst))
arr = np.array([sublst if sublst else ['0'] * colnum for sublst in lst]).astype(int)
idx = np.array(idx)
# calculate indexer and mask array conditionally
mask = np.ix_(idx[:, 1], idx[:, 0])
arr[mask] = np.where(arr[mask] == 1, 0, arr[mask])
print(arr)
# array([[0, 2, 3],
# [0, 0, 0],
# [4, 2, 0]])

How to check that the values of Tensor is contained in other tensor?

I have a problem about finding a value from other tensor
It's similar to the following problem : (URL: How to find a value in tensor from other tensor in Tensorflow)
The previous problem was to ask if input tensor x[i], y[i] is contained in input tensor label_x, label_y
Here is an example of the previous problem:
Input Tensor
s_idx = (1, 3, 5, 7)
e_idx = (3, 4, 5, 8)
label_s_idx = (2, 2, 3, 6)
label_e_idx = (2, 3, 4, 8)
The problem is to give output[i] a value of 1
if s_idx[i] == label_s_idx[j] and e_idx[i] == label_s_idx[j] for some j are satisfied for some j.
Thus, in the above example, the output tensor is
output = (0, 1, 0, 0)
Because (s_idx[1] = 3, e_idx[1] = 4) is same as (label_s_idx[2] = 3, label_e_idx[2] = 4)
(s_idx, e_idx) does not have a duplicate value, and (label_s_idx, label_e_idx) does so.
Therefore, it is assumed that the following input example is impossible:
s_idx = (2, 2, 3, 3)
e_idx = (2, 3, 3, 3)
Because, (s_idx[2] = 3, e_idx[2] = 3) is same as (s_idx[3] = 3, e_idx[3] = 3).
What I want to change a bit in this problem is to add another value to the input tensor:
Input Tensor
s_idx = (1, 3, 5, 7)
e_idx = (3, 4, 5, 8)
label_s_idx = (2, 2, 3, 6)
label_e_idx = (2, 3, 4, 8)
label_score = (1, 3, 2, 3)
*There is no 0 values in label_score tensor
The task in the changed problem is defined as follows:
The problem is to give output_2[i] a value of label_score[j] if s_idx[i] == label_s_idx[j] and e_idx[i] == label_s_idx[j] for some j are satisfied.
Therefore, the output_2 should be like this:
output = (0, 1, 0, 0) // It is same as previous problem
output_2 = (0, 2, 0, 0)
How do I code like this on Tensorflow in Python?
Here is a possible solution:
import tensorflow as tf
s_idx = tf.placeholder(tf.int32, [None])
e_idx = tf.placeholder(tf.int32, [None])
label_s_idx = tf.placeholder(tf.int32, [None])
label_e_idx = tf.placeholder(tf.int32, [None])
label_score = tf.placeholder(tf.int32, [None])
# Stack inputs for comparison
se_idx = tf.stack([s_idx, e_idx], axis=1)
label_se_idx = tf.stack([label_s_idx, label_e_idx], axis=1)
# Compare every pair to each other and find matches
cmp = tf.equal(se_idx[:, tf.newaxis, :], label_se_idx[tf.newaxis, :, :])
matches = tf.reduce_all(cmp, axis=2)
# Find the position of the matches
match_pos = tf.argmax(tf.cast(matches, tf.int8), axis=1)
# For those positions where a match was found take the corresponding score
output = tf.where(tf.reduce_any(matches, axis=1),
tf.gather(label_score, match_pos),
tf.zeros_like(label_score))
# Test
with tf.Session() as sess:
print(sess.run(output, feed_dict={s_idx: [1, 3, 5, 7],
e_idx: [3, 4, 5, 8],
label_s_idx: [2, 2, 3, 6],
label_e_idx: [2, 3, 4, 8],
label_score: [1, 3, 2, 3]}))
# >>> [0 2 0 0]
It compares every pair of values to each other, so the cost is quadratic on the input size. Also, tf.argmax is used to find the index of the matching position, and if there is more than one possible index it may return any of them nondeterministically.
This perhaps works. Since this is a complex task, try more examples and see if expected results are obtained.
import tensorflow as tf
s_idx = [1, 3, 5, 7]
e_idx = [3, 4, 5, 8]
label_s_idx = [2, 2, 3, 6]
label_e_idx = [2, 3, 4, 8]
label_score = [1, 3, 2, 3]
# convert to one-hot vector.
# make sure all have the same shape
max_idx = tf.reduce_max([s_idx, label_s_idx, e_idx, label_e_idx])
s_oh = tf.one_hot(s_idx, max_idx)
label_s_oh = tf.one_hot(label_s_idx, max_idx)
e_oh = tf.one_hot(e_idx, max_idx)
label_e_oh = tf.one_hot(label_e_idx, max_idx)
# make a matrix such that (i,j) element equals one if
# idx(i) = label(j)
s_mult = tf.matmul(s_oh, label_s_oh, transpose_b=True)
e_mult = tf.matmul(e_oh, label_e_oh, transpose_b=True)
# find i such that idx(i) = label(j) for s and e, with some j
# there is at most one such j by the uniqueness condition.
output = tf.reduce_max(s_mult * e_mult, axis=1)
with tf.Session() as sess:
print(sess.run(output))
# [0. 1. 0. 0.]
# extract the label score at the corresponding j index
# and store in the index i
# then remove redundant dimension
output_2 = tf.matmul(
s_mult * e_mult,
tf.cast(tf.expand_dims(label_score, -1), tf.float32))
output_2 = tf.squeeze(output_2)
with tf.Session() as sess:
print(sess.run(output_2))
# [0. 2. 0. 0.]

How to create an xarray from a sparse, denormalized table?

Say I have the following structured array:
import numpy as np
l, h, w = 6, 5, 5
dtype = [('a', int), ('b', '<U3'), ('data', (float, (h, w)))]
table = np.empty(l, dtype)
table['a'] = [1, 2, 3, 1, 2, 3]
table['b'] = ['foo', 'bar'] * 3
table['data'] = np.random.rand(l, h, w)
My data has shape (6, 5, 5). But really, its shape is (3, 2, 5, 5), but I just have columns a and b denormalized.
Is it possible to create an xarray DataArray directly from this shape (6, 5, 5) by providing columns a and b of length 6 and have xarray figure out the (3, 2, 5, 5) shape? What would coords and dims be?
In reality, table is sparse and has many dimensions, and I'm trying to see if there's any xarray creation machinery I can lean on instead of reshaping table myself.

filling numpy array with random element from another array

I'm not sure if this is possible but here goes. Suppose I have an array:
array1 = [0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1]
and now I would like to create a numpy 1D array consisting of 5 elements that are randomly drawn from array1 AND with the condition that the sum is equal to 1. Example is something like, a numpy array that looks like [.2,.2,.2,.1,.1].
currently I use the random module, and choice function that looks like this:
range1= np.array([choice(array1),choice(array1),choice(array1),choice(array1),choice(array1)])
then checking range1 to see if it meets the criteria; I'm wondering if there is faster way , something similar to
randomArray = np.random.random() instead.
Would be even better if I can store this array in some library so that if I try to generate 100 of such array, that there is no repeat but this is not necessary.
You can use numpy.random.choice if you use numpy 1.7.0+:
>>> import numpy as np
>>> array1 = np.array([0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1])
>>> np.random.choice(array1, 5)
array([ 0. , 0. , 0.3, 1. , 0.3])
>>> np.random.choice(array1, 5, replace=False)
array([ 0.6, 0.8, 0.1, 0. , 0.4])
To get 5 elements that the sum is equal to 1,
generate 4 random numbers.
substract the sum of 4 numbers from 1 -> x
if x included in array1, use that as final number; or repeat
>>> import numpy as np
>>>
>>> def solve(arr, total, n):
... while True:
... xs = np.random.choice(arr, n-1)
... remain = total - xs.sum()
... if remain in arr:
... return np.append(xs, remain)
...
>>> array1 = np.array([0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1])
>>> print solve(array1, 1, 5)
[ 0.1 0.3 0.4 0.2 0. ]
Another version (assume given array is sorted):
EPS = 0.0000001
def solve(arr, total, n):
while True:
xs = np.random.choice(arr, n-1)
t = xs.sum()
i = arr.searchsorted(total - t)
if abs(t + arr[i] - total) < EPS:
return np.append(xs, arr[i])
I had to do something similar a while ago.
def getRandomList(n, source):
'''
Returns a list of n elements randomly selected from source.
Selection is done without replacement.
'''
list = source
indices = range(len(source))
randIndices = []
for i in range(n):
randIndex = indices.pop(np.random.randint(0, high=len(indices)))
randIndices += [randIndex]
return [source[index] for index in randIndices]
data = [1,2,3,4,5,6,7,8,9]
randomData = getRandomList(4, data)
print randomData
If you don't care about the order of the values in the output sequences, the number of 5-value combinations of values from your list that add up to 1 is pretty small. In the specific case you proposed though, it's a bit complicated to calculate, since floating point values have rounding issues. You can more easily solve the issue if you use a set of integers (e.g. range(11))and find combinations that add up to 10. Then if you need the fractional values, just divide the values in the results by 10.
Anyway, here's a generator that yields all the possible sets that add up to a given value:
def picks(values, n, target):
if n == 1:
if target in values:
yield (target,)
return
for i, v in enumerate(values):
if v <= target:
for r in picks(values[i:], n-1, target-v):
yield (v,)+r
Here's the results for the numbers zero through ten:
>>> for r in picks(range(11), 5, 10):
print(r)
(0, 0, 0, 0, 10)
(0, 0, 0, 1, 9)
(0, 0, 0, 2, 8)
(0, 0, 0, 3, 7)
(0, 0, 0, 4, 6)
(0, 0, 0, 5, 5)
(0, 0, 1, 1, 8)
(0, 0, 1, 2, 7)
(0, 0, 1, 3, 6)
(0, 0, 1, 4, 5)
(0, 0, 2, 2, 6)
(0, 0, 2, 3, 5)
(0, 0, 2, 4, 4)
(0, 0, 3, 3, 4)
(0, 1, 1, 1, 7)
(0, 1, 1, 2, 6)
(0, 1, 1, 3, 5)
(0, 1, 1, 4, 4)
(0, 1, 2, 2, 5)
(0, 1, 2, 3, 4)
(0, 1, 3, 3, 3)
(0, 2, 2, 2, 4)
(0, 2, 2, 3, 3)
(1, 1, 1, 1, 6)
(1, 1, 1, 2, 5)
(1, 1, 1, 3, 4)
(1, 1, 2, 2, 4)
(1, 1, 2, 3, 3)
(1, 2, 2, 2, 3)
(2, 2, 2, 2, 2)
You can select one of them at random (with random.choice), or if you plan on using many of them and you don't want to repeat yourself, you can use random.shuffle, then iterate.
results = list(picks(range(11), 5, 10))
random.shuffle(results)
for r in results:
# do whatever you want with r

Categories