Convert a tree to a matrix Python 3

Convert a tree to a matrix Python 3 - python

I'm trying to convert a Tree: (list Nat (listof Tree)) to a graph matrix but I do not know where to start. I'm not looking for code but more so ideas on how to approach this problem.
For example, a tree is
aTree = [3 , [
[1 , []] ,
[0 , [
[2 , []] ,
[5 , []]
]
] ,
[4 , []]
]
]
Which would look like:
3
/ | \
1 0 4
/ \
2 5
And the matrix would be
aM =
[[0 , 0 , 1 , 1 , 0 , 1] ,
[0 , 0 , 0 , 1 , 0 , 0] ,
[1 , 0 , 0 , 0 , 0 , 0] ,
[1 , 1 , 0 , 0 , 1 , 0] ,
[0 , 0 , 0 , 1 , 0 , 0] ,
[1 , 0 , 0 , 0 , 0 , 0]]
the function would be treetomatrix(Tree, N) where N is the number of vertices in the tree. So treetomatrix(aTree, 6) => aM.
Any suggestions would be much appreciated.

The really hard part of this question is how to change the weird list structure you're using into a dictionary. (If you don't know about dictionaries, read up on them. They do what you're trying to make your list do, but much more naturally.) I've skipped that for the moment, and will do that next.
aTree = {3 : {
1: {},
0:{ 2: {},
5: {}
}
4:{}
}
}
def tree_to_matrix(tree, n):
return tree_to_mat(tree, [[0 for i in range(n)] for j in range(n)])
def tree_to_mat(tree, mat):
for k, v in tree.items():
for i in v.keys():
mat[i][k] = mat[k][i] = 1
mat = tree_to_mat(v, mat)
return mat
print(tree_to_matrix(aTree, 6))
prints
[[0, 0, 1, 1, 0, 1], [0, 0, 0, 1, 0, 0], [1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 1, 0], [0, 0, 0, 1, 0, 0], [1, 0, 0, 0, 0, 0]]
Which is our desired output. You may find it easier to read your input into a custom Tree class and then write a method for that class to generate the matrices. The trick is to use recursion and realize that you have to set both mat[i][k] and mat[k][i] at the same time.
EDIT: Here's my hacky way of turning your list into a dict.
def dictify(l):
return {l[0]: d_helper(l[1])}
def d_helper(l):
d={}
for i in l:
d.update(dictify(i))
return d
dictify(aTree)
There's a better way, but this works.

Related

Boundaries when printing list of lists

I have the following code:
matrix = [[0, 0, 1, 0], [1, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 1]]
I am able to print every line as follows using this:
for i in matrix:
print(*i)
outputting:
0 0 1 0
1 1 0 0
0 0 0 1
1 0 0 1
I want to create custom boundaries for each line and I am able to do so with by manually adding the boundaries to the list of list as shown below:
for k in range(0,columns):
matrix[k].insert(0,'[')
matrix[k].insert(columns+1,']')
giving me the output as desired:
[ 0 0 1 0 ]
[ 1 1 0 0 ]
[ 0 0 0 1 ]
[ 1 0 0 1 ]
Is there a better way to do this, particularly without having to add the boundaries into my list?

Yes you can do it with two for loop like that
for i in matrix:
s = "["
for j in i:
s = s + str(j) + " "
s = s + "]"
print(s)
Or you can still do it with 1 for loop like that
for i in matrix:
print("[", *i, "]")

for row in matrhx:
print( '[ ' + ' '.join(str(j)) + ' ]' )

for row in matrix:
print(row)
almost does what you want, but it has commas. Replace those commas by nothing:
for row in matrix:
print(str(row).replace(',',''))
[0 0 1 0]
[1 1 0 0]
[0 0 0 1]
[1 0 0 1]
Even this isn't quite what your target is, but in mathematical type-setting it is not customary to pad the boundaries of a matrix with white space.

Another way with simple list to str casting and replacing all the commas with nothing like below-
matrix = [[0, 0, 1, 0], [1, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 1]]
for i in matrix:
print(str(i).replace(',',''))
DEMO: https://rextester.com/QESAJC13339

Looping through a numpy array

I have a 5 by 10 array and I want to flip a bit if a random number is greater than 0.9. However, this only works for the first row of the array and it doesn't get to the second and subsequent row. I replaced the bits with 3 and 4 so i can easily see if the flipping occurs. I have been getting results that look like this.
[[3 1 1 1 4 1 3 1 0 1]
[1 1 0 0 1 0 1 1 1 0]
[1 0 1 0 1 0 1 1 1 1]
[0 0 1 0 1 1 0 1 1 1]
[0 1 1 0 0 0 0 1 1 1]]
Please help me figure out where I'm wrong.
from random import random
RM = np.random.randint(0,2, size=(5,10))
print(RM)
for k in range(0, RM.shape[0]):
for j in range(0, RM.shape[1]):
A = random()
if A > 0.9:
if RM[k,j] == 0:
np.put(RM, [k,j], [3])
print("k1",k)
print("j1", j)
else:
np.put(RM, [k,j], [4])
print("k2", k)
else:
continue
print(RM)

Looking at the documentation of np.put
numpy.put(a, ind, v, mode='raise')[source]
Replaces specified elements of an array with given values.
under Examples:
a = np.arange(5)
np.put(a, [0, 2], [-44, -55])
a
array([-44, 1, -55, 3, 4])
So, if you feed a list to the function, it replaces multiple values in the flattened array.
To make your loop work, simply assigning the values to the array should work:
from random import random
RM = np.random.randint(0,2, size=(5,10))
print(RM)
for k in range(0, RM.shape[0]):
for j in range(0, RM.shape[1]):
A = random()
if A > 0.9:
if RM[k,j] == 0:
RM[k,j]=3
print("k1",k)
print("j1", j)
else:
RM[k,j] =4
print("k2", k)
else:
continue

Most likely you don't need the iteration. The flips are independent, you can generate the probabilities at one go, and just flip:
np.random.seed(100)
RM = np.random.randint(0,2, size=(5,10))
array([[0, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 0, 0, 1],
[0, 1, 0, 0, 0, 1, 1, 1, 0, 0],
[1, 0, 0, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 1]])
alpha = np.random.uniform(0,1,(5,10))
np.round(alpha,2)
array([[0.49, 0.4 , 0.35, 0.5 , 0.45, 0.09, 0.27, 0.94, 0.03, 0.04],
[0.28, 0.58, 0.99, 0.99, 0.99, 0.11, 0.66, 0.52, 0.17, 0.94],
[0.24, 1. , 0.58, 0.18, 0.39, 0.19, 0.41, 0.59, 0.72, 0.49],
[0.31, 0.58, 0.44, 0.36, 0.32, 0.21, 0.45, 0.49, 0.9 , 0.73],
[0.77, 0.38, 0.34, 0.66, 0.71, 0.11, 0.13, 0.46, 0.16, 0.96]])
RM[alpha>0.9] = abs(1-RM[alpha>0.9])
RM
array([[0, 0, 1, 1, 1, 1, 0, 1, 0, 0],
[0, 1, 1, 1, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1, 0, 0],
[1, 0, 0, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])

To iterate over a Numpy array, a convenient (and recommended) tool is nditer.
If you want to change values of the iterated array, op_flags=['readwrite']
should be passed.
To have access to the indices of the current element, in case of a
multi-dimension array, flags=['multi_index'] should be passed.
Below you have example code, which also prints indices in each case the current
element has been flipped.
To check how it operates, I added a printout of RM, both before
and after the loop.
np.random.seed(0)
RM = np.random.randint(0, 2, size=(5, 10))
print('Before:')
print(RM, '\n')
with np.nditer(RM, op_flags=['readwrite'], flags=['multi_index']) as it:
for x in it:
A = np.random.random()
if A > 0.9:
x[...] = 1 - x # Flip
print(f'Flip: <{it.multi_index}>, {A:.3f}')
print('\nAfter:')
print(RM)
To get repeatable result, I added np.random.seed(0) (remove it in the
target version).
With the above seeding, I got the following result:
Before:
[[0 1 1 0 1 1 1 1 1 1]
[1 0 0 1 0 0 0 0 0 1]
[0 1 1 0 0 1 1 1 1 0]
[1 0 1 0 1 1 0 1 1 0]
[0 1 0 1 1 1 1 1 0 1]]
Flip: <(0, 2)>, 0.945
Flip: <(1, 3)>, 0.944
Flip: <(2, 7)>, 0.988
Flip: <(4, 5)>, 0.976
Flip: <(4, 7)>, 0.977
After:
[[0 1 0 0 1 1 1 1 1 1]
[1 0 0 0 0 0 0 0 0 1]
[0 1 1 0 0 1 1 0 1 0]
[1 0 1 0 1 1 0 1 1 0]
[0 1 0 1 1 0 1 0 0 1]]
Compare elements indicated as flipped, in "Before" and "After" sections,
to confirm that the above code does its job.
Check also that no other element has been changed.
A bit tricky element in the above code is x[...] = 1 - x.
Note that 1 - x part reads the current value (so far it is OK).
But if you attempted to save anything to x, writing x =,
then you would break the link to the source array element.
In this case x would point to the new value, but not to the
current array element.
So in order not to break this link, just x[...] = notation is needed.

Convert a matrix of positive integer numbers into a boolean matrix without loops

I'm trying to write code in Python using NumPy. I'm not sure it's possible but here's what I'm trying to do:
I have a 2D matrix a of shape (rows, cols) with positive integer numbers and I want to define a matrix b such that if a[i,j]=x then b[i,j+1]=b[i,j+2]=...=b[i,j+x]=1 (b is initialized to a matrix of zeros).
You can assume that for every j,x: j+x<=cols-1.
For example, if a is:
[0 2 0 0]
[0 2 0 0]
[3 0 0 0]
[2 0 1 0]
Then b should be:
[0 0 1 1]
[0 0 1 1]
[0 1 1 1]
[0 1 1 1]
Is it possible to do the above in Python with NumPy without using loops?
If it's not possible to do it without loops, is there an efficient way to do it? (rows and cols can be big numbers.)

If it's not possible to do it without loops, is there an efficient way to do it? (rows and cols can be big numbers.)
I'm sorry, I don't know a NumPy function which would help in your situation, but I think a regular loop and array indexing should be quite fast:
import numpy as np
a = np.array([
[0, 2, 0, 0],
[0, 2, 0, 0],
[3, 0, 0, 0],
[2, 0, 1, 0],
])
b = np.zeros(a.shape)
for i, x in enumerate(a.flat):
b.flat[i + 1 : i + 1 + x] = 1
print(b)
Which prints your expected result:
[[0. 0. 1. 1.]
[0. 0. 1. 1.]
[0. 1. 1. 1.]
[0. 1. 1. 1.]]

Here's a slightly optimized solution of #finefoot
aa = a.ravel()
b = np.zeros_like(aa)
for i, x in enumerate(aa):
if x != 0:
b[i + 1 : i + 1 + x] = 1
b = b.reshape(a.shape)
And here's another solution which is slightly faster but less readable:
from itertools import chain
aa = a.ravel()
b = np.zeros_like(aa)
w = np.nonzero(aa)[0]
ranges = (range(s, e) for s, e in zip(w + 1, w + 1 + aa[w]))
for r in chain.from_iterable(ranges):
b[r] = 1
b = b.reshape(a.shape)
Gives correct results under assumption that j,x: j+x<=cols-1. Both solutions use a for-loop though, but I don't think that it's possible to do it otherwise.

Editting python 2-dimensional array without for-loop?

So, I have a given 2 dimensional matrix which is randomly generated:
a = np.random.randn(4,4)
which gives output:
array([[-0.11449491, -2.7777728 , -0.19784241, 1.8277976 ],
[-0.68511473, 0.40855461, 0.06003551, -0.8779363 ],
[-0.55650378, -0.16377137, 0.10348714, -0.53449633],
[ 0.48248298, -1.12199767, 0.3541335 , 0.48729845]])
I want to change all the negative values to 0 and all the positive values to 1.
How can I do this without a for loop?

You can use np.where()
import numpy as np
a = np.random.randn(4,4)
a = np.where(a<0, 0, 1)
print(a)
[[1 1 0 1]
[1 0 1 0]
[1 1 0 0]
[0 1 1 0]]

(a<0).astype(int)
This is one possibly solution - converting the array to boolean array according to your condition and then converting it from boolean to integer.
array([[ 0.63694991, -0.02785534, 0.07505496, 1.04719295],
[-0.63054947, -0.26718763, 0.34228736, 0.16134474],
[ 1.02107383, -0.49594998, -0.11044738, 0.64459594],
[ 0.41280766, 0.668819 , -1.0636972 , -0.14684328]])
And the result -
(a<0).astype(int)
>>> array([[0, 1, 0, 0],
[1, 1, 0, 0],
[0, 1, 1, 0],
[0, 0, 1, 1]])

One-hot encoding of categories

I have a list like similar to this:
list = ['Opinion, Journal, Editorial',
'Opinion, Magazine, Evidence-based',
'Evidence-based']
where the commas split between categories eg. Opinion and Journal are two separate categories. The real list is much larger and has more possible categories. I would like to use one-hot encoding to transform the list so that it can be used for machine learning. For example, from that list I would like to produce a sparse matrix containing data like:
list = [[1, 1, 1, 0, 0],
[1, 0, 0, 0, 1],
[0, 0, 0, 0, 1]]
Ideally, I would like to use scikit-learn's one hot encoder as I presume this would be the most efficient.
In response to #nbrayns comment:
The idea is to transform the list of categories from text to a vector wherby if it belongs to that category it will be assigned 1, otherwise 0. For the above example, the headings would be:
headings = ['Opinion', 'Journal', 'Editorial', 'Magazine', 'Evidence-based']

If you are able to use Pandas, this functionality is essentially built-in there:
import pandas as pd
l = ['Opinion, Journal, Editorial', 'Opinion, Magazine, Evidence-based', 'Evidence-based']
pd.Series(l).str.get_dummies(', ')
Editorial Evidence-based Journal Magazine Opinion
0 1 0 1 0 1
1 0 1 0 1 1
2 0 1 0 0 0
If you'd like to stick with the sklearn ecosystem, you are looking for MultiLabelBinarizer, not for OneHotEncoder. As the name implies, OneHotEncoder only supports one level per sample per category, while your dataset has multiple.
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer() # pass sparse_output=True if you'd like
mlb.fit_transform(s.split(', ') for s in l)
[[1 0 1 0 1]
[0 1 0 1 1]
[0 1 0 0 0]]
To map the columns back to categorical levels, you can access mlb.classes_. For the above example, this gives ['Editorial' 'Evidence-based' 'Journal' 'Magazine' 'Opinion'].

One more way:
l = ['Opinion, Journal, Editorial', 'Opinion, Magazine, Evidence-based', 'Evidence-based']
# Get list of unique classes
classes = list(set([j for i in l for j in i.split(', ')]))
=> ['Journal', 'Opinion', 'Editorial', 'Evidence-based', 'Magazine']
# Get indices in the matrix
indices = np.array([[k, classes.index(j)] for k, i in enumerate(l) for j in i.split(', ')])
=> array([[0, 1],
[0, 0],
[0, 2],
[1, 1],
[1, 4],
[1, 3],
[2, 3]])
# Generate output
output = np.zeros((len(l), len(classes)), dtype=int)
output[indices[:, 0], indices[:, 1]]=1
=> array([[ 1, 1, 1, 0, 0],
[ 0, 1, 0, 1, 1],
[ 0, 0, 0, 1, 0]])

This may not be the most efficient method, but probably easy to grasp.
If you don't already have a list of all possible words, you need to create that. In the code below it's called unique. The columns of the output matrix s will then correspond to those unique words; the rows will be the item from the list.
import numpy as np
lis = ['Opinion, Journal, Editorial','Opinion, Magazine, Evidence-based','Evidence-based']
unique=list(set(", ".join(lis).split(", ")))
print unique
# prints ['Opinion', 'Journal', 'Magazine', 'Editorial', 'Evidence-based']
s = np.zeros((len(lis), len(unique)))
for i, item in enumerate(lis):
for j, notion in enumerate(unique):
if notion in item:
s[i,j] = 1
print s
# prints [[ 1. 1. 0. 1. 0.]
# [ 1. 0. 1. 0. 1.]
# [ 0. 0. 0. 0. 1.]]

Very easy in pandas:
import pandas as pd
s = pd.Series(['a','b','c'])
pd.get_dummies(s)
Output:
a b c
0 1 0 0
1 0 1 0
2 0 0 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert a tree to a matrix Python 3 - python

Related

Boundaries when printing list of lists

Looping through a numpy array

Convert a matrix of positive integer numbers into a boolean matrix without loops

Editting python 2-dimensional array without for-loop?

One-hot encoding of categories

Categories

Resources