Python code for converting two mode matrix data into one-mode - python

Being a novice on Python, I am wondering if there is any way possible I can convert two mode matrix data into one mode using Python code.
This is to create a data for social network analysis, and I am looking for a way to use Pandas and Python to create such a script.
Provided I have two mode data llke this,
workshop1 workshop2 workshop3 workshop4
A 1 0 1 1
B 0 1 1 0
C 1 1 1 0
D 0 0 0 1
I need to convert that into one mode matrix like this.
A B C D
A 4 1 2 1
B 1 4 2 0
C 2 2 4 0
D 1 0 0 4
A,B,C,D are names of persons registered for workshops, "1" means the specific person signed up for the workshop.
One-mode matrix data indicate how many times they are supposed to meet each other at workshops. For example, A and C are expected to meet two times at workshop 1 and workshop3.
Thank you any advise or help in advance!

NumPy solution:
According to this tutorial you can simply multiply the matrix with it's transpose. Using the NumPy functions dot() and transpose() (or the short form T) you end up with the following code:
import numpy as np
M = np.array([
[1,0,1,1],
[0,1,1,0],
[1,1,1,0],
[0,0,0,1]])
print M.dot(M.T)
Output:
[[3 1 2 1]
[1 2 2 0]
[2 2 3 0]
[1 0 0 1]]
Only the diagonals are not 4 as you requested. In contrast they contain the number of workshops a person attended. You can easily fix that with np.fill_diagonal(A, 4) with A being the one-mode matrix.
No-NumPy solution:
In case you don't want to use NumPy, you can adapt a standard matrix multiplication to the special case of "R = M*M^T":
m = len(M)
n = len(M[0])
R = [[0 for i in range(m)] for j in range(m)]
for i in range(m):
for j in range(m):
for k in range(n):
R[i][j] += M[i][k] * M[j][k]
Or the corresponding one-liner with 3 indices i, j and k:
R = [[sum(M[i][k] * M[j][k] for k in range(len(M[0]))) for i in range(len(M))] for j in range(len(M))]
Or directly iterating over rows of M:
R = [[sum(a * b for a, b in zip(A, B)) for B in M] for A in M]

Related

Creating submatrix in python

Given a matrix S and a binary matrix W, I want to create a submatrix of S corresponding to the non zero coordinates of W.
For example:
S = [[1,1],[1,2],[1,3],[1,4],[1,5]]
W = [[1,0,0],[1,1,0],[1,1,1],[0,1,1],[0,0,1]]
I want to get matrices
S_1 = [[1,1],[1,2],[1,3]]
S_2 = [[1,2],[1,3],[1,4]]
S_3 = [[1,3],[1,4],[1,5]]
I couldn't figure out a slick way to do this in python. The best I could do for each S_i is
S_1 = S[0,:]
for i in range(np.shape(W)[0]):
if W[i, 0] == 1:
S_1 = np.vstack((S_1, S[i, :]))
but if i want to change the dimensions of the problem and have, say, 100 S_i's, writing a for loop for each one seems a bit ugly. (Side note: S_1 should be initialized to some empty 2d array but I couldn't get that to work, so initialized it to S[0,:] as a placeholder).
EDIT: To clarify what I mean:
I have a matrix S
1 1
1 2
1 3
1 4
1 5
and I have a binary matrix
1 0 0
1 1 0
1 1 1
0 1 1
0 0 1
Given the first column of the binary matrix W
1
1
1
0
0
The 1's are in the first, second, and third positions. So I want to create a corresponding submatrix of S with just the first, second and third positions of every column, so S_1 (corresponding to the 1st column of W) is
1 1
1 2
1 3
Similarly, if we look at the third column of W
0
0
1
1
1
The 1's are in the last three coordinates and so I want a submatrix of S with just the last three coordinates of every column, called S_3
1 3
1 4
1 5
So given any ith column of the binary matrix, I'm looking to generate a submatrix S_i where the columns of S_i contain the columns of S, but only the entries corresponding to the positions of the 1's in the ith column of the binary matrix.
It probably is more useful to work with the transpose of W rather than W itself, both for human-readability and to facilitate writing the code. This means that the entries that affect each S_i are grouped together in one of the inner parentheses of W, i.e. in a row of W rather than a column as you have it now.
Then, S_i = np.array[S[j,:] for j in np.shape(S)[0] if W_T[i,j] == 1], where W_T is the transpose of W. If you need/want to stick with W as is, you need to reverse the indices i and j.
As for the outer loop, you could try to nest this in another similar comprehension without an if statement--however this might be awkward since you aren't actually building one output matrix (the S_i can easily be different dimensions, unless you're somehow guaranteed to have the same number of 1s in every column of W). This in fact raises the question of what you want--a list of these arrays S_i? Otherwise if they are separate variables as you have it written, there's no good way to refer to them in a generalizable way as they don't have indices.
Numpy can do this directly.
import numpy as np
S = np.array([[1,1],[1,2],[1,3],[1,4],[1,5]])
W = np.array([[1,0,0],[1,1,0],[1,1,1],[0,1,1],[0,0,1]])
for row in range(W.shape[1]):
print(S[W[:,row]==1])
Output:
[[1 1]
[1 2]
[1 3]]
[[1 2]
[1 3]
[1 4]]
[[1 3]
[1 4]
[1 5]]

Python 9x9 and 3x3 array validation excluding 0

I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.
It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.
Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.
When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))

Create variable padding around 1d numpy array

arr= [1,2,3,4]
k = 4 (can be different)
so result will be 2 d array. How to do this without using any loop? and can't hard code k.
k and arr can vary as per input.
Must use numpy.pad
[[1,2,3,4,0,0,0], #k-1 zeros
[0,1,2,3,4,0,0],
[0,0,1,2,3,4,0],
[0,0,0,1,2,3,4]]
If you really have to do it without a loop (for educational purposes)
np.pad(np.tile(arr,[k,1]), [(0,0),(0,k)]).reshape(-1)[:-k].reshape(k,-1)
Using list comprehension as a one liner :
import numpy as np
arr= np.array([1,2,3,4])
k = 4
print( np.array( [ np.pad(arr, (0+i , k-1-i ) ) for i in range(0,k)] ) )
Out :
[[1 2 3 4 0 0 0]
[0 1 2 3 4 0 0]
[0 0 1 2 3 4 0]
[0 0 0 1 2 3 4]]

How to reduce the nested for Loop complexity to a single loop in python?

for i in range(0,x):
for j in range(0,y):
if (i+j)%2 == 0:
Think of something like tossing two dices at the same time and finding if the sum on the dices is an even number but here's the catch, a dice has 6 sides but here the two can have any number of sizes, equal and not equal even!
Can anyone suggest how to merge it under one loop because I can't think of any?
based on Python combine two for loops, you can merge two for loops in a single line by importing itertools as below:
import itertools
for i, j in itertools.product(range(0,x), range(0,y)):
if (i+j)%2 == 0:
You can't get rid of the nested loop (you could hide it, like by using itertool.product, but it would still be executed somewhere, and the complexity would still be O(x * y)) but you can get rid of the condition, if you only need to generate the values of j that satisfy it, by adapting the range for j.
This way, you'll have about twice as less loops by avoiding the useless ones.
for i in range(0,x):
for j in range(i%2,y, 2):
print(i, j, i+j)
Output:
0 0 0
0 2 2
1 1 2
1 3 4
2 0 2
2 2 4
For me its much cleaner to leave it as two loops. Its much more readable and easier to understand whats happening. However you could essentially do x * y then use divmod to calculate i and j
x = 2
y = 3
for i in range(0,x):
for j in range(0,y):
print(i, j, i+j)
print("###")
for r in range(x*y):
i, j = divmod(r, y)
print(i, j, i + j)
OUTPUT
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
###
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3

Find the difference between all the values in the array and check whether difference in present in the array

I want to write a program which would calculate the difference between all the values of an array and find whether the difference is also present in the array or not.
For eg,
a = [1,2,4,5]
for i in range(len(a)):
j = i+1
for j in range(len(a)):
dif = a[i] - a[j]
if dif in a:
print a[i], a[j], dif
The output here would be,
2 1 1
4 2 2
5 1 4
5 4 1
I want to know if there is a more efficient way of doing this? I don't want to use any python in built functions here. Without that is it possible to improve the algorithm?
Any help would be helpful
Thanks
You may use itertools.combinations to achieve this:
from itertools import combinations
a = [1,2,4,5]
for i, j in combinations(a, 2):
dif = j - i # OR, dif = abs(j - i) for checking against absolute value
if dif in a:
print j, i, dif
Above code will print:
2 1 1
5 1 4
4 2 2
5 4 1

Categories