I prefer python over R for my work. From time to time, I need to use R
functions, and I start to try Rpy2 for that purpose.
I tried but failed to find out how to replicate following with Rpy2
design <- model.matrix(~Subject+Treat)
I have gone as far as this:
import rpy2.robjects as robjects
fmla = robjects.Formula('~subject+treatment')
env = fmla.environment
env['subject'] = sbj_group
env['treatment'] = trt_group
from what I saw here.
But I could not find how to perform model.matrix. I tried a couple of different ways:
robjects.r.model_matrix(fmla)
robjects.r('model.matrix(%s)' %fmla.r_repr())
As you can see none of them is right.
I am new to Rpy2, and fairly inexperienced in R. Any help would be appreciated!
You could evaluate strings as R code:
import numpy as np
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
ro.numpy2ri.activate()
R = ro.r
subject = np.repeat([1,2,3], 4)
treatment = np.tile([1,2,3,4], 3)
R.assign('subject', subject)
R.assign('treatment', treatment)
R('subject <- as.factor(subject)')
R('treatment <- as.factor(treatment)')
R('design <- model.matrix(~subject+treatment)')
R('print(design)')
yields
(Intercept) subject2 subject3 treatment2 treatment3 treatment4
1 1 0 0 0 0 0
2 1 0 0 1 0 0
3 1 0 0 0 1 0
4 1 0 0 0 0 1
5 1 1 0 0 0 0
6 1 1 0 1 0 0
7 1 1 0 0 1 0
8 1 1 0 0 0 1
9 1 0 1 0 0 0
10 1 0 1 1 0 0
11 1 0 1 0 1 0
12 1 0 1 0 0 1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$subject
[1] "contr.treatment"
attr(,"contrasts")$treatment
[1] "contr.treatment"
R(...) returns objects which you can manipulate on the Python side.
For example,
design = R('model.matrix(~subject+treatment)')
assigns a rpy2.robjects.vectors.Matrix to design.
arr = np.array(design)
makes arr the NumPy array
[[ 1. 0. 0. 0. 0. 0.]
[ 1. 0. 0. 1. 0. 0.]
[ 1. 0. 0. 0. 1. 0.]
[ 1. 0. 0. 0. 0. 1.]
[ 1. 1. 0. 0. 0. 0.]
[ 1. 1. 0. 1. 0. 0.]
[ 1. 1. 0. 0. 1. 0.]
[ 1. 1. 0. 0. 0. 1.]
[ 1. 0. 1. 0. 0. 0.]
[ 1. 0. 1. 1. 0. 0.]
[ 1. 0. 1. 0. 1. 0.]
[ 1. 0. 1. 0. 0. 1.]]
The column names can be accessed with
np.array(design.colnames)
# array(['(Intercept)', 'subject2', 'subject3', 'treatment2', 'treatment3',
# 'treatment4'],
# dtype='|S11')
Related
for (i,j) in zip(Y_test,Y_pred_test):
if np.logical_and((Y_test[i]==1),(Y_pred_test[j]==1)):
TP += 1
elif np.logical_and((Y_test[i]==1),(Y_pred_test[j] == 0)):
FN += 1
elif np.logical_and((Y_test[i]==0),(Y_pred_test[j]==1)):
FP += 1
elif np.logical_and((Y_test[i]==0),(Y_pred_test[j]==0)):
TN += 1
Python - NumPy Question:
I need help. My code keeps coming up with an error on this particular section. The error states "IndexError: index 1 is out of bounds for axis 0 with size 1"
I'm currently writing up how to Calculate TP, FP, TN, FN, Accuracy, Precision, Recall, and F-1 score.
Y_test data contains:
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
Y_pred_test data contains:
[[0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0.
1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1.
1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1.]]
As Y_test and Y_pred_test are 2d arrays both i and j in the for loop are 1d arrays. Y_test[i] uses the values in i as the indices to access the rows in Y_test. Row zero exists but row 1 doesn't causing the error message.
import numpy as np
Y_test = np.array( [[ 0, 0, 0, 1, 1, 1, 1 ]] ) # Easy to see test data
Y_pred_test = np.array( [[ 0., 1., 1., 0., 0., 1. ]] )
for (i,j) in zip(Y_test,Y_pred_test):
print(i,'\n', j)
# [0 0 0 1 1 1 1] # i in the loop
# [0. 1. 1. 0. 0. 1.] # j in the loop
import numpy as np
x = np.ones((5,5))
print(x)
x[1:-1,1:-1] = 0
print(x)
I am getting the output as shown below:
[[1. 1. 1. 1. 1.]
[1. 0. 0. 0. 1.]
[1. 0. 0. 0. 1.]
[1. 0. 0. 0. 1.]
[1. 1. 1. 1. 1.]]
You can do it using astype, setting it to int:
print(x.astype(int))
Result:
[[1 1 1 1 1]
[1 0 0 0 1]
[1 0 0 0 1]
[1 0 0 0 1]
[1 1 1 1 1]]
I think you refer to 1. When you see a dot sign, you know that that number is float type.
If you don't want floats, you should cast your list to integer:
x.astype(int)
Other things you should do in python console to understand things a little:
print(type(1))
print(type(1.))
print(x.dtype)
print(x.astype(int).dtype)
I am trying to create an n-by-m matrix of 0s and 1s with a very simple structure:
[[1 0 0 0 0 0 0 ...],
[1 1 0 0 0 0 0 ...],
[1 1 1 0 0 0 0 ...],
[1 1 1 1 0 0 0 ...],
[0 1 1 1 1 0 0 ...],
[0 1 1 1 1 1 0 ...],
...
[... 0 0 0 1 1 1 1],
[... 0 0 0 0 1 1 1],
[... 0 0 0 0 0 1 1],
[... 0 0 0 0 0 0 1]]
However, I don't want to start writing loops as this is probably achievable using something built in: A = tf.constant(???,shape(n,m))
Note that after the first 3 rows there is simply a repetition of four 1s, followed by m-3 0s, until the last 3 rows.
So I am thinking something along the lines of a repeat of repeat, but I have no idea what syntax to use.
You're looking for tf.matrix_band_part(). As per the manual, it's function is to
Copy a tensor setting everything outside a central band in each innermost matrix to zero.
So in your case you'd create a matrix with ones, and then take a 4-wide band like this:
tf.matrix_band_part( tf.ones( shape = ( 1, n, m ) ), 3, 0 )
Tested code:
import tensorflow as tf
x = tf.ones( shape = ( 1, 9, 6 ) )
y = tf.matrix_band_part( x, 3, 0 )
with tf.Session() as sess:
res = sess.run( y )
print ( res )
Output:
[[[1. 0. 0. 0. 0. 0.]
[1. 1. 0. 0. 0. 0.]
[1. 1. 1. 0. 0. 0.]
[1. 1. 1. 1. 0. 0.]
[0. 1. 1. 1. 1. 0.]
[0. 0. 1. 1. 1. 1.]
[0. 0. 0. 1. 1. 1.]
[0. 0. 0. 0. 1. 1.]
[0. 0. 0. 0. 0. 1.]]]
Say I have two options for generating the Adjacency Matrix of a network: nx.adjacency_matrix() and my own code. I wanted to test the correctness of my code and came up with some strange inequalities.
Example: a 3x3 lattice network.
import networkx as nx
N=3
G=nx.grid_2d_graph(N,N)
pos = dict( (n, n) for n in G.nodes() )
labels = dict( ((i,j), i + (N-1-j) * N ) for i, j in G.nodes() )
nx.relabel_nodes(G,labels,False)
inds=labels.keys()
vals=labels.values()
inds.sort()
vals.sort()
pos2=dict(zip(vals,inds))
plt.figure()
nx.draw_networkx(G, pos=pos2, with_labels=True, node_size = 200)
This is the visualization:
The adjacency matrix with nx.adjacency_matrix():
B=nx.adjacency_matrix(G)
B1=B.todense()
[[0 0 0 0 0 1 0 0 1]
[0 0 0 1 0 1 0 0 0]
[0 0 0 1 0 1 0 1 1]
[0 1 1 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 1]
[1 1 1 0 0 0 0 0 0]
[0 0 0 1 0 0 0 1 0]
[0 0 1 0 1 0 1 0 0]
[1 0 1 0 1 0 0 0 0]]
According to it, node 0 (entire 1st row and entire 1st column) is connected to nodes 5 and 8. But if you look at the image above this is wrong, as it connects to nodes 1 and 3.
Now my code (to be run in in the same script as the above):
import numpy
import math
P=3
def nodes_connected(i, j):
try:
if i in G.neighbors(j):
return 1
except nx.NetworkXError:
return False
A=numpy.zeros((P*P,P*P))
for i in range(0,P*P,1):
for j in range(0,P*P,1):
if i not in G.nodes():
A[i][:]=0
A[:][i]=0
elif i in G.nodes():
A[i][j]=nodes_connected(i,j)
A[j][i]=A[i][j]
for i in range(0,P*P,1):
for j in range(0,P*P,1):
if math.isnan(A[i][j]):
A[i][j]=0
print(A)
This yields:
[[ 0. 1. 0. 1. 0. 0. 0. 0. 0.]
[ 1. 0. 1. 0. 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 1. 0. 1. 0. 0.]
[ 0. 1. 0. 1. 0. 1. 0. 1. 0.]
[ 0. 0. 1. 0. 1. 0. 0. 0. 1.]
[ 0. 0. 0. 1. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 1. 0. 1. 0. 1.]
[ 0. 0. 0. 0. 0. 1. 0. 1. 0.]]
which says that node 0 is connected to nodes 1 and 3. Why does such difference exist? What is wrong in this situation?
Networkx doesn't know what order you want the nodes to be in.
Here is how to call it: adjacency_matrix(G, nodelist=None, weight='weight').
If you want a specific order, set nodelist to be a list in that order.
So for example adjacency_matrix(G, nodelist=range(9)) should get what you want.
Why is this? Well, because a graph can have just about anything as its nodes (anything hashable). One of your nodes could have been "parrot" or (1,2). So it stores the nodes as keys in a dict, rather than assuming it's the non-negative integers starting at 0. Dict keys have an arbitrary order.
A more general solution, if your nodes have some logical ordering as is the case if you generate a graph using G=nx.grid_2d_graph(3,3) (which returns tupples from (0,0) to (2,2), or in your example would be to use:
adjacency_matrix(G,nodelist=sorted(G.nodes()))
This sorts the returned list of nodes of G and passes it as the nodelist
I am very new to python programming so this might look very easy for most of the pros out there. I have a text file in the following format. I want to import only the numbers to a matrix. Meaning i do not want the spaces (there is also a space at the start of each row) and the data label.
1 1 1 1 1 1 1 data_1
1 1 1 1 1 1 2 data_2
1 1 1 1 1 2 1 data_3
1 1 1 1 1 2 2 data_4
1 1 1 1 1 3 1 data_5
1 1 1 1 1 3 2 data_6
Use numpy.loadtxt, which assumes the data are delimited by whitespace by default and takes an argument usecols specifying which fields to use in building the array:
In [1]: import numpy as np
In [2]: matrix = np.loadtxt('matrix.txt', usecols=range(7))
In [3]: print matrix
[[ 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 2.]
[ 1. 1. 1. 1. 1. 2. 1.]
[ 1. 1. 1. 1. 1. 2. 2.]
[ 1. 1. 1. 1. 1. 3. 1.]
[ 1. 1. 1. 1. 1. 3. 2.]]
If you want your matrix elements to be integers, pass dtype=int to loadtxt as well.