import text file as matrix in numpy - python

I am very new to python programming so this might look very easy for most of the pros out there. I have a text file in the following format. I want to import only the numbers to a matrix. Meaning i do not want the spaces (there is also a space at the start of each row) and the data label.
1 1 1 1 1 1 1 data_1
1 1 1 1 1 1 2 data_2
1 1 1 1 1 2 1 data_3
1 1 1 1 1 2 2 data_4
1 1 1 1 1 3 1 data_5
1 1 1 1 1 3 2 data_6

Use numpy.loadtxt, which assumes the data are delimited by whitespace by default and takes an argument usecols specifying which fields to use in building the array:
In [1]: import numpy as np
In [2]: matrix = np.loadtxt('matrix.txt', usecols=range(7))
In [3]: print matrix
[[ 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 2.]
[ 1. 1. 1. 1. 1. 2. 1.]
[ 1. 1. 1. 1. 1. 2. 2.]
[ 1. 1. 1. 1. 1. 3. 1.]
[ 1. 1. 1. 1. 1. 3. 2.]]
If you want your matrix elements to be integers, pass dtype=int to loadtxt as well.

Related

IndexError: index 1 is out of bounds for axis 0 with size 1 - NumPy Python

for (i,j) in zip(Y_test,Y_pred_test):
if np.logical_and((Y_test[i]==1),(Y_pred_test[j]==1)):
TP += 1
elif np.logical_and((Y_test[i]==1),(Y_pred_test[j] == 0)):
FN += 1
elif np.logical_and((Y_test[i]==0),(Y_pred_test[j]==1)):
FP += 1
elif np.logical_and((Y_test[i]==0),(Y_pred_test[j]==0)):
TN += 1
Python - NumPy Question:
I need help. My code keeps coming up with an error on this particular section. The error states "IndexError: index 1 is out of bounds for axis 0 with size 1"
I'm currently writing up how to Calculate TP, FP, TN, FN, Accuracy, Precision, Recall, and F-1 score.
Y_test data contains:
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
Y_pred_test data contains:
[[0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0.
1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1.
1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1.]]
As Y_test and Y_pred_test are 2d arrays both i and j in the for loop are 1d arrays. Y_test[i] uses the values in i as the indices to access the rows in Y_test. Row zero exists but row 1 doesn't causing the error message.
import numpy as np
Y_test = np.array( [[ 0, 0, 0, 1, 1, 1, 1 ]] ) # Easy to see test data
Y_pred_test = np.array( [[ 0., 1., 1., 0., 0., 1. ]] )
for (i,j) in zip(Y_test,Y_pred_test):
print(i,'\n', j)
# [0 0 0 1 1 1 1] # i in the loop
# [0. 1. 1. 0. 0. 1.] # j in the loop

How to get all possible array attributions of numpy arrays?

Python: get all possible array attributions of nd arrays. Use itertools.product?
If so, how?
In Python, I have two n dimensions numpy arrays A and B (B is a zero array).
Such way A.shape[i]<=B.shape[i], for any i between 0 and n.
I want to create a for loop in such way every iteration I attribute A to a different subset of B, in such way every possible position in occupied until the end of the for loop.
for instance, with A = np.array([[1,1,1],[1,1,1]]) and B = np.zeros((3,4)), I would get these(one of these for each iteration):
1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1
For a fixed n dimension it is trivial, just use nested for loops for each dimension.
However, I want it for a generic n dimensions.
My approach was to use the itertools.product to get all combinations of indexes.
In the above example, product([0,1],[0,1]), would iterate over (0,0),(0,1),(1,0),(1,1), and I would have my indexes.
However, I don't know how to pass the values of the parameters to product function for a generic n.
Any idea? There are better ways of doing so?
itertools product should work.
import numpy as np
from itertools import product
A = np.ones((2,3))
B = np.zeros((3,4))
r_rng = range(B.shape[0]-A.shape[0]+1)
c_rng = range(B.shape[1]-A.shape[1]+1)
for i,j in product(r_rng, c_rng):
C = B.copy()
C[i:i+A.shape[0],j:j+A.shape[1]]=A
print(C,'\n')
Output:
[[1. 1. 1. 0.]
[1. 1. 1. 0.]
[0. 0. 0. 0.]]
[[0. 1. 1. 1.]
[0. 1. 1. 1.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[1. 1. 1. 0.]
[1. 1. 1. 0.]]
[[0. 0. 0. 0.]
[0. 1. 1. 1.]
[0. 1. 1. 1.]]
Here is an example. You can use the * operator to unpack a variable number of argument from a list and give it to itertools.product():
import itertools
size1 = (3,5,6)
size2 = (2,2,2)
N = len(size1)
coords = []
for i in range(N):
delta = size1[i]-size2[i]
coords.append(list(range(delta)))
print(coords)
it = itertools.product(*coords)
arr = np.array(list(it))
print(arr)
Output:
[[0 0 0]
[0 0 1]
[0 0 2]
[0 0 3]
[0 1 0]
[0 1 1]
[0 1 2]
[0 1 3]
[0 2 0]
[0 2 1]
[0 2 2]
[0 2 3]]
Im going to post the solution I obtained:
import numpy as np
from itertools import product
A=np.ones((2,3,2))
B=np.zeros((3,4,4))
coords=[]
for i in range(len(B.shape)):
delta = B.shape[i]-A.shape[i]+1
coords.append(list(range(delta)))
print(coords)
for start_idx in product(*coords):
idx=tuple(slice(start_idx[i], start_idx[i]+A.shape[i]) for i in range(len(A.shape)))
m=np.zeros(B.shape)
m.__setitem__(tuple(idx), A)
print(m)
ps: Indexing the nd arrays was very tricky

Output matrix updating with dots and want I want remove the Dots in array

import numpy as np
x = np.ones((5,5))
print(x)
x[1:-1,1:-1] = 0
print(x)
I am getting the output as shown below:
[[1. 1. 1. 1. 1.]
[1. 0. 0. 0. 1.]
[1. 0. 0. 0. 1.]
[1. 0. 0. 0. 1.]
[1. 1. 1. 1. 1.]]
You can do it using astype, setting it to int:
print(x.astype(int))
Result:
[[1 1 1 1 1]
[1 0 0 0 1]
[1 0 0 0 1]
[1 0 0 0 1]
[1 1 1 1 1]]
I think you refer to 1. When you see a dot sign, you know that that number is float type.
If you don't want floats, you should cast your list to integer:
x.astype(int)
Other things you should do in python console to understand things a little:
print(type(1))
print(type(1.))
print(x.dtype)
print(x.astype(int).dtype)

Initializing a matrix with alternating 0s and 1s in TensorFlow

I am trying to create an n-by-m matrix of 0s and 1s with a very simple structure:
[[1 0 0 0 0 0 0 ...],
[1 1 0 0 0 0 0 ...],
[1 1 1 0 0 0 0 ...],
[1 1 1 1 0 0 0 ...],
[0 1 1 1 1 0 0 ...],
[0 1 1 1 1 1 0 ...],
...
[... 0 0 0 1 1 1 1],
[... 0 0 0 0 1 1 1],
[... 0 0 0 0 0 1 1],
[... 0 0 0 0 0 0 1]]
However, I don't want to start writing loops as this is probably achievable using something built in: A = tf.constant(???,shape(n,m))
Note that after the first 3 rows there is simply a repetition of four 1s, followed by m-3 0s, until the last 3 rows.
So I am thinking something along the lines of a repeat of repeat, but I have no idea what syntax to use.
You're looking for tf.matrix_band_part(). As per the manual, it's function is to
Copy a tensor setting everything outside a central band in each innermost matrix to zero.
So in your case you'd create a matrix with ones, and then take a 4-wide band like this:
tf.matrix_band_part( tf.ones( shape = ( 1, n, m ) ), 3, 0 )
Tested code:
import tensorflow as tf
x = tf.ones( shape = ( 1, 9, 6 ) )
y = tf.matrix_band_part( x, 3, 0 )
with tf.Session() as sess:
res = sess.run( y )
print ( res )
Output:
[[[1. 0. 0. 0. 0. 0.]
[1. 1. 0. 0. 0. 0.]
[1. 1. 1. 0. 0. 0.]
[1. 1. 1. 1. 0. 0.]
[0. 1. 1. 1. 1. 0.]
[0. 0. 1. 1. 1. 1.]
[0. 0. 0. 1. 1. 1.]
[0. 0. 0. 0. 1. 1.]
[0. 0. 0. 0. 0. 1.]]]

use model.matrix through rpy2?

I prefer python over R for my work. From time to time, I need to use R
functions, and I start to try Rpy2 for that purpose.
I tried but failed to find out how to replicate following with Rpy2
design <- model.matrix(~Subject+Treat)
I have gone as far as this:
import rpy2.robjects as robjects
fmla = robjects.Formula('~subject+treatment')
env = fmla.environment
env['subject'] = sbj_group
env['treatment'] = trt_group
from what I saw here.
But I could not find how to perform model.matrix. I tried a couple of different ways:
robjects.r.model_matrix(fmla)
robjects.r('model.matrix(%s)' %fmla.r_repr())
As you can see none of them is right.
I am new to Rpy2, and fairly inexperienced in R. Any help would be appreciated!
You could evaluate strings as R code:
import numpy as np
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
ro.numpy2ri.activate()
R = ro.r
subject = np.repeat([1,2,3], 4)
treatment = np.tile([1,2,3,4], 3)
R.assign('subject', subject)
R.assign('treatment', treatment)
R('subject <- as.factor(subject)')
R('treatment <- as.factor(treatment)')
R('design <- model.matrix(~subject+treatment)')
R('print(design)')
yields
(Intercept) subject2 subject3 treatment2 treatment3 treatment4
1 1 0 0 0 0 0
2 1 0 0 1 0 0
3 1 0 0 0 1 0
4 1 0 0 0 0 1
5 1 1 0 0 0 0
6 1 1 0 1 0 0
7 1 1 0 0 1 0
8 1 1 0 0 0 1
9 1 0 1 0 0 0
10 1 0 1 1 0 0
11 1 0 1 0 1 0
12 1 0 1 0 0 1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$subject
[1] "contr.treatment"
attr(,"contrasts")$treatment
[1] "contr.treatment"
R(...) returns objects which you can manipulate on the Python side.
For example,
design = R('model.matrix(~subject+treatment)')
assigns a rpy2.robjects.vectors.Matrix to design.
arr = np.array(design)
makes arr the NumPy array
[[ 1. 0. 0. 0. 0. 0.]
[ 1. 0. 0. 1. 0. 0.]
[ 1. 0. 0. 0. 1. 0.]
[ 1. 0. 0. 0. 0. 1.]
[ 1. 1. 0. 0. 0. 0.]
[ 1. 1. 0. 1. 0. 0.]
[ 1. 1. 0. 0. 1. 0.]
[ 1. 1. 0. 0. 0. 1.]
[ 1. 0. 1. 0. 0. 0.]
[ 1. 0. 1. 1. 0. 0.]
[ 1. 0. 1. 0. 1. 0.]
[ 1. 0. 1. 0. 0. 1.]]
The column names can be accessed with
np.array(design.colnames)
# array(['(Intercept)', 'subject2', 'subject3', 'treatment2', 'treatment3',
# 'treatment4'],
# dtype='|S11')

Categories