How to form a matrix from one column in a file?

How to form a matrix from one column in a file? - python

I have a file that contains this column of info
1.0000000000000002
0.6593496737729044
1.0000000000000002
I can read this data from a file and I want to form a matrix 2*2 from it. I tried a lot, but I got a wrong output.
my code
with open("final_overlap.txt", "r") as final_over:
for i in range(2):
for j in range(2):
i = final_over.readline()
j = final_over.readline()
S = np.array([i,j])
print(S)
The output I want like this.
[[1.0000000000000002 0.6593496737729044]
[0.6593496737729044 1.0000000000000002]]
How can I form this matrix.
Take into account that I have another input, and it has more info, so I want a method that can form a different matrix not only 2*2.
Like this input too
1 1 1.0000000000000002
2 1 0.6593496737729044
2 2 1.0000000000000002
3 1 0.1192165290691592
3 2 0.0954901018165798
3 3 1.0000000000000002
4 1 0.0954901018165798
4 2 0.1192165290691592
4 3 0.6593496737729044
4 4 1.0000000000000002
and the matrix will be 4*4
One more question about the matrix. I got the right answer but if I have input like this.
`
1 1 1 1 0.7746059439198979
2 1 1 1 0.4441350695399573
2 1 2 1 0.2970603935859659
2 2 1 1 0.5696940113278337
2 2 2 1 0.4441350695399575
2 2 2 2 0.7746059439198979
I tried with this code, but I got error "list index out of range"
for line in open('Two_Electron.txt'):
r,c,d,e,v = line.split()
r = int(r)-1
c = int(c)-1
d = int(d)-1
e = int(e)-1
v = float(v)
if c == 0:
data.append( [v] )
else:
data[-1].append(v)
print(data)
# Fill in the upper triangle.
for i in range(len(data)-1):
for j in range(i+1,len(data)):
data[i].append( data[j][i] )
for k in range(len(data)-1):
for l in range(k+1,len(data)):
data[k].append( data[l][k] )
V_ee = np.array(data)
The output should I get.
[[[[0.77460594 0.4441351 ]
[0.4441351 0.56969403]]
[[0.4441351 0.29706043]
[0.29706043 0.4441351 ]]]
[[[0.4441351 0.29706043]
[0.29706043 0.4441351 ]]
[[0.56969403 0.4441351 ]
[0.4441351 0.77460594]]]]

Load the data into a simple list, then build the rows from the list.
with open("final_overlap.txt", "r") as final_over:
data = [float(line) for line in final_over]
S = np.array( [data[0:2], data[1:]] )
print(S)
Output:
[[1. 0.65934967]
[0.65934967 1. ]]
Followup
OK, assuming your data has row and column numbers like your second example, this will read the data, fill in the upper triangle, and convert to np.array.
import numpy as np
# Read in the data to find out the size.
data = []
for line in open('x.txt'):
r,c,v = line.split()
r = int(r)-1
c = int(c)-1
v = float(v)
if c == 0:
data.append( [v] )
else:
data[-1].append(v)
# Fill in the upper triangle.
for i in range(len(data)-1):
for j in range(i+1,len(data)):
data[i].append( data[j][i] )
array = np.array(data)
print(array)
Output:
[[1. 0.65934967 0.11921653 0.0954901 ]
[0.65934967 1. 0.0954901 0.11921653]
[0.11921653 0.0954901 1. 0.65934967]
[0.0954901 0.11921653 0.65934967 1. ]]
It would still be possible to do this, even if you don't have the row and column numbers, just by keeping an internal counter.

Related

How can I calculate scikit-learn rbf_kernel() with very large array?

While using the rbf_kernel() function the array is too large and there is a memory issue, so I have to separate the data and calculate it.
from sklearn.metrics.pairwise import rbf_kernel
result = rbf_kernel([[1,1],[2,2],[3,3]], gamma=60) # A data:[1,1] , B data:[2,2], C data:[3,3]
And result looks like
A B C
A 1 2 1
B 1 1 1
C 1 1 2
However, if I insert larger data, there is a memory issue.
result = rbf_kernel([[1,1],[2,2],[3,3],[4,4],[5,5],.... ], gamma=60)
How can I extract the result without putting data all at once?

Try using:
l = [[1,1],[2,2],[3,3],[4,4],[5,5], ...]
newl = []
for i in range(0, len(l), 10):
newl.append(rbf_kernel(l[i:i + 10]))

how to find minimum element of adjacent elements of a position in a matrix

I have a 5x5 matrix and I have to find the minimum of adjacent elements for a position and add that minimum number to that position... this has to be done for all the elements in the matrix except for the 1st row and 1st column.
This is the matrix
A= [[1 1 2 2 3],[1 1 0 1 0],[2 0 1 0 1],[3 2 1 2 1],[4 0 1 0 1]]

import numpy as np
a = [1,2,1,3,1]
b = [2,1,2,1,2]
First Matrix
def get_matrix1(a,b):
d = []
for x in a:
for y in b:
d.append(abs(y-x))
return np.reshape(d,(5,5))
Second Matrix
def get_matrix2():
# Matrix
m1 = get_matrix1(a,b)
print('First Matrix : {}'.format(m1))
# Cumulative Addition
m1[0] = np.cumsum(m1[0])
m1[:,0] = np.cumsum(m1[:,0])
m2 = m1.copy()
print('\nCumulative Addition Matrix : {}'.format(m2))
# Second Matrix
i_rows,j_cols = [0,1,2,3],[0,1,2,3]
edge_rows,edge_cols = [1,2,3,4],[1,2,3,4]
for i,row in zip(i_rows, edge_rows):
for j,col in zip(j_cols, edge_cols):
# old
old = m2[row,col]
print('\nOld : {}'.format(old))
# edges
c,u,l = m2[i,j],m2[i,j+1],m2[i+1,j]
r = (c,u,l)
print('Edges : {}'.format(r))
# new
new = min(r) + old
print('New : {}'.format(new))
# update
m2[row,col] = new
print('Updated Matrix :')
print(m2)
get_matrix2()

Find rows where values change in array

How do I find the rows(indices) of my array, where its values change?
for example I have an array:
0 -0.638127 0.805294 1.30671
1 -0.638127 0.805294 1.30671
2 -0.085362 0.523378 0.550509
3 -0.085362 0.523378 0.550509
4 -0.323397 0.94502 0.49001
5 -0.323397 0.94502 0.49001
6 -0.323397 0.94502 0.49001
7 -0.291798 0.421398 0.962115
I want a result like:
[0 2 4 7]
I am happy to use existing librarys and I am not limited to anything. All I want are the numbers of the rows. How would I calculate that?
I tried
a = []
for i, row in enumerate(vecarray):
if i > 0:
a[i] = vecarray[i] - vecarray[i-1]
b = np.where(a != 0)
but that gives me IndexError: list assignment index out of range

arr = [
(-0.638127, 0.805294, 1.30671),
(-0.638127, 0.805294, 1.30671),
(-0.085362, 0.523378, 0.550509),
(-0.085362, 0.523378, 0.550509),
(-0.323397, 0.94502, 0.49001),
(-0.323397, 0.94502, 0.49001),
(-0.323397, 0.94502, 0.49001),
(-0.291798, 0.421398, 0.962115)
]
i = 0
prev_t = None
for t in arr:
if t != prev_t:
prev_t = t
print(i)
i += 1

Iterate the code in a shortest way for the whole dataset

I have very big df:
df.shape() = (106, 3364)
I want to calculate so called frechet distance by using this Frechet Distance between 2 curves. And it works good. Example:
x = df['1']
x1 = df['1.1']
p = np.array([x, x1])
y = df['2']
y1 = df['2.1']
q = np.array([y, y1])
P_final = list(zip(p[0], p[1]))
Q_final = list(zip(q[0], q[1]))
from frechetdist import frdist
frdist(P_final,Q_final)
But I can not do row by row like:
`1 and 1.1` to `1 and 1.1` which is equal to 0
`1 and 1.1` to `2 and 2.1` which is equal to some number
...
`1 and 1.1` to `1682 and 1682.1` which is equal to some number
I want to create something (first idea is for loop, but maybe you have better solution) to calculate this frdist(P_final,Q_final) between:
first rows to all rows (including itself)
second row to all rows (including itself)
Finally, I supposed to get a matrix size (106,106) with 0 on diagonal (because distance between itself is 0)
matrix =
0 1 2 3 4 5 ... 105
0 0
1 0
2 0
3 0
4 0
5 0
... 0
105 0
Not including my trial code because it is confusing everyone!
EDITED:
Sample data:
1 1.1 2 2.1 3 3.1 4 4.1 5 5.1
0 43.1024 6.7498 45.1027 5.7500 45.1072 3.7568 45.1076 8.7563 42.1076 8.7563
1 46.0595 1.6829 45.0595 9.6829 45.0564 4.6820 45.0533 8.6796 42.0501 3.6775
2 25.0695 5.5454 44.9727 8.6660 41.9726 2.6666 84.9566 3.8484 44.9566 1.8484
3 35.0281 7.7525 45.0322 3.7465 14.0369 3.7463 62.0386 7.7549 65.0422 7.7599
4 35.0292 7.5616 45.0292 4.5616 23.0292 3.5616 45.0292 7.5616 25.0293 7.5613

I just used own sample data in your format (I hope)
import pandas as pd
from frechetdist import frdist
import numpy as np
# create sample data
df = pd.DataFrame([[1,2,3,4,5,6], [3,4,5,6,8,9], [2,3,4,5,2,2], [3,4,5,6,7,3]], columns=['1','1.1','2', '2.1', '3', '3.1'])
# this matrix will hold the result
res = np.ndarray(shape=(df.shape[1] // 2, df.shape[1] // 2), dtype=np.float32)
for row in range(res.shape[0]):
for col in range(row, res.shape[1]):
# extract the two functions
P = [*zip([df.loc[:, f'{row+1}'], df.loc[:, f'{row+1}.1']])]
Q = [*zip([df.loc[:, f'{col+1}'], df.loc[:, f'{col+1}.1']])]
# calculate distance
dist = frdist(P, Q)
# put result back (its symmetric)
res[row, col] = dist
res[col, row] = dist
# output
print(res)
Output:
[[0. 4. 7.5498343]
[4. 0. 5.5677643]
[7.5498343 5.5677643 0. ]]
Hope that helps
EDIT: Some general tips:
If speed matters: check if frdist handles also a numpy array of shape
(n_values, 2) than you could save the rather expensive zip-and-unpack operation
and directly use the arrays or build the data directly in a format the your library needs
Generally, use better column namings (3 and 3.1 is not too obvious). Why you dont call them x3, y3 or x3 and f_x3
I would actually put the data into two different Matrices. If you watch the
code I had to do some not-so-obvious stuff like iterating over shape
divided by two and built indices from string operations because of the given table layout

Printing in a loop

I have the following file I'm trying to manipulate.
1 2 -3 5 10 8.2
5 8 5 4 0 6
4 3 2 3 -2 15
-3 4 0 2 4 2.33
2 1 1 1 2.5 0
0 2 6 0 8 5
The file just contains numbers.
I'm trying to write a program to subtract the rows from each other and print the results to a file. My program is below and, dtest.txt is the name of the input file. The name of the program is make_distance.py.
from math import *
posnfile = open("dtest.txt","r")
posn = posnfile.readlines()
posnfile.close()
for i in range (len(posn)-1):
for j in range (0,1):
if (j == 0):
Xp = float(posn[i].split()[0])
Yp = float(posn[i].split()[1])
Zp = float(posn[i].split()[2])
Xc = float(posn[i+1].split()[0])
Yc = float(posn[i+1].split()[1])
Zc = float(posn[i+1].split()[2])
else:
Xp = float(posn[i].split()[3*j+1])
Yp = float(posn[i].split()[3*j+2])
Zp = float(posn[i].split()[3*j+3])
Xc = float(posn[i+1].split()[3*j+1])
Yc = float(posn[i+1].split()[3*j+2])
Zc = float(posn[i+1].split()[3*j+3])
Px = fabs(Xc-Xp)
Py = fabs(Yc-Yp)
Pz = fabs(Zc-Zp)
print Px,Py,Pz
The program is calculating the values correctly but, when I try to call the program to write the output file,
mpipython make_distance.py > distance.dat
The output file (distance.dat) only contains 3 columns when it should contain 6. How do I tell the program to shift what columns to print to for each step j=0,1,....
For j = 0, the program should output to the first 3 columns, for j = 1 the program should output to the second 3 columns (3,4,5) and so on and so forth.
Finally the len function gives the number of rows in the input file but, what function gives the number of columns in the file?
Thanks.

Append a , to the end of your print statement and it will not print a newline, and then when you exit the for loop add an additional print to move to the next row:
for j in range (0,1):
...
print Px,Py,Pz,
print
Assuming all rows have the same number of columns, you can get the number of columns by using len(row.split()).
Also, you can definitely shorten your code quite a bit, I'm not sure what the purpose of j is, but the following should be equivalent to what you're doing now:
for j in range (0,1):
Xp, Yp, Zp = map(float, posn[i].split()[3*j:3*j+3])
Xc, Yc, Zc = map(float, posn[i+1].split()[3*j:3*j+3])
...

You don't need to:
use numpy
read the whole file in at once
know how many columns
use awkward comma at end of print statement
use list subscripting
use math.fabs()
explicitly close your file
Try this (untested):
with open("dtest.txt", "r") as posnfile:
previous = None
for line in posnfile:
current = [float(x) for x in line.split()]
if previous:
delta = [abs(c - p) for c, p in zip(current, previous)]
print ' '.join(str(d) for d in delta)
previous = current

just in case your dtest.txt grows larger and you don't want to redirect your output but rather write to distance.dat, especially, if you want to use numpy. Thank #John for pointing out my mistake in the old code ;-)
import numpy as np
pos = np.genfromtxt("dtest.txt")
dis = np.array([np.abs(pos[j+1] - pos[j]) for j in xrange(len(pos)-1)])
np.savetxt("distance.dat",dis)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to form a matrix from one column in a file? - python

Related

How can I calculate scikit-learn rbf_kernel() with very large array?

how to find minimum element of adjacent elements of a position in a matrix

Find rows where values change in array

Iterate the code in a shortest way for the whole dataset

Printing in a loop

Categories

Resources