Building a symmetric matrix in Python from data in file - python

I have a file which, for example, looks like:
1 1 5.5
1 2 6.1
1 3 7.3
2 2 3.4
2 3 9.2
3 3 4.7
This is "half" of a symmetric 3x3 matrix. I would like to create the full symmetric matrix in Python which looks like
[[ 5.5 6.1 7.3]
[ 6.1 3.4 9.2]
[ 7.3 9.2 4.7]]
(of course my actual file is a much bigger 'half' of a NxN matrix so I need a solution other than typing in the values one by one)
I've exhausted all my resources (books and internet) and what I have so far does not really come close. Can anyone please help me with this?
Thank you!

to read the file and load it as a python object, here's a solution:
import numpy
m = numpy.matrix([[0,0,0],[0,0,0],[0,0,0]])
with file('matrix.txt', 'r') as f:
for l in f:
try:
i, j, val = line.split(' ')
i, j, val = int(i), int(j), float(val)
m[i-1,j-1] = val
except:
print("couldn't load line: {}".format(l))
print m

Here is an alternative way to do this completely inside Numpy. Two important remarks:
you can read directly with the np.loadtxt function
you can assign the upper-half values to the correct indexes in one line: N[idxs[:,0] - 1, idxs[:,1] - 1] = vals
Here is the code:
import numpy as np
from StringIO import StringIO
indata = """
1 1 5.5
1 2 6.1
1 3 7.3
2 2 3.4
2 3 9.2
3 3 4.7
"""
infile = StringIO(indata)
A = np.loadtxt(infile)
# A is
# array([[ 1. , 1. , 5.5],
# [ 1. , 2. , 6.1],
# [ 1. , 3. , 7.3],
# [ 2. , 2. , 3.4],
# [ 2. , 3. , 9.2],
# [ 3. , 3. , 4.7]])
idxs = A[:, 0:2].astype(int)
vals = A[:, 2]
## To find out the total size of the triangular matrix, note that there
## are only n * (n + 1) / 2 elements that must be specified (the upper
## half amount for (n^2 - n) / 2, and the diagonal adds n to that).
## Therefore, the length of your data, A.shape[0], must be one solution
## to the quadratic equation: n^2 + 1 - 2 * A.shape[0] = 0
possible_sizes = np.roots([1, 1, -2 * A.shape[0]])
## Let us take only the positive solution to that equation as size of the
## result matrix
size = possible_sizes[possible_sizes > 0]
N = np.zeros([size] * 2)
N[idxs[:,0] - 1, idxs[:,1] - 1] = vals
# N is
# array([[ 5.5, 6.1, 7.3],
# [ 0. , 3.4, 9.2],
# [ 0. , 0. , 4.7]])
## Here we could do a one-liner like
# N[idxs[:,1] - 1, idxs[:,0] - 1] = vals
## But how cool is it to add the transpose and subtract the diagonal? :)
M = N + np.transpose(N) - np.diag(np.diag(N))
# M is
# array([[ 5.5, 6.1, 7.3],
# [ 6.1, 3.4, 9.2],
# [ 7.3, 9.2, 4.7]])

If you know the size of the matrix in advance (and it sounds like you do), then the following would work (in both Python 2 and 3):
N = 3
symmetric = [[None]*N for _ in range(SIZE)] # pre-allocate output matrix
with open('matrix_data.txt', 'r') as file:
for i, j, val in (line.split() for line in file if line):
i, j, val = int(i)-1, int(j)-1, float(val)
symmetric[i][j] = val
if symmetric[j][i] is None:
symmetric[j][i] = val
print(symmetric) # -> [[5.5, 6.1, 7.3], [6.1, 3.4, 9.2], [7.3, 9.2, 4.7]]
If you don't know the sizeNahead of time, you could preprocess the file and determine the maximum index values given.

Related

Find missing elements and insert value in place in Python

I want to find the missing elements in I[:,1] and insert 0 for these elements. For instance, the missing elements in I[:,1] are 3,6. I want to insert 0 corresponding to these elements as shown in the expected output.
import numpy as np
I=np.array([[ 0.24414794669159817 , 1. ],
[ 0.2795127725932865 , 2. ],
[ 0.2630129055948728 , 4. ],
[ 0.2518744176621288 , 5. ],
[ 0.0000000000000051625370645, 7. ]])
The expected output is:
array([[ 0.24414794669159817 , 1. ],
[ 0.2795127725932865 , 2. ],
[ 0.0 , 3. ]
[ 0.2630129055948728 , 4. ],
[ 0.2518744176621288 , 5. ],
[ 0.0 , 6. ]
[ 0.0000000000000051625370645, 7. ]])
Try this out
Using numpy
mx = int(np.max(I[:, 1])) # find max length to construct new array
I2 = np.stack([np.zeros(mx), np.arange(1, mx + 1)], axis=1) # new array
indices = I[:,1].astype(int) # take column as index
I2[indices-1] = I # assign prev values to new array
Using For loop:
I1 = np.copy(I)
prev = 1
for i in I[:, 1]:
if abs(prev - i) >= 1:
I1 = np.insert(I1, int(i), [[0.0, float(i - 1)]], axis=0)
prev = i
print(I1)
You do not need the loop if you use np.isin:
# Create new matrix with zeros in first column.
idx_min = min(I[:,1])
idx_max = max(I[:,1])
idxs = np.arange(idx_min, idx_max+1)
vals = np.zeros_like(idxs)
I_new = np.stack([vals, idxs], axis=1)
# replace zeros with existing data
idx_data = np.isin(I_new[:,1], I[:,1])
I_new[idx_data, 0] = I[:,0]

Finding percentage change with Numpy

I'm writing a function to find the percentage change using Numpy and function calls. So far what I got is:
def change(a,b):
answer = (np.subtract(a[b+1], a[b])) / a[b+1] * 100
return answer
print(change(a,0))
"a" is the array I have made and b will be the index/numbers I am trying to calculate.
For example:
My Array is
[[1,2,3,5,7]
[1,4,5,6,7]
[5,8,9,10,32]
[3,5,6,13,11]]
How would I calculate the percentage change between 1 to 2 (=0.5) or 1 to 4(=0.75) or 5,7 etc..
Note: I know how mathematically to get the change, I'm not sure how to do this in python/ numpy.
If I understand correctly, that you're trying to find percent change in each row, then you can do:
>>> np.diff(a) / a[:,1:] * 100
Which gives you:
array([[ 50. , 33.33333333, 40. , 28.57142857],
[ 75. , 20. , 16.66666667, 14.28571429],
[ 37.5 , 11.11111111, 10. , 68.75 ],
[ 40. , 16.66666667, 53.84615385, -18.18181818]])
I know you have asked this question with Numpy in mind and got answers above:
import numpy as np
np.diff(a) / a[:,1:]
I attempt to solve this with Pandas. For those who would have the same question but using Pandas instead of Numpy
import pandas as pd
data = [[1,2,3,4,5],
[1,4,5,6,7],
[5,8,9,10,32],
[3,5,6,13,11]]
df = pd.DataFrame(data)
df_change = df.rolling(1,axis=1).sum().pct_change(axis=1)
print(df_change)
I suggest to simply shift the array. The computation basically becomes a one-liner.
import numpy as np
arr = np.array(
[
[1, 2, 3, 5, 7],
[1, 4, 5, 6, 7],
[5, 8, 9, 10, 32],
[3, 5, 6, 13, 11],
]
)
# Percentage change from row to row
pct_chg_row = arr[1:] / arr[:-1] - 1
[[ 0. 1. 0.66666667 0.2 0. ]
[ 4. 1. 0.8 0.66666667 3.57142857]
[-0.4 -0.375 -0.33333333 0.3 -0.65625 ]]
# Percentage change from column to column
pct_chg_col = arr[:, 1::] / arr[:, 0:-1] - 1
[[ 1. 0.5 0.66666667 0.4 ]
[ 3. 0.25 0.2 0.16666667]
[ 0.6 0.125 0.11111111 2.2 ]
[ 0.66666667 0.2 1.16666667 -0.15384615]]
You could easily generalize the task, so that you are not limited to compute the change from one row/column to another, but be able to compute the change for n rows/columns.
n = 2
pct_chg_row_generalized = arr[n:] / arr[:-n] - 1
[[4. 3. 2. 1. 3.57142857]
[2. 0.25 0.2 1.16666667 0.57142857]]
pct_chg_col_generalized = arr[:, n:] / arr[:, :-n] - 1
[[2. 1.5 1.33333333]
[4. 0.5 0.4 ]
[0.8 0.25 2.55555556]
[1. 1.6 0.83333333]]
If the output array must have the same shape as the input array, you need to make sure to insert the appropriate number of np.nan.
out_row = np.full_like(arr, np.nan, dtype=float)
out_row[n:] = arr[n:] / arr[:-n] - 1
[[ nan nan nan nan nan]
[ nan nan nan nan nan]
[4. 3. 2. 1. 3.57142857]
[2. 0.25 0.2 1.16666667 0.57142857]]
out_col = np.full_like(arr, np.nan, dtype=float)
out_col[:, n:] = arr[:, n:] / arr[:, :-n] - 1
[[ nan nan 2. 1.5 1.33333333]
[ nan nan 4. 0.5 0.4 ]
[ nan nan 0.8 0.25 2.55555556]
[ nan nan 1. 1.6 0.83333333]]
Finally, a small function for the general 2D case might look like this:
def np_pct_chg(arr: np.ndarray, n: int = 1, axis: int = 0) -> np.ndarray:
out = np.full_like(arr, np.nan, dtype=float)
if axis == 0:
out[n:] = arr[n:] / arr[:-n] - 1
elif axis == 1:
out[:, n:] = arr[:, n:] / arr[:, :-n] - 1
return out
The accepted answer is close but incorrect if you're trying to take % difference from left to right.
You should get the following percent difference:
1,2,3,5,7 --> 100%, 50%, 66.66%, 40%
check for yourself: https://www.calculatorsoup.com/calculators/algebra/percent-change-calculator.php
Going by what Josmoor98 said, you can use np.diff(a) / a[:,:-1] * 100 to get the percent difference from left to right, which will give you the correct answer.
array([[100. , 50. , 66.66666667, 40. ],
[300. , 25. , 20. , 16.66666667],
[ 60. , 12.5 , 11.11111111, 220. ],
[ 66.66666667, 20. , 116.66666667, -15.38461538]])
import numpy as np
a = np.array([[1,2,3,5,7],
[1,4,5,6,7],
[5,8,9,10,32],
[3,5,6,13,11]])
np.array([(i[:-1]/i[1:]) for i in a])
Combine all your arrays.
Then make a data frame from them.
df = pd.df(data=array you made)
Use the pct_change() function on dataframe. It will calculate the % change for all rows in dataframe.

Python - Generate random real number between range with a step size

I am using python-3.x, and I am trying to generate an initial population that contains random real numbers between 0 and 1 where these numbers should be one of the following:
0, 0.33333, 0.666667 or 1
That means the difference between these numbers is 0.33333 (1/3). I tried to modify this code in many ways but their no luck
import numpy as np
import random
from random import randint
from itertools import product
pop_size = 7
i_length = 2
i_min = 0
i_max = 1
level = 2
step = ((1/((2**level)-1))*(i_max-i_min))
def individual(length, min, max):
return [ randint(min,max) for x in range(length) ]
def population(count, length, min, max):
return [ individual(length, min, max) for x in range(count) ]
population = population(pop_size, i_length, i_min, i_max)
##count: the number of individuals in the population
##length: the number of values per individual
##min: the minimum possible value in an individual's list of values
##max: the maximum possible value in an individual's list of values
##this code was taken from :https://lethain.com/genetic-algorithms-cool-name-damn-simple/
I did this lines which works very well for me:
population2 = np.array(list(product(np.linspace(i_min, i_max, 2**level), repeat=2)))
population3 = [j for j in product(np.arange(i_min, i_max+step, step), repeat=2)]
but the problem it will list all the possible values which are not what I want. I want random numbers where the population size will be given
the result I want to see is smailar to (numpy array or list):
population = [[0, 1],
[0, 0.3333],
[0.3333, 1],
[1, 0.6667],
[0.3333, 0.6667],
[0.6667, 0],
[0.3333, 0.3333]]
keep in mind the:
level = 2
where I can calculat the the step value:
step = ((1/((2**level)-1))*(i_max-i_min))
for example, if I changed the level = 2 to level = 3 then it is no more using the 0.3333 it will change to 0.1428 1/7) which I will get different values.
Any advice would be much appreciated
>>> np.random.choice([0, 1/3., 2/3., 1], size=(7,2), replace=True)
array([[0. , 0.33333333],
[0.33333333, 0.66666667],
[0. , 0. ],
[0.66666667, 0. ],
[0.33333333, 0.33333333],
[1. , 1. ],
[0.33333333, 0.33333333]])
>>> i_min = 0
>>> i_max = 1
>>> level = 3
>>> np.random.choice(np.linspace(i_min, i_max, 2**level), size=(7,2), replace=True)
array([[0.28571429, 0.14285714],
[0.85714286, 0.57142857],
[0.71428571, 0.42857143],
[0.71428571, 1. ],
[0.14285714, 0.85714286],
[0. , 0. ],
[1. , 0. ]])
Without numpy:
from random import randint
def get_population(num, repeats, fraction):
return [[randint(0, fraction)/fraction for x in range(num)] for i in range(repeats)]
print(get_population(2, 7, 3))
Output is:
[[0.3333333333333333, 0.0],
[0.3333333333333333, 1.0],
[1.0, 0.3333333333333333],
[0.3333333333333333, 0.0],
[0.0, 0.3333333333333333],
[0.3333333333333333, 0.6666666666666666],
[1.0, 1.0]]
Fraction 7:
print(get_population(2, 7, 7))
Output is:
[[0.8571428571428571, 0.7142857142857143],
[0.7142857142857143, 0.14285714285714285],
[0.0, 0.7142857142857143],
[0.42857142857142855, 0.5714285714285714],
[0.42857142857142855, 0.7142857142857143],
[1.0, 0.5714285714285714],
[1.0, 1.0]]

python: performance enhancement shrinking image

I have written the following code to scale an image to 50%. However, it took this algorithm 65 seconds to shrink a 3264x2448 image. Can someone who understands numpy explain why this algorithm is so inefficient and suggest more efficient changes?
def shrinkX2(im):
X, Y = im.shape[1] / 2, im.shape[0] / 2
new = np.zeros((Y, X, 3))
for y in range(Y):
for x in range(X):
new[y, x] = im[2*y:2*y + 2, 2*x:2*x + 2].reshape(4, 3).mean(axis=0)
return new
Going by the text of the question, it seems you are shrinking the image by 50% and by the code it seems, you are doing it in blocks. We can reshape to split each of the two axes of the 2D input by lengths as the required block sizes to get a 4D array and then compute mean along the axes corresponding to the block sizes, like so -
def block_mean(im, BSZ):
m,n = im.shape[:2]
return im.reshape(m//BSZ[0],BSZ[0],n//BSZ[1],BSZ[1],-1).mean((1,3))
Sample run -
In [44]: np.random.seed(0)
...: im = np.random.randint(0,9,(6,8,3))
In [45]: im[:2,:2,:].mean((0,1)) # average of first block across all 3 channels
Out[45]: array([3.25, 3.75, 3.5 ])
In [46]: block_mean(im, BSZ=(2,2))
Out[46]:
array([[[3.25, 3.75, 3.5 ],
[4. , 4.5 , 3.75],
[5.75, 2.75, 5. ],
[3. , 3.5 , 3.25]],
[[4. , 5.5 , 5.25],
[6.25, 1.75, 2. ],
[4.25, 2.75, 1.75],
[2. , 4.75, 3.75]],
[[3.25, 3.5 , 5.25],
[4.25, 1.5 , 5.25],
[3.5 , 3.5 , 4.25],
[0.75, 5. , 5.5 ]]])

Python Reading (floating-point) values one at a time

I'd like to fill in a numpy array with some floating-point values coming from a file. The data would be stored like this:
0 11
5 6.2 4 6
2 5 3.2 6
7 1.4 5 11
The first line gives the first and last index and on the following lines come the actual data. My current approach is to split each data line, use float on each part, and store the values in a pre-allocated array, slice by slice. Here is how I do it now:
data_file ='data.txt'
# Non needed stuff at the beginning
skip_lines = 0
with open(data_file, 'r') as f:
# Skip any lines if needed
for _ in range(skip_lines):
f.readline()
# Get the data size and preallocate the numpy array
first, last = map(int, f.readline().split())
size = last - first + 1
data = np.zeros(size)
beg, end = (-1, 0) # Keep track of where to fill the array
for line in f:
if end - 1 == last:
break
samples = line.split()
beg = end
end += len(samples)
data[beg:end] = [float(s) for s in samples]
Is there a way in Python to read the data values one by one instead?
import numpy as np
f = open('data.txt', 'r')
first, last = map(int, f.readline().split())
arr = np.zeros(last - first + 1)
for k in range(last - first + 1):
data = f.read() # This does not work. Any idea?
# In C++, it could be done this way: double data; cin >> data
arr[k] = data
EDIT The only thing that one can be sure of is that the two first numbers are the first and last index and that the last data row has only the last numbers. There can be also other stuff after the data numbers. So one can't just read all the rows after the "first, last" row.
EDIT 2 Added (working) initial approach (split each data line, use float on each part, and store the values in a pre-allocated array, slice by slice) implementation.
Since your sample has the same number of columns in each row (except the first) we can read it as csv, for example with loadtxt:
In [1]: cat stack43307063.txt
0 11
5 6.2 4 6
2 5 3.2 6
7 1.4 5 11
In [2]: arr = np.loadtxt('stack43307063.txt', skiprows=1)
In [3]: arr
Out[3]:
array([[ 5. , 6.2, 4. , 6. ],
[ 2. , 5. , 3.2, 6. ],
[ 7. , 1.4, 5. , 11. ]])
This is easy to reshape and manipulate. If columns aren't consistent, then we need to work line by line.
In [9]: alist = []
In [10]: with open('stack43307063.txt') as f:
...: start, stop = [int(i) for i in f.readline().split()]
...: print(start, stop)
...: for line in f: # f.readline()
...: print(line.split())
...: alist.append([float(i) for i in line.split()])
...:
0 11
['5', '6.2', '4', '6']
['2', '5', '3.2', '6']
['7', '1.4', '5', '11']
In [11]: alist
Out[11]: [[5.0, 6.2, 4.0, 6.0], [2.0, 5.0, 3.2, 6.0], [7.0, 1.4, 5.0, 11.0]]
Replace the append with extend to collect the values in a flat list instead:
alist.extend([float(i) for i in line.split()])
[5.0, 6.2, 4.0, 6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0, 11.0]
c++ io usually uses streams. Streaming is possible with Python, but text files are more often read line by line.
In [15]: lines = open('stack43307063.txt').readlines()
In [16]: lines
Out[16]: ['0 11\n', '5 6.2 4 6\n', '2 5 3.2 6\n', '7 1.4 5 11\n']
a list of lines when can be processed as above.
fromfile could also be used, except it looses any row/column structure in the original:
In [20]: np.fromfile('stack43307063.txt',sep=' ')
Out[20]:
array([ 0. , 11. , 5. , 6.2, 4. , 6. , 2. , 5. , 3.2,
6. , 7. , 1.4, 5. , 11. ])
This load includes the first line. We could skip that with an open and readline.
In [21]: with open('stack43307063.txt') as f:
...: start, stop = [int(i) for i in f.readline().split()]
...: print(start, stop)
...: arr = np.fromfile(f, sep=' ')
0 11
In [22]: arr
Out[22]:
array([ 5. , 6.2, 4. , 6. , 2. , 5. , 3.2, 6. , 7. ,
1.4, 5. , 11. ])
fromfile takes a count parameter as well, which could be set from your start and stop. But unless you just want to read subset it isn't needed.
Assumes only that the first two numbers represent the indices of the values required from the numbers that follow. Varying numbers of numbers can appear in the first or subsequent lines. Won't read tokens beyond last.
from io import StringIO
sample = StringIO('''3 11 5\n 6.2 4\n6 2 5 3.2 6 7\n1.4 5 11''')
from shlex import shlex
lexer = shlex(instream=sample, posix=False)
lexer.wordchars = r'0123456789.'
lexer.whitespace = ' \n'
lexer.whitespace_split = True
def oneToken():
while True:
token = lexer.get_token()
if token:
token = token.strip()
if not token:
return
else:
return
token = token.replace('\n', '')
yield token
tokens = oneToken()
first = int(next(tokens))
print (first)
last = int(next(tokens))
print (last)
all_available = [float(next(tokens)) for i in range(0, last)]
print (all_available)
data = all_available[first:last]
print (data)
Output:
3
11
[5.0, 6.2, 4.0, 6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0]
[6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0]
f.read() will give you the remaining numbers as a string. You'll have to split them and map to float.
import numpy as np
f = open('data.txt', 'r')
first, last = map(int, f.readline().split())
arr = np.zeros(last - first + 1)
data = map(float, f.read().split())
Python works fast with string processing. So you can try to solve this problem of reading with two delimiters. Reduce it to one delimiter and then read (Python 3.):
import numpy as np
from io import StringIO
data = np.loadtxt(StringIO(''.join(l.replace(' ', '\n') for l in open('tata.txt'))),delimiter=' ',skiprows=2)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
Data-type is float by default.

Categories