Python Reading (floating-point) values one at a time - python

I'd like to fill in a numpy array with some floating-point values coming from a file. The data would be stored like this:
0 11
5 6.2 4 6
2 5 3.2 6
7 1.4 5 11
The first line gives the first and last index and on the following lines come the actual data. My current approach is to split each data line, use float on each part, and store the values in a pre-allocated array, slice by slice. Here is how I do it now:
data_file ='data.txt'
# Non needed stuff at the beginning
skip_lines = 0
with open(data_file, 'r') as f:
# Skip any lines if needed
for _ in range(skip_lines):
f.readline()
# Get the data size and preallocate the numpy array
first, last = map(int, f.readline().split())
size = last - first + 1
data = np.zeros(size)
beg, end = (-1, 0) # Keep track of where to fill the array
for line in f:
if end - 1 == last:
break
samples = line.split()
beg = end
end += len(samples)
data[beg:end] = [float(s) for s in samples]
Is there a way in Python to read the data values one by one instead?
import numpy as np
f = open('data.txt', 'r')
first, last = map(int, f.readline().split())
arr = np.zeros(last - first + 1)
for k in range(last - first + 1):
data = f.read() # This does not work. Any idea?
# In C++, it could be done this way: double data; cin >> data
arr[k] = data
EDIT The only thing that one can be sure of is that the two first numbers are the first and last index and that the last data row has only the last numbers. There can be also other stuff after the data numbers. So one can't just read all the rows after the "first, last" row.
EDIT 2 Added (working) initial approach (split each data line, use float on each part, and store the values in a pre-allocated array, slice by slice) implementation.

Since your sample has the same number of columns in each row (except the first) we can read it as csv, for example with loadtxt:
In [1]: cat stack43307063.txt
0 11
5 6.2 4 6
2 5 3.2 6
7 1.4 5 11
In [2]: arr = np.loadtxt('stack43307063.txt', skiprows=1)
In [3]: arr
Out[3]:
array([[ 5. , 6.2, 4. , 6. ],
[ 2. , 5. , 3.2, 6. ],
[ 7. , 1.4, 5. , 11. ]])
This is easy to reshape and manipulate. If columns aren't consistent, then we need to work line by line.
In [9]: alist = []
In [10]: with open('stack43307063.txt') as f:
...: start, stop = [int(i) for i in f.readline().split()]
...: print(start, stop)
...: for line in f: # f.readline()
...: print(line.split())
...: alist.append([float(i) for i in line.split()])
...:
0 11
['5', '6.2', '4', '6']
['2', '5', '3.2', '6']
['7', '1.4', '5', '11']
In [11]: alist
Out[11]: [[5.0, 6.2, 4.0, 6.0], [2.0, 5.0, 3.2, 6.0], [7.0, 1.4, 5.0, 11.0]]
Replace the append with extend to collect the values in a flat list instead:
alist.extend([float(i) for i in line.split()])
[5.0, 6.2, 4.0, 6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0, 11.0]
c++ io usually uses streams. Streaming is possible with Python, but text files are more often read line by line.
In [15]: lines = open('stack43307063.txt').readlines()
In [16]: lines
Out[16]: ['0 11\n', '5 6.2 4 6\n', '2 5 3.2 6\n', '7 1.4 5 11\n']
a list of lines when can be processed as above.
fromfile could also be used, except it looses any row/column structure in the original:
In [20]: np.fromfile('stack43307063.txt',sep=' ')
Out[20]:
array([ 0. , 11. , 5. , 6.2, 4. , 6. , 2. , 5. , 3.2,
6. , 7. , 1.4, 5. , 11. ])
This load includes the first line. We could skip that with an open and readline.
In [21]: with open('stack43307063.txt') as f:
...: start, stop = [int(i) for i in f.readline().split()]
...: print(start, stop)
...: arr = np.fromfile(f, sep=' ')
0 11
In [22]: arr
Out[22]:
array([ 5. , 6.2, 4. , 6. , 2. , 5. , 3.2, 6. , 7. ,
1.4, 5. , 11. ])
fromfile takes a count parameter as well, which could be set from your start and stop. But unless you just want to read subset it isn't needed.

Assumes only that the first two numbers represent the indices of the values required from the numbers that follow. Varying numbers of numbers can appear in the first or subsequent lines. Won't read tokens beyond last.
from io import StringIO
sample = StringIO('''3 11 5\n 6.2 4\n6 2 5 3.2 6 7\n1.4 5 11''')
from shlex import shlex
lexer = shlex(instream=sample, posix=False)
lexer.wordchars = r'0123456789.'
lexer.whitespace = ' \n'
lexer.whitespace_split = True
def oneToken():
while True:
token = lexer.get_token()
if token:
token = token.strip()
if not token:
return
else:
return
token = token.replace('\n', '')
yield token
tokens = oneToken()
first = int(next(tokens))
print (first)
last = int(next(tokens))
print (last)
all_available = [float(next(tokens)) for i in range(0, last)]
print (all_available)
data = all_available[first:last]
print (data)
Output:
3
11
[5.0, 6.2, 4.0, 6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0]
[6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0]

f.read() will give you the remaining numbers as a string. You'll have to split them and map to float.
import numpy as np
f = open('data.txt', 'r')
first, last = map(int, f.readline().split())
arr = np.zeros(last - first + 1)
data = map(float, f.read().split())

Python works fast with string processing. So you can try to solve this problem of reading with two delimiters. Reduce it to one delimiter and then read (Python 3.):
import numpy as np
from io import StringIO
data = np.loadtxt(StringIO(''.join(l.replace(' ', '\n') for l in open('tata.txt'))),delimiter=' ',skiprows=2)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
Data-type is float by default.

Related

Read numpy arrays from file without importing numpy

I have a file with x dimensional vector storing numpy arrays such as
0 0 0 2.4 1.2
1.3 1.4 5.6 2.2
I tried to read it using numpy as
np.array()
Is it possible without importing the package numpy?
If I write that test to a file:
In [13]: cat stack45745158.py
0 0 0 2.4 1.2
1.3 1.4 5.6 2.2
Read all the lines, and split them into strings:
In [14]: with open('stack45745158.py','r') as f:
...: lines = f.readlines()
...:
In [15]: lines
Out[15]: ['0 0 0 2.4 1.2\n', '1.3 1.4 5.6 2.2\n']
In [16]: alist = []
In [17]: for line in lines:
...: data = line.strip().split()
...: alist.append(data)
...:
In [18]: alist
Out[18]: [['0', '0', '0', '2.4', '1.2'], ['1.3', '1.4', '5.6', '2.2']]
Convert the strings to float:
In [19]: alist = [[float(i) for i in line] for line in alist]
In [20]: alist
Out[20]: [[0.0, 0.0, 0.0, 2.4, 1.2], [1.3, 1.4, 5.6, 2.2]]
Since one sublist has 5 numbers, and the other 4, this about as good as it gets. It can't be made in to a 2d array of floats.

Finding 1000 linear interpolated values between every number of a list in Python

I am a beginner in Python and am stuck on a problem. I have two lists of 60 floating point numbers, lets call them start and end. The numbers in both the lists are not in an increasing or decreasing order.
start = [ ] //60 floating point numbers
end = [ ] // 60 floating numbers
I would like to find 1000 interpolated values between start[0] and end[0] and repeat the process for all 60 values of list. How do I go about it?
You can do this with a list comprehension and using numpy.linspace
import numpy as np
[np.linspace(first, last, 1000) for first, last in zip(start, end)]
As a small example (with fewer values)
>>> start = [1, 5, 10]
>>> end = [2, 10, 20]
>>> [np.linspace(first, last, 5) for first, last in zip(start, end)]
[array([ 1. , 1.25, 1.5 , 1.75, 2. ]),
array([ 5. , 6.25, 7.5 , 8.75, 10. ]),
array([ 10. , 12.5, 15. , 17.5, 20. ])]

Python: How to convert multiple list that have multiple digits into array?

I have a text file with listed 4 x 3 binary values as such:
1 0 1
0 0 1
1 1 0
0 0 1
When I read this file in python, it is in this form:
import numpy as np
with open("test.txt")as g:
p=g.read().splitlines()
q=[];
for m in p:
q.append(int(m));
p=q;
Python window:
>>> p
['1 0 1', '0 0 1', '1 1 0', '0 0 1']
How to convert it into array:
array([[ 1.0, 0.0, 1.0],
[ 0.0, 0.0, 1.0],
[ 1.0, 1.0, 0.0],
[ 0.0, 0.0, 1.0]])
The simplest solution by far is to skip all the intermediate steps of reading the file of your own and converting the lines to lists of lists and just use numpy.loadtxt(). The values will be of float type by default, so you won't have to do anything more.
import numpy as np
dat = np.loadtxt('test.txt')
You can loop over each line of p, split the string into separate numbers and, finally, convert each substring into a float:
import numpy as np
p = ['1 0 1', '0 0 1', '1 1 0', '0 0 1']
print np.array([map(float, line.split()) for line in p])
Output:
[[ 1. 0. 1.]
[ 0. 0. 1.]
[ 1. 1. 0.]
[ 0. 0. 1.]]
Assuming you're guaranteed a sane enough input you can split the strings and convert the fragments to int:
def str2ints(l):
return [int(frag) for frag in l.split()]
This function takes one line and splits it into parts, fx "1 0 1" are split into ["1", "0", "1"] then I use list comprehension and converts the fragments to an int.
You use more of list comprehension to do it on the entire p:
[str2ints(l) for l in p]

Wrong CSV printing in Python (enumerating numpy array)

I apologize if this question looks like a duplicate. I am trying to write a 7x2 array to a .csv file. The array I want to print is called x5:
x5
Out[47]:
array([[ 0.5, 1. ],
[ 0.7, 3. ],
[ 1.1, 5. ],
[ 1.9, 6. ],
[ 2. , 7. ],
[ 2.2, 9. ],
[ 3.1, 10. ]])
The code I use:
import time
import csv
import numpy
timestr = time.strftime("%Y%m%d-%H%M%S")
with open('mydir\\AreaIntCurve'+'_'+str(timestr)+'.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Unique value', ' Occurrences'])
for m, val in numpy.ndenumerate(x5):
writer.writerow([m, val])
The result I get:
Unique value, Occurrences
"(0, 0)",0.5
"(0, 1)",1.0
"(1, 0)",0.69999999999999996
"(1, 1)",3.0
"(2, 0)",1.1000000000000001
"(2, 1)",5.0
"(3, 0)",1.8999999999999999
"(3, 1)",6.0
"(4, 0)",2.0
"(4, 1)",7.0
"(5, 0)",2.2000000000000002
"(5, 1)",9.0
"(6, 0)",3.1000000000000001
"(6, 1)",10.0
The result I want:
Unique value, Occurrences
0.5, 1
0.7, 3
1.1, 5
1.9, 6
2.0, 7
2.2, 9
3.1, 10
I assume the problem is with ndenumerate(x5), which prints the coordinates of my values. I have tried different approaches like numpy.savetxt, but it does not produce what I want and also does not print the current date in the file name. How to amend the ndenumerate() call to get rid of the value coordinates, while keeping the current date in the file name? Thanks a lot!
Here's an alternative that uses numpy.savetxt instead of the csv library:
In [17]: x5
Out[17]:
array([[ 0.5, 1. ],
[ 0.7, 3. ],
[ 1.1, 5. ],
[ 1.9, 6. ],
[ 2. , 7. ],
[ 2.2, 9. ],
[ 3.1, 10. ]])
In [18]: np.savetxt('foo.csv', x5, fmt=['%4.1f', '%4i'], header='Unique value, Occurrences', delimiter=',', comments='')
In [19]: !cat foo.csv
Unique value, Occurrences
0.5, 1
0.7, 3
1.1, 5
1.9, 6
2.0, 7
2.2, 9
3.1, 10
replace this line
for m, val in numpy.ndenumerate(x5):
writer.writerow([m, val])
with:
for val in x5:
writer.writerow(val)
you dont need to do ndenumerate
Have you tried replacing your two last lines of code with
for x in x5:
writer.writerow(x)
?
You may be surpised to see 1.8999999999999999 instead of 1.9 in your csv result; that is because 1.9 cannot be represented exactly in floating point arithmetics (see this question).
If you want to limit the number of digits to 3, you can replace the last line with writer.writerow([["{0:.3f}".format(val) for val in x]])
But this will also add three zeroes to integer values. Since you can check if a float is an integer with is_integer(), you can avoid this with
writer.writerow([str(y) if y.is_integer() else "{0:.3f}".format(y) for y in x])

Building a symmetric matrix in Python from data in file

I have a file which, for example, looks like:
1 1 5.5
1 2 6.1
1 3 7.3
2 2 3.4
2 3 9.2
3 3 4.7
This is "half" of a symmetric 3x3 matrix. I would like to create the full symmetric matrix in Python which looks like
[[ 5.5 6.1 7.3]
[ 6.1 3.4 9.2]
[ 7.3 9.2 4.7]]
(of course my actual file is a much bigger 'half' of a NxN matrix so I need a solution other than typing in the values one by one)
I've exhausted all my resources (books and internet) and what I have so far does not really come close. Can anyone please help me with this?
Thank you!
to read the file and load it as a python object, here's a solution:
import numpy
m = numpy.matrix([[0,0,0],[0,0,0],[0,0,0]])
with file('matrix.txt', 'r') as f:
for l in f:
try:
i, j, val = line.split(' ')
i, j, val = int(i), int(j), float(val)
m[i-1,j-1] = val
except:
print("couldn't load line: {}".format(l))
print m
Here is an alternative way to do this completely inside Numpy. Two important remarks:
you can read directly with the np.loadtxt function
you can assign the upper-half values to the correct indexes in one line: N[idxs[:,0] - 1, idxs[:,1] - 1] = vals
Here is the code:
import numpy as np
from StringIO import StringIO
indata = """
1 1 5.5
1 2 6.1
1 3 7.3
2 2 3.4
2 3 9.2
3 3 4.7
"""
infile = StringIO(indata)
A = np.loadtxt(infile)
# A is
# array([[ 1. , 1. , 5.5],
# [ 1. , 2. , 6.1],
# [ 1. , 3. , 7.3],
# [ 2. , 2. , 3.4],
# [ 2. , 3. , 9.2],
# [ 3. , 3. , 4.7]])
idxs = A[:, 0:2].astype(int)
vals = A[:, 2]
## To find out the total size of the triangular matrix, note that there
## are only n * (n + 1) / 2 elements that must be specified (the upper
## half amount for (n^2 - n) / 2, and the diagonal adds n to that).
## Therefore, the length of your data, A.shape[0], must be one solution
## to the quadratic equation: n^2 + 1 - 2 * A.shape[0] = 0
possible_sizes = np.roots([1, 1, -2 * A.shape[0]])
## Let us take only the positive solution to that equation as size of the
## result matrix
size = possible_sizes[possible_sizes > 0]
N = np.zeros([size] * 2)
N[idxs[:,0] - 1, idxs[:,1] - 1] = vals
# N is
# array([[ 5.5, 6.1, 7.3],
# [ 0. , 3.4, 9.2],
# [ 0. , 0. , 4.7]])
## Here we could do a one-liner like
# N[idxs[:,1] - 1, idxs[:,0] - 1] = vals
## But how cool is it to add the transpose and subtract the diagonal? :)
M = N + np.transpose(N) - np.diag(np.diag(N))
# M is
# array([[ 5.5, 6.1, 7.3],
# [ 6.1, 3.4, 9.2],
# [ 7.3, 9.2, 4.7]])
If you know the size of the matrix in advance (and it sounds like you do), then the following would work (in both Python 2 and 3):
N = 3
symmetric = [[None]*N for _ in range(SIZE)] # pre-allocate output matrix
with open('matrix_data.txt', 'r') as file:
for i, j, val in (line.split() for line in file if line):
i, j, val = int(i)-1, int(j)-1, float(val)
symmetric[i][j] = val
if symmetric[j][i] is None:
symmetric[j][i] = val
print(symmetric) # -> [[5.5, 6.1, 7.3], [6.1, 3.4, 9.2], [7.3, 9.2, 4.7]]
If you don't know the sizeNahead of time, you could preprocess the file and determine the maximum index values given.

Categories