Read in lines from text line as separate arrays - python

I have a text file containing lines of strings that resemble an array format. I initially had a list of numpy arrays, and read them into the file like this, where each array is about 5 floats:
import numpy as np
parameters = [np.array(...), np.array(...), ...]
with open('params.txt', 'w') as f:
for param in parameters:
f.write(str(param)+'\n')
Now I'd like to read them back out, as a list of separate arrays. I'm having issues with this however -- below is what I'm trying to do:
parameters = []
with open('params.txt', 'r') as f:
for line in f:
parameters.append(np.array(line))
But now when I later try to index elements in these arrays and use list comprehension, like: [params[2] for params in parameters], I get this error: IndexError: too many indices for array.
I have also tried reading them out with line.split(','), but this didn't give me what I wanted and just messed up the formatting further. How can I accomplish this?
The format of my text file:
[242.1383, 131.087, 1590.853, 1306.09, 783.979]
[7917.102, 98.12, 21.43, 13.1383, 6541.33]
[823.74, 51.31, 9622.434, 974.11, 980.177]
...
What I want:
parameters = [np.array([242.1383, 131.087, 1590.853, 1306.09, 783.979]), np.array([7917.102, 98.12, 21.43, 13.1383, 6541.33]), np.array([823.74, 51.31, 9622.434, 974.11, 980.177]), ...]

I figured out a slightly simpler way to accomplish this without having to worry about all the string parsing, using regex:
import re
parameters = []
with open('params.txt', 'r') as f:
for line in f:
set = [float(value) for value in re.findall('\d+\.?\d*', line)]
parameters.append(np.array(set))

are you looking for something like this?
parameters = []
for line in f.readlines():
y = [value for value in line.split()]
parameter.append( y )
would be easier if I knew what the text file looked like if would would show the format of the text file you're trying to read from

Related

Nested lists in python containing a single string and not single letters

I need to load text from a file which contains several lines, each line contains letters separated by coma, into a 2-dimensional list. When I run this, I get a 2 dimensional list, but the nested lists contain single strings instead of separated values, and I can not iterate over them. how do I solve this?
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split()
matrix.append(line)
return matrix
result:
[['a,p,p,l,e'], ['a,g,o,d,o'], ['n,n,e,r,t'], ['g,a,T,A,C'], ['m,i,c,s,r'], ['P,o,P,o,P']]
I need each letter in the nested lists to be a single string so I can use them.
thanks in advance
split() function splits on white space by default. You can fix this by passing the string you want to split on. In this case, that would be a comma. The code below should work.
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
matrix.append(line)
return matrix
The input format you described conforms to CSV format. Python has a library just for reading CSV files. If you just want to get the job done, you can use this library to do the work for you. Here's an example:
Input(test.csv):
a,string,here
more,strings,here
Code:
>>> import csv
>>> lines = []
>>> with open('test.csv') as file:
... reader = csv.reader(file)
... for row in reader:
... lines.append(row)
...
>>>
Output:
>>> lines
[['a', 'string', 'here'], ['more', 'strings', 'here']]
Using the strip() function will get rid of the new line character as well:
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
line[-1] = line[-1].strip()
matrix.append(line)
return matrix

How to combine split() with float()

My Script is reading data from another file.
I require the data as float, not as string and I am searching for an elegant/pythonic way to combine float() with the last line instead of iterating over the entire list to change the data or changing it when I need it:
data = []
with open(os.path.join(path, file), "r") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
data.append(line.replace('[', ' ').replace(']', ' ').split())
So far this will save the data from the file in a list in a list as string.
How to combine the last line with float()?
Here is an example of the data before reading it:
[[ 563.15 1673.97 3078.41]
[ 563.15 1066.4 26617.7]
[ 563.212 778.931 59356.1]
Use map
Ex:
data.append(map(float, line.strip('[]').split()))
If python3
data.append(list(map(float, line.strip('[]').split())))
Do you have numpy installed?
Because in that case you can do:
import numpy as np
with open(os.path.join(path, file), "r") as f:
data = np.array([line.strip('[]').split() for line in f],dtype=float)
it gives you a matrix in float format. Of course, this assumes that each line has the same number of values in it

Load text file python could not convert string to float

I have a text file that looks like this:
(1064.2966,1898.787,1064.2986,1898.787,1064.2986,1898.785,1064.2966,1898.785)
(1061.0567,1920.3816,1065.1361,1920.2276,1065.5847,1915.9657,1065.4726,1915.2927,1061.0985,1914.3955,1058.1824,1913.9468,1055.6028,1913.9468,1051.0044,1916.19,1051.5651,1918.8817,1056.0514,1918.9939,1058.9675,1919.6668,1060.8741,1920.4519)
etc (all rows have different lengths)
when I use
np.loadtxt(filename,dtype=float,delimiter=',')
I get
ValueError: could not convert string to float: (1031.4647
I think np.loadtxt expects numbers so it does not know how to convert a value which starts with a '(', I think you have two choices here:
lines = []
with open('datafile') as infile:
for line in infile:
line = line.rstrip('\n')[1:-1] # this removes first and last parentheses from the line
lines.append([float(v) for v in line.split(',')])
in this way you end up with lines which is a list of lists of values (i.e. lines[0] is a list of the values on line 1).
The other way to go is modifying the data file to remove the parentheses, which you can do in many ways depending on the platform you are working on.
In most Linux systems for instance you can just do something along the lines of this answer
EDIT: as suggested by #AlexanderHuszagh in the comments section, different systems can have different ways of representing newlines, so a more robust solution would be:
lines = []
with open('datafile') as infile:
file_lines = infile.read().splitlines()
for line in file_lines:
lines.append([float(v) for v in line[1:-1].split(',')])
You got the error because of the parentheses, you can replace it this way:
s = open(filename).read().replace('(','').replace(')','')
This return a list of arrays:
arrays = [np.array(map(float, line.split(","))) for line in s.split("\n")]

open a .json file with multiple dictionaries

I have a problem that I can't solve with python, it is probably very stupid but I didn't manage to find the solution by myself.
I have a .json file where the results of a simulation are stored. The result is stored as a series of dictionaries like
{"F_t_in_max": 709.1800264942982, "F_t_out_max": 3333.1574129603068, "P_elec_max": 0.87088836042046958, "beta_max": 0.38091242406098391, "r0_max": 187.55175182942901, "r1_max": 1354.8636763521174, " speed ": 8}
{"F_t_in_max": 525.61428305710433, "F_t_out_max": 2965.0538075438467, "P_elec_max": 0.80977406754203796, "beta_max": 0.59471606595464666, "r0_max": 241.25371753877008, "r1_max": 688.61786996066826, " speed ": 9}
{"F_t_in_max": 453.71124051199763, "F_t_out_max": 2630.1763649193008, "P_elec_max": 0.64268078173342935, "beta_max": 1.0352896471221695, "r0_max": 249.32706230502498, "r1_max": 709.11415981343885, " speed ": 10}
I would like to open the file and and access the values like to plot "r0_max" as function of "speed" but I can't open unless there is only one dictionary.
I use
with open('./results/rigid_wing_opt.json') as data_file:
data = json.load(data_file)
but When the file contains more than one dictionary I get the error
ValueError: Extra data: line 5 column 1 - line 6 column 1 (char 217 - 431)
If your input data is exactly as provided then you should be able to interpret each individual dictionary using json.load. If each dictionary is on its own line then this should be sufficient:
with open('filename', 'r') as handle:
json_data = [json.loads(line) for line in handle]
I would recommend reading the file line-by-line and convert each line independently to a dictionary.
You can place each line into a list with the following code:
import ast
# Read all lines into a list
with open(fname) as f:
content = f.readlines()
# Convert each list item to a dict
content = [ ast.literal_eval( line ) for line in content ]
Or an even shorter version performing the list comprehension on the same line:
import ast
# Read all lines into a list
with open(fname) as f:
content = [ ast.literal_eval( l ) for l in f.readlines() ]
{...} {...} is not proper json. It is two json objects separated by a space. Unless you can change the format of the input file to correct this, I'd suggest you try something a little different. If the data is a simple as in your example, then you could do something like this:
with open('filename', 'r') as handle:
text_data = handle.read()
text_data = '[' + re.sub(r'\}\s\{', '},{', text_data) + ']'
json_data = json.loads(text_data)
This should work even if your dictionaries are not on separate lines.
That is not valid JSON. You can't have multiple obje at the top level, without surrounding them by a list and inserting commas between them.

Reading file string into an array (In a pythonic way)

I'm reading lines from a file to then work with them. Each line is composed solely by float numbers.
I have pretty much everything sorted up to convert the lines into arrays.
I basically do (pseudopython code)
line=file.readlines()
line=line.split(' ') # Or whatever separator
array=np.array(line)
#And then iterate over every value casting them as floats
newarray[i]=array.float(array[i])
This works, buts seems a bit counterintuitive and antipythonic, I wanted to know if there is a better way to handle the inputs from a file to have at the end an array full of floats.
Quick answer:
arrays = []
for line in open(your_file): # no need to use readlines if you don't want to store them
# use a list comprehension to build your array on the fly
new_array = np.array((array.float(i) for i in line.split(' ')))
arrays.append(new_array)
If you process often this kind of data, the csv module will help.
import csv
arrays = []
# declare the format of you csv file and Python will turn line into
# lists for you
parser = csv.reader(open(your_file), delimiter=' '))
for l in parser:
arrays.append(np.array((array.float(i) for i in l)))
If you feel wild, you can even make this completly declarative:
import csv
parser = csv.reader(open(your_file), delimiter=' '))
make_array = lambda row : np.array((array.float(i) for i in row))
arrays = [make_array(row) for row in parser]
And if you realy want you colleagues to hate you, you can make a one liner (NOT PYTHONIC AT ALL :-):
arrays = [np.array((array.float(i) for i in r)) for r in csv.reader(open(your_file), delimiter=' '))]
Stripping all the boiler plate and flexibility, you can end up with a clean and quite readable one liner. I wouldn't use it because I like the refatoring potential of using csv, but it can be good enought. It's a grey zone here, so I wouldn't say it's Pythonic, but it's definitly handy.
arrays = [np.array((array.float(i) for i in l.split())) for l in open(your_file))]
If you want a numpy array and each row in the text file has the same number of values:
a = numpy.loadtxt('data.txt')
Without numpy:
with open('data.txt') as f:
arrays = list(csv.reader(f, delimiter=' ', quoting=csv.QUOTE_NONNUMERIC))
Or just:
with open('data.txt') as f:
arrays = [map(float, line.split()) for line in f]
How about the following:
import numpy as np
arrays = []
for line in open('data.txt'):
arrays.append(np.array([float(val) for val in line.rstrip('\n').split(' ') if val != '']))
One possible one-liner:
a_list = [map(float, line.split(' ')) for line in a_file]
Note that I used map() here instead of a nested list comprehension to aid readability.
If you want a numpy array:
an_array = np.array([map(float, line.split(' ')) for line in a_file])
I would use regular expressions
import re
all_lines = ''.join( file.readlines() )
new_array = np.array( re.findall('[\d.E+-]+', all_lines), float)
np.reshape( new_array, (m,n) )
First merging the files into one long string, and then extracting only the expressions corresponding to floats ( '[\d.E+-]' for scientific notation, but you can also use '[\d.]' for only float expressions).

Categories