This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 5 months ago.
I have a text file and there is 3 lines on data in it.
[1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
[1, 1, 3, 3, 3, 1, 1, 1, 1, 2, 1, 1, 1, 3, 3]
[1, 2, 3, 1, 3, 1, 1, 3, 1, 3, 1, 1, 1, 3, 3]
I try to open and get data in it.
with open("rafine.txt") as f:
l = [line.strip() for line in f.readlines()]
f.close()
now i have list in list.
if i say print(l[0]) it shows me [1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
But i want to get numbers in it.
So when i write print(l[0][0])
i want to see 1 but it show me [
how can i fix this ?
You can use literal_eval to parse the lines from the file & build the matrix:
from ast import literal_eval
with open("test.txt") as f:
matrix = []
for line in f:
row = literal_eval(line)
matrix.append(row)
print(matrix[0][0])
print(matrix[1][4])
print(matrix[2][8])
result:
1
3
1
import json
with open("rafine.txt") as f:
for line in f.readlines():
line = json.loads(line)
print(line)
The best approach depends on what assumption you make about the data in your text file:
ast.literal_eval
If the data in your file is formatted the same way, it would be inside python source-code, the best approach is to use literal_eval:
from ast import literal_eval
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = literal_eval(line)
data.append(row)
or, the short version:
with open(filename) as f:
data = [literal_eval(line) for line in f]
re.findall
If you can make few assumptions about the data, using regular expressions to find all digits might be a way forward. The below builds lists by simply extracting any digits in the text file, regardless of separators or other characters in the file:
import re
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = [int(i) for i in re.findall(r'\d+', line)]
data.append(row)
or, in short:
with open(filename) as f:
data= [ [int(i) for i in re.findall(r'\d+', line)] for line in f ]
handwritten parsing
If both options are not suitable, there is always an option to parse by hand, to tailor for the exact format:
data = [] # will contain list of lists
with open(filename) as f:
for line in f:
row = [int(i) for i in line[1:-1].split(, )]
data.append(row)
The [1,-1] will remove the first and last character (the brackets), then split(", ") will split it into a list. for i in ... will iterate over the items in this list (assigning i to each item) and int(i) will convert i to an integer.
Related
This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 10 days ago.
Hi hope everyone is okay.
I am trying to find the most simple method to take data from a text file and store it into diffrent
variables. Below is the format of a text file:
TEXT FILE:
min:1,2,3,4,5,7,8,9
avg:1,2,3,4
max:1,2,3,4,5,1,2,3,44,55,32,12
I want to take each of these lines remove the part before the number starts (min,avg,max and the ':')
and store all the number data in seperate variables in their appropriate names.
NOTE: amount of numbers in each line may differ and shouldnt effect the code
desired in python:
min = [1,2,3,4,5,7,8,9]
avg = [1,2,3,4]
max = [1,2,3,4,5,1,2,3,44,55,32,12]
The code i have tried:
with open('input.txt', 'r') as input:
input = input.read()
input = input.strip().split(',')
After this part i am unsure which method would be best to achieve what I am trying to do.
Any help is appriciated!
There's no reasonable way to generate variables (by name) dynamically. Better to use a dictionary. Something like this:
my_dict = {}
with open('input.txt') as data:
for line in map(str.strip, data):
try:
key, vals = line.split(':')
my_dict[key.rstrip()] = list(map(int, vals.split(',')))
except ValueError:
pass
print(my_dict)
Output:
{'min': [1, 2, 3, 4, 5, 7, 8, 9], 'avg': [1, 2, 3, 4], 'max': [1, 2, 3, 4, 5, 1, 2, 3, 44, 55, 32, 12]}
Using exec for a string evaluation. Do that on trusted data to avoid injection attacks.
with open('input.txt', 'r') as fd:
data = fd.read()
# list of lines
lines = data.split('\n')
# python code format
code_format = '\n'.join("{} = [{}]".format(*line.partition(':')[::2]) for line in lines if line)
# execute the string as python code
exec(code_format)
print(avg)
#[1, 2, 3, 4]
Notice that there is a further side effect in this code evaluation since some variable identifiers overload those of the built-in functions min, max. So, if after the execution of the code you try to call such build-in functions you will get TypeError: 'list' object is not callable.
One way to re-approach the problem would be by pickling the objects and use pickle.dumps to save an object to a file and pickle.loads to retrieve the object, see doc.
This is how you store it in a python dictionary:
txtdict = {}
with open('input.txt', 'r') as f:
for line in f:
if line.strip():
name = line.split(':')[0]
txtdict[name] = [int(i) for j in line.strip().split(':')[1:] for i in j.split(',')]
Output:
{'min': [1, 2, 3, 4, 5, 7, 8, 9],
'avg': [1, 2, 3, 4],
'max': [1, 2, 3, 4, 5, 1, 2, 3, 44, 55, 32, 12]}
My current code reads the 1st line, then 3rd, 5th, 7th and so on and adds it to a list.
I want it to read the 2nd, 4th, 6th lines ... and add it to another list.
with open(path) as f:
content = f.readlines()
content = [x.strip() for x in content[::2]]
You need to add a start to your slice of 1, e.g. content[1::2]:
with open(path) as f:
content = f.readlines()
content = [x.strip() for x in content[1::2]]
A better alternative would be to use itertools.islice() to do this, as follows:
from itertools import islice
with open(path) as f_input:
content = [line.strip() for line in islice(f_input, 1, None, 2)]
You need to start slicing by skipping the first item; here is an example:
>>> list(i)
[0, 1, 2, 3, 4, 5, 6, 7, 8]
>>> list(i[1::2])
[1, 3, 5, 7, 9]
In your code:
content = [x.strip() for x in content[1::2]]
your code should be like that
meaning that the slice starts from 1 to the end of the list with a step of two.
with open(path) as f:
content = f.readlines()
content = [x.strip() for x in content[1::2]]
from itertools import islice
with open(path) as f_input:
content = [line.strip() for line in islice(f_input, 1, None, 2)]
I have the following dataset in a CSV file
[1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2]
Now I want to count each value by comparing them and store it in an array, but I don't want the frequency. So my output should be like this:
[3, 4, 3, 2, 1]
My code is as follows:
import csv
with open("c:/Users/Niels/Desktop/test.csv", 'rb') as f:
reader = csv.reader(f, delimiter=';')
data = []
for column in reader:
data.append(column[0])
results = data
results = [int(i) for i in results]
print results
dataFiltered = []
for i in results:
if i == (i+1):
counter = counter + 1
dataFiltered.append(counter)
counter = 0
print dataFiltered
My idea was by comparing the cell values. I know something is wrong in the for loop of results, but I can't figure out where my mistake is. My idea was by comparing the cell values. Maybe
I won't go into the details of your loop which is very wrong, if i==(i+1): just cannot be True for starters.
Next, you'd be better off with itertools.groupby and sum the length of the groups:
import itertools
results = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2]
freq = [len(list(v)) for _,v in itertools.groupby(results)]
print(freq)
len(list(v)) uses list to force the iteration on the grouped items so we can compute the length (maybe sum(1 for x in v) would more performant/appropriate, I haven't benched both approaches)
I get:
[3, 4, 3, 2, 1]
Aside: reading the first column of a csv file and convert the result to integer can be simply acheived by:
results = [int(row[0]) for row in reader]
I have python code that generates sets of numbers as arrays and stores them in a file, the file looks like follows
set([0, 2, 3])
set([0, 1, 3])
set([0, 1, 2])
I have another python code that reads this file and needs to convert the text line back to a array.
Method to read the file
def get_sets_from_file (self,file_name):
file_handle = open(file_name, "r")
all_sets_from_file = file_handle.read()
print all_sets_from_file
Once the text line is read, I need a mechasism to convert the textline back to a array.
Thanks,
Bhavesh.
EDIT-1:
Based on the suggestions given below, i have changed the file format to use comma-seperated file
set([8, 6, 7]),
set([8, 5, 7]),
set([8, 4, 7]),
set([8, 3, 7]),
you can apply this to each line in your file:
>>> line = "set([0, 2, 3])" #or "set([0, 2, 3]),"
>>> import re
>>> r = "set\(\[(.*)\]\)"
>>> m = re.search(r, line)
>>> match = m.group(1)
>>> a = [int(item.strip()) for item in match.split(',')]
>>> a
[0, 2, 3]
>>>
that could be implemented in your code as:
def get_sets_from_file (self,file_name):
total = []
with open(file_name, "r") as fhdl:
for line in fhdl:
a = do_the_regex_thing_above
total.append(a)
return total
edit (based on the comments from #Droogans):
this code will work perfectly with no change for the csv version of your document as you depicted it in the new edit.
However, the problem would be greatly simplified if you have access to the code that produces the current output. If this is the case, it would be more effective to pickling or jsoning your data. In this way you could recover your sets of list simply by pickle- or json-loading the generated output
It looks like your file contains properly formed python code. You can use this:
read each line of the file into a variable (m)
>>> m = "set([1, 3, 2])"
>>> eval(m)
set([1, 2, 3])
>>>
eval is considered very dangerous because it will do anything you ask it to (like reformat your disk or whatever). But since you know what is in the file you want to evaluate this might be the way for you to go.
If you just want to read and write simple lists of integers to/from a file:
import os
sets = [
set([0, 2, 3]),
set([0, 1, 3]),
set([0, 1, 2]),
]
def write_sets(path, sets):
with open(path, 'wb') as stream:
for item in sets:
item = ' '.join(str(number) for number in item)
stream.write(item + os.linesep)
def read_sets(path, sets):
sets = []
with open(path, 'rb') as stream:
for line in stream:
sets.append(set(int(number) for number in line.split()))
return sets
path = 'tmp/sets.txt'
write_sets(path, sets)
print read_sets(path, sets)
# [set([0, 2, 3]), set([0, 1, 3]), set([0, 1, 2])]
Why don't you serialize your data in something that you can easily deserialize from a string ?
JSON perfectly fits here.
I am running into some trouble with parsing the contents of a text file into a 2D array/list. I cannot use built-in libraries, so have taken a different approach. This is what my text file looks like, followed by my code
1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1
def twoDArray():
network = [[]]
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
col = line.split(line, ',')
row = line.split(',')
network.append(col,row)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
I ran this code but got this error:
Traceback (most recent call last):
File "2dArray.py", line 22, in <module>
twoDArray()
File "2dArray.py", line 8, in twoDArray
col = line.split(line, ',')
TypeError: an integer is required
I am using the comma to separate both row and column as I am not sure how I would differentiate between the two - I am confused about why it is telling me that an integer is required when the file is made up of integers
Well, I can explain the error. You're using str.split() and its usage pattern is:
str.split(separator, maxsplit)
You're using str.split(string, separator) and that isn't a valid call to split. Here is a direct link to the Python docs for this:
http://docs.python.org/library/stdtypes.html#str.split
To directly answer your question, there is a problem with the following line:
col = line.split(line, ',')
If you check the documentation for str.split, you'll find the description to be as follows:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most
maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).
This is not what you want. You are not trying to specify the number of splits you want to make.
Consider replacing your for loop and network.append with this:
for line in filename.readlines():
# line is a string representing the values for this row
row = line.split(',')
# row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
cols = [int(x) for x in row]
# cols is the list of numbers for this row, such as [1, 0, 4, ...]
network.append(row)
# Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]
"""I cannot use built-in libraries""" -- do you really mean "cannot" as in you have tried to use the csv module and failed? If so, say so. Do you mean that "may not" as in you are forbidden to use a built-in module by the terms of your homework assignment? If so, say so.
Here is an answer that works. It doesn't leave a newline attached to the end of the last item in each row. It converts the numbers to int so that you can use them for whatever purpose you have. It fixes other errors that nobody else has mentioned.
def twoDArray():
network = []
# filename = open('twoDArray.txt', 'r')
# "filename" is a very weird name for a file HANDLE
f = open('twoDArray.txt', 'r')
# for line in filename.readlines():
# readlines reads the whole file into memory at once.
# That is quite unnecessary.
for line in f: # just iterate over the file handle
line = line.rstrip('\n') # remove the newline, if any
# col = line.split(line, ',')
# wrong args, as others have said.
# In any case, only 1 split call is necessary
row = line.split(',')
# now convert string to integer
irow = [int(item) for item in row]
# network.append(col,row)
# list.append expects only ONE arg
# indentation was wrong; you need to do this once per line
network.append(irow)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
Omg...
network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
network.append(line.split(','))
you take
[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]
or you neeed some other structure as output? Please add what do you need as output?
class TwoDArray(object):
#classmethod
def fromFile(cls, fname, *args, **kwargs):
splitOn = kwargs.pop('splitOn', None)
mode = kwargs.pop('mode', 'r')
with open(fname, mode) as inf:
return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)
def __init__(self, data=[[]], *args, **kwargs):
dataType = kwargs.pop('dataType', lambda x:x)
super(TwoDArray,self).__init__()
self.data = [[dataType(i) for i in line] for line in data]
def __str__(self, fmt=str, endrow='\n', endcol='\t'):
return endrow.join(
endcol.join(fmt(i) for i in row) for row in self.data
)
def main():
network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)
print("Network =")
print(network)
if __name__ == "__main__":
main()
The input format is simple, so the solution should be simple too:
network = [map(int, line.split(',')) for line in open(filename)]
print network
csv module doesn't provide an advantage in this case:
import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]
If you need float instead of int:
print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))
If you are working with numpy arrays:
import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')
See Why NumPy instead of Python lists?
All examples produce arrays equal to:
[[1 0 4 3 6 7 4 8 3 2 1 0]
[2 3 6 3 2 1 7 4 3 1 1 0]
[5 2 1 3 4 6 4 8 9 5 2 1]]
Read the data from the file. Here's one way:
f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()
Parse the data into a table
table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]
Perhaps you want the transpose instead:
transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]