Related
This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 5 months ago.
I have a text file and there is 3 lines on data in it.
[1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
[1, 1, 3, 3, 3, 1, 1, 1, 1, 2, 1, 1, 1, 3, 3]
[1, 2, 3, 1, 3, 1, 1, 3, 1, 3, 1, 1, 1, 3, 3]
I try to open and get data in it.
with open("rafine.txt") as f:
l = [line.strip() for line in f.readlines()]
f.close()
now i have list in list.
if i say print(l[0]) it shows me [1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
But i want to get numbers in it.
So when i write print(l[0][0])
i want to see 1 but it show me [
how can i fix this ?
You can use literal_eval to parse the lines from the file & build the matrix:
from ast import literal_eval
with open("test.txt") as f:
matrix = []
for line in f:
row = literal_eval(line)
matrix.append(row)
print(matrix[0][0])
print(matrix[1][4])
print(matrix[2][8])
result:
1
3
1
import json
with open("rafine.txt") as f:
for line in f.readlines():
line = json.loads(line)
print(line)
The best approach depends on what assumption you make about the data in your text file:
ast.literal_eval
If the data in your file is formatted the same way, it would be inside python source-code, the best approach is to use literal_eval:
from ast import literal_eval
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = literal_eval(line)
data.append(row)
or, the short version:
with open(filename) as f:
data = [literal_eval(line) for line in f]
re.findall
If you can make few assumptions about the data, using regular expressions to find all digits might be a way forward. The below builds lists by simply extracting any digits in the text file, regardless of separators or other characters in the file:
import re
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = [int(i) for i in re.findall(r'\d+', line)]
data.append(row)
or, in short:
with open(filename) as f:
data= [ [int(i) for i in re.findall(r'\d+', line)] for line in f ]
handwritten parsing
If both options are not suitable, there is always an option to parse by hand, to tailor for the exact format:
data = [] # will contain list of lists
with open(filename) as f:
for line in f:
row = [int(i) for i in line[1:-1].split(, )]
data.append(row)
The [1,-1] will remove the first and last character (the brackets), then split(", ") will split it into a list. for i in ... will iterate over the items in this list (assigning i to each item) and int(i) will convert i to an integer.
I have a .txt with 10 lines like this:
[-3 -4 -5 -6 -7], 0
I want to extract the number between [...] and pass them to an array of integers, i'm reading the .txt like this:
import sys
with open(sys.argv[1], 'r') as f:
contents = f.read()
print(contents)
I need to divide the part before the coma?
There is any function to do this?
What should i do?
This should do it:
import sys
with open(sys.argv[1], 'r') as f:
contents = f.read()
arrays = []
for line in contents.split('\n'):
array_string = line.split(',')[0]
array = [int(i) for i in array_string[1:-1].split()]
arrays.append(array)
This will return, based on your example:
arrays
[[-3, -4, -5, -6, -7]]
Try something like that, but it's a bit meaty:
with open('your_file.txt', 'r') as f:
new_list = [item.split(',')[0][1:-1].split() for item in f.readlines()]
print(new_list)
my fault, hasn't seen that file will be provided like argument, then
#!/bin/python3
from sys import argv
with open(argv[1], 'r') as f:
new_list = [item.split(',')[0][1:-1].split() for item in f.readlines()]
print(new_list)
by the way, regex solution, also seems good:
with open(argv[1], 'r') as f:
new_list = [re.findall('-?\d+', item.split(',')[0]) for item in f.readlines()]
print(new_list)
test = "[-3 -4 -5 -6 -7], 0"
# split on the ','
test = test.split(",")
# remove the '[' and ']' and split on the whitespace
res = test[0].lstrip("[").rstrip("]").split(" ")
# add the standalone value to the the list?
res.append(test[1])
# cast values to int
res = [int(x) for x in res]
# print out the result
print(res)
result:
[-3, -4, -5, -6, -7, 0]
You can read the data to a list with a single line, as follows.
import sys
data = [ [ int(field) for field in row.split(sep=",")[0][1:-1].split() ] for row in open(sys.argv[1])]
You will get e.g.:
[[-3, -4, -5, -6, -7]]
The read mode is default for open, so you can omit it.
You go through the lines of the file.
For each row, you split the row by the "," separator.
The left part you can split simply to a list.
This list contains string fields which you can convert to int.
try this :
import sys
import re
with open(sys.argv[1], 'r') as f:
contents = f.read()
arrays = []
for line in contents.split('\n'):
string = line.split(',')[0]
arrays.append(re.findall("[+-]?\d+(?:\.\d+)?",string))
I'm converting a .txt file with annotations into another annotation format in a .csv file. The annotation format is as follows: filepath,x1,y1,x2,y2,classname. For pictures which haven't an instance of any class in them, annotation is like this: filepath,,,,,.
The problem is, that the .writerrow method of the csv.writer class doesn't write more than one comma after another.
My code is like this:
with open(annotation_file, 'r') as file:
lines = file.readlines()
splitted_lines = [line.split(' ') for line in lines]
with open(out_file, 'w', newline = '') as out:
csv_writer = csv.writer(out,delimiter= ';' )
for l in splitted_lines:
if len(l) == 1:
# indicate empty images
csv_writer.writerow([l[0] + ',,,,,'])
l is a list that contains a single string, so by l[0] + ',,,,,' I want to concatenate l with five commas.
Thank you in advance
set missing values as empty strings and fill the list
with open(annotation_file, 'r') as file:
lines = file.readlines()
splitted_lines = [line.split(' ') for line in lines]
with open(out_file, 'w', newline='') as out:
csv_writer = csv.writer(out, delimiter=';')
for l in splitted_lines:
if len(l) == 1:
# indicate empty images
csv_writer.writerow(l + ['' for _ in range(5)])
else:
csv_writer.writerow(l)
Given sample data:
data = [
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1],
]
it outputs:
1;2;3;4;5;6
1;2;3;4;5;6
1;2;3;4;5;6
1;;;;;
which is inline with what you want
I discovered my problem, l is a string which contained a '\n' at the end. Because of this the writer wasn't able to write the five commas to the string. I changed the code like displayed below what fixed the problem.
with open(annotation_file, 'r') as file:
lines = file.readlines()
splitted_lines = [line.split(' ') for line in lines]
with open(out_file, 'w', newline = '') as out:
csv_writer = csv.writer(out,delimiter= ';' )
for l in splitted_lines:
if len(l) == 1:
# indicate empty images
l[0] = l[0].replace('\n', '')
csv_writer.writerow([l[0] + ',,,,,'])
else:
csv_writer.writerow(['something else'])
Thanks anyway #DelphiX
I have a newbie question. I need help on separating a text file into columns and rows. Let's say I have a file like this:
1 2 3 4
2 3 4 5
and I want to put it into a 2d list called values = [[]]
i can get it to give me the rows ok and this code works ok:
values = map(int, line.split(','))
I just don't know how I can say the same thing but for the rows and the documentation doesn't make any sense
cheers
f = open(filename,'rt')
a = [[int(token) for token in line.split()] for line in f.readlines()[::2]]
In your sample file above, you have an empty line between each data row - I took this into account, but you can drop the ::2 subscript if you didn't mean to have this extra line in your data.
Edit: added conversion to int - you can use map as well, but mixing list comprehensions and map seems ugly to me.
import csv
import itertools
values = []
with open('text.file') as file_object:
for line in csv.reader(file_object, delimiter=' '):
values.append(map(int, line))
print "rows:", values
print "columns"
for column in itertools.izip(*values):
print column
Output is:
rows: [[1, 2, 3, 4], [2, 3, 4, 5]]
columns:
(1, 2)
(2, 3)
(3, 4)
(4, 5)
Get the data into your program by some method. Here's one:
f = open(tetxfile, 'r')
buffer = f.read()
f.close()
Parse the buffer into a table (note: strip() is used to clear any trailing whitespace):
table = [map(int, row.split()) for row in buffer.strip().split("\n")]
>>> print table
[[1, 2, 3, 4], [2, 3, 4, 5]]
Maybe it's ordered pairs you want instead, then transpose the table:
transpose = zip(*table)
>>> print transpose
[(1, 2), (2, 3), (3, 4), (4, 5)]
You could try to use the CSV-module. You can specify custom delimiters, so it might work.
If columns are separated by blanks
import re
A,B,C,D = [],[],[],[]
pat = re.compile('([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.match(line.strip()).groups()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
or with csv module
EDIT
A,B,C,D = [],[],[],[]
with open('try.txt') as f:
for line in f:
a,b,c,d = line.split()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
But if there are more than one blank between elements of data, this code will fail
EDIT 2
Because the solution with regex has been qualified of extremely hard to understand, it can be cleared as follows:
import re
A,B,C,D = [],[],[],[]
pat = re.compile('\s+')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.split(line.strip())
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
I am running into some trouble with parsing the contents of a text file into a 2D array/list. I cannot use built-in libraries, so have taken a different approach. This is what my text file looks like, followed by my code
1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1
def twoDArray():
network = [[]]
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
col = line.split(line, ',')
row = line.split(',')
network.append(col,row)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
I ran this code but got this error:
Traceback (most recent call last):
File "2dArray.py", line 22, in <module>
twoDArray()
File "2dArray.py", line 8, in twoDArray
col = line.split(line, ',')
TypeError: an integer is required
I am using the comma to separate both row and column as I am not sure how I would differentiate between the two - I am confused about why it is telling me that an integer is required when the file is made up of integers
Well, I can explain the error. You're using str.split() and its usage pattern is:
str.split(separator, maxsplit)
You're using str.split(string, separator) and that isn't a valid call to split. Here is a direct link to the Python docs for this:
http://docs.python.org/library/stdtypes.html#str.split
To directly answer your question, there is a problem with the following line:
col = line.split(line, ',')
If you check the documentation for str.split, you'll find the description to be as follows:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most
maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).
This is not what you want. You are not trying to specify the number of splits you want to make.
Consider replacing your for loop and network.append with this:
for line in filename.readlines():
# line is a string representing the values for this row
row = line.split(',')
# row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
cols = [int(x) for x in row]
# cols is the list of numbers for this row, such as [1, 0, 4, ...]
network.append(row)
# Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]
"""I cannot use built-in libraries""" -- do you really mean "cannot" as in you have tried to use the csv module and failed? If so, say so. Do you mean that "may not" as in you are forbidden to use a built-in module by the terms of your homework assignment? If so, say so.
Here is an answer that works. It doesn't leave a newline attached to the end of the last item in each row. It converts the numbers to int so that you can use them for whatever purpose you have. It fixes other errors that nobody else has mentioned.
def twoDArray():
network = []
# filename = open('twoDArray.txt', 'r')
# "filename" is a very weird name for a file HANDLE
f = open('twoDArray.txt', 'r')
# for line in filename.readlines():
# readlines reads the whole file into memory at once.
# That is quite unnecessary.
for line in f: # just iterate over the file handle
line = line.rstrip('\n') # remove the newline, if any
# col = line.split(line, ',')
# wrong args, as others have said.
# In any case, only 1 split call is necessary
row = line.split(',')
# now convert string to integer
irow = [int(item) for item in row]
# network.append(col,row)
# list.append expects only ONE arg
# indentation was wrong; you need to do this once per line
network.append(irow)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
Omg...
network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
network.append(line.split(','))
you take
[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]
or you neeed some other structure as output? Please add what do you need as output?
class TwoDArray(object):
#classmethod
def fromFile(cls, fname, *args, **kwargs):
splitOn = kwargs.pop('splitOn', None)
mode = kwargs.pop('mode', 'r')
with open(fname, mode) as inf:
return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)
def __init__(self, data=[[]], *args, **kwargs):
dataType = kwargs.pop('dataType', lambda x:x)
super(TwoDArray,self).__init__()
self.data = [[dataType(i) for i in line] for line in data]
def __str__(self, fmt=str, endrow='\n', endcol='\t'):
return endrow.join(
endcol.join(fmt(i) for i in row) for row in self.data
)
def main():
network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)
print("Network =")
print(network)
if __name__ == "__main__":
main()
The input format is simple, so the solution should be simple too:
network = [map(int, line.split(',')) for line in open(filename)]
print network
csv module doesn't provide an advantage in this case:
import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]
If you need float instead of int:
print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))
If you are working with numpy arrays:
import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')
See Why NumPy instead of Python lists?
All examples produce arrays equal to:
[[1 0 4 3 6 7 4 8 3 2 1 0]
[2 3 6 3 2 1 7 4 3 1 1 0]
[5 2 1 3 4 6 4 8 9 5 2 1]]
Read the data from the file. Here's one way:
f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()
Parse the data into a table
table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]
Perhaps you want the transpose instead:
transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]