Python - Parsing Columns and Rows - python

I am running into some trouble with parsing the contents of a text file into a 2D array/list. I cannot use built-in libraries, so have taken a different approach. This is what my text file looks like, followed by my code
1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1
def twoDArray():
network = [[]]
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
col = line.split(line, ',')
row = line.split(',')
network.append(col,row)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
I ran this code but got this error:
Traceback (most recent call last):
File "2dArray.py", line 22, in <module>
twoDArray()
File "2dArray.py", line 8, in twoDArray
col = line.split(line, ',')
TypeError: an integer is required
I am using the comma to separate both row and column as I am not sure how I would differentiate between the two - I am confused about why it is telling me that an integer is required when the file is made up of integers

Well, I can explain the error. You're using str.split() and its usage pattern is:
str.split(separator, maxsplit)
You're using str.split(string, separator) and that isn't a valid call to split. Here is a direct link to the Python docs for this:
http://docs.python.org/library/stdtypes.html#str.split

To directly answer your question, there is a problem with the following line:
col = line.split(line, ',')
If you check the documentation for str.split, you'll find the description to be as follows:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most
maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).
This is not what you want. You are not trying to specify the number of splits you want to make.
Consider replacing your for loop and network.append with this:
for line in filename.readlines():
# line is a string representing the values for this row
row = line.split(',')
# row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
cols = [int(x) for x in row]
# cols is the list of numbers for this row, such as [1, 0, 4, ...]
network.append(row)
# Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]

"""I cannot use built-in libraries""" -- do you really mean "cannot" as in you have tried to use the csv module and failed? If so, say so. Do you mean that "may not" as in you are forbidden to use a built-in module by the terms of your homework assignment? If so, say so.
Here is an answer that works. It doesn't leave a newline attached to the end of the last item in each row. It converts the numbers to int so that you can use them for whatever purpose you have. It fixes other errors that nobody else has mentioned.
def twoDArray():
network = []
# filename = open('twoDArray.txt', 'r')
# "filename" is a very weird name for a file HANDLE
f = open('twoDArray.txt', 'r')
# for line in filename.readlines():
# readlines reads the whole file into memory at once.
# That is quite unnecessary.
for line in f: # just iterate over the file handle
line = line.rstrip('\n') # remove the newline, if any
# col = line.split(line, ',')
# wrong args, as others have said.
# In any case, only 1 split call is necessary
row = line.split(',')
# now convert string to integer
irow = [int(item) for item in row]
# network.append(col,row)
# list.append expects only ONE arg
# indentation was wrong; you need to do this once per line
network.append(irow)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()

Omg...
network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
network.append(line.split(','))
you take
[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]
or you neeed some other structure as output? Please add what do you need as output?

class TwoDArray(object):
#classmethod
def fromFile(cls, fname, *args, **kwargs):
splitOn = kwargs.pop('splitOn', None)
mode = kwargs.pop('mode', 'r')
with open(fname, mode) as inf:
return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)
def __init__(self, data=[[]], *args, **kwargs):
dataType = kwargs.pop('dataType', lambda x:x)
super(TwoDArray,self).__init__()
self.data = [[dataType(i) for i in line] for line in data]
def __str__(self, fmt=str, endrow='\n', endcol='\t'):
return endrow.join(
endcol.join(fmt(i) for i in row) for row in self.data
)
def main():
network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)
print("Network =")
print(network)
if __name__ == "__main__":
main()

The input format is simple, so the solution should be simple too:
network = [map(int, line.split(',')) for line in open(filename)]
print network
csv module doesn't provide an advantage in this case:
import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]
If you need float instead of int:
print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))
If you are working with numpy arrays:
import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')
See Why NumPy instead of Python lists?
All examples produce arrays equal to:
[[1 0 4 3 6 7 4 8 3 2 1 0]
[2 3 6 3 2 1 7 4 3 1 1 0]
[5 2 1 3 4 6 4 8 9 5 2 1]]

Read the data from the file. Here's one way:
f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()
Parse the data into a table
table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]
Perhaps you want the transpose instead:
transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]

Related

How to store data from text files into lists on integers [duplicate]

This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 10 days ago.
Hi hope everyone is okay.
I am trying to find the most simple method to take data from a text file and store it into diffrent
variables. Below is the format of a text file:
TEXT FILE:
min:1,2,3,4,5,7,8,9
avg:1,2,3,4
max:1,2,3,4,5,1,2,3,44,55,32,12
I want to take each of these lines remove the part before the number starts (min,avg,max and the ':')
and store all the number data in seperate variables in their appropriate names.
NOTE: amount of numbers in each line may differ and shouldnt effect the code
desired in python:
min = [1,2,3,4,5,7,8,9]
avg = [1,2,3,4]
max = [1,2,3,4,5,1,2,3,44,55,32,12]
The code i have tried:
with open('input.txt', 'r') as input:
input = input.read()
input = input.strip().split(',')
After this part i am unsure which method would be best to achieve what I am trying to do.
Any help is appriciated!
There's no reasonable way to generate variables (by name) dynamically. Better to use a dictionary. Something like this:
my_dict = {}
with open('input.txt') as data:
for line in map(str.strip, data):
try:
key, vals = line.split(':')
my_dict[key.rstrip()] = list(map(int, vals.split(',')))
except ValueError:
pass
print(my_dict)
Output:
{'min': [1, 2, 3, 4, 5, 7, 8, 9], 'avg': [1, 2, 3, 4], 'max': [1, 2, 3, 4, 5, 1, 2, 3, 44, 55, 32, 12]}
Using exec for a string evaluation. Do that on trusted data to avoid injection attacks.
with open('input.txt', 'r') as fd:
data = fd.read()
# list of lines
lines = data.split('\n')
# python code format
code_format = '\n'.join("{} = [{}]".format(*line.partition(':')[::2]) for line in lines if line)
# execute the string as python code
exec(code_format)
print(avg)
#[1, 2, 3, 4]
Notice that there is a further side effect in this code evaluation since some variable identifiers overload those of the built-in functions min, max. So, if after the execution of the code you try to call such build-in functions you will get TypeError: 'list' object is not callable.
One way to re-approach the problem would be by pickling the objects and use pickle.dumps to save an object to a file and pickle.loads to retrieve the object, see doc.
This is how you store it in a python dictionary:
txtdict = {}
with open('input.txt', 'r') as f:
for line in f:
if line.strip():
name = line.split(':')[0]
txtdict[name] = [int(i) for j in line.strip().split(':')[1:] for i in j.split(',')]
Output:
{'min': [1, 2, 3, 4, 5, 7, 8, 9],
'avg': [1, 2, 3, 4],
'max': [1, 2, 3, 4, 5, 1, 2, 3, 44, 55, 32, 12]}

Read List in List [duplicate]

This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 5 months ago.
I have a text file and there is 3 lines on data in it.
[1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
[1, 1, 3, 3, 3, 1, 1, 1, 1, 2, 1, 1, 1, 3, 3]
[1, 2, 3, 1, 3, 1, 1, 3, 1, 3, 1, 1, 1, 3, 3]
I try to open and get data in it.
with open("rafine.txt") as f:
l = [line.strip() for line in f.readlines()]
f.close()
now i have list in list.
if i say print(l[0]) it shows me [1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
But i want to get numbers in it.
So when i write print(l[0][0])
i want to see 1 but it show me [
how can i fix this ?
You can use literal_eval to parse the lines from the file & build the matrix:
from ast import literal_eval
with open("test.txt") as f:
matrix = []
for line in f:
row = literal_eval(line)
matrix.append(row)
print(matrix[0][0])
print(matrix[1][4])
print(matrix[2][8])
result:
1
3
1
import json
with open("rafine.txt") as f:
for line in f.readlines():
line = json.loads(line)
print(line)
The best approach depends on what assumption you make about the data in your text file:
ast.literal_eval
If the data in your file is formatted the same way, it would be inside python source-code, the best approach is to use literal_eval:
from ast import literal_eval
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = literal_eval(line)
data.append(row)
or, the short version:
with open(filename) as f:
data = [literal_eval(line) for line in f]
re.findall
If you can make few assumptions about the data, using regular expressions to find all digits might be a way forward. The below builds lists by simply extracting any digits in the text file, regardless of separators or other characters in the file:
import re
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = [int(i) for i in re.findall(r'\d+', line)]
data.append(row)
or, in short:
with open(filename) as f:
data= [ [int(i) for i in re.findall(r'\d+', line)] for line in f ]
handwritten parsing
If both options are not suitable, there is always an option to parse by hand, to tailor for the exact format:
data = [] # will contain list of lists
with open(filename) as f:
for line in f:
row = [int(i) for i in line[1:-1].split(, )]
data.append(row)
The [1,-1] will remove the first and last character (the brackets), then split(", ") will split it into a list. for i in ... will iterate over the items in this list (assigning i to each item) and int(i) will convert i to an integer.

File handling in Python

Im a python noob and I'm stuck on a problem.
filehandler = open("data.txt", "r")
alist = filehandler.readlines()
def insertionSort(alist):
for line in alist:
line = list(map(int, line.split()))
print(line)
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
print(line)
insertionSort(alist)
for line in alist:
print line
Output:
[4, 19, 2, 5, 11]
[4, 2, 5, 11, 19]
[8, 1, 2, 3, 4, 5, 6, 1, 2]
[8, 1, 1, 2, 2, 3, 4, 5, 6]
4 19 2 5 11
8 1 2 3 4 5 6 1 2
I am supposed to sort lines of values from a file. The first value in the line represents the number of values to be sorted. I am supposed to display the values in the file in sorted order.
The print calls in insertionSort are just for debugging purposes.
The top four lines of output show that the insertion sort seems to be working. I can't figure out why when I print the lists after calling insertionSort the values are not sorted.
I am new to Stack Overflow and Python so please let me know if this question is misplaced.
for line in alist:
line = list(map(int, line.split()))
line starts out as eg "4 19 2 5 11". You split it and convert to int, ie [4, 19, 2, 5, 11].
You then assign this new value to list - but list is a local variable, the new value never gets stored back into alist.
Also, list is a terrible variable name because there is already a list data-type (and the variable name will keep you from being able to use the data-type).
Let's reorganize your program:
def load_file(fname):
with open(fname) as inf:
# -> list of list of int
data = [[int(i) for i in line.split()] for line in inf]
return data
def insertion_sort(row):
# `row` is a list of int
#
# your sorting code goes here
#
return row
def save_file(fname, data):
with open(fname, "w") as outf:
# list of list of int -> list of str
lines = [" ".join(str(i) for i in row) for row in data]
outf.write("\n".join(lines))
def main():
data = load_file("data.txt")
data = [insertion_sort(row) for row in data]
save_file("sorted_data.txt", data)
if __name__ == "__main__":
main()
Actually, with your data - where the first number in each row isn't actually data to sort - you would be better to do
data = [row[:1] + insertion_sort(row[1:]) for row in data]
so that the logic of insertion_sort is cleaner.
As #Barmar mentioned above, you are not modifying the input to the function. You could do the following:
def insertionSort(alist):
blist = []
for line in alist:
line = list(map(int, line.split()))
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
blist.append(line)
return blist
blist = insertionSort(alist)
print(blist)
Alternatively, modify alist "in-place":
def insertionSort(alist):
for k, line in enumerate(alist):
line = list(map(int, line.split()))
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
alist[k] = line
insertionSort(alist)
print(alist)

write zip array vertical in csv

is there ways to display zipped text vertically in csv ?? I tried many difference type of \n ',' but still can't get the array to be vertical
if __name__ == '__main__': #start of program
master = Tk()
newDirRH = "C:/VSMPlots"
FileName = "J123"
TypeName = "1234"
Field = [1,2,3,4,5,6,7,8,9,10]
Court = [5,4,1,2,3,4,5,1,2,3]
for field, court in zip(Field, Court):
stringText = ','.join((str(FileName), str(TypeName), str(Field), str(Court)))
newfile = newDirRH + "/Try1.csv"
text_file = open(newfile, "w")
x = stringText
text_file.write(x)
text_file.close()
print "Done"
This is the method i am looking for for your Code i can't seem to add new columns as all the column will repeat 10x
You are not writing CSV data. You are writing Python string representations of lists. You are writing the whole Field and Court lists each iteration of your loop, instead of writing field and court, and Excel sees the comma in the Python string representation:
J123,1234,[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[5, 4, 1, 2, 3, 4, 5, 1, 2, 3]
J123,1234,[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[5, 4, 1, 2, 3, 4, 5, 1, 2, 3]
etc.
while you wanted to write:
J123,1234,1,5
J123,1234,2,4
etc.
Use the csv module to produce CSV files:
import csv
with open(newfile, "wb") as csvfile:
writer = csv.writer(csvfile)
for field, court in zip(Field, Court):
writer.writerow([FileName, TypeName, field, court])
Note the with statement; it takes care of closing the open file object for you. The csv module also makes sure everything is converted to strings.
If you want to write something only on the first row, keep a counter with your items; enumerate() makes that easy:
with open(newfile, "wb") as csvfile:
writer = csv.writer(csvfile)
# row of headers
writer.writerow(['FileName', 'TypeName', 'field', 'court'])
for i, (field, court) in enumerate(zip(Field, Court)):
row = [[FileName, TypeName] if i == 0 else ['', '']
writer.writerow(row + [field, court])

Splitting Text File Into Columns and Rows in Python

I have a newbie question. I need help on separating a text file into columns and rows. Let's say I have a file like this:
1 2 3 4
2 3 4 5
and I want to put it into a 2d list called values = [[]]
i can get it to give me the rows ok and this code works ok:
values = map(int, line.split(','))
I just don't know how I can say the same thing but for the rows and the documentation doesn't make any sense
cheers
f = open(filename,'rt')
a = [[int(token) for token in line.split()] for line in f.readlines()[::2]]
In your sample file above, you have an empty line between each data row - I took this into account, but you can drop the ::2 subscript if you didn't mean to have this extra line in your data.
Edit: added conversion to int - you can use map as well, but mixing list comprehensions and map seems ugly to me.
import csv
import itertools
values = []
with open('text.file') as file_object:
for line in csv.reader(file_object, delimiter=' '):
values.append(map(int, line))
print "rows:", values
print "columns"
for column in itertools.izip(*values):
print column
Output is:
rows: [[1, 2, 3, 4], [2, 3, 4, 5]]
columns:
(1, 2)
(2, 3)
(3, 4)
(4, 5)
Get the data into your program by some method. Here's one:
f = open(tetxfile, 'r')
buffer = f.read()
f.close()
Parse the buffer into a table (note: strip() is used to clear any trailing whitespace):
table = [map(int, row.split()) for row in buffer.strip().split("\n")]
>>> print table
[[1, 2, 3, 4], [2, 3, 4, 5]]
Maybe it's ordered pairs you want instead, then transpose the table:
transpose = zip(*table)
>>> print transpose
[(1, 2), (2, 3), (3, 4), (4, 5)]
You could try to use the CSV-module. You can specify custom delimiters, so it might work.
If columns are separated by blanks
import re
A,B,C,D = [],[],[],[]
pat = re.compile('([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.match(line.strip()).groups()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
or with csv module
EDIT
A,B,C,D = [],[],[],[]
with open('try.txt') as f:
for line in f:
a,b,c,d = line.split()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
But if there are more than one blank between elements of data, this code will fail
EDIT 2
Because the solution with regex has been qualified of extremely hard to understand, it can be cleared as follows:
import re
A,B,C,D = [],[],[],[]
pat = re.compile('\s+')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.split(line.strip())
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

Categories