File handling in Python

File handling in Python - python

Im a python noob and I'm stuck on a problem.
filehandler = open("data.txt", "r")
alist = filehandler.readlines()
def insertionSort(alist):
for line in alist:
line = list(map(int, line.split()))
print(line)
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
print(line)
insertionSort(alist)
for line in alist:
print line
Output:
[4, 19, 2, 5, 11]
[4, 2, 5, 11, 19]
[8, 1, 2, 3, 4, 5, 6, 1, 2]
[8, 1, 1, 2, 2, 3, 4, 5, 6]
4 19 2 5 11
8 1 2 3 4 5 6 1 2
I am supposed to sort lines of values from a file. The first value in the line represents the number of values to be sorted. I am supposed to display the values in the file in sorted order.
The print calls in insertionSort are just for debugging purposes.
The top four lines of output show that the insertion sort seems to be working. I can't figure out why when I print the lists after calling insertionSort the values are not sorted.
I am new to Stack Overflow and Python so please let me know if this question is misplaced.

for line in alist:
line = list(map(int, line.split()))
line starts out as eg "4 19 2 5 11". You split it and convert to int, ie [4, 19, 2, 5, 11].
You then assign this new value to list - but list is a local variable, the new value never gets stored back into alist.
Also, list is a terrible variable name because there is already a list data-type (and the variable name will keep you from being able to use the data-type).
Let's reorganize your program:
def load_file(fname):
with open(fname) as inf:
# -> list of list of int
data = [[int(i) for i in line.split()] for line in inf]
return data
def insertion_sort(row):
# `row` is a list of int
#
# your sorting code goes here
#
return row
def save_file(fname, data):
with open(fname, "w") as outf:
# list of list of int -> list of str
lines = [" ".join(str(i) for i in row) for row in data]
outf.write("\n".join(lines))
def main():
data = load_file("data.txt")
data = [insertion_sort(row) for row in data]
save_file("sorted_data.txt", data)
if __name__ == "__main__":
main()
Actually, with your data - where the first number in each row isn't actually data to sort - you would be better to do
data = [row[:1] + insertion_sort(row[1:]) for row in data]
so that the logic of insertion_sort is cleaner.

As #Barmar mentioned above, you are not modifying the input to the function. You could do the following:
def insertionSort(alist):
blist = []
for line in alist:
line = list(map(int, line.split()))
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
blist.append(line)
return blist
blist = insertionSort(alist)
print(blist)
Alternatively, modify alist "in-place":
def insertionSort(alist):
for k, line in enumerate(alist):
line = list(map(int, line.split()))
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
alist[k] = line
insertionSort(alist)
print(alist)

Related

Appending numbers from a file into a list by it's value

I have a file containing numerical values:
1
4
6
10
12
and I'm trying to append these values into its respective position in an array where i would obtain:
[None,1,None,None,4,None,6,None,None,None,10,None,None,12]
Since 1 from the file would be at index 1 in the list, 4 from the file would be at index 4 in the list and so on.
I begin by first reading in the file:
filename = open("numbers.txt", "r", encoding = "utf-8")
numfile = filename
lst = [None] * 12
for line in numfile:
line = line.strip() #strip new line
line = int(line) #making the values in integer form
vlist.append(line) #was thinking of line[val] where value is the number itself.
print(vlist)
but I'm getting the output:
[None,None,None,None,None,None,None,None,None,1,4,6,10,12]
Where the numbers are appended to the far right of the array. Would appreciate some help on this.

Assuming you have your number as integers in a list called numbers (which you have no problem doing as it seems), you can do:
lst = [None if i not in numbers else i for i in range(max(numbers)+1)]
if numbers can be a big list, I would cast it to set first to make the in comparisons faster.
numbers = set(numbers)
lst = [None if i not in numbers else i for i in range(max(numbers)+1)]
Example
>>> numbers = [1, 4, 6, 10, 12]
>>> [None if i not in numbers else i for i in range(max(numbers) + 1)]
[None, 1, None, None, 4, None, 6, None, None, None, 10, None, 12]

Appending to a list adds the numbers at the end of the list. You instead want to assign the value of line to the list at the index of line
filename = open("numbers.txt", "r", encoding = "utf-8")
numfile = filename
lst = [None] * 13
for line in numfile:
line = line.strip() #strip new line
line = int(line) #making the values in integer form
lst[line] = line
print(lst)
# [None, 1, None, None, 4, None, 6, None, None, None, 10, None, 12]

You can use an indexer and compare its value with the actual value of the line, and replace the value at the index rather than
filename = open("numbers.txt", "r", encoding = "utf-8")
numfile = filename
lst = [None] * 12
i = 0
for line in numfile:
line = line.strip() #strip new line
line = int(line) #making the values in integer form
value = None
if (i == line):
value = line
if (len(vlist) < line): #replace or append if there's more numbers in the file than the size of the list
vlist[i] = value
else:
vlist.append(value)
i += 1
print(vlist)

Appending to a list in a loop is expensive. I advise your construct a list of None items and then iterate your list to update elements.
Below is a demo with itertools and csv.reader:
from io import StringIO
from itertools import chain
import csv
mystr = StringIO("""1
4
6
10
12""")
# replace mystr with open('numbers.txt', 'r')
with mystr as f:
reader = csv.reader(f)
num_list = list(map(int, chain.from_iterable(reader)))
res = [None] * (num_list[-1]+1)
for i in num_list:
res[i] = i
print(res)
[None, 1, None, None, 4, None, 6, None, None, None, 10, None, 12]
Benchmarking example:
def appender1(n):
return [None]*int(n)
def appender2(n):
lst = []
for i in range(int(n)):
lst.append(None)
return lst
%timeit appender1(1e7) # 90.4 ms per loop
%timeit appender2(1e7) # 1.77 s per loop

Appending elements to a list based on condition

I was trying to append few elements to a list list_accepted_outsidenestant. When i try to print the list list_accepted_outsidenestant, i get: list_accepted_outsidenestant- [([971, 977, 728, 740], set([728, 977, 971, 740]))]. The list is showing a list and set with same elements. Can anyone pointout the mistake i am doing? Because of this, i am getting an error:
set_accepted_outsidenest_antlist = set(list_accepted_outsidenestant
TypeError: unhashable type: 'list'
I have shown part of code only relevant to the current question.
def leo(tag_data):
available_ants_outside = []
ori = []
for id, (x, y) in tag_data:
available_ants_outside.append(id)
if for_coordinates_outside_nest((x, y)) is True:
ori.append(id)
return ori
def virgo(tag_data):
available_ants_inside = []
list_insidenest_ant_id = []
set_inside_nest_ant_id = set()
for id, (x, y) in tag_data:
available_ants_inside.append(id)
if for_coordinates_inside_nest((x, y)) is True:
list_insidenest_ant_id.append(id)
set_inside_nest_ant_id = set(list_insidenest_ant_id)
return list_insidenest_ant_id,set_inside_nest_ant_id
def bambino(ori,list_insidenest_ant_id):
list_accepted_outsidenestant = []
set_accepted_outsidenest_antlist = set()
set_accepted_insidenest_antlist = set()
if len(list_accepted_outsidenestant) < num_accepted:
if (len(ori) > 0) or (len(list_insidenest_ant_id) >0):
list_accepted_outsidenestant.extend(ori[0:min(len(ori),
num_accepted-len(list_accepted_outsidenestant))])
set_accepted_outsidenest_antlist = set(list_accepted_outsidenestant)
print "list_accepted_outsidenestant-" + str(list_accepted_outsidenestant)
set_accepted_insidenest_antlist = set(list_insidenest_ant_id)
return set_accepted_outsidenest_antlist,set_list_outsideant_id,set_accepted_insidenest_antlist

The problem is that you're appending a list to a list.
You can either iterate over the list you want to add:
items_to_add = ori[0:min(len(ori),
num_accepted-len(list_accepted_outsidenestant))]
for item in items_to_add:
list_accepted_outsidenestant.append(item)
Or add the lists:
list_accepted_outsidenestant = list_accepted_outsidenestant + ori[0:min(len(ori), num_accepted-len(list_accepted_outsidenestant))]
Or as bruno pointed out (even better), extend the list.
list_accepted_outsidenestant.extend(ori[0:min(len(ori), num_accepted-len(list_accepted_outsidenestant))])

append function add whole into other array
extend function extend add array into previous array
In [1]: a = [1,2,3,4]
In [2]: b = [10,9,8,7,6]
In [3]: a.append(b)
In [4]: a
Out[4]: [1, 2, 3, 4, [10, 9, 8, 7, 6]]
In [5]: c = [1,2,3,4]
In [6]: c.extend(b)
In [7]: c
Out[7]: [1, 2, 3, 4, 10, 9, 8, 7, 6]
Hope this code help you

How to read the first row of an array in Python

I am new to learning Python, here is my current code:
#!/usr/bin/python
l = []
with open('datad.dat', 'r') as f:
for line in f:
line = line.strip()
if len(line) > 0:
l.append(map(float, line.split()))
print l[:,1]
I attempted to do this but made the mistake of using FORTRAN syntax, and received the following error:
File "r1.py", line 9, in <module>
print l[:,1]
TypeError: list indices must be integers, not tuple
How would I go about getting the first row or column of an array?

To print the first row use l[0], to get columns you will need to transpose with zip print(list(zip(*l))[0]).
In [14]: l = [[1,2,3],[4,5,6],[7,8,9]]
In [15]: l[0] # first row
Out[15]: [1, 2, 3]
In [16]: l[1] # second row
Out[16]: [4, 5, 6]
In [17]: l[2] # third row
Out[17]: [7, 8, 9]
In [18]: t = list(zip(*l))
In [19] t[0] # first column
Out[19]: (1, 4, 7)
In [20]: t[1] # second column
Out20]: (2, 5, 8)
In [21]: t[2] # third column
Out[21]: (3, 6, 9)
The csv module may also be useful:
import csv
with open('datad.dat', 'r') as f:
reader = csv.reader(f)
l = [map(float, row) for row in reader]

Python finding repeating sequence in list of integers?

I have a list of lists and each list has a repeating sequence. I'm trying to count the length of repeated sequence of integers in the list:
list_a = [111,0,3,1,111,0,3,1,111,0,3,1]
list_b = [67,4,67,4,67,4,67,4,2,9,0]
list_c = [1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18,10]
Which would return:
list_a count = 4 (for [111,0,3,1])
list_b count = 2 (for [67,4])
list_c count = 10 (for [1,2,3,4,5,6,7,8,9,0])
Any advice or tips would be welcome. I'm trying to work it out with re.compile right now but, its not quite right.

Guess the sequence length by iterating through guesses between 2 and half the sequence length. If no pattern is discovered, return 1 by default.
def guess_seq_len(seq):
guess = 1
max_len = len(seq) / 2
for x in range(2, max_len):
if seq[0:x] == seq[x:2*x] :
return x
return guess
list_a = [111,0,3,1,111,0,3,1,111,0,3,1]
list_b = [67,4,67,4,67,4,67,4,2,9,0]
list_c = [1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18,10]
print guess_seq_len(list_a)
print guess_seq_len(list_b)
print guess_seq_len(list_c)
print guess_seq_len(range(500)) # test of no repetition
This gives (as expected):
4
2
10
1
As requested, this alternative gives longest repeated sequence. Hence it will return 4 for list_b. The only change is guess = x instead of return x
def guess_seq_len(seq):
guess = 1
max_len = len(seq) / 2
for x in range(2, max_len):
if seq[0:x] == seq[x:2*x] :
guess = x
return guess

I took Maria's faster and more stackoverflow-compliant answer and made it find the largest sequence first:
def guess_seq_len(seq, verbose=False):
seq_len = 1
initial_item = seq[0]
butfirst_items = seq[1:]
if initial_item in butfirst_items:
first_match_idx = butfirst_items.index(initial_item)
if verbose:
print(f'"{initial_item}" was found at index 0 and index {first_match_idx}')
max_seq_len = min(len(seq) - first_match_idx, first_match_idx)
for seq_len in range(max_seq_len, 0, -1):
if seq[:seq_len] == seq[first_match_idx:first_match_idx+seq_len]:
if verbose:
print(f'A sequence length of {seq_len} was found at index {first_match_idx}')
break
return seq_len

This worked for me.
def repeated(L):
'''Reduce the input list to a list of all repeated integers in the list.'''
return [item for item in list(set(L)) if L.count(item) > 1]
def print_result(L, name):
'''Print the output for one list.'''
output = repeated(L)
print '%s count = %i (for %s)' % (name, len(output), output)
list_a = [111, 0, 3, 1, 111, 0, 3, 1, 111, 0, 3, 1]
list_b = [67, 4, 67, 4, 67, 4, 67, 4, 2, 9, 0]
list_c = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 0, 23, 18, 10
]
print_result(list_a, 'list_a')
print_result(list_b, 'list_b')
print_result(list_c, 'list_c')
Python's set() function will transform a list to a set, a datatype that can only contain one of any given value, much like a set in algebra. I converted the input list to a set, and then back to a list, reducing the list to only its unique values. I then tested the original list for each of these values to see if it contained that value more than once. I returned a list of all of the duplicates. The rest of the code is just for demonstration purposes, to show that it works.
Edit: Syntax highlighting didn't like the apostrophe in my docstring.

Python - Parsing Columns and Rows

I am running into some trouble with parsing the contents of a text file into a 2D array/list. I cannot use built-in libraries, so have taken a different approach. This is what my text file looks like, followed by my code
1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1
def twoDArray():
network = [[]]
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
col = line.split(line, ',')
row = line.split(',')
network.append(col,row)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
I ran this code but got this error:
Traceback (most recent call last):
File "2dArray.py", line 22, in <module>
twoDArray()
File "2dArray.py", line 8, in twoDArray
col = line.split(line, ',')
TypeError: an integer is required
I am using the comma to separate both row and column as I am not sure how I would differentiate between the two - I am confused about why it is telling me that an integer is required when the file is made up of integers

Well, I can explain the error. You're using str.split() and its usage pattern is:
str.split(separator, maxsplit)
You're using str.split(string, separator) and that isn't a valid call to split. Here is a direct link to the Python docs for this:
http://docs.python.org/library/stdtypes.html#str.split

To directly answer your question, there is a problem with the following line:
col = line.split(line, ',')
If you check the documentation for str.split, you'll find the description to be as follows:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most
maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).
This is not what you want. You are not trying to specify the number of splits you want to make.
Consider replacing your for loop and network.append with this:
for line in filename.readlines():
# line is a string representing the values for this row
row = line.split(',')
# row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
cols = [int(x) for x in row]
# cols is the list of numbers for this row, such as [1, 0, 4, ...]
network.append(row)
# Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]

"""I cannot use built-in libraries""" -- do you really mean "cannot" as in you have tried to use the csv module and failed? If so, say so. Do you mean that "may not" as in you are forbidden to use a built-in module by the terms of your homework assignment? If so, say so.
Here is an answer that works. It doesn't leave a newline attached to the end of the last item in each row. It converts the numbers to int so that you can use them for whatever purpose you have. It fixes other errors that nobody else has mentioned.
def twoDArray():
network = []
# filename = open('twoDArray.txt', 'r')
# "filename" is a very weird name for a file HANDLE
f = open('twoDArray.txt', 'r')
# for line in filename.readlines():
# readlines reads the whole file into memory at once.
# That is quite unnecessary.
for line in f: # just iterate over the file handle
line = line.rstrip('\n') # remove the newline, if any
# col = line.split(line, ',')
# wrong args, as others have said.
# In any case, only 1 split call is necessary
row = line.split(',')
# now convert string to integer
irow = [int(item) for item in row]
# network.append(col,row)
# list.append expects only ONE arg
# indentation was wrong; you need to do this once per line
network.append(irow)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()

Omg...
network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
network.append(line.split(','))
you take
[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]
or you neeed some other structure as output? Please add what do you need as output?

class TwoDArray(object):
#classmethod
def fromFile(cls, fname, *args, **kwargs):
splitOn = kwargs.pop('splitOn', None)
mode = kwargs.pop('mode', 'r')
with open(fname, mode) as inf:
return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)
def __init__(self, data=[[]], *args, **kwargs):
dataType = kwargs.pop('dataType', lambda x:x)
super(TwoDArray,self).__init__()
self.data = [[dataType(i) for i in line] for line in data]
def __str__(self, fmt=str, endrow='\n', endcol='\t'):
return endrow.join(
endcol.join(fmt(i) for i in row) for row in self.data
)
def main():
network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)
print("Network =")
print(network)
if __name__ == "__main__":
main()

The input format is simple, so the solution should be simple too:
network = [map(int, line.split(',')) for line in open(filename)]
print network
csv module doesn't provide an advantage in this case:
import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]
If you need float instead of int:
print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))
If you are working with numpy arrays:
import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')
See Why NumPy instead of Python lists?
All examples produce arrays equal to:
[[1 0 4 3 6 7 4 8 3 2 1 0]
[2 3 6 3 2 1 7 4 3 1 1 0]
[5 2 1 3 4 6 4 8 9 5 2 1]]

Read the data from the file. Here's one way:
f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()
Parse the data into a table
table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]
Perhaps you want the transpose instead:
transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

File handling in Python - python

Related

Appending numbers from a file into a list by it's value

Appending elements to a list based on condition

How to read the first row of an array in Python

Python finding repeating sequence in list of integers?

Python - Parsing Columns and Rows

Categories

Resources