Iterate through a file reading first N values - python

I am reading 3 lines at a time from a file which has numbers 1,2,3...100
I want the output to look something like this
1
2
3
2
3
4
3
4
5
However with the following code, it is printing continuous numbers
with open("/home/osboxes/num", "r+") as f:
for line in f:
print(line)
line2 = f.__next__()
print(line2)
line3 = f.__next__()
print(line3)
Is there a way to go back to the iteration and skip the file line and display the output as shown above

Let's assume that instead of your file object we have an iterator like iter(range(100)) in order to produce our expected result using next you can copy the iterator using itertools.tee as many times as you want and create a zip from your iterators based on your expected output:
In [3]: r = iter(range(100))
In [4]: from itertools import tee
In [5]: r, n, m = tee(r, 3) # copy the iterator 3 times
In [6]: next(n) # consume the first item of n
Out[6]: 0
In [7]: next(m);next(m) # consume the first 2 items of m
Out[7]: 1
In [8]: list(zip(r, n, m))
#Out[8]:
#[(0, 1, 2),
# (1, 2, 3),
# (2, 3, 4),
# (3, 4, 5),
# (4, 5, 6),
# (5, 6, 7),
# ...
Now you can do the same thing with file object:
from itertools import tee
with open("/home/osboxes/num", "r+") as f:
f, n, m = tee(f, 3)
next(n);next(m);next(m)
for i, j , k in zip(r, n, m):
print(i, j, k) # or do something else with i,j,k

If it's a smaller file as you mentioned, then you can use following code, but if it's much bigger than prefer using seek() method:
with open("abc.txt", "r+") as f:
data = f.readlines()
for i in range(2, len(data)):
print("%s %s %s" % (data[i-2].rstrip(), data[i-1].rstrip(), data[i].rstrip()), end = " ")
Output:
1 2 3 2 3 4 3 4 5

If storing the whole file in a variable isn't a problem, an easy solution would be:
with open("num", "r+") as f:
lines = f.read().splitlines()
for i in range(len(lines) - 2):
print(lines[i])
print(lines[i + 1])
print(lines[i + 2])
For a more efficient solution, see #Kasramvd solution using iterators.
As an alternative without iterators, you can store the last 2 values:
with open("num", "r+") as f:
prev1, prev2 = None, None
for line in f:
if prev1 is not None and prev2 is not None:
print(prev1)
print(prev2)
print(line)
prev1, prev2 = prev2, line

Related

File handling in Python

Im a python noob and I'm stuck on a problem.
filehandler = open("data.txt", "r")
alist = filehandler.readlines()
def insertionSort(alist):
for line in alist:
line = list(map(int, line.split()))
print(line)
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
print(line)
insertionSort(alist)
for line in alist:
print line
Output:
[4, 19, 2, 5, 11]
[4, 2, 5, 11, 19]
[8, 1, 2, 3, 4, 5, 6, 1, 2]
[8, 1, 1, 2, 2, 3, 4, 5, 6]
4 19 2 5 11
8 1 2 3 4 5 6 1 2
I am supposed to sort lines of values from a file. The first value in the line represents the number of values to be sorted. I am supposed to display the values in the file in sorted order.
The print calls in insertionSort are just for debugging purposes.
The top four lines of output show that the insertion sort seems to be working. I can't figure out why when I print the lists after calling insertionSort the values are not sorted.
I am new to Stack Overflow and Python so please let me know if this question is misplaced.
for line in alist:
line = list(map(int, line.split()))
line starts out as eg "4 19 2 5 11". You split it and convert to int, ie [4, 19, 2, 5, 11].
You then assign this new value to list - but list is a local variable, the new value never gets stored back into alist.
Also, list is a terrible variable name because there is already a list data-type (and the variable name will keep you from being able to use the data-type).
Let's reorganize your program:
def load_file(fname):
with open(fname) as inf:
# -> list of list of int
data = [[int(i) for i in line.split()] for line in inf]
return data
def insertion_sort(row):
# `row` is a list of int
#
# your sorting code goes here
#
return row
def save_file(fname, data):
with open(fname, "w") as outf:
# list of list of int -> list of str
lines = [" ".join(str(i) for i in row) for row in data]
outf.write("\n".join(lines))
def main():
data = load_file("data.txt")
data = [insertion_sort(row) for row in data]
save_file("sorted_data.txt", data)
if __name__ == "__main__":
main()
Actually, with your data - where the first number in each row isn't actually data to sort - you would be better to do
data = [row[:1] + insertion_sort(row[1:]) for row in data]
so that the logic of insertion_sort is cleaner.
As #Barmar mentioned above, you are not modifying the input to the function. You could do the following:
def insertionSort(alist):
blist = []
for line in alist:
line = list(map(int, line.split()))
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
blist.append(line)
return blist
blist = insertionSort(alist)
print(blist)
Alternatively, modify alist "in-place":
def insertionSort(alist):
for k, line in enumerate(alist):
line = list(map(int, line.split()))
for index in range(2, len(line)):
currentvalue = line[index]
position = index
while position>1 and line[position-1]>currentvalue:
line[position]=line[position-1]
position = position-1
line[position]=currentvalue
alist[k] = line
insertionSort(alist)
print(alist)

Zip two file contents having related timestamp column to create a list in python

I have two files containing timestamp column with 1000+ rows. Row in file f1 is related to the row in file f2. I wanted a Python script to do [f1 nth row,f2 nth row] for all corresponding rows in the best way possible. Thanks!
f1:
05:43:44
05:59:32
f2:
05:43:51
05:59:39
e.g. [05:43:44,05:43:51], [05:59:32,05:59:39] ....
You may use zip() function. https://docs.python.org/3/library/functions.html#zip
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> list(zipped)
[(1, 4), (2, 5), (3, 6)]
You can do something like the following:
f1_as_list = open(f1).readlines() # get each line as a list element
f2_as_list = open(f2).readlines()
zipped_files = zip(f1_as_list, f2_as_list) # zip the two lists together
Something like this is probably the most intuitive approach.
#!/usr/bin/python3
with open("f1.txt") as f1:
with open("f2.txt") as f2:
for row1 in f1:
for row2 in f2:
print("%s %s" % (row1.strip(), row2.strip()))
Some might prefer a list comprehension, but non-pythonistas may not consider it intuitive.
with open("f1.txt") as f1:
with open("f2.txt") as f2:
print("\n".join([
"%s %s" % (row1.strip(), row2.strip())
for row1 in f1
for row2 in f2
]))

Python: reading N number from file, M at time

My file is this one:
14
3
21
37
48
12
4
6
22
4
How can I read M number at time? for example 4 at time. Is it necessary to use two for loops?
My goal is to create (N/M)+1 lists with M numbers inside every lists, except the final list (it's the reminder of division N/M)
You can use python list slice operator to fetch the number of required elements from a file by reading a file using readlines() where each element of list will be one line of file.
with open("filename") as myfile:
firstNtoMlines = myfile.readlines()[N:N+M] # the interval you want to read
print firstNtoMlines
Use itertools.islice,
import itertools
import math
filename = 'test.dat'
N = 9
M = 4
num_rest_lines = N
nrof_lists = int(math.ceil(N*1.0/M))
with open(filename, 'r') as f:
for i in range(nrof_lists):
num_lines = min(num_rest_lines, M)
lines_gen = itertools.islice(f, num_lines)
l = [int(line.rstrip()) for line in lines_gen]
num_rest_lines = num_rest_lines - M
print(l)
# Output
[14, 3, 21, 37]
[48, 12, 4, 6]
[22]
Previous answer: Iterate over a file (N lines) in chunks (every M lines), forming a list of N/M+1 lists.
import itertools
def grouper(iterable, n, fillvalue=None):
"""iterate in chunks"""
args = [iter(iterable)] * n
return itertools.izip_longest(*args, fillvalue=fillvalue)
# Test
filename = 'test.dat'
m = 4
fillvalue = '0'
with open(filename, 'r') as f:
lists = [[int(item.rstrip()) for item in chuck] for chuck in grouper(f, m, fillvalue=fillvalue)]
print(lists)
# Output
[[14, 3, 21, 37], [48, 12, 4, 6], [22, 4, 0, 0]]
Now my code is this one:
N = 4
M = 0
while (M < 633):
with open("/Users/Lorenzo/Desktop/X","r") as myFile:
res = myFile.readlines()[M:N]
print(res)
M+=4
N+=4
so, It should work. My file's got 633 numbers
This has been asked before.
from itertools import izip_longest
izip_longest(*(iter(yourlist),) * yourgroupsize)
For the case of grouping lines in a file into lists of size 4:
with open("file.txt", "r") as f:
res = izip_longest(*(iter(f)),) * 4)
print res
Alternative way to split a list into groups of n

Splitting Text File Into Columns and Rows in Python

I have a newbie question. I need help on separating a text file into columns and rows. Let's say I have a file like this:
1 2 3 4
2 3 4 5
and I want to put it into a 2d list called values = [[]]
i can get it to give me the rows ok and this code works ok:
values = map(int, line.split(','))
I just don't know how I can say the same thing but for the rows and the documentation doesn't make any sense
cheers
f = open(filename,'rt')
a = [[int(token) for token in line.split()] for line in f.readlines()[::2]]
In your sample file above, you have an empty line between each data row - I took this into account, but you can drop the ::2 subscript if you didn't mean to have this extra line in your data.
Edit: added conversion to int - you can use map as well, but mixing list comprehensions and map seems ugly to me.
import csv
import itertools
values = []
with open('text.file') as file_object:
for line in csv.reader(file_object, delimiter=' '):
values.append(map(int, line))
print "rows:", values
print "columns"
for column in itertools.izip(*values):
print column
Output is:
rows: [[1, 2, 3, 4], [2, 3, 4, 5]]
columns:
(1, 2)
(2, 3)
(3, 4)
(4, 5)
Get the data into your program by some method. Here's one:
f = open(tetxfile, 'r')
buffer = f.read()
f.close()
Parse the buffer into a table (note: strip() is used to clear any trailing whitespace):
table = [map(int, row.split()) for row in buffer.strip().split("\n")]
>>> print table
[[1, 2, 3, 4], [2, 3, 4, 5]]
Maybe it's ordered pairs you want instead, then transpose the table:
transpose = zip(*table)
>>> print transpose
[(1, 2), (2, 3), (3, 4), (4, 5)]
You could try to use the CSV-module. You can specify custom delimiters, so it might work.
If columns are separated by blanks
import re
A,B,C,D = [],[],[],[]
pat = re.compile('([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.match(line.strip()).groups()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
or with csv module
EDIT
A,B,C,D = [],[],[],[]
with open('try.txt') as f:
for line in f:
a,b,c,d = line.split()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
But if there are more than one blank between elements of data, this code will fail
EDIT 2
Because the solution with regex has been qualified of extremely hard to understand, it can be cleared as follows:
import re
A,B,C,D = [],[],[],[]
pat = re.compile('\s+')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.split(line.strip())
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))

Python - Parsing Columns and Rows

I am running into some trouble with parsing the contents of a text file into a 2D array/list. I cannot use built-in libraries, so have taken a different approach. This is what my text file looks like, followed by my code
1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1
def twoDArray():
network = [[]]
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
col = line.split(line, ',')
row = line.split(',')
network.append(col,row)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
I ran this code but got this error:
Traceback (most recent call last):
File "2dArray.py", line 22, in <module>
twoDArray()
File "2dArray.py", line 8, in twoDArray
col = line.split(line, ',')
TypeError: an integer is required
I am using the comma to separate both row and column as I am not sure how I would differentiate between the two - I am confused about why it is telling me that an integer is required when the file is made up of integers
Well, I can explain the error. You're using str.split() and its usage pattern is:
str.split(separator, maxsplit)
You're using str.split(string, separator) and that isn't a valid call to split. Here is a direct link to the Python docs for this:
http://docs.python.org/library/stdtypes.html#str.split
To directly answer your question, there is a problem with the following line:
col = line.split(line, ',')
If you check the documentation for str.split, you'll find the description to be as follows:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most
maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).
This is not what you want. You are not trying to specify the number of splits you want to make.
Consider replacing your for loop and network.append with this:
for line in filename.readlines():
# line is a string representing the values for this row
row = line.split(',')
# row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
cols = [int(x) for x in row]
# cols is the list of numbers for this row, such as [1, 0, 4, ...]
network.append(row)
# Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]
"""I cannot use built-in libraries""" -- do you really mean "cannot" as in you have tried to use the csv module and failed? If so, say so. Do you mean that "may not" as in you are forbidden to use a built-in module by the terms of your homework assignment? If so, say so.
Here is an answer that works. It doesn't leave a newline attached to the end of the last item in each row. It converts the numbers to int so that you can use them for whatever purpose you have. It fixes other errors that nobody else has mentioned.
def twoDArray():
network = []
# filename = open('twoDArray.txt', 'r')
# "filename" is a very weird name for a file HANDLE
f = open('twoDArray.txt', 'r')
# for line in filename.readlines():
# readlines reads the whole file into memory at once.
# That is quite unnecessary.
for line in f: # just iterate over the file handle
line = line.rstrip('\n') # remove the newline, if any
# col = line.split(line, ',')
# wrong args, as others have said.
# In any case, only 1 split call is necessary
row = line.split(',')
# now convert string to integer
irow = [int(item) for item in row]
# network.append(col,row)
# list.append expects only ONE arg
# indentation was wrong; you need to do this once per line
network.append(irow)
print "Network = "
print network
if __name__ == "__main__":
twoDArray()
Omg...
network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
network.append(line.split(','))
you take
[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]
or you neeed some other structure as output? Please add what do you need as output?
class TwoDArray(object):
#classmethod
def fromFile(cls, fname, *args, **kwargs):
splitOn = kwargs.pop('splitOn', None)
mode = kwargs.pop('mode', 'r')
with open(fname, mode) as inf:
return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)
def __init__(self, data=[[]], *args, **kwargs):
dataType = kwargs.pop('dataType', lambda x:x)
super(TwoDArray,self).__init__()
self.data = [[dataType(i) for i in line] for line in data]
def __str__(self, fmt=str, endrow='\n', endcol='\t'):
return endrow.join(
endcol.join(fmt(i) for i in row) for row in self.data
)
def main():
network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)
print("Network =")
print(network)
if __name__ == "__main__":
main()
The input format is simple, so the solution should be simple too:
network = [map(int, line.split(',')) for line in open(filename)]
print network
csv module doesn't provide an advantage in this case:
import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]
If you need float instead of int:
print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))
If you are working with numpy arrays:
import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')
See Why NumPy instead of Python lists?
All examples produce arrays equal to:
[[1 0 4 3 6 7 4 8 3 2 1 0]
[2 3 6 3 2 1 7 4 3 1 1 0]
[5 2 1 3 4 6 4 8 9 5 2 1]]
Read the data from the file. Here's one way:
f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()
Parse the data into a table
table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]
Perhaps you want the transpose instead:
transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]

Categories