How to read the rest of the lines? - python - python

I have a file and it has some header lines, e.g.
header1 lines: somehting something
more headers then
somehting something
----
this is where the data starts
yes data... lots of foo barring bar fooing data.
...
...
I've skipped the header lines by looping and running file.readlines(), other than looping and concating the rest of the lines, how else can i read the rest of the lines?
x = """header1 lines: somehting something
more headers then
somehting something
----
this is where the data starts
yes data... lots of foo barring bar fooing data.
...
..."""
with open('test.txt','w') as fout:
print>>fout, x
fin = open('test.txt','r')
for _ in range(5): fin.readline();
rest = "\n".join([i for i in fin.readline()])

.readlines() reads the all data in the file, in one go. There are no more lines to read after the first call.
You probably wanted to use .readline() (no s, singular) instead:
with open('test.txt','r') as fin:
for _ in range(5): fin.readline()
rest = "\n".join(fin.readlines())
Note that because .readlines() returns a list already, you don't need to loop over the items. You could also just use .read() to read in the remainder of the file:
with open('test.txt','r') as fin:
for _ in range(5): fin.readline()
rest = fin.read()
Alternatively, treat the file object as an iterable, and using itertools.islice() slice the iterable to skip the first five lines:
from itertools import islice
with open('test.txt','r') as fin:
all_but_the_first_five = list(islice(fin, 5, None))
This does produce lines, not one large string, but if you are processing the input file line by line, that usually is preferable anyway. You can loop directly over the slice and handle lines:
with open('test.txt','r') as fin:
for line in list(islice(fin, 5, None)):
# process line, first 5 will have been skipped
Don't mix using a file object as an iterable and .readline(); the iteration protocol as implemented by file objects uses an internal buffer to ensure efficiency that .readline() doesn't know about; using .readline() after iteration is liable to return data further on in the file than you expect.

Skip the first 5 lines:
from itertools import islice
with open('yourfile') as fin:
data = list(islice(fin, 5, None))
# or loop line by line still
for line in islice(fin, 5, None):
print line

Related

Reading a string from an opened text file [duplicate]

I have seen these two ways to process a file:
file = open("file.txt")
for line in file:
#do something
file = open("file.txt")
contents = file.read()
for line in contents:
# do something
I know that in the first case, the file will act like a list, so the for loop iterates over the file as if it were a list. What exactly happens in the second case, where we read the file and then iterate over the contents? What are the consequences of taking each approach, and how should I choose between them?
In the first one you are iterating over the file, line by line. In this scenario, the entire file data is not read into the memory at once; instead, only the current line is read into memory. This is useful for handling very large files, and good for robustness if you don't know if the file is going to be large or not.
In the second one, file.read() returns the complete file data as a string. When you are iterating over it, you are actually iterating over the file's data character by character. This reads the complete file data into memory.
Here's an example to show this behavior.
a.txt file contains
Hello
Bye
Code:
>>> f = open('a.txt','r')
>>> for l in f:
... print(l)
...
Hello
Bye
>>> f = open('a.txt','r')
>>> r = f.read()
>>> print(repr(r))
'Hello\nBye'
>>> for c in r:
... print(c)
...
H
e
l
l
o
B
y
e
The second case reads in the contents of the file into one big string. If you iterate over a string, you get each character in turn. If you want to get each line in turn, you can do this:
for line in contents.split('\n'):
# do something
Or you can read in the contents as a list of lines using readlines() instead of read().
with open('file.txt','r') as fin:
lines = fin.readlines()
for line in lines:
# do something

How to use read next() starting from any line in python?

I'm trying to start reading some file from line 3, but I can't.
I've tried to use readlines() + the index number of the line, as seen bellow:
x = 2
f = open('urls.txt', "r+").readlines( )[x]
line = next(f)
print(line)
but I get this result:
Traceback (most recent call last):
File "test.py", line 441, in <module>
line = next(f)
TypeError: 'str' object is not an iterator
I would like to be able to set any line, as a variable, and from there, all the time that I use next() it goes to the next line.
IMPORTANT: as this is a new feature and all my code already uses next(f), the solution needs to be able to work with it.
Try this (uses itertools.islice):
from itertools import islice
f = open('urls.txt', 'r+')
start_at = 3
file_iterator = islice(f, start_at - 1, None)
# to demonstrate
while True:
try:
print(next(file_iterator), end='')
except StopIteration:
print('End of file!')
break
f.close()
urls.txt:
1
2
3
4
5
Output:
3
4
5
End of file!
This solution is better than readlines because it doesn't load the entire file into memory and only loads parts of it when needed. It also doesn't waste time iterating previous lines when islice can do that, making it much faster than #MadPhysicist's answer.
Also, consider using the with syntax to guarantee the file gets closed:
with open('urls.txt', 'r+') as f:
# do whatever
The readlines method returns a list of strings for the lines. So when you take readlines()[2] you're getting the third line, as a string. Calling next on that string then makes no sense, so you get an error.
The easiest way to do this is to slice the list: readlines()[x:] gives a list of everything from line x onwards. Then you can use that list however you like.
If you have your heart set on an iterator, you can turn a list (or pretty much anything) into an iterator with the iter builtin function. Then you can next it to your heart's content.
The following code will allow you to use an iterator to print the first line:
In [1]: path = '<path to text file>'
In [2]: f = open(path, "r+")
In [3]: line = next(f)
In [4]: print(line)
This code will allow you to print the lines starting from the xth line:
In [1]: path = '<path to text file>'
In [2]: x = 2
In [3]: f = iter(open(path, "r+").readlines()[x:])
In [4]: f = iter(f)
In [5]: line = next(f)
In [6]: print(line)
Edit: Edited the solution based on #Tomothy32's observation.
The line you printed returns a string:
open('urls.txt', "r+").readlines()[x]
open returns a file object. Its readlines method returns a list of strings. Indexing with [x] returns the third line in the file as a single string.
The first problem is that you open the file without closing it. The second is that your index doesn't specify a range of lines until the end. Here's an incremental improvement:
with open('urls.txt', 'r+') as f:
lines = f.readlines()[x:]
Now lines is a list of all the lines you want. But you first read the whole file into memory, then discarded the first two lines. Also, a list is an iterable, not an iterator, so to use next on it effectively, you'd need to take an extra step:
lines = iter(lines)
If you want to harness the fact that the file is already a rather efficient iterator, apply next to it as many times as you need to discard unwanted lines:
with open('urls.txt', 'r+') as f:
for _ in range(x):
next(f)
# now use the file
print(next(f))
After the for loop, any read operation you do on the file will start from the third line, whether it be next(f), f.readline(), etc.
There are a few other ways to strip the first lines. In all cases, including the example above, next(f) can be replaced with f.readline():
for n, _ in enumerate(f):
if n == x:
break
or
for _ in zip(f, range(x)): pass
After you run either of these loops, next(f) will return the xth line.
Just call next(f) as many times as you need to. (There's no need to overcomplicate this with itertools, nor to slurp the entire file with readlines.)
lines_to_skip = 3
with open('urls.txt') as f:
for _ in range(lines_to_skip):
next(f)
for line in f:
print(line.strip())
Output:
% cat urls.txt
url1
url2
url3
url4
url5
% python3 test.py
url4
url5

generating list by reading from file

i want to generate a list of server addresses and credentials reading from a file, as a single list splitting from newline in file.
file is in this format
login:username
pass:password
destPath:/directory/subdir/
ip:10.95.64.211
ip:10.95.64.215
ip:10.95.64.212
ip:10.95.64.219
ip:10.95.64.213
output i want is in this manner
[['login:username', 'pass:password', 'destPath:/directory/subdirectory', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
i tried this
with open('file') as f:
credentials = [x.strip().split('\n') for x in f.readlines()]
and this returns lists within list
[['login:username'], ['pass:password'], ['destPath:/directory/subdir/'], ['ip:10.95.64.211'], ['ip:10.95.64.215'], ['ip:10.95.64.212'], ['ip:10.95.64.219'], ['ip:10.95.64.213']]
am new to python, how can i split by newline character and create single list. thank you in advance
You could do it like this
with open('servers.dat') as f:
L = [[line.strip() for line in f]]
print(L)
Output
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211', 'ip:10.95.64.215', 'ip:10.95.64.212', 'ip:10.95.64.219', 'ip:10.95.64.213']]
Just use a list comprehension to read the lines. You don't need to split on \n as the regular file iterator reads line by line. The double list is a bit unconventional, just remove the outer [] if you decide you don't want it.
I just noticed you wanted the list of ip addresses joined in one string. It's not clear as its off the screen in the question and you make no attempt to do it in your own code.
To do that read the first three lines individually using next then just join up the remaining lines using ; as your delimiter.
def reader(f):
yield next(f)
yield next(f)
yield next(f)
yield ';'.join(ip.strip() for ip in f)
with open('servers.dat') as f:
L2 = [[line.strip() for line in reader(f)]]
For which the output is
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
It does not match your expected output exactly as there is a typo 'destPath:/directory/subdirectory' instead of 'destPath:/directory/subdir' from the data.
This should work
arr = []
with open('file') as f:
for line in f:
arr.append(line)
return [arr]
You could just treat the file as a list and iterate through it with a for loop:
arr = []
with open('file', 'r') as f:
for line in f:
arr.append(line.strip('\n'))

Python how to read last three lines of a .txt file, and put those items into a list?

I am trying to have python read the last three lines of a .txt file. I am also trying to add each line as an element in a list.
So for instance:
**list.txt**
line1
line2
line3
**python_program.py**
(read list.txt, insert items into line_list)
line_list[line1,line2,line3]
However I am a bit confused on this process.
Any help would be greatly appreciated!
What if you are dealing with a very big file? Reading all the lines in memory is going to be quite wasteful. An alternative approach may be:
from collections import deque
d=deque([], maxlen=3)
with open("file.txt") as f:
for l in f:
d.append(l)
This keeps in memory at a given time only the last three rows read (the deque discards the oldest elements in excess at each append).
As #user2357112 points out, this will work as well, and is more synthetic:
from collections import deque
d=None
with open("file.txt") as f:
d=deque(f, maxlen=3)
with open('list.txt') as f:
lines = f.readlines()
line_list = lines[-3:]
Try these:
#!/usr/local/cpython-3.3/bin/python
import pprint
def get_last_3_variant_1(file_):
# This is simple, but it also reads the entire file into memory
lines = file_.readlines()
return lines[-3:]
def get_last_3_variant_2(file_):
# This is more complex, but it only keeps three lines in memory at any given time
three_lines = []
for index, line in zip(range(3), file_):
three_lines.append(line)
for line in file_:
three_lines.append(line)
del three_lines[0]
return three_lines
get_last_3 = get_last_3_variant_2
def main():
# /etc/services is a long file
# /etc/adjtime is exactly 3 lines long on my system
# /etc/libao.conf is exactly 2 lines long on my system
for filename in ['/etc/services', '/etc/adjtime', '/etc/libao.conf']:
with open (filename, 'r') as file_:
result = get_last_3(file_)
pprint.pprint(result)
main()

Best method for reading newline delimited files and discarding the newlines?

I am trying to determine the best way to handle getting rid of newlines when reading in newline delimited files in Python.
What I've come up with is the following code, include throwaway code to test.
import os
def getfile(filename,results):
f = open(filename)
filecontents = f.readlines()
for line in filecontents:
foo = line.strip('\n')
results.append(foo)
return results
blahblah = []
getfile('/tmp/foo',blahblah)
for x in blahblah:
print x
lines = open(filename).read().splitlines()
Here's a generator that does what you requested. In this case, using rstrip is sufficient and slightly faster than strip.
lines = (line.rstrip('\n') for line in open(filename))
However, you'll most likely want to use this to get rid of trailing whitespaces too.
lines = (line.rstrip() for line in open(filename))
What do you think about this approach?
with open(filename) as data:
datalines = (line.rstrip('\r\n') for line in data)
for line in datalines:
...do something awesome...
Generator expression avoids loading whole file into memory and with ensures closing the file
for line in file('/tmp/foo'):
print line.strip('\n')
Just use generator expressions:
blahblah = (l.rstrip() for l in open(filename))
for x in blahblah:
print x
Also I want to advise you against reading whole file in memory -- looping over generators is much more efficient on big datasets.
I use this
def cleaned( aFile ):
for line in aFile:
yield line.strip()
Then I can do things like this.
lines = list( cleaned( open("file","r") ) )
Or, I can extend cleaned with extra functions to, for example, drop blank lines or skip comment lines or whatever.
I'd do it like this:
f = open('test.txt')
l = [l for l in f.readlines() if l.strip()]
f.close()
print l

Categories