How to use read next() starting from any line in python? - python

I'm trying to start reading some file from line 3, but I can't.
I've tried to use readlines() + the index number of the line, as seen bellow:
x = 2
f = open('urls.txt', "r+").readlines( )[x]
line = next(f)
print(line)
but I get this result:
Traceback (most recent call last):
File "test.py", line 441, in <module>
line = next(f)
TypeError: 'str' object is not an iterator
I would like to be able to set any line, as a variable, and from there, all the time that I use next() it goes to the next line.
IMPORTANT: as this is a new feature and all my code already uses next(f), the solution needs to be able to work with it.

Try this (uses itertools.islice):
from itertools import islice
f = open('urls.txt', 'r+')
start_at = 3
file_iterator = islice(f, start_at - 1, None)
# to demonstrate
while True:
try:
print(next(file_iterator), end='')
except StopIteration:
print('End of file!')
break
f.close()
urls.txt:
1
2
3
4
5
Output:
3
4
5
End of file!
This solution is better than readlines because it doesn't load the entire file into memory and only loads parts of it when needed. It also doesn't waste time iterating previous lines when islice can do that, making it much faster than #MadPhysicist's answer.
Also, consider using the with syntax to guarantee the file gets closed:
with open('urls.txt', 'r+') as f:
# do whatever

The readlines method returns a list of strings for the lines. So when you take readlines()[2] you're getting the third line, as a string. Calling next on that string then makes no sense, so you get an error.
The easiest way to do this is to slice the list: readlines()[x:] gives a list of everything from line x onwards. Then you can use that list however you like.
If you have your heart set on an iterator, you can turn a list (or pretty much anything) into an iterator with the iter builtin function. Then you can next it to your heart's content.

The following code will allow you to use an iterator to print the first line:
In [1]: path = '<path to text file>'
In [2]: f = open(path, "r+")
In [3]: line = next(f)
In [4]: print(line)
This code will allow you to print the lines starting from the xth line:
In [1]: path = '<path to text file>'
In [2]: x = 2
In [3]: f = iter(open(path, "r+").readlines()[x:])
In [4]: f = iter(f)
In [5]: line = next(f)
In [6]: print(line)
Edit: Edited the solution based on #Tomothy32's observation.

The line you printed returns a string:
open('urls.txt', "r+").readlines()[x]
open returns a file object. Its readlines method returns a list of strings. Indexing with [x] returns the third line in the file as a single string.
The first problem is that you open the file without closing it. The second is that your index doesn't specify a range of lines until the end. Here's an incremental improvement:
with open('urls.txt', 'r+') as f:
lines = f.readlines()[x:]
Now lines is a list of all the lines you want. But you first read the whole file into memory, then discarded the first two lines. Also, a list is an iterable, not an iterator, so to use next on it effectively, you'd need to take an extra step:
lines = iter(lines)
If you want to harness the fact that the file is already a rather efficient iterator, apply next to it as many times as you need to discard unwanted lines:
with open('urls.txt', 'r+') as f:
for _ in range(x):
next(f)
# now use the file
print(next(f))
After the for loop, any read operation you do on the file will start from the third line, whether it be next(f), f.readline(), etc.
There are a few other ways to strip the first lines. In all cases, including the example above, next(f) can be replaced with f.readline():
for n, _ in enumerate(f):
if n == x:
break
or
for _ in zip(f, range(x)): pass
After you run either of these loops, next(f) will return the xth line.

Just call next(f) as many times as you need to. (There's no need to overcomplicate this with itertools, nor to slurp the entire file with readlines.)
lines_to_skip = 3
with open('urls.txt') as f:
for _ in range(lines_to_skip):
next(f)
for line in f:
print(line.strip())
Output:
% cat urls.txt
url1
url2
url3
url4
url5
% python3 test.py
url4
url5

Related

How to loop through two generators of the same opened file

I have a medium-size file (25MB, 1000000 rows), and I want to read every row except every third row.
FIRST QUESTION: Is it faster to load the whole file into memory and then read the rows (method .read()), or load and read one row at the time (method .readline())?
Since I'm not an experienced coder I tried the second option with islice method from itertools module.
import intertools
with open(input_file) as inp:
inp_atomtype = itertools.islice(inp, 0, 40, 3)
inp_atomdata = itertools.islice(inp, 1, 40, 3)
for atomtype, atomdata in itertools.zip_longest(inp_atomtype, inp_atomdata):
print(atomtype + atomdata)
Although looping through single generator (inp_atomtype or inp_atomdata) prints correct data, looping through both of them simultaneously (as in this code) prints wrong data.
SECOND QUESTION: How can I reach desired rows using generators?
You don't need to slice the iterator, a simple line counter should be enough:
with open(input_file) as f:
current_line = 0
for line in f:
current_line += 1
if current_line % 3: # ignore every third line
print(line) # NOTE: print() will add an additional new line by default
As for turning it into a generator, just yield the line instead of printing.
When it comes to speed, given that you'll be reading your lines anyway the I/O part will probably take the same but you might benefit a bit (in total processing time) from fast list slicing instead of counting lines if you have enough working memory to keep the file contents and if loading the whole file upfront instead of streaming is acceptable.
yield is perfect for this.
This functions yields pairs from an iterable and skip every third item:
def two_thirds(seq):
_iter = iter(seq)
while True:
yield (next(_iter), next(_iter))
next(_iter)
You will lose half pairs, which means that two_thirds(range(2)) will stop iterating immediately.
https://repl.it/repls/DullNecessaryCron
You can also use the grouper recipe from itertools doc and ignore the third item in each tuple generated:
for atomtype, atomdata, _ in grouper(lines, 3):
pass
FIRST QUESTION: I am pretty sure that .readline() is faster than .read(). Plus, the fastest way based my test is to do lopping like:
with open(file, 'r') as f:
for line in f:
...
SECOND QUESTION: I am not quite sure abut this. you may consider to use yield.
There is a code snippet you may refer:
def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
# the end of file
yield buf
break
buf += chunk
with open("input.txt") as f:
for line in myreadlines(f, "{|}"):
print (line)
q2: here's my generator:
def yield_from_file(input_file):
with open(input_file) as file:
yield from file
def read_two_skip_one(gen):
while True:
try:
val1 = next(gen)
val2 = next(gen)
yield val1, val2
_ = next(gen)
except StopIteration:
break
if __name__ == '__main__':
for atomtype, atomdata in read_two_skip_one(yield_from_file('sample.txt')):
print(atomtype + atomdata)
sample.txt was generated with a bash shell (it's just lines counting to 100)
for i in {001..100}; do echo $i; done > sample.txt
regarding q1: if you're reading the file multiple times, you'd be better off to have it in memory. otherwise you're fine reading it line by line.
Regarding the problem you're having with the wrong results:
both itertools.islice(inp, 0, 40, 3) statements will use inp as generator. Both will call next(inp), to provide you with a value.
Each time you call next() on an iterator, it will change its state, so that's where your problems come from.
You can use a generator expression:
with open(input_file, 'r') as f:
generator = (line for e, line in enumerate(f, start=1) if e % 3)
enumerate adds line numbers to each line, and the if clause ignores line numbers divisible by 3 (default numbering starts at 0, so you have to specify start=1 to get the desired pattern).
Keep in mind that you can only use the generator while the file is still open.

generating list by reading from file

i want to generate a list of server addresses and credentials reading from a file, as a single list splitting from newline in file.
file is in this format
login:username
pass:password
destPath:/directory/subdir/
ip:10.95.64.211
ip:10.95.64.215
ip:10.95.64.212
ip:10.95.64.219
ip:10.95.64.213
output i want is in this manner
[['login:username', 'pass:password', 'destPath:/directory/subdirectory', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
i tried this
with open('file') as f:
credentials = [x.strip().split('\n') for x in f.readlines()]
and this returns lists within list
[['login:username'], ['pass:password'], ['destPath:/directory/subdir/'], ['ip:10.95.64.211'], ['ip:10.95.64.215'], ['ip:10.95.64.212'], ['ip:10.95.64.219'], ['ip:10.95.64.213']]
am new to python, how can i split by newline character and create single list. thank you in advance
You could do it like this
with open('servers.dat') as f:
L = [[line.strip() for line in f]]
print(L)
Output
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211', 'ip:10.95.64.215', 'ip:10.95.64.212', 'ip:10.95.64.219', 'ip:10.95.64.213']]
Just use a list comprehension to read the lines. You don't need to split on \n as the regular file iterator reads line by line. The double list is a bit unconventional, just remove the outer [] if you decide you don't want it.
I just noticed you wanted the list of ip addresses joined in one string. It's not clear as its off the screen in the question and you make no attempt to do it in your own code.
To do that read the first three lines individually using next then just join up the remaining lines using ; as your delimiter.
def reader(f):
yield next(f)
yield next(f)
yield next(f)
yield ';'.join(ip.strip() for ip in f)
with open('servers.dat') as f:
L2 = [[line.strip() for line in reader(f)]]
For which the output is
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
It does not match your expected output exactly as there is a typo 'destPath:/directory/subdirectory' instead of 'destPath:/directory/subdir' from the data.
This should work
arr = []
with open('file') as f:
for line in f:
arr.append(line)
return [arr]
You could just treat the file as a list and iterate through it with a for loop:
arr = []
with open('file', 'r') as f:
for line in f:
arr.append(line.strip('\n'))

How to use the 'any()' function to search for multiple substrings?

I have the following code where I am trying to open a textfile, read through it line by line, and if a line has a certain country code (US,BR), add it to a list myNames:
f = urllib2.urlopen(url)
countries = ['US', 'BR']
myNames = []
for line in f:
line = f.readline()
if any(x in line for x in countries):
myNames.append(line)
Unfortunately I think my use of any() must be incorrect because it is yielding only a small number from 1 country and none from the second even though I can verify that there are more of each type. How can I fix this?
It's hard to say without knowing what x is and what's in the file, but this snippet:
for line in f:
line = f.readline()
is reading two lines at a time -- for line in f is already iterating over the file line by line, by reading twice you're skipping every other line. That would explain why you're getting too few results.
I think you should do:
for line in f.readlines():
if any(x in line for x in countries):
myNames.append(line)
otherwise you will skip a good number of lines.

How to get 2nd thing out of every line using python and file parsing

i'm trying to parse through a file with structure:
0 rs41362547 MT 10044
1 rs28358280 MT 10550
...
and so forth, where i want the second thing in each line to be put into an array. I know it should be pretty easy, but after a lot of searching, I'm still lost. I'm really new to python, what would be the script to do this?
THanks!
You can split the lines using str.split:
with open('file.txt') as infile:
result = []
for line in infile: #loop through the lines
data = line.split(None, 2)[1] #split, get the second column
result.append(data) #append it to our results
print data #Just confirming
This will work:
with open('/path/to/file') as myfile: # Open the file
data = [] # Make a list to hold the data
for line in myfile: # Loop through the lines in the file
data.append(line.split(None, 2)[1]) # Get the data and add it to the list
print (data) # Print the finished list
The important parts here are:
str.split, which breaks up the lines based on whitespace.
The with-statement, which auto-closes the file for you when done.
Note that you could also use a list comprehension:
with open('/path/to/file') as myfile:
data = [line.split(None, 2)[1] for line in myfile]
print (data)

How to read the rest of the lines? - python

I have a file and it has some header lines, e.g.
header1 lines: somehting something
more headers then
somehting something
----
this is where the data starts
yes data... lots of foo barring bar fooing data.
...
...
I've skipped the header lines by looping and running file.readlines(), other than looping and concating the rest of the lines, how else can i read the rest of the lines?
x = """header1 lines: somehting something
more headers then
somehting something
----
this is where the data starts
yes data... lots of foo barring bar fooing data.
...
..."""
with open('test.txt','w') as fout:
print>>fout, x
fin = open('test.txt','r')
for _ in range(5): fin.readline();
rest = "\n".join([i for i in fin.readline()])
.readlines() reads the all data in the file, in one go. There are no more lines to read after the first call.
You probably wanted to use .readline() (no s, singular) instead:
with open('test.txt','r') as fin:
for _ in range(5): fin.readline()
rest = "\n".join(fin.readlines())
Note that because .readlines() returns a list already, you don't need to loop over the items. You could also just use .read() to read in the remainder of the file:
with open('test.txt','r') as fin:
for _ in range(5): fin.readline()
rest = fin.read()
Alternatively, treat the file object as an iterable, and using itertools.islice() slice the iterable to skip the first five lines:
from itertools import islice
with open('test.txt','r') as fin:
all_but_the_first_five = list(islice(fin, 5, None))
This does produce lines, not one large string, but if you are processing the input file line by line, that usually is preferable anyway. You can loop directly over the slice and handle lines:
with open('test.txt','r') as fin:
for line in list(islice(fin, 5, None)):
# process line, first 5 will have been skipped
Don't mix using a file object as an iterable and .readline(); the iteration protocol as implemented by file objects uses an internal buffer to ensure efficiency that .readline() doesn't know about; using .readline() after iteration is liable to return data further on in the file than you expect.
Skip the first 5 lines:
from itertools import islice
with open('yourfile') as fin:
data = list(islice(fin, 5, None))
# or loop line by line still
for line in islice(fin, 5, None):
print line

Categories