I want to set the current position in a textfile one line back.
Example:
I search in a textfile for a word "x".
Textfile:
Line: qwe qwe
Line: x
Line: qwer
Line: qwefgdg
If i find that word, the current position of the fobj shall be set back one line.
( in the example I find the word in the 2. Line so the position shall be set to the beginning of the 1. Line)
I try to use fseek. But I wasn't that succesfull.
This is not how you do it in Python. You should just iterate over the file, test the current line and never worry about file pointers. If you need to retrieve the content of the previous line, just store it.
>>> with open('text.txt') as f: print(f.read())
a
b
c
d
e
f
>>> needle = 'c\n'
>>> with open('test.txt') as myfile:
previous = None
position = 0
for line in myfile:
if line == needle:
print("Previous line is {}".format(repr(previous)))
break
position += len(line) if line else 0
previous = line
Previous line is 'b\n'
>>> position
4
If you really need the byte position of the previous line, be aware that the tell/seek methods don't blend well with iteration, so reopen the file to be safe.
f = open('filename').readlines()
i = 0
while True:
if i > range(len(f)-1):
break
if 'x' in f[i]:
i = i - 1
print f[i]
i += 1
Be careful as that will create a forever loop. Make sure you enter an exit condition for loop to terminate.
Related
I'm trying to read xyz coordinates from a long file using python.
within the file there is a block which indicates that the xyz coordinates are within the next lines.
CARTESIAN COORDINATES (ANGSTROEM)
---------------------------------
C -0.283576 -0.776740 -0.312605
H -0.177080 -0.046256 -1.140653
Cl -0.166557 0.025928 1.189976
----------------------------
I'm using the following code to find the line which mentions the "CARTESIAN COORDINATES (ANGSTROEM)" and then try to iterate until finding an empty line to read the coordinates. However, f.tell() points that I'm at line 0! Therefore, I can not do either next(f) or f.readline() to go through the next lines (just goes to line 1 from line 0). I don't know how this can be done with python.
def read_xyz_out(self,out):
atoms = []
x = []
y = []
z = []
f = open(out, "r")
for line in open(out):
if re.match(r'{}'.format(r'CARTESIAN COORDINATES \(ANGSTROEM\)'), line):
print(f.tell())
# data = line.split()
# atoms.append(data[0])
# x.append(float(data[1]))
# y.append(float(data[2]))
# z.append(float(data[3]))
Suppose you read your file into this string:
My dog has fleas.
CARTESIAN COORDINATES (ANGSTROEM)
---------------------------------
C -0.283576 -0.776740 -0.312605
H -0.177080 -0.046256 -1.140653
Cl -0.166557 0.025928 1.189976
----------------------------
My cat too.
You can then extract lines 4, 5 and 6 with the regular expression
/CARTESIAN COORDINATES \(ANGSTROEM\)\r?\n---------------------------------\r?\n(.+?)(?=\r?\n\r?\n)/s
demo
This expression reads, "match the string 'CARTENSION...---\r?\n' followed by matching 1+ chars, greedily, in capture group 1, followed by an empty line, with the flag '/s' to enable '.' to match the ends of lines".
The desired information can then be extracted with the regular expression
/ *([A-Z][a-z]*) +(-?\d+.\d{6}) +(-?\d+.\d{6}) +(-?\d+.\d{6})\r?\n/
demo
The first step can be skipped if it is sufficient to look for a line that look like this:
C -0.283576 -0.776740 -0.312605
without having to confirm it is preceded by "CARTESIAN...---".
demo
You've opened out twice: once for the f variable and a second time for the for line in open(out): loop. Each file object has its own position, and you've only been reading from the second one (which hasn't been assigned to a variable so you can't get the position). The position of f is still at the beginning, since you never read from it.
You should use
for line in f:
and not call open(out) a second time. You can then call f.readline() inside the loop to read more lines of the file.
How about this (note: untested so there's bound to be bugs - think of this as a sketch of a solution):
def read_xyz_out(self,out):
atoms = []
x = []
y = []
z = []
f = open(out, "r")
# Read until you get to the data
for line in f:
if re.match(r'{}'.format(r'CARTESIAN COORDINATES \(ANGSTROEM\)'), line):
# skip the next line too
f.readline()
break
# Now you're into the data - the loop here picks up where the previous
# one left off
for line in f:
data = line.split()
atoms.append(data[0])
x.append(float(data[1]))
y.append(float(data[2]))
z.append(float(data[3]))
f.close()
Let's say I have a file with the following content(every even line is blank):
Line 1 Line 2 Line 3 ...
I tried to read the file in 2 ways:
count = 0
for line in open("myfile.txt"):
if line == '': #or if len(line) == 0
count += 1
and
count = 0
file = open('myfile.txt')
lines = file.readlines()
for line in lines:
if line == '': #or if len(line) == 0
count += 1
But count always remains 0. How can I count the number of blank lines?
In a more simple and pythonic way:
with open(filename) as fd:
count = sum(1 for line in fd if len(line.strip()) == 0)
This keep the linear complexity in time and a constant complexity in memory.
And, most of all, it get rid of the variable count as a manually incremented variable.
When you use readlines() function, it doesn't automatically remove the EOL characters for you. So you either compare against the end of line, something like:
if line == os.linesep:
count += 1
(you have to import os module of course), or you strip the line (as suggested by #khelwood's comment on your question) and compare against '' as you are doing.
Notice that using os.linesep might not necessarily work as you would expect if you are running your program on a certain OS, e.g. MacOS, but the file you are checking is from a different OS, e.g. Linux, as the line ending will be different. So to check for all cases you have to do something like:
if line == '\n' or line == '\r' or line == '\r\n':
count += 1
Hope this helps.
Every line ends with a newline character '\n'. Note that it is only one character.
An easy workaround is to check wether the line equals '\n', or wether its length is 1, not 0.
You can use count from itertools, which returns iterator. Furthermore I used just strip instead of checking length.
from itertools import count
counter = count()
with open('myfile.txt', 'r') as f:
for line in f.readlines():
if not line.strip():
counter.next()
print counter.next()
The test.txt would be
1
2
3
start
4
5
6
end
7
8
9
I would like the result to be
start
4
5
6
end
This is my code
file = open('test.txt','r')
line = file.readline()
start_keyword = 'start'
end_keyword = 'end'
lines = []
while line:
line = file.readlines()
for words_in_line in line:
if start_keyword in words_in_line:
lines.append(words_in_line)
file.close()
print entities
It returns
['start\n']
I have no idea what to add to the above code to achieve the result I want to get. I have been searching and changing the code around but I don't know how to get this to work as I want it to.
Use a flag. Try this:
file = open('test.txt','r')
start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []
lines = file.readlines()
for line in lines:
line = line.strip()
if line == start_keyword:
in_range = True
elif line == end_keyword:
in_range = False
elif in_range:
entities.append(line)
file.close()
# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]
print entities
About your code, notice that readlines already reads all lines in a file, so calling readline doesn't seem to make much sense, unless you are ignoring the first line. Also use strip to remove EOL characters from the strings. Notice how your code doesn't do what you expect it to:
# Reads ALL lines in the file as an array
line = file.readlines()
# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:
# If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
if start_keyword in words_in_line:
lines.append(words_in_line)
You need a state variable to decide whether you are storing the lines or not. Here is a simplistic example that will always store the line, and then will change its mind and discard it for the cases you don't want:
start_keyword = 'start'
end_keyword = 'end'
lines = []
reading = False
with open('test.txt', 'r') as f:
for line in f:
lines.append(line)
if start_keyword in line:
reading = True
elif end_keyword in line:
reading = False
elif not reading:
lines.pop()
print ''.join(lines)
If the file isn't too big (relative to how much RAM your computer has):
start = 'start'
end = 'end'
with open('test.txt','r') as f:
content = f.read()
result = content[content.index(start):content.index(end)]
You can then print it with print(result), create a list by using result.split(), and so on.
If there are multiple start/stop points, and/or the file is very large:
start = 'start'
end = 'end'
running = False
result = []
with open('test.txt','r') as f:
for line in f:
if start in line:
running = True
result.append(line)
elif end in line:
running = False
result.append(line)
elif running:
result.append(line)
This leaves you with a list, which you can join(), print(), write to a file, and so on.
You can use some kind of a flag that gets set to true when you encounter the start_keyword and if that flag is set you add the lines to lines list, and it gets unset when end_keyword is encountered (but only after end_keyword has been written into the lines list.
Also use .strip() on words_in_line to remove the \n (and other trailing and leading whitespaces) If you do not want them in the list lines , if you do want them, then don't strip it.
Example -
flag = False
for words_in_line in line:
if start_keyword in words_in_line:
flag = True
if flag:
lines.append(words_in_line.strip())
if end_keyword in words_in_line:
flag = False
Please note, this would add multiple start to end blocks into the lines list, I am guessing that is what you want.
A file object is it's own iterator, you don't need a while loop to read a file line by line, you can iterate over the file object itself. To catch the sections just start an inner loopn when you encounter a line with start and break the inner loop when you hit end:
with open("in.txt") as f:
out = []
for line in f:
if start in line:
out.append(line)
for _line in f:
out.append(_line)
if end in _line:
break
Output:
['start\n', '4\n', '5\n', '6\n', 'end\n']
I have a Python script that is reading from a file.
The first command counts the lines. The second one prints the second line although the second one is not working.
lv_file = open("filename.txt", "rw+")
# count the number of lines =================================
lv_cnt = 0
for row in lv_file.xreadlines():
lv_cnt = lv_cnt + 1
# print the second line =====================================
la_lines = la_file.readlines()
print la_lines[2]
lv_file.close()
When I write it like this it works but I don't see why I would have to close the file and reopen it to get it to work. Is there some kind of functionality that I am misusing?
lv_file = open("filename.txt", "rw+")
# count the number of lines =================================
lv_cnt = 0
for row in lv_file.xreadlines():
lv_cnt = lv_cnt + 1
lv_file.close()
lv_file = open("filename.txt", "rw+")
# print the second line =====================================
la_lines = la_file.readlines()
print la_lines[2]
lv_file.close()
A file object is an iterator. Once you've gone through all the lines, the iterator is exhausted, and further reads will do nothing.
To avoid closing and reopening the file, you can use seek to rewind to the start:
lv_file.seek(0)
What you are after is file.seek():
Example: (based on your code)
lv_file = open("filename.txt", "rw+")
# count the number of lines =================================
lv_cnt = 0
for row in lv_file.xreadlines():
lv_cnt = lv_cnt + 1
lv_file.seek(0) # reset file pointer
# print the second line =====================================
la_lines = la_file.readlines()
print la_lines[2]
lv_file.close()
This will reset the file pointer back to it's starting position.
pydoc file.seek:
seek(offset, whence=SEEK_SET) Change the stream position to the
given byte offset. offset is interpreted relative to the position
indicated by whence. Values for whence are:
SEEK_SET or 0 – start of the stream (the default); offset should be
zero or positive SEEK_CUR or 1 – current stream position; offset may
be negative SEEK_END or 2 – end of the stream; offset is usually
negative Return the new absolute position.
New in version 2.7: The SEEK_* constants
Update: A better way of counting the no. of lines in a file iteratively and only caring about the 2nd line:
def nth_line_and_count(filename, n):
"""Return the nth line in a file (zero index) and the no. of lines"""
count = 0
with open(filename, "r") as f:
for i, line in enumerate(f):
count += 1
if i == n:
value = line
return count, value
nlines, line = nth_line_and_count("filename.txt", 1)
Since xreadlines() keeps a pointer to the last line it sent you, when you do
la_lines = la_file.readlines()
it basically remembers the index of the last line it gave you.
when you close the file and then open it, it create a new iterator, and it again points to line 0.
I want to grab a chunk of data from a file. I know the start line and the end line. I wrote the code but its incomplete and I don't know how to solve it further.
file = open(filename,'r')
end_line='### Leave a comment!'
star_line = 'Kill the master'
for line in file:
if star_line in line:
??
startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []
with open("somefile") as f:
for line in f:
if line.startswith(startmarker): marking = True
elif line.startswith(endmarker): marking = False
if marking: result.append(line)
if len(result) > 1:
print "".join(result[1:])
Explanation: The with block is a nice way to use files -- it makes sure you don't forget to close() it later. The for walks each line and:
starts outputting when it sees a line that starts with 'ohai' (including that line)
stops outputting when it sees a line that starts with 'meheer?' (without outputting that line).
After the loop, result contains the part of the file that is needed, plus that initial marker. Rather than making the loop more complicated to ignore the marker, I just throw it out using a slice: result[1:] returns all elements in result starting at index 1; in other words, it excludes the first element (index 0).
Update to reflect add partial-line matches:
startmarker = "ohai"
endmarker = "meheer?"
marking = False
result = []
with open("somefile") as f:
for line in f:
if not marking:
index = line.find(startmarker)
if index != -1:
marking = True
result.append(line[index:])
else:
index = line.rfind(endmarker)
if index != -1:
marking = False
result.append(line[:index + len(endmarker)])
else:
result.append(line)
print "".join(result)
Yet more explanation: marking still tells us whether we should be outputting whole lines, but I've changed the if statements for the start and end markers as follows:
if we're not (yet) marking, and we see the startmarker, then output the current line starting at the marker. The find method returns the position of the first occurrence of startmarker in this case. The line[index:] notation means 'the content of line starting at position index.
while marking, just output the current line entirely unless it contains endmarker. Here, we use rfind to find the rightmost occurrence of endmarker, and the line[...] notation means 'the content of line up to position index (the start of the match) plus the marker itself.' Also: stop marking now :)
if reading the whole file is not a problem, I would use file.readlines() to read in all the lines in a list of strings.
then you can use list_of_lines.index(value) to find the indices of the first and last line, and then select all the lines between these two indices.
First, a test file (assuming Bash shell):
for i in {0..100}; do echo "line $i"; done > test_file.txt
That generates a file a 101 line file with lines line 0\nline 1\n ... line 100\n
This Python script captures the line between and including mark1 up to and not including mark2:
#!/usr/bin/env python
mark1 = "line 22"
mark2 = "line 26"
record=False
error=False
buf = []
with open("test_file.txt") as f:
for line in f:
if mark1==line.rstrip():
if error==False and record==False:
record=True
if mark2==line.rstrip():
if record==False:
error=True
else:
record=False
if record==True and error==False:
buf.append(line)
if len(buf) > 1 and error==False:
print "".join(buf)
else:
print "There was an error in there..."
Prints:
line 22
line 23
line 24
line 25
in this case. If both marks are not found in the correct sequence, it will print an error.
If the size of the file between the marks is excessive, you may need some additional logic. You can also use a regex for each line instead of an exact match if that fits your use case.