Making tail function: Reversing lines in a file - python

I'm trying to define a function that outputs the last n lines in a file. The function below seems to mostly work, aside from the fact that the first two lines in fReverse are being joined and I can't figure out why...
example: (I tried putting these in blockquotes instead of code, but it ruins the line formatting)
f =
Darkly I gaze into the days ahead,
And see her might and granite wonders there,
Beneath the touch of Time’s unerring hand,
Like priceless treasures sinking in the sand.
fReverse =
Like priceless treasures sinking in the sand.Beneath the touch of Time’s unerring hand,
And see her might and granite wonders there,
Darkly I gaze into the days ahead,
Code:
def tail(filename, nlines):
'''Returns a list containing the last n lines of the file.'''
f = open(filename, 'r')
fReverse = open('output.txt', 'w')
fReverse.writelines(reversed(f.readlines()))
fReverse.close()
f.close()
fReverse = open('output.txt', 'r')
listFile = []
for i in range(1,nlines+1):
listFile.append(fReverse.readline(),)
fReverse.close()
return listFile
fname = raw_input('What is the name of the file? ')
lines = int(raw_input('Number of lines to display? '))
print "The last %d lines of the file are: \n%s" % (lines, ''.join(tail(fname, lines)))

Easier to use a deque here:
To reverse the whole file:
from collections import deque
with open('file') as fin:
reversed_lines = deque()
reversed_lines.extendleft(fin)
To display the last n (but iterating through all lines first):
with open('file') as fin:
last4 = deque(fin, 4)

This function can be simplified down quite a bit:
def tail(filename, number_lines):
with open(filename, 'r') as file:
with open('output.txt', 'w') as output:
reversed_lines = file.readlines()[::-1]
output.write('\n'.join([line.strip() for line in reversed_lines]))
return reversed_lines[:number_lines-1]

The issue here is that the last line of your file does not end with a newline character. So f.readlines() will be something like the following (note that the final entry does not have the \n):
['Darkly I gaze into the days ahead,\n',
'And see her might and granite wonders there,\n',
'Beneath the touch of Time’s unerring hand,\n',
'Like priceless treasures sinking in the sand.']
So when you reverse this you end up writing to the file your first "line" doesn't actually write a \n and fReverse.writelines() doesn't add a line ending automatically. To fix this, just check to see if the last line from f.readlines() ends with \n and add it if necessary:
def tail(filename, nlines):
'''Returns a list containing the last n lines of the file.'''
f = open(filename, 'r')
fReverse = open('output.txt', 'w')
lines = f.readlines()
if not lines[-1].endswith('\n'):
lines[-1] += '\n'
fReverse.writelines(reversed(lines))
fReverse.close()
f.close()
fReverse = open('output.txt', 'r')
listFile = []
for i in range(1,nlines+1):
listFile.append(fReverse.readline(),)
fReverse.close()
return listFile

That's because the last line has no \n with it at the end ;P
You can try:
lines = reversed([l.strip()+'\n' for l in f])
fReverse.writelines(lines)

Related

delete all rows up to a specific row

How you can implement deleting lines in a text document up to a certain line?
I find the line number using the code:
#!/usr/bin/env python
lookup = '00:00:00'
filename = "test.txt"
with open(filename) as text_file:
for num, line in enumerate(text_file, 1):
if lookup in line:
print(num)
print(num) outputs me the value of the string, for example 66.
How do I delete all the lines up to 66, i.e. up to the found line by word?
As proposed here with a small modification to your case:
read all lines of the file.
iterate the lines list until you reach the keyword.
write all remaining lines
with open("yourfile.txt", "r") as f:
lines = iter(f.readlines())
with open("yourfile.txt", "w") as f:
for line in lines:
if lookup in line:
f.write(line)
break
for line in lines:
f.write(line)
That's easy.
filename = "test.txt"
lookup = '00:00:00'
with open(filename,'r') as text_file:
lines = text_file.readlines()
res=[]
for i in range(0,len(lines),1):
if lookup in lines[i]:
res=lines[i:]
break
with open(filename,'w') as text_file:
text_file.writelines(res)
Do you know what lines you want to delete?
#!/usr/bin/env python
lookup = '00:00:00'
filename = "test.txt"
with open(filename) as text_file, open('okfile.txt', 'w') as ok:
lines = text_file.readlines()
ok.writelines(lines[4:])
This will delete the first 4 lines and store them in a different document in case you wanna keep the original.
Remember to close the files when you're done with them :)
Providing three alternate solutions. All begin with the same first part - reading:
filename = "test.txt"
lookup = '00:00:00'
with open(filename) as text_file:
lines = text_file.readlines()
The variations for the second parts are:
Using itertools.dropwhile which discards items from the iterator until the predicate (condition) returns False (ie discard while predicate is True). And from that point on, yields all the remaining items without re-checking the predicate:
import itertools
with open(filename, 'w') as text_file:
text_file.writelines(itertools.dropwhile(lambda line: lookup not in line, lines))
Note that it says not in. So all the lines before lookup is found, are discarded.
Bonus: If you wanted to do the opposite - write lines until you find the lookup and then stop, replace itertools.dropwhile with itertools.takewhile.
Using a flag-value (found) to determine when to start writing the file:
with open(filename, 'w') as text_file:
found = False
for line in lines:
if not found and lookup in line: # 2nd expression not checked once `found` is True
found = True # value remains True for all remaining iterations
if found:
text_file.write(line)
Similar to #c yj's answer, with some refinements - use enumerate instead of range, and then use the last index (idx) to write the lines from that point on; with no other intermediate variables needed:
for idx, line in enumerate(lines):
if lookup in line:
break
with open(filename, 'w') as text_file:
text_file.writelines(lines[idx:])

Print multiple lines between two specific lines (keywords) from a text file

I have a textfile and want to print the lines between two other lines, using Python 3.5 on Windows. I want to print the characters of a drama to another file. The textfile looks like this:
...
Characters:
Peter, the king.
Anna, court lady.
Michael, caretaker.
Andre, soldier.
Tina, baker.
First scene.
...
I want to print all the character names between the lines "Characters:" and "First scene." My first try was:
newfile = open('newfile.txt', 'w')
with open('drama.txt', 'r') as f:
for line in f:
if line.startswith('Characters:'):
print(next(f), file = newfile)
But this only prints one line and i need several lines and the iteration with the next() function led always to a StopIteration Error after printing one line.
So is there a way to say: Print all lines between the lines "Characters:" and "First Scene."? It is not really possible to work with indices, because i'm doing it for several dramas and they have all a different number of characters.
You can set a boolean to know if to print a line or not:
newfile = open('newfile.txt', 'w')
printing = False
with open('drama.txt', 'r') as f:
for line in f:
if line.startswith('Characters:'):
printing = True
continue # go to next line
elif line.startswith('First scene'):
printing = False
break # quit file reading
if printing:
print(line, file=newfile)
newfile.close()
A regex solution:
import re
f = open('drama.txt', 'r')
content = f.read()
x = re.findall(r'Characters:(.*?)First scene\.', content, re.DOTALL)
print("".join(x))
'''
Peter, the king.
Anna, court lady.
Michael, caretaker.
Andre, soldier.
Tina, baker.
'''

Removing word from the beginning of my text object?

I have a function that scrapes speeches from millercenter.org and returns the processed speech. However, every one of my speeches has the word "transcript" at the beginning (that's just how it's coded into the HTML). So, all of my text files look like this:
\n <--- there's really just a new line, here, not literally '\n'
transcript
fourscore and seven years ago, blah blah blah
I have these saved in my U:/ drive - how can I iterate through these files and remove 'transcript'? The files look like this, essentially:
Edit:
speech_dict = {}
for filename in glob.glob("U:/FALL 2015/ENGL 305/NLP Project/Speeches/*.txt"):
with open(filename, 'r') as inputFile:
filecontent = inputFile.read();
filecontent.replace('transcript','',1)
speech_dict[filename] = filecontent # put the speeches into a dictionary to run through the algorithm
This is not doing anything to change my speeches. 'transcript' is still there.
I also tried putting it into my text-processing function, but that doesn't work, either:
def processURL(l):
open_url = urllib2.urlopen(l).read()
item_soup = BeautifulSoup(open_url)
item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'})
item_str = item_div.text.lower()
item_str_processed = punctuation.sub(' ',item_str)
item_str_processed_final = item_str_processed.replace('—',' ').replace('transcript','',1)
splitlink = l.split("/")
president = splitlink[4]
speech_num = splitlink[-1]
filename = "{0}_{1}".format(president, speech_num)
return filename, item_str_processed_final # giving back filename and the text itself
Here's an example url I run through processURL: http://millercenter.org/president/harding/speeches/speech-3805
You can use Python's excellent replace() for this:
data = data.replace('transcript', '', 1)
This line will replace 'transcript' with '' (empty string). The final parameter is the number of replacements to make. 1 for only the first instance of 'transcript', blank for all instances.
If you know that the data you want always starts on line x then do this:
with open('filename.txt', 'r') as fin:
for _ in range(x): # This loop will skip x no. of lines.
next(fin)
for line in fin:
# do something with the line.
print(line)
Or let's say you want to remove any lines before transcript:
with open('filename.txt', 'r') as fin:
while next(fin) != 'transcript': # This loop will skip lines until it reads the *transcript* lines.
break
# if you want to skip the empty line after *transcript*
next(fin) # skips the next line.
for line in fin:
# do something with the line.
print(line)

python: Open file, edit one line, save it as the same file

I want to open a file, search for a specific word, change the word and save the file again. Sounds really easy - but I just can't get it working... I know that I have to overwrite the whole file but only change this one word!
My Code:
f = open('./myfile', 'r')
linelist = f.readlines()
f.close
for line in linelist:
i =0;
if 'word' in line:
for number in arange(0,1,0.1)):
myNumber = 2 - number
myNumberasString = str(myNumber)
myChangedLine = line.replace('word', myNumberasString)
f2 = open('./myfile', 'w')
f2.write(line)
f2.close
#here I have to do some stuff with these files so there is a reason
#why everything is in this for loop. And I know that it will
#overwrite the file every loop and that is good so. I want that :)
If I make it like this, the 'new' myfile file contains only the changed line. But I want the whole file with the changed line... Can anyone help me?
****EDIT*****
I fixed it! I just turned the loops around and now it works perfectly like this:
f=open('myfile','r')
text = f.readlines()
f.close()
i =0;
for number in arange(0,1,0.1):
fw=open('mynewfile', 'w')
myNumber = 2 - number
myNumberasString = str(myNumber)
for line in text:
if 'word' in line:
line = line.replace('word', myNumberasString)
fw.write(line)
fw.close()
#do my stuff here where I need all these input files
You just need to write out all the other lines as you go. As I said in my comment, I don't know what you are really trying to do with your replace, but here's a slightly simplified version in which we're just replacing all occurrences of 'word' with 'new':
f = open('./myfile', 'r')
linelist = f.readlines()
f.close
# Re-open file here
f2 = open('./myfile', 'w')
for line in linelist:
line = line.replace('word', 'new')
f2.write(line)
f2.close()
Or using contexts:
with open('./myfile', 'r') as f:
lines = f.readlines()
with open('./myfile', 'w') as f:
for line in lines:
line = line.replace('word', 'new')
f.write(line)
Use fileinput passing in whatever you want to replace:
import fileinput
for line in fileinput.input("in.txt",inplace=True):
print(line.replace("whatever","foo"),end="")
You don't seem to be doing anything special in your loop that cannot be calculated first outside the loop, so create the string you want to replace the word with and pass it to replace.
inplace=True will mean the original file is changed. If you want to verify everything looks ok then remove the inplace=True for the first run and you will actually see the replaced output instead of the lines being written to the file.
If you want to write to a temporary file, you can use a NamedTemporaryFile with shutil.move:
from tempfile import NamedTemporaryFile
from shutil import move
with open("in.txt") as f, NamedTemporaryFile(dir=".",delete=False) as out:
for line in f:
out.write(line.replace("whatever","foo"))
move("in.txt",out.name)
One problem you may encounter is matching substrings with replace so if you know the word is always followed in the middle of a sentence surrounded by whitespace you could add that but if not you will need to split and check every word.
from tempfile import NamedTemporaryFile
from shutil import move
from string import punctuation
with open("in.txt") as f, NamedTemporaryFile(dir=".",delete=False) as out:
for line in f:
out.write(" ".join(word if word.strip(punctuation) != "whatever" else "foo"
for word in line.split()))
The are three issues with your current code. First, create the f2 file handle before starting the loop, otherwise you'll overwrite the file in each iteration. Third, you are writing an unmodified line in f2.write(line). I guess you meant f2.write(myChangedLine)? Third, you should add an else statement that writes unmodified lines to the file. So:
f = open('./myfile', 'r')
linelist = f.readlines()
f.close
f2 = open('./myfile', 'w')
for line in linelist:
i =0;
if 'word' in line:
for number in arange(0,1,0.1)):
myNumber = 2 - number
myNumberasString = str(myNumber)
myChangedLine = line.replace('word', myNumberasString)
f2.write(myChangedLine)
else:
f2.write(line)
f2.close()

Copy the last three lines of a text file in python?

I'm new to python and the way it handles variables and arrays of variables in lists is quite alien to me. I would normally read a text file into a vector and then copy the last three into a new array/vector by determining the size of the vector and then looping with a for loop a copy function for the last size-three into a new array.
I don't understand how for loops work in python so I can't do that.
so far I have:
#read text file into line list
numberOfLinesInChat = 3
text_file = open("Output.txt", "r")
lines = text_file.readlines()
text_file.close()
writeLines = []
if len(lines) > numberOfLinesInChat:
i = 0
while ((numberOfLinesInChat-i) >= 0):
writeLine[i] = lines[(len(lines)-(numberOfLinesInChat-i))]
i+= 1
#write what people say to text file
text_file = open("Output.txt", "w")
text_file.write(writeLines)
text_file.close()
To get the last three lines of a file efficiently, use deque:
from collections import deque
with open('somefile') as fin:
last3 = deque(fin, 3)
This saves reading the whole file into memory to slice off what you didn't actually want.
To reflect your comment - your complete code would be:
from collections import deque
with open('somefile') as fin, open('outputfile', 'w') as fout:
fout.writelines(deque(fin, 3))
As long as you're ok to hold all of the file lines in memory, you can slice the list of lines to get the last x items. See http://docs.python.org/2/tutorial/introduction.html and search for 'slice notation'.
def get_chat_lines(file_path, num_chat_lines):
with open(file_path) as src:
lines = src.readlines()
return lines[-num_chat_lines:]
>>> lines = get_chat_lines('Output.txt', 3)
>>> print(lines)
... ['line n-3\n', 'line n-2\n', 'line n-1']
First to answer your question, my guress is that you had an index error you should replace the line writeLine[i] with writeLine.append( ). After that, you should also do a loop to write the output :
text_file = open("Output.txt", "w")
for row in writeLine :
text_file.write(row)
text_file.close()
May I suggest a more pythonic way to write this ? It would be as follow :
with open("Input.txt") as f_in, open("Output.txt", "w") as f_out :
for row in f_in.readlines()[-3:] :
f_out.write(row)
A possible solution:
lines = [ l for l in open("Output.txt")]
file = open('Output.txt', 'w')
file.write(lines[-3:0])
file.close()
This might be a little clearer if you do not know python syntax.
lst_lines = lines.split()
This will create a list containing all the lines in the text file.
Then for the last line you can do:
last = lst_lines[-1]
secondLAst = lst_lines[-2]
etc... list and string indexes can be reached from the end with the '-'.
or you can loop through them and print specific ones using:
start = start line, stop = where to end, step = what to increment by.
for i in range(start, stop-1, step):
string = lst_lines[i]
then just write them to a file.

Categories