My question is about Python build-in open function with a+ mode. First, from this thread, I know for a+ mode, subsequent writes to the file will always end up at the current end of file. I write the following simple program to confirm this fact:
file = 'test0.txt' # any non-empty text file
with open(file, 'a+', encoding='utf-8') as f:
f.seek(0)
f.write('added')
# print(f.tell())
cc = f.readlines() # strange!
print(cc)
Yes, 'added' is appended on the end even if forcing the stream to position at the beginning of the file by seek() method.
I think cc should be [], because the stream is positioned at the end of the file. However, it is wrong! The result is:
cc shows all the text before appending added. Moreover, either switching to comment f.seek(0) or switching to uncomment print(f.tell()) makes things normal: c turns to be [] as expected. For me, this means tell() indeed changes something—not just reporting the position—in this case. I would be very grateful if anyone could tell me the logic behind this.
Such examples are very implementation-dependent and if you are looking to get an exact answer, you have to dive deep into the source code.
Indeed you are trying to get an undesired outcome using the methods in an undesired order. I know I am not providing a solution here as to why this happens, but here are some of the questions asked before and they still don't really have the detailed explanation you seek.
All in all, be my guest, go ahead read the source code, understand what happens under to hood and explain us what the issue is. If you think this would take a lot of time (I know from experience), use the methods in the order as they should be and avoid ambiguities and undesired behaviors. Like #Blckknght says, don't play with fire unless you are willing to put it out.
At the end of the day, this might actually be a security problem or not if inspected closely, though I doubt since you are not overflowing anything or overwriting to unexpected places, as there are various checks and controls for such operations.
Weird file seeking behaviour
Python 3 tell() gets out of sync with file pointer in append+read mode
Related
Am I correct in thinking that that Python doesn't have a direct equivalent for Perl's __END__?
print "Perl...\n";
__END__
End of code. I can put anything I want here.
One thought that occurred to me was to use a triple-quoted string. Is there a better way to achieve this in Python?
print "Python..."
"""
End of code. I can put anything I want here.
"""
The __END__ block in perl dates from a time when programmers had to work with data from the outside world and liked to keep examples of it in the program itself.
Hard to imagine I know.
It was useful for example if you had a moving target like a hardware log file with mutating messages due to firmware updates where you wanted to compare old and new versions of the line or keep notes not strictly related to the programs operations ("Code seems slow on day x of month every month") or as mentioned above a reference set of data to run the program against. Telcos are an example of an industry where this was a frequent requirement.
Lastly Python's cult like restrictiveness seems to have a real and tiresome effect on the mindset of its advocates, if your only response to a question is "Why would you want to that when you could do X?" when X is not as useful please keep quiet++.
The triple-quote form you suggested will still create a python string, whereas Perl's parser simply ignores anything after __END__. You can't write:
"""
I can put anything in here...
Anything!
"""
import os
os.system("rm -rf /")
Comments are more suitable in my opinion.
#__END__
#Whatever I write here will be ignored
#Woohoo !
What you're asking for does not exist.
Proof: http://www.mail-archive.com/python-list#python.org/msg156396.html
A simple solution is to escape any " as \" and do a normal multi line string -- see official docs: http://docs.python.org/tutorial/introduction.html#strings
( Also, atexit doesn't work: http://www.mail-archive.com/python-list#python.org/msg156364.html )
Hm, what about sys.exit(0) ? (assuming you do import sys above it, of course)
As to why it would useful, sometimes I sit down to do a substantial rewrite of something and want to mark my "good up to this point" place.
By using sys.exit(0) in a temporary manner, I know nothing below that point will get executed, therefore if there's a problem (e.g., server error) I know it had to be above that point.
I like it slightly better than commenting out the rest of the file, just because there are more chances to make a mistake and uncomment something (stray key press at beginning of line), and also because it seems better to insert 1 line (which will later be removed), than to modify X-many lines which will then have to be un-modified later.
But yeah, this is splitting hairs; commenting works great too... assuming your editor supports easily commenting out a region, of course; if not, sys.exit(0) all the way!
I use __END__ all the time for multiples of the reasons given. I've been doing it for so long now that I put it (usually preceded by an exit('0');), along with BEGIN {} / END{} routines, in by force-of-habit. It is a shame that Python doesn't have an equivalent, but I just comment-out the lines at the bottom: extraneous, but that's about what you get with one way to rule them all languages.
Python does not have a direct equivalent to this.
Why do you want it? It doesn't sound like a really great thing to have when there are more consistent ways like putting the text at the end as comments (that's how we include arbitrary text in Python source files. Triple quoted strings are for making multi-line strings, not for non-code-related text.)
Your editor should be able to make using many lines of comments easy for you.
While working on an image file I have, I tried reading it into a string and printing it on my IDLE 3.6. The string is roughly 160K bytes long and I already saved it into a txt file on my machine. That took about a second, so I figured printing it would take about the same...
Never have I been so wrong...
Now, I checked this and the first answer suggests that the print itself is problematic. In their case, the format was non-standard, so I'm not sure if my case is the same. Second, if the print is the problem, why the IDLE seem to be slow after the print is done?
This is how I run it:
with open(location_of_160KB_png_file, "rb") as imageFile:
f = imageFile.read()
b = bytearray(f)
b=''.join([str(bb) for bb in b])
b[:10] # this prints easily (on IDLE I don't have to use _print_ function, I can just type the variable name)
b # this, however...
The issue, as explained in the answers to the link you gave, is that the tk Text widget is optimized for handling short lines. I have loaded IDLE's Shell with over 500000 lines of maybe 40 chars. That is 20 million chars, way larger than any file a person would write. It is well suited for the intended use.
In the referenced link, a 10000 char line is built 1 char at a time. Tk Text bogs down somewhere in the low 1000s. You apparently threw 160000 chars all at once. 10000 all at once is enough.
PS: Echoing expressions without a print statement is standard Python interactive interpreter behavior. I an fairly sure that this was probably copied from predecessors.
I've been trying to remove the numberings from the following lines using a Python script.
jokes.txt:
It’s hard to explain puns to kleptomaniacs because they always take things literally.
I used to think the brain was the most important organ. Then I thought, look
what’s telling me that.
When I run this Python script:
import re
with open('jokes.txt', 'r+') as original_file:
modfile = original_file.read()
modfile = re.sub("\d+\. ", "", modfile)
original_file.write(modfile)
The numbers are still there and it gets appended like this:
It’s hard to explain puns to kleptomaniacs because they always take things literally.
I used to think the brain was the most important organ. Then I thought, look what’s telling me that.1. It’s hard to explain puns to
kleptomaniacs because they always take things literally.ഀഀ2. I used to think the brain was the most important organ. Then I thought, look what’s telling me that.
I guess the regular expression re.sub("\d+\. ", "", modfile)finds all the digits from 0-9 and replaces it with an empty string.
As a novice, I'm not sure where I messed up. I'd like to know why this happens and how to fix it.
You've opened the file for reading and writing, but after you've read the file in you just start writing without specifying where to write to. That causes it to start writing where you left off reading - at the end of the file.
Other than closing the file and re-opening it just for writing, here's a way to write to the file:
import re
with open('jokes.txt', 'r+') as original_file:
modfile = original_file.read()
modfile = re.sub("\d+\. ", "", modfile)
original_file.seek(0) # Return to start of file
original_file.truncate() # Clear out the old contents
original_file.write(modfile)
I don't know why the numbers were still there in the part that you appended, as this worked just fine for me. You might want to add a caret (^) to the start of your regex (resulting in "^\d+\. "). Carets match the start of a line, making it so that if one of your jokes happens to use something like 1. in the joke itself the number at the beginning will be removed but not the number inside the joke.
I used the following line in a program.
text = open("C:\Python27\Scripts\wordlist.txt", "r").read().split()
This creates a list called text. My question is, is there still an open file that needs closing? Or is it not necessary in this case, or perhaps not really possible... I tried looking through locals() and globals() for any 'file' type object, but there was none. Would such a line of code be considered bad practice for some reason? On the surface, for my purposes, it seemed rather handy. (the wordlist.txt is a lengthy tome)
Yes I would consider this bad practice. Best practice to use when handling files is to use the with statement. The code would then look something like this
with open("C:\Python27\Scripts\wordlist.txt", "r") as textfile:
result = textfile.read.split()
Using the with statement implicitly means that the file will be closed upon leaving the scope, so you don't have to worry about it. Read the above link for more information on how it works!
It should be closed, but just note that a with statement is much more pythonic and better to use, because it is guaranteed to close the file, even if an error is raised:
with open("C:\Python27\Scripts\wordlist.txt") as myfile: # Second argument not needed. 'r' is by default
text = myfile.read().split()
In python mode, when I forward-word. The cursor jumps from H to d (Hello_World). But in another mode(shell-mode or c-mode), the cursor jumps from H to _.
I want the result which i get in Python mode, even in the other mode. What should I do?
PS: I saw a similar question before, I have searched, but I could't find it.
I think you're looking for this:
(modify-syntax-entry ?_ "w")
Underscores will be treated as part of a word. This command will change the syntax table of the mode you're currently in. AFAIK there's no way to change syntax globally. However, you could try modifying the standard syntax table. Most major modes inherit the standard-syntax-table.
(modify-syntax-entry ?_ "w" standard-syntax-table)
If that doesn't work, I guess you have to add mode-hooks for all modes you're using and modify their syntax tables individually.
Things got a little simpler since Emacs 24.4. There's now a M-x superword-mode that has the desired effect.