Why python2 shows \r (Raw escaped) and python3 does not?

Why python2 shows \r (Raw escaped) and python3 does not? - python

I have been having a path error: No file or directory found for hours. After hours of debugging, I realised that python2 added an invisible '\r' at the end of each line.
The input: (trainval.txt)
Images/K0KKI1.jpg Labels/K0KKI1.xml
Images/2KVW51.jpg Labels/2KVW51.xml
Images/MMCPZY.jpg Labels/MMCPZY.xml
Images/LCW6RB.jpg Labels/LCW6RB.xml
The code I used to debug the error
with open('trainval.txt', "r") as lf:
for line in lf.readlines():
print ((line),repr(line))
img_file, anno = line.strip("\n").split(" ")
print(repr(img_file), repr(anno))
Python2 output:
("'Images/K0KKI1.jpg'", "'Labels/K0KKI1.xml\\r'")
('Images/2KVW51.jpg Labels/2KVW51.xml\r\n', "'Images/2KVW51.jpg Labels/2KVW51.xml\\r\\n'")
("'Images/2KVW51.jpg'", "'Labels/2KVW51.xml\\r'")
('Images/MMCPZY.jpg Labels/MMCPZY.xml\r\n', "'Images/MMCPZY.jpg Labels/MMCPZY.xml\\r\\n'")
("'Images/MMCPZY.jpg'", "'Labels/MMCPZY.xml\\r'")
('Images/LCW6RB.jpg Labels/LCW6RB.xml\r\n', "'Images/LCW6RB.jpg Labels/LCW6RB.xml\\r\\n'")
("'Images/LCW6RB.jpg'", "'Labels/LCW6RB.xml\\r'")
Python3 output:
Images/K0KKI1.jpg Labels/K0KKI1.xml
'Images/K0KKI1.jpg Labels/K0KKI1.xml\n'
'Images/K0KKI1.jpg' 'Labels/K0KKI1.xml'
Images/2KVW51.jpg Labels/2KVW51.xml
'Images/2KVW51.jpg Labels/2KVW51.xml\n'
'Images/2KVW51.jpg' 'Labels/2KVW51.xml'
Images/MMCPZY.jpg Labels/MMCPZY.xml
'Images/MMCPZY.jpg Labels/MMCPZY.xml\n'
'Images/MMCPZY.jpg' 'Labels/MMCPZY.xml'
Images/LCW6RB.jpg Labels/LCW6RB.xml
'Images/LCW6RB.jpg Labels/LCW6RB.xml\n'
'Images/LCW6RB.jpg' 'Labels/LCW6RB.xml'
As annoying as it was, it was that small '\r' who caused the path error. I could not see it in my console until I write the script above. My question is: Why is this '\r' even there? I did not create it. Something somewhere added it there. It would be helpful if someone could tell me what is the use of this small 'r' , why did it appear in python2 and not in python3 and how to avoid getting bugs due to it.

there's probably a subtle difference of processing between Windows text file in python 2 & 3 versions.
The issue here is that your file has a Windows text format, and contains one or several carriage return chars before the linefeed. A quick & generic fix would be to change:
img_file, anno = line.strip("\n").split(" ")
by just:
img_file, anno = line.split()
Without arguments str.split is very smart:
it splits according to any kind of whitespace (linefeed, space, carriage return, tab)
it removes empty fields (no need for strip after all)
So use that cross-platform/python version agnostic form unless you need really specific split operation, and your problems will be history.
As an aside, don't do for line in lf.readlines(): but just for line in lf:, it will read & yield the lines one by one, handy when the file is big so you don't consume too much memory.

Related

Why does the python IDLE interpreter put a newline at the end of each compiled file?

I was wondering why that occurs... Is it something to do with reading the file by the interpreter, or a unintentional thing... Maybe it's just for EOU... (Ease Of Use). Who knows...
I have tried to pre-newline the file to check what happens and it stays unchanged. Also, putting 2 or more newlines at the end of the document does alter the output, so the amount of newlines stays the same.

Python not inserting an EOF character after file close?

I'm having a strange problem where some Python code that prints to a file is not inserting an EOF character. Basically, the Python script generates runscripts to later be submitted as jobs on a cluster. I essentially wrote the entire runscript between """'s, allowing for variables to be plugged in (to vary some parameters in my simulation). I write the runscripts using the
with open(file_name, 'w') as runscrpt:
runscrpt.write("""ENTIRE_FILE_CONTENTS_HERE""")
syntax. I can give the actual code if necessary but it's not much more than above. Despite the script running fine and generating all of my runsripts, whenever I submitted them nothing happened. It took me a long time to figure out why, but it's because they're missing an EOF character. I can fix it by, for example, opening one, adding some trailing whitespace or a blank line somewhere in vim, and resaving the file.
Why isn't Python inserting the EOF character, and is there a better way to fix this than manually making trivial edits to all the files with vim?

Sounds like you mean there is no EOL (not EOF!) at the end, because that's what diff will typically tell you. Just add a newline at the end of the write (make sure there is a newline before the final """ terminator, or write a separate newline explicitly).
with open(file_name, 'w') as runscript:
runscript.write("""ENTIRE_FILE_CONTENTS_HERE\n""")
(As a bonus, I added the missing vowel.)

Specifying filename in os.system call from python

I am creating a simple file in python to reorganize some text data I grabbed from a website. I put the data in a .txt file and then want to use the "tail" command to get rid of the first 5 lines. I'm able to make this work for a simple filename shown below, but when I try to change the filename (to what I'd actually like it to be) I get an error. My code:
start = 2010
end = 2010
for i in range(start,end+1)
year = str(i)
...write data to a file called file...
teamname=open(file).readline() # want to use this in the new filename
teamfname=teamname.replace(" ","") #getting rid of spaces
file2 = "gotdata2_"+year+".txt"
os.system("tail -n +5 gotdata_"+year+".txt > "+file2)
The above code works as intended, creating file, then creating file2 that excludes the first 5 lines of file. However, when I change the name of file2 to be:
file2 = teamfname+"_"+year+".txt"
I get the error:
sh: line 1: _2010.txt: command not found
It's as if the end of my file2 statement is getting chopped off and the .txt part isn't being recognized. In this case, my code outputs a file but is missing the _2010.txt at the end. I've double checked that both year and teamfname are strings. I've also tried it with and without spaces in the teamfname string. I get the same error when I try to include a os.system mv statement that would rename the file to what I want it to be, so there must be something wrong with my understanding of how to specify the string here.
Does anyone have any ideas about what causes this? I haven't been able to find a solution, but I've found this problem difficult to search for.

Without knowing what your actual strings are, it's impossible to be sure what the problem is. However, it's almost certainly something to do with failing to properly quote and/or escape arguments for the command line.
My first guess would be that you have a newline in the middle of your filename, and the shell is truncating the command at the newline. But I wouldn't bet too heavily on that. If you actually printed out the repr of the pathname, I could tell you for sure. But why go through all this headache?
The solution to almost any problem with os.system is to not use os.system.
If you look at the docs, they even tell you this:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.
If you use subprocess instead of os.system, you can avoid the shell entirely. You can also pass arguments as a list instead of trying to figure out how to quote them and escape them properly. Which would completely avoid the exact problem you're having.
For example, if you do this:
file2 = "gotdata2_"+year+".txt"
with open(file2, 'wb') as f:
subprocess.check_call(['tail', '-n', '+5', "gotdata_"+year+".txt"], stdout=f)
Then, if you change that first line to this:
file2 = teamfname+"_"+year+".txt"
It will still work even if teamfname has a space or a quote or another special character in it.
That being said, I'm not sure why you want to use tail in the first place. You can skip the first 5 lines just as easily directly in Python.

Parsing large (20GB) text file with python - reading in 2 lines as 1

I'm parsing a 20Gb file and outputting lines that meet a certain condition to another file, however occasionally python will read in 2 lines at once and concatenate them.
inputFileHandle = open(inputFileName, 'r')
row = 0
for line in inputFileHandle:
row = row + 1
if line_meets_condition:
outputFileHandle.write(line)
else:
lstIgnoredRows.append(row)
I've checked the line endings in the source file and they check out as line feeds (ascii char 10). Pulling out the problem rows and parsing them in isolation works as expected. Am I hitting some python limitation here? The position in the file of the first anomaly is around the 4GB mark.

Quick google search for "python reading files larger than 4gb" yielded many many results. See here for such an example and another one which takes over from the first.
It's a bug in Python.
Now, the explanation of the bug; it's not easy to reproduce because it depends both on the internal FILE buffer size and the number of chars passed to fread().
In the Microsoft CRT source code, in open.c, there is a block starting with this encouraging comment "This is the hard part. We found a CR at end of buffer. We must peek ahead to see if next char is an LF."
Oddly, there is an almost exact copy of this function in Perl source code:
http://perl5.git.perl.org/perl.git/blob/4342f4d6df6a7dfa22a470aa21e54a5622c009f3:/win32/win32.c#l3668
The problem is in the call to SetFilePointer(), used to step back one position after the lookahead; it will fail because it is unable to return the current position in a 32bit DWORD. [The fix is easy; do you see it?]
At this point, the function thinks that the next read() will return the LF, but it won't because the file pointer was not moved back.
And the work-around:
But note that Python 3.x is not affected (raw files are always opened in binary mode and CRLF translation is done by Python); with 2.7, you may use io.open().

The 4GB mark is suspiciously near the maximum value that can be stored in a 32-bit register (2**32).
The code you've posted looks fine by itself, so I would suspect a bug in your Python build.
FWIW, the snippet would be a little cleaner if it used enumerate:
inputFileHandle = open(inputFileName, 'r')
for row, line in enumerate(inputFileHandle):
if line_meets_condition:
outputFileHandle.write(line)
else:
lstIgnoredRows.append(row)

How do I modify program files in Python?

In the actual window where I right code is there a way to insert part of the code into everyline that I already have. Like insert a comma into all lines at the first spot>?

You need a file editor, not python.
Install the appropriate VIM variant for your operating system
Open the file you want to modify using VIM
Type: :%s/^/,/
Type: :wq

If you are in UNIX environment, open up a terminal, cd to the directory your file is in and use the sed command. I think this may work:
sed "s/\n/\n,/" your_filename.py > new_filename.py
What this says is to replace all \n (newline character) to \n, (newline character + comma character) in your_filename.py and to output the result into new_filename.py.
UPDATE: This is much better:
sed "s/^/,/" your_filename.py > new_filename.py
This is very similar to the previous example, however we use the regular expression token ^ which matches the beginning of each line (and $ is the symbol for end).
There are chances this doesn't work or that it doesn't even apply to you because you didn't really provide that much information in your question (and I would have just commented on it, but I can't because I don't have enough reputation or something). Good luck.

Are you talking about the interactive shell? (a.k.a. opening up a prompt and typing python)? You can't go back and edit what those previous commands did (as they have been executed), but you can hit the up arrow to flip through those commands to edit and reexecute them.
If you're doing anything very long, the best bet is to write your program into your text editor of choice, save that file, then launch it.
Adding a comma to the start of every line with Python:
import sys
src = open(sys.argv[1])
dest = open('withcommas-' + sys.argv[1],'w')
for line in src:
dest.write(',' + line)
src.close()
dest.close()
Call like so: C:\Scripts>python commaz.py cc.py. This is a bizzare thing to do, but who am I to argue.

Code is data. You could do this like you would with any other text file. Open the file, read the line, stick a comma on the front of it, then write it back to file.
Also, most modern IDEs/text editors have the ability to define macros. You could post a question asking for specific help for your editor. For example, in Emacs I would use C-x ( to start defining a macro, then ',' to write a comma, then C-b C-n to go back a character and down a line, then C-x ) to end my macro. I could then run this macro with C-x e, pressing e to execute it an additional time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.