Previously, I used Spyder to write my files, but recently began making the transition to Vim. When I open a .py file using Vim, all previous lines are blended into the first, but separated with ^M.
My ~/.vimrc file uses filetype plugin indent on which I thought would solve this issue.
Thanks!
A file that has only ^M (also known as <CR>, or carriage return) as line separator is using the file format of mac. That seems to be the format of the file you're opening here.
Since this file format is so unusual, Vim will not try to detect it. You can tell Vim to detect it by adding the following to your vimrc file:
set file formats+=mac
Alternatively, you can use this format while opening a single file by using:
:e ++ff=mac script.py
You might want to convert these files to the more normal unix file format. You can do so after opening a file, with:
:set ff=unix
And then saving the file, with :w or similar.
From what I know ^M is the special character used for carriage return in vim. You should just be able to replace it
:%s/^M/\r/g
I would imagine that the ^M is there as some kind of specific configuration spider uses that couldn't be translated well in vim. I am just speculating here, could be totally wrong.
Substituting using hexadecimal notation is better in my opinion:
:%s/\%x0D$//e
Explanation: If the user try to type caret M instead of CtrlvEnter he of she will think that is a mistake.
Related
This question already has answers here:
Convert UTF-8 with BOM to UTF-8 with no BOM in Python
(7 answers)
Closed last year.
I'm opening a plain text file, parsing it, and adding different lines to existing, empty string variables. I add these variables into a new variable that is a multi-line fstring. Trying to write the data to a new text file is not behaving as expected.
Reading the original file works fine. Text is properly parsed, variables populated.
The multi-line fstring variable seems fine. Prints normally. Even tried formatting it different ways which I show below.
When writing to a new file, that's where the strangeness starts. I've tried 2 ways:
Straight coding the open function with w or w+
Adding the above to a function and using that inside main()
The file is saved to disk with the correct name. Trying to double-click open in Finder produces nothing. Right-click to open produces nothing. Trying to move to trash with command+delete gives an error:
It sounds like the file goes to trash, but as the file disappears from the folder a new one is created with the same name in its place.
If I try to open in TextMate via File > Open, it opens as a blank file with no errors.
Since I can't get rid of the file, I have to delete the directory and create the directory again with the same name, or force delete in Terminal using rm. Restarting the system does not help. Relaunching Finder does nothing. Saving text files from other apps works fine. Directory is chmod 755.
If I copy an existing text file into the output directory, rename it to what the file is expected to be named, and let python overwrite the contents, it doesn't work either. The file modification date changes (and I see the file "blink" in Finder) but the contents remain the same. However, the file is not corrupted and opens normally.
If I do the same but delete the text inside of the copied file first, then run the script, python writes no data to the file, I can't open it by double-clicking on it, and I get error -43 again with the odd non-trashing behavior.
The strangest thing is this: if I add another with open() at the end of the script, and open the file that was just created and supposedly written to, and print its contents, the contents print. It's like when the script ends the file contents are being removed or its being corrupted somehow. Tried to close the file inside the script even though it's not needed, but same behavior persists.
Code:
Here's the code for writing:
FORMAT='utf-8'
OUTPUT_DIR = '/Path/To/SaveFolder'
# as a function
def write_to_file(content, fpath, name):
the_file = os.path.join(fpath, name)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(content)
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
write_to_file(multiline_var, OUTPUT_DIR, filename)
# or hard coded in main()
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
the_file = os.path.join(OUTPUT_DIR, filename)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(multiline_var)
I have tried using w w+ wt and wt+ and with and without encoding='utf-8'
Here is an example of multi-line fstring variable:
# using triple quotes
multiline_var = f"""
[PROJ-{pcode}] {full_title} by {author}
{description}
{URL}
{DIVIDER_1}
{TEXT_BLURB}
Some text here and then {SOME_MORE_TEXT}"
{DIVIDER_1}
{SOME_LINK}
"""
# or inside parens
multiline_var = (
f"[PROJ-{pcode}] {full_title} by {author}\n"
f"{description}\n\n"
f"{URL}\n"
f"{DIVIDER_1}\n"
f"{TEXT_BLURB}\n\n"
f"Some text here and then {SOME_MORE_TEXT}\n"
f"{DIVIDER_1}\n\n"
f"{SOME_LINK}"
)
Using exiftool on the text file shows the following, so it looks the data is there but must be corrupted:
File Size : 1797 bytes
File Modification Date/Time : 2021:12:31 15:55:39-05:00
File Access Date/Time : 2021:12:31 15:58:13-05:00
File Inode Change Date/Time : 2021:12:31 15:55:39-05:00
File Permissions : -rw-r--r--
File Type : TXT
File Type Extension : txt
MIME Type : text/plain
MIME Encoding : utf-8
Byte Order Mark : No
Newlines : Unix LF
Line Count : 55
Word Count : 181
Not sure what I'm doing wrong. VScode shows no syntax errors in the script. There are no errors in Terminal when running the script. Have I made some simple mistake in the above code? Maybe the fstring variable is causing a problem?
Thanks to #bnaecker for leading me to the solution to this problem.
It appeared that when creating/writing to a text file with a long name, Python can corrupt it. Not sure why, as I save long names for images with Python image libraries all the time. Using a short name like "MyFile.txt" it worked just fine, but that was a red herring.
I have updated this post with my journey to the final solution for using the long names that are needed for my project, though I'm not sure why the problem exists.
First Attempts:
So far creating using a short name and then renaming to a long one.... attempts have failed. I did notice that python is locking the file it creates and never unlocks it. Not sure if this is the problem. Setting chflags with os.system('chflags nouchg') command does not work, not even with sudo, and not even in the Terminal doing it manually.
Using os.rename() in Python corrupts the file
Using os.system('mv oldFile.txt newFile.txt') corrupts the file
Manually using mv command in Terminal corrupts the file
Manually changing the filename in the Finder does not (wtf?)
I kept looking for workarounds but nothing did the job.
Round 2:
Progress!
After much tinkering, I discovered a hidden character inside the file. I ran cat /path/longfilename.txt in Terminal, selected and copied the output and pasted into VScode. Here is what I saw:
Somehow a hidden character is getting into the project code number.
Pasting it into a Unicode search engine it came up as a ZERO WIDTH NO-BREAK SPACE also known in Unicode as EF BB BF. However, when pasting this symbol into TextMate it shows up as <U+FEFF> which is?...
The Byte Order Mark!
Opening a normal utf-8 text file in a hex editor also shows the files starting with EFBBBF for the BOM.
Now, the text file being read and parsed at first has no blank lines to start the file, so I added a line break, and also tried adding some spaces. This time when writing the file I could open it, however, after sending it to the trash, the same behavior occurred and the file was broken again. It seems that because other corrupted versions were in the trash, it added the symbol back to the file name for some reason.
So what appears to be happening, for whatever reason, when Python opens the text file I'm parsing that has no line break at the top, it seems to be grabbing the BOM from the file and adding that to the first variable which is grabbing the first line of the text file. Since that text is a number code that starts the file name, the BOM symbol is being added to the file name as well as the code inside the text file.
Just... wow
The Current Solution:
I have to leave a blank line at the start of the text file that I'm opening and parsing and a simple line break won't do it. I have no idea why this is. I added some spaces for good measure because randomly the BOM would be added to the variable and filename again. So far (knock on wood) as long as the first line of that initial file has some spaces and then a line break, and previous corrupted files have been deleted from the trash, a long file name can be used for all the files I'm creating and writing to without any problems.
This corruption even persists if I remove the encoding flag from both of the open functions I'm using (one to read and parse, the other to create and write).
If anyone knows why this is happening, please share. I've never seen it mentioned before. I'm not sure if it's a python 3.8 bug, a mac OS bug, the way TextMate wrote the original file, or a combination of these.
Correct Solution:
Thanks to #tripleee for the proper way to handle this, as I don't remember seeing this before, though I haven't been using python for very long.
In order to ignore the BOM, reading in the text file to be parsed with an encoding='utf-8-sig' does the job. Seems to be why it exists. :)
Problem solved.
When I copy and paste the sentence How brave they’ll all think me at home! into a blank TextEdit rtf document on the Mac, it looks fine. But if I create an an apparently identical rtf file programatically, and write the same sentence into it, on opening TextEdit it appears as How brave they’ll all think me at home! In the following code, output is OK, but the file when viewed in TextEdit has problems with the right single quotation mark (here used as an apostrophe), unicode U-2019.
header = r"""{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf400
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 """
sen = 'How brave they’ll all think me at home!'
with open('staging.rtf', 'w+’) as f:
f.write(header)
f.write(sen)
f.write('}')
with open('staging.rtf') as f:
output = f.read()
print(output)
I’ve discovered from https://www.i18nqa.com/debug/utf8-debug.html that this may be caused by “UTF-8 bytes being interpreted as Windows-1252”, and that makes sense as it seems that ansicpg1252 in the header indicates US Windows.
But I still can’t work out how to fix it, even having read the similar issue here: Encoding of rtf file. I’ve tried replacing ansi with mac without effect. And adding ,encoding='utf8' to the open function doesn’t seem to help either.
(The reason for using rtf by the way is to be able to export sentences with colour-coded words, allow them to be manually edited, then read back in for further processing).
OK, I've found the answer myself. I needed to use , encoding='windows-1252' both when writing to the rtf file and also when reading from it.
I'm following along with the examples in a translated version of Wes McKinney's "Python for Data Analysis" and I was blocked in first example of Chapter 2
I think my problem arose because I saved a data file in a wrong path. is that right?
I stored a file, usagov_bitly_data2012-03-16-1331923249.txt, in C:\Users\HRR
and also stored folder, pydata-book-mater, that can be downloaded from http://github.com/pydata-book in C:\Users\HRR\Anaconda2\Library\bin.
Depends.
You might change the location you save your File or eddit the path you give to your code in Line 10. Since you're yousing relativ Paths i guess your script runs in C:\Users\HRR\Anaconda2\Library\bin, which means you have to go back to C:\Users\HRR or use an absolute Path ... or move the File, but hell you don't want to move a file every time you want to open it, like moving word files into msoffice file to open it, so try to change the Path.
And allways try harder ;)
In python open() will open from the current directory down unless given a full path (in linux that starts with / and windows <C>://). In your case the command is open the folder ch02 in the directory the script is running from and then open usagov_bitly_data2012-03-16-1331923249.txt in that folder.
Since you are storing the text file in C:\Users\HRR\usagov_bitly_data2012-03-16-1331923249.txt and you did not specify the directory of the script. I recommend the following command instead open(C:\\Users\\HRR\\usagov_bitly_data2012-03-16-1331923249.txt)
Note: the double \ is to escape the characters and avoid tabs and newlines showing up in the path.
I wrote python code to write to a file like this:
with codecs.open("wrtieToThisFile.txt",'w','utf-8') as outputFile:
for k,v in list1:
outputFile.write(k + "\n")
The list1 is of type (char,int)
The problem here is that when I execute this, file doesn't get separated by "\n" as expected. Any idea what is the problem here ? I think it is because of the
with
Any help is appreciated. Thanks in advance.
(I am using Python 3.4 with "Python Tools for Visual Studio" version 2.2)
If you are on windows the \n doesn't terminate a line.
Honestly, I'm surprised you are having a problem, by default any file opened in text mode would automatically convert the \n to os.linesep. I have no idea what codecs.open() is but it must be opening the file in binary mode.
Given that is the case you need to explicitly add os.linesep:
outputFile.write(k + os.linesep)
Obviously you have to import os somewhere.
Figured it out, from here:
How would I specify a new line in Python?
I had to use "\r\n" as in Windows, "\r\n" will work.
Per codecs.open's documentation, codecs.open opens the underlying file in binary mode, without line ending conversion. Frankly, codecs.open is semi-deprecated; in Python 2.7 and onwards, io.open (which is the same thing as the builtin open function in Python 3.x) handles 99% of the cases people used to use codecs.open, but better (faster, and without stupid issues like line endings). If you're reliably running on Python 3, just use plain open; if you need to run on Python 2.7 as well, import io and use io.open.
If you are on windows, try '\r\n'. Or open it with an editor that recognizes unix style new lines.
In the actual window where I right code is there a way to insert part of the code into everyline that I already have. Like insert a comma into all lines at the first spot>?
You need a file editor, not python.
Install the appropriate VIM variant for your operating system
Open the file you want to modify using VIM
Type: :%s/^/,/
Type: :wq
If you are in UNIX environment, open up a terminal, cd to the directory your file is in and use the sed command. I think this may work:
sed "s/\n/\n,/" your_filename.py > new_filename.py
What this says is to replace all \n (newline character) to \n, (newline character + comma character) in your_filename.py and to output the result into new_filename.py.
UPDATE: This is much better:
sed "s/^/,/" your_filename.py > new_filename.py
This is very similar to the previous example, however we use the regular expression token ^ which matches the beginning of each line (and $ is the symbol for end).
There are chances this doesn't work or that it doesn't even apply to you because you didn't really provide that much information in your question (and I would have just commented on it, but I can't because I don't have enough reputation or something). Good luck.
Are you talking about the interactive shell? (a.k.a. opening up a prompt and typing python)? You can't go back and edit what those previous commands did (as they have been executed), but you can hit the up arrow to flip through those commands to edit and reexecute them.
If you're doing anything very long, the best bet is to write your program into your text editor of choice, save that file, then launch it.
Adding a comma to the start of every line with Python:
import sys
src = open(sys.argv[1])
dest = open('withcommas-' + sys.argv[1],'w')
for line in src:
dest.write(',' + line)
src.close()
dest.close()
Call like so: C:\Scripts>python commaz.py cc.py. This is a bizzare thing to do, but who am I to argue.
Code is data. You could do this like you would with any other text file. Open the file, read the line, stick a comma on the front of it, then write it back to file.
Also, most modern IDEs/text editors have the ability to define macros. You could post a question asking for specific help for your editor. For example, in Emacs I would use C-x ( to start defining a macro, then ',' to write a comma, then C-b C-n to go back a character and down a line, then C-x ) to end my macro. I could then run this macro with C-x e, pressing e to execute it an additional time.