Saving file with apostrophe in the name (Python 3.4) - python

Trying to save image files in batches. Works nicely, but the list of names for each file sometimes includes apostrophes, and everything stops.
The offending script is:
pic.save(r"C:\Python34\Scripts\{!s}.jpg".format(name))
The apostrophes in the names aren't a problem when I embed them in a url with selenium
browser.get("https://website.com/{!s}".format(name))
or when I print the destination file name, e.g.
print(r"C:\Python34\Scripts\{!s}.jpg".format(name))
Which is fine to turn out like
C:\Python34\Scripts['It's fine'].jpg
so I assume this kind of problem has something to do with the save function.
The trace back calls the pic.save line of code in PIL\Image.py and says the OSError: [Errno 22] is an Invalid argument in the save destination.
Using Windows 7 if that matters.
Probably super-novice error, but I've been reading threads and can't figure this out--workaround would be cleaning the list of apostrophes before using it, which would be annoying but acceptable.
Any help appreciated.
---edited to fix double quotes as single, just mistyped when writing this post...doh.

It's not a Python problem, but Windows, or rather the file system, file naming rules. From the MSDN:
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
On UNIX type systems, all except the / would be valid (although most would be a bad idea). A further "character", binary zero 0x00, is invalid on most file systems.
Rules for URLs are different again.
So you are going to have to write a sanitiser for filenames avoiding these characters. A regular expression would probably be the easiest, but you will have to choose replacement characters that don't occur naturally.
Edit: I was assuming that Error 22 was reporting an invalid filename, but I was wrong, it actually means "The device does not recognise the command".
See https://stackoverflow.com/questions/19870570/pil-giving-oserror-errno-22-when-opening-gif. The accepted reply is rather weird though.
I Google'd "python PIL OSError Errno 22", you might like to try the same and see if any of the conditions apply to you, but clearly you are not alone, if that's any consolation.
Sorry I can't do more.

Related

Python FileNotFound error caused by Python inserting unwanted slashes in string

I am trying to save a figure from Matplotlib to a folder location on a drive and i am getting some unwanted behavior from the filepath.
This is what i have set up to run with a real string type to handle the "\" escape character.
save_path = r"\\nemesis\Network Planning\Team Members\Taylor\2020_04_23 - COVID Impact
Adjustment\Test Stores\State and Region Growth - " +str(Store_ID)+ ".jpg"
print(save_path)
plt.savefig(save_path)
The print statement displays the correct file path string
However when i run the savefig python appears to add an extra slash next to every existing slash in the string and gives the FileNotFound error. Full error transcript below.
FileNotFoundError: [Errno 2] No such file or directory: '\\\\nemesis\\Network Planning\\Team Members\\Taylor\\2020_04_23 - COVID Impact Adjustment\\Test Stores\\State and Region Growth - 17062.jpg'
I am at a loss for the reasons as to why this is occurring and have tried a bunch of different string methods and none have seemed to work.
Any help is much appreciated
To answer your question, I'll need to explain some background on raw strings. Raw strings are just an easier way to include backslashes in a normal string without you needing to escape them. For example, defining a string that would be printed as "a\b\c" using normal string syntax, you would need to write my_string = "a\\b\\c", but with raw strings, you only need to write r"a\b\c", but the resulting string is equal in both cases:
s = r"a\b\c"
s2 = "a\\b\\c"
s == s2 # Evaluates to True
When you print the string, print() excludes the extra backslashes required to recreate the string using normal syntax:
print(s) # -> a\b\c
To view a representation of the string suitable for recreating it, use repr(s):
print(repr(s)) # -> "a\\b\\c"
Now for your question. The raw string you make may look like what you want when you use print(), as it excludes the extra slashes, but isn't what you want. For one thing, I don't think you meant to have two backslashes at the beginning of the path.
save_path = r"\\nemesis\Network Planning\..."
print(save_path) # Prints the correct path, save the extra leading backslash
print(repr(save_path)) # Reveals the normal string representation, which requires 4 backslashes to create (where there should be only two).
Fixing this problem is simple: represent your file path differently. Either use normal strings and escape all the backslashes manually: "\\nemesis\\Network Planning\\Team Members\\Taylor\\2020_04_23 - COVID Impact Adjustment\\Test Stores\\State and Region Growth - " +str(Store_ID)+ ".jpg" or just use os.path.join("\\nemesis", "Network Planning", "Team Members", "Taylor", "2020_04_23 - COVID Impact Adjustment", "Test Stores", "State and Region Growth - "+ str(Store_ID)+ ".jpg") to automatically join the directories with all the proper backslashes (I can't test that second one because I'm on Linux)
Hope this helped!

Exporting multiple csv files with dynamic naming

I created about 200 csv files in Python and now need to download them all.
I created the files from a single file using:
g = df.groupby("col")
for n,g in df.groupby('col'):
g.to_csv(n+'stars'+'.csv')
When I try to use this same statement to export to my machine I get a syntax error and I'm not sure what I'm doing wrong:
g = df.groupby("col")
for n,g in df.groupby('col'):
g.to_csv('C:\Users\egagne\Downloads\'n+'stars'+'.csv'')
Error:
File "<ipython-input-27-43a5bfe55259>", line 3
g.to_csv('C:\Users\egagne\Downloads\'n+'stars'+'.csv'')
^
SyntaxError: invalid syntax
I'm in Jupyter lab, so I can download each file individually but I really don't want to have to do that.
You're possibly mixing up integers and strings, and the use of backslash in literals is dangerous anyway. Consider using the following
import os
inside the loop
f_name = os.path.join('C:', 'users', ' egagne', 'Downloads', str(n), 'stars.csv')
g.to_csv(f_name)
with os.path.join taking care of the backslashes for you.
g.to_csv('C:\Users\egagne\Downloads\'n+'stars'+'.csv'')
needs to be
g.to_csv('C:\\Users\\egagne\\Downloads\\'+n+'stars.csv').
There were two things wrong -- the backslash is an escape character so if you put a ' after it, it will be treated as part of your string instead of a closing quote as you intended it. Using \\ instead of a single \ escapes the escape character so that you can include a backslash in your string.
Also, you did not pair your quotes correctly. n is a variable name but from the syntax highlighting in your question it is clear that it is part of the string. Similarly you can see that stars and .csv are not highlighted as part of a string, and the closing '' should be a red flag that something has gone wrong.
Edit: I addressed what is causing the problem but Ami Tavory's answer is the right one -- though you know this is going to run on windows it is a better practice to use os.path.join() with directory names instead of writing out a path in a string. str(n) is also the right way to go if you are at all unsure about the type of n.

How to open a file in its default program with python

I want to open a file in python 3.5 in its default application, specifically 'screen.txt' in Notepad.
I have searched the internet, and found os.startfile(path) on most of the answers. I tried that with the file's path os.startfile(C:\[directories n stuff]\screen.txt) but it returned an error saying 'unexpected character after line continuation character'. I tried it without the file's path, just the file's name but it still didn't work.
What does this error mean? I have never seen it before.
Please provide a solution for opening a .txt file that works.
EDIT: I am on Windows 7 on a restricted (school) computer.
It's hard to be certain from your question as it stands, but I bet your problem is backslashes.
[EDITED to add:] Or actually maybe it's something simpler. Did you put quotes around your pathname at all? If not, that will certainly not work -- but once you do, you will find that then you need the rest of what I've written below.
In a Windows filesystem, the backslash \ is the standard way to separate directories.
In a Python string literal, the backslash \ is used for putting things into the string that would otherwise be difficult to enter. For instance, if you are writing a single-quoted string and you want a single quote in it, you can do this: 'don\'t'. Or if you want a newline character, you can do this: 'First line.\nSecond line.'
So if you take a Windows pathname and plug it into Python like this:
os.startfile('C:\foo\bar\baz')
then the string actually passed to os.startfile will not contain those backslashes; it will contain a form-feed character (from the \f) and two backspace characters (from the \bs), which is not what you want at all.
You can deal with this in three ways.
You can use forward slashes instead of backslashes. Although Windows prefers backslashes in its user interface, forward slashes work too, and they don't have special meaning in Python string literals.
You can "escape" the backslashes: two backslashes in a row mean an actual backslash. os.startfile('C:\\foo\\bar\\baz')
You can use a "raw string literal". Put an r before the opening single or double quotes. This will make backslashes not get interpreted specially. os.startfile(r'C:\foo\bar\baz')
The last is maybe the nicest, except for one annoying quirk: backslash-quote is still special in a raw string literal so that you can still say 'don\'t', which means you can't end a raw string literal with a backslash.
The recommended way to open a file with the default program is os.startfile. You can do something a bit more manual using os.system or subprocess though:
os.system(r'start ' + path_to_file')
or
subprocess.Popen('{start} {path}'.format(
start='start', path=path_to_file), shell=True)
Of course, this won't work cross-platform, but it might be enough for your use case.
For example I created file "test file.txt" on my drive D: so file path is 'D:/test file.txt'
Now I can open it with associated program with that script:
import os
os.startfile('d:/test file.txt')

Vexing Python syntax error

I am writing a python script using version 2.7.3. In the script a line is
toolsDir = 'tools/'
When I run this in terminal I get SyntaxError: invalid syntax on the last character in the string 'r'. I've tried renaming the string, using " as opposed to '. If I actually go into python via bash and declare the string in one line and print it I get no error.
I checked the encoding via file -i update.py and I get text/x-python; charset=us-ascii
I have used TextWrangler, nano and LeafPad as the text editors.
I have a feeling it may be something with the encoding of one of the editors. I have had this script run before without any errors.
Any advice would be greatly appreciated.
The string is 'tools/'. toolsDir is a variable. You're free to use different terminology, of course, but you'll end up confusing people trying to help you. The only r in that line is the last character of the variable name, so I assume that's the location of the error.
Most likely you've managed to introduce a fixed-width space (character code 0xA0) instead of an ordinary space. Try deleting SP=SP (all three characters) and retyping them.
Try running the code through pylint.
You probably have a syntax error on a nearby line before this one. Try commenting this line out and see if the error moves.
You might have a whitespace error, don't forget whitespace counts in python. If you've mixed tabs and spaces anywhere in your file it can throw the syntax checker off by several lines.
If you copied and pasted lines into this from any other source you may have copied whitespace in that doesn't fit with whichever convention you used.
The error was, of course, a silly one.
In one of my imports I use try: without closing or catching the error condition. pylint did not catch this and the error message did not indicate this.
If someone in the future has this triple check all opening code for syntax errors.

PDFtotext - whitespace showing as aacute on commandline

I am extracting text using python from a textfile created from pdf using pdftotext. It is one of 2000 files and in this particular one, a line of keywords ends in EU. The remainder of the line is blank to the naked eye and so is the following line.
The program normally strips off any trailing blanks at the end of a line and ignores the subsequent blank line.
In this instance, it is saving the whitespace which is seen when it is printed out in at textfile between "EU. " and similarly in html (Simile Exhibit).
I also printed to the command line and here I see a string of aacute. [?]
I thought the obvious way to deal with this was to search and replace the accute. I've tried to do that with a compile statement and I've played with permutations of decoding the incoming text.
Oddly though, when I print "\255" I don't get an aacute, I get an o grave.
It seems likely with this odd combination of errors that I have misunderstood something fundamental. Any tips of how to begin unravelling this?
Many thanks.
The first tip is not to print wildly to all possible output mechanisms using various unstated encodings. Find out exactly what you have got. Do this:
print repr(the_line_with_the_problem) # Python 2.x
print(ascii(the_line_with_the_problem)) # Python 3.x
and edit your question and copy/paste the result.
Second tip: When asking for help, give information about your environment:
What version of Python? What version of what operating system?
Also show locale-related info; following example is from my computer running Python 2.7 in a Windows 7 Command Prompt window::
>>> import sys, locale
>>> sys.getdefaultencoding()
'ascii'
>>> sys.stdout.encoding
'cp850'
>>> locale.getdefaultlocale()
('en_AU', 'cp1252')
>>>
Third tip: Don't use your own jargon ... the concepts "Simile Exhibit", "printed to the command line", and "compile statement" need explanation.
What is the relevance of "\255"? Where did you get that from?
Wild guesses while waiting for some facts to emerge:
(1) The offending character is U+00A0 NO-BREAK SPACE aka NBSP which appears in your text as "\xA0" and when sent to stdout in a Western European locale on Windows using a Command Prompt window would be treated as being encoded in cp850 and thus appear as a-acute. How this could be transmogrified into o-grave is a mystery.
(2) "\255" == \xAD implies the offending character is U+00AD SOFT HYPHEN but why this would be seen as o-grave is a mystery, and it's not "whitespace"; it shouldn't be shown at all, and it it is shown it should be as a hyphen/minus-sign, not a space.

Categories