Fix undesired escape sequences in path

Fix undesired escape sequences in path - python

I have a path in a variable like that:
path = "C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif"
Which is incorrect because it contains escape sequences:
>>> path
'C:\\HT_Projeler\x07\\Kaynak\\wrapped_gedizw.tif'
How can I fix the path in this variable so it becomes equivalent to r"C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif" or "C:/HT_Projeler/7/Kaynak/wrapped_gedizw.tif"?
I know the topic is common and I investigated many questions (1,2 etc.) in here.
ADD
Here is my exact script:
...
basinFile = self._gv.basinFile
basinDs = gdal.Open(basinFile, gdal.GA_ReadOnly)
basinNumberRows = basinDs.RasterYSize
basinNumberCols = basinDs.RasterXSize
...
In here self._gv.basinFile consists my path. So I cannot put "r" beginngin of self._gv.basinFile

If you insert paths in Python code, just use raw strings, as other have suggested.
If instead that string is out of your control, there's not much you can do "after the fact". Escape sequences conversion is not injective, so, given a string where escape sequences have already been processed, you cannot "go back" univocally. IOW, if someone incorrectly writes:
path = "C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif"
as you show, you get
'C:\\HT_Projeler\x07\\Kaynak\\wrapped_gedizw.tif'
and there's no way to guess surely "what they meant", because that \x07 may have been written as \7, or \x07, or \a. Heck, any letter may have been originally written as an escape sequence - what you see in that string as an a may have actually been \x61.
Long story short: your caller is responsible for giving you correct data. Once it's corrupted there's no way to come back.

In the general case, there is no way to tell whether a character in a path is correct or not without externally checking the actual paths on your computer (and "special character" is not really well-defined; how do you know that the path wasn't \0x41 which got converted to A anyway?)
As a weak heuristic, you could look for path names within a particular editing distance, for example.
import os
from difflib import SequenceMatcher as similarity # or whatever
path_components = os.path.split(variable)
path = ''
for p in path_components:
npath = os.path.join(path, p)
if not os.path.exists(npath):
similar = reversed(sorted([(similarity(None, x, p).ratio(), x) in os.listdir(npath)]))
# recurse on most similar, second most similar, etc? or something
path = npath

Related

str.replace backslash with forward slash

I would like to replace the backslash \ in a windows path with forward slash / using python.
Unfortunately I'm trying from hours but I cannot solve this issue.. I saw other questions here but still I cannot find a solution
Can someone help me?
This is what I'm trying:
path = "\\ftac\admin\rec\pir"
path = path.replace("\", "/")
But I got an error (SyntaxError: EOL while scanning string literal) and is not return the path as I want:
//ftac/admin/rec/pir, how can I solve it?
I also tried path = path.replace(os.sep, "/") or path = path.replace("\\", "/") but with both methods the first double backslash becomes single and the \a was deleted..

Oh boy, this is a bit more complicated than first appears.
Your problem is that you have stored your windows paths as normal strings, instead of raw strings. The conversion from strings to their raw representation is lossy and ugly.
This is because when you make a string like "\a", the intperter sees a special character "\x07".
This means you have to manually know which of these special characters you expect, then [lossily] hack back if you see their representation (such as in this example):
def str_to_raw(s):
raw_map = {8:r'\b', 7:r'\a', 12:r'\f', 10:r'\n', 13:r'\r', 9:r'\t', 11:r'\v'}
return r''.join(i if ord(i) > 32 else raw_map.get(ord(i), i) for i in s)
>>> str_to_raw("\\ftac\admin\rec\pir")
'\\ftac\\admin\\rec\\pir'
Now you can use the pathlib module, this can handle paths in a system agnsotic way. In your case, you know you have Windows like paths as input, so you can use as follows:
import pathlib
def fix_path(path):
# get proper raw representaiton
path_fixed = str_to_raw(path)
# read in as windows path, convert to posix string
return pathlib.PureWindowsPath(path_fixed).as_posix()
>>> fix_path("\\ftac\admin\rec\pir")
'/ftac/admin/rec/pir'

Pathlib 'normalizes' UNC paths with "$"

On Python3.8, I'm trying to use pathlib to concatenate a string to a UNC path that's on a remote computer's C drive.
It's weirdly inconsistent.
For example:
>>> remote = Path("\\\\remote\\", "C$\\Some\\Path")
>>> remote
WindowsPath('//remote//C$/Some/Path')
>>> remote2 = Path(remote, "More")
>>> remote2
WindowsPath('/remote/C$/Some/Path/More')
Notice how the initial // is turned into /?
Put the initial path in one line though, and everything is fine:
>>> remote = Path("\\\\remote\\C$\\Some\\Path")
>>> remote
WindowsPath('//remote/C$/Some/Path')
>>> remote2 = Path(remote, "more")
>>> remote2
WindowsPath('//remote/C$/Some/Path/more')
This works as a workaround, but I suspect I'm misunderstanding how it's supposed to work or doing it wrong.
Anyone got a clue what's happening?

tldr: you should give the entire UNC share (\\\\host\\share) as a single unit, pathlib has special-case handling of UNC paths but it needs specifically this prefix in order to recognize a path as UNC. You can't use pathlib's facilities to separately manage host and share, it makes pathlib blow a gasket.
The Path constructor normalises (deduplicates) path separators:
>>> PPP('///foo//bar////qux')
PurePosixPath('/foo/bar/qux')
>>> PWP('///foo//bar////qux')
PureWindowsPath('/foo/bar/qux')
PureWindowsPath has a special case for paths recognised as UNC, that is //host/share... which avoids collapsing leading separators.
However your initial concatenation puts it in a weird funk because it creates a path of the form //host//share... then the path gets converted back to a string when passed to the constructor, at which point it doesn't match a UNC anymore and all the separators get collapsed:
>>> PWP("\\\\remote\\", "C$\\Some\\Path")
PureWindowsPath('//remote//C$/Some/Path')
>>> str(PWP("\\\\remote\\", "C$\\Some\\Path"))
'\\\\remote\\\\C$\\Some\\Path'
>>> PWP(str(PWP("\\\\remote\\", "C$\\Some\\Path")))
PureWindowsPath('/remote/C$/Some/Path')
the issue seems to be specifically the presence of a trailing separator on a UNC-looking path, I don't know if it's a bug or if it's matching some other UNC-style (but not UNC) special case:
>>> PWP("//remote")
PureWindowsPath('/remote')
>>> PWP("//remote/")
PureWindowsPath('//remote//') # this one is weird, the trailing separator gets doubled which breaks everything
>>> PWP("//remote/foo")
PureWindowsPath('//remote/foo/')
>>> PWP("//remote//foo")
PureWindowsPath('/remote/foo')
These behaviours don't really seem documented, the pathlib doc specifically notes that it collapses path separators, and has a few examples of UNC which show that it doesn't, but I don't really know what's supposed to happen exactly. Either way it only seems to handle UNC paths somewhat properly if the first two segments are kept as a single "drive" unit, and that the share-path is considered a drive is specifically documented.
Of note: using joinpath / / doesn't seem to trigger a re-normalisation, your path remains improper (because the second pathsep between host and share remains doubled) but it doesn't get completely collapsed.

Save sentence as server filename

I'm saving the recording of a set of sentences to a corresponding set of audio files.
Sentences include:
Ich weiß es nicht!
¡No lo sé!
Ég veit ekki!
How would you recommend I convert the sentence to a human readable filename which will later be served on an online server. I'm not sure right now as to what languages I might be dealing with in the future.
UPDATE:
Please note that two sentences can't clash with each other. For example:
É bär icke dej.
E bår icke dej.
can't resolve to the same filename as these will overwrite each other. This is the problem with the slugify function mentioned here: Turn a string into a valid filename?
The best I have come up with is to use urllib.parse.quote. However I think the resulting output is harder to read than I would have hoped. Any suggestions?:
Ich%20wei%C3%9F%20es%20nicht%21
%C2%A1No%20lo%20s%C3%A9%21
%C3%89g%20veit%20ekki%21

What about unidecode?
import unidecode
a = [u'Ich weiß es nicht!', u'¡No lo sé!', u'Ég veit ekki!']
for s in a:
print(unidecode.unidecode(s).replace(' ', '_'))
This gives pure ASCII strings that can readily be processed if they still contain unwanted characters. Keeping spaces distinct in the form of underscores helps with readability.
Ich_weiss_es_nicht!
!No_lo_se!
Eg_veit_ekki!
If uniqueness is a problem, a hash or something like that might be added to the strings.
Edit:
Some clarification seems to be required with respect to the hashing. Many hash functions are explicitely designed for giving very different outputs for close inputs. For example, the built-in hash function of python gives:
In [1]: hash('¡No lo sé!')
Out[1]: 6428242682022633791
In [2]: hash('¡No lo se!')
Out[2]: 4215591310983444451
With that you can do something like
unidecode.unidecode(s).replace(' ', '_') + '_' + str(hash(s))[:10]
in order to get not too long strings. Even with such shortened hashes, clashes are pretty unlikely.

you should probably try to convert spaces into another symbol making your string look like É-bär-icke-dej.
if your using python I would do it like this.
Replace spaces with another symbol like (-) or (/)
mystring.replace(' ','-')
Detect your character encoding using chardet a python package that detects encoding.
Decode your string using pythons
mystring.decode(*the detected encoding*)
Check if file name is in your directory already using python's OS package. something like
files = os.listdir(*path to directory*)
//get how many times the file name has been repeated
redundance = 0
for name in files:
if mystring in name:
redundance+=1
append redundance to your string
if redundance !=0:
mystring = mystring+redundance
Use ur string as a file name!
Hope this helps!

The only disallowed characters in traditional Unix / Linux file names are slash (/ U+002F) and the null character (U+0000). There is no need to convert your example human-readable strings to anything else.
If you need to make the files available to systems which do not use the same file name encoding, such as for downloading over FTP or from a web server, perhaps you want to expose them as explicitly UTF-8. On most modern U*xes, this should be the default out of the box anyway. This would correspond to the results you get from urllib quoting, where the percent-encoding is a safe and reasonably standard way of producing a machine readable and unambigious representation of the encoding. If you embed these in a snippet of HTML or something, you can keep the display text human-readable, and just keep the link machine-readable.
Ég veit ekki!

Print raw string from variable? (not getting the answers)

I'm trying to find a way to print a string in raw form from a variable. For instance, if I add an environment variable to Windows for a path, which might look like 'C:\\Windows\Users\alexb\', I know I can do:
print(r'C:\\Windows\Users\alexb\')
But I cant put an r in front of a variable.... for instance:
test = 'C:\\Windows\Users\alexb\'
print(rtest)
Clearly would just try to print rtest.
I also know there's
test = 'C:\\Windows\Users\alexb\'
print(repr(test))
But this returns 'C:\\Windows\\Users\x07lexb'
as does
test = 'C:\\Windows\Users\alexb\'
print(test.encode('string-escape'))
So I'm wondering if there's any elegant way to make a variable holding that path print RAW, still using test? It would be nice if it was just
print(raw(test))
But its not

I had a similar problem and stumbled upon this question, and know thanks to Nick Olson-Harris' answer that the solution lies with changing the string.
Two ways of solving it:
Get the path you want using native python functions, e.g.:
test = os.getcwd() # In case the path in question is your current directory
print(repr(test))
This makes it platform independent and it now works with .encode. If this is an option for you, it's the more elegant solution.
If your string is not a path, define it in a way compatible with python strings, in this case by escaping your backslashes:
test = 'C:\\Windows\\Users\\alexb\\'
print(repr(test))

In general, to make a raw string out of a string variable, I use this:
string = "C:\\Windows\Users\alexb"
raw_string = r"{}".format(string)
output:
'C:\\\\Windows\\Users\\alexb'

You can't turn an existing string "raw". The r prefix on literals is understood by the parser; it tells it to ignore escape sequences in the string. However, once a string literal has been parsed, there's no difference between a raw string and a "regular" one. If you have a string that contains a newline, for instance, there's no way to tell at runtime whether that newline came from the escape sequence \n, from a literal newline in a triple-quoted string (perhaps even a raw one!), from calling chr(10), by reading it from a file, or whatever else you might be able to come up with. The actual string object constructed from any of those methods looks the same.

I know i'm too late for the answer but for people reading this I found a much easier way for doing it
myVariable = 'This string is supposed to be raw \'
print(r'%s' %myVariable)

try this. Based on what type of output you want. sometime you may not need single quote around printed string.
test = "qweqwe\n1212as\t121\\2asas"
print(repr(test)) # output: 'qweqwe\n1212as\t121\\2asas'
print( repr(test).strip("'")) # output: qweqwe\n1212as\t121\\2asas

Get rid of the escape characters before storing or manipulating the raw string:
You could change any backslashes of the path '\' to forward slashes '/' before storing them in a variable. The forward slashes don't need to be escaped:
>>> mypath = os.getcwd().replace('\\','/')
>>> os.path.exists(mypath)
True
>>>

Just simply use r'string'. Hope this will help you as I see you haven't got your expected answer yet:
test = 'C:\\Windows\Users\alexb\'
rawtest = r'%s' %test

I have my variable assigned to big complex pattern string for using with re module and it is concatenated with few other strings and in the end I want to print it then copy and check on regex101.com.
But when I print it in the interactive mode I get double slash - '\\w'
as #Jimmynoarms said:
The Solution for python 3x:
print(r'%s' % your_variable_pattern_str)

Your particular string won't work as typed because of the escape characters at the end \", won't allow it to close on the quotation.
Maybe I'm just wrong on that one because I'm still very new to python so if so please correct me but, changing it slightly to adjust for that, the repr() function will do the job of reproducing any string stored in a variable as a raw string.
You can do it two ways:
>>>print("C:\\Windows\Users\alexb\\")
C:\Windows\Users\alexb\
>>>print(r"C:\\Windows\Users\alexb\\")
C:\\Windows\Users\alexb\\
Store it in a variable:
test = "C:\\Windows\Users\alexb\\"
Use repr():
>>>print(repr(test))
'C:\\Windows\Users\alexb\\'
or string replacement with %r
print("%r" %test)
'C:\\Windows\Users\alexb\\'
The string will be reproduced with single quotes though so you would need to strip those off afterwards.

To turn a variable to raw str, just use
rf"{var}"
r is raw and f is f-str; put them together and boom it works.

Replace back-slash with forward-slash using one of the below:
re.sub(r"\", "/", x)
re.sub(r"\", "/", x)

This does the trick
>>> repr(string)[1:-1]
Here is the proof
>>> repr("\n")[1:-1] == r"\n"
True
And it can be easily extrapolated into a function if need be
>>> raw = lambda string: repr(string)[1:-1]
>>> raw("\n")
'\\n'

i wrote a small function.. but works for me
def conv(strng):
k=strng
k=k.replace('\a','\\a')
k=k.replace('\b','\\b')
k=k.replace('\f','\\f')
k=k.replace('\n','\\n')
k=k.replace('\r','\\r')
k=k.replace('\t','\\t')
k=k.replace('\v','\\v')
return k

Here is a straightforward solution.
address = 'C:\Windows\Users\local'
directory ="r'"+ address +"'"
print(directory)
"r'C:\\Windows\\Users\\local'"

how to turn 'C:\Music\song.mp3' into r'C:\Music\song.mp3'

I have been making an mp3 player with Tkinter and the module mp3play.
Say i had the song to play: C:\Music\song.mp3
and to play that song i have to run this script:
import mp3play
music_file=r'C:\Music\song.mp3'
clip = mp3play.load(music_file)
clip.play()
Easy enough, my problem though is getting the "r" there.
i have tried:
import mp3play
import re
music_file="'C:\Music\song.mp3'"
music_file='r'+music_file
music_file=re.sub('"','',music_file)
print music_file
clip = mp3play.load(music_file)
clip.play()
Which gets the output: r'C:\Music\song.mp3'
but it is a string, so it wont read the file.

The 'r' in the front denotes a particular category of string called raw string. You can't get that by adding two strings or re substituting a string. It is just a string type, but with the escape characters take care.
>>> s = r'something'
>>> s
'something'
>>>
When you are writing the script, use the 'r', if you are getting the input via raw_input, python will take care of escaping the characters. So, the question is why are you trying to do that?

try:
music_file='C:/Music/song.mp3'

In Python, the r prefix introduces a raw string. Outside of raw strings, backslash (\) characters are considered as escape characters and have to be escaped themselves (by doubling them).
Try a simple string instead:
music_file = 'C:\\Music\\song.mp3'

The r you are talking about has to be placed before a string definition, and tells python that the following string is "raw", meaning it will ignore backslash escapes (so it doesn't error on invalid backslashes in filenames, for example).
Why don't you just do it like in the first example? I don't see what you are trying to accomplish in the second example.

you can try music_file = r'%s' % path_to_file

As a few of the other answers have pointed out (I'm just posting this as an answer because it seemed kind of silly to make it a comment), what you've given in your first code block is exactly what the contents of your script should be. You don't need to do anything special to get the r there. In fact the 'r' is not part of the string, it's part of the code that makes the string.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fix undesired escape sequences in path - python

Related

str.replace backslash with forward slash

Pathlib 'normalizes' UNC paths with "$"

Save sentence as server filename

Print raw string from variable? (not getting the answers)

how to turn 'C:\Music\song.mp3' into r'C:\Music\song.mp3'

Categories

Resources