C-style escaping in python - python

How do I escape (and unescape) the C escaped characters( newlines, slashes etc) for a string in python?
I guess JSON.encode( string) does this, but is there a better way?

Use str.encode('string-escape') in Python 2.7:
>>> '12\t34\n'.encode('string-escape')
'12\\t34\\n'
>>> '12\\t34\\n'.decode('string-escape')
'12\t34\n'
Use str.encode('unicode-escape') or str.encode('unicode-escape').decode('utf-8'):
>>> '12\t34\n'.encode('unicode-escape')
b'12\\t34\\n'
>>> b'12\\t34\\n'.decode('unicode-escape')
'12\t34\n'
>>> '12\t34\n'.encode('unicode-escape').decode('utf-8')
'12\\t34\\n'
>>> '12\\t34\\n'.encode('utf-8').decode('unicode-escape')
'12\t34\n'

Related

Python prevent decoding HEX to ASCII while removing backslashes from my Var

I want to strip some unwanted symbols from my variable. In this case the symbols are backslashes. I am using a HEX number, and as an example I will show some short simple code down bellow. But I don't want python to convert my HEX to ASCII, how would I prevent this from happening.? I have some long shell codes for asm to work with later which are really long and removing \ by hand is a long process. I know there are different ways like using echo -e "x\x\x\x" > output etc, but my whole script will be written in python.
Thanks
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> b = a.strip("\\")
>>> print b
1�Phtv
>>> a = "\x31\x32\x33\x34\x35\x36"
>>> b = a.strip("\\")
>>> print b
123456
At the end I would like it to print my var:
>>> print b
x31x32x33x34x35x36
There are no backslashes in your variable:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(a)
1ÀPhtv
Take newline for example: writing "\n" in Python will give you string with one character -- newline -- and no backslashes. See string literals docs for full syntax of these.
Now, if you really want to write string with such backslashes, you can do it with r modifier:
>>> a = r"\x31\xC0\x50\x68\x74\x76"
>>> print(a)
\x31\xC0\x50\x68\x74\x76
>>> print(a.replace('\\', ''))
x31xC0x50x68x74x76
But if you want to convert a regular string to hex-coded symbols, you can do it character by character, converting it to number ("\x31" == "1" --> 49), then to hex ("0x31"), and finally stripping the first character:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(''.join([hex(ord(x))[1:] for x in a]))
'x31xc0x50x68x74x76'
There are two problems in your Code.
First the simple one:
strip() just removes one occurrence. So you should use replace("\\", ""). This will replace every backslash with "", which is the same as removing it.
The second problem is pythons behavior with backslashes:
To get your example working you need to append an 'r' in front of your string to indicate, that it is a raw string. a = r"\x31\xC0\x50\x68\x74\x76". In raw strings, a backlash doesn't escape a character but just stay a backslash.
>>> r"\x31\xC0\x50\x68\x74\x76"
'\\x31\\xC0\\x50\\x68\\x74\\x76'

Python Unicode Casting on Variable Bug

I've found out this weird python2 behavior related to unicode and variable:
>>> u"\u2730".encode('utf-8').encode('hex')
'e29cb0'
This is the expected result I need, but I want to dynamically control the first part ("u\u2730")
>>> type(u"\u2027")
<type 'unicode'>
Good, so the first part is casted as unicode. Now declaring a string variable and casting it to unicode:
>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> type(myvar)
<type 'unicode'>
>>> print myvar
\u2027
It seems that now I can use the variable in my original code, right?
>>> myvar.encode('utf-8').encode('hex')
'5c7532303237'
The results, as you can see, is not the original one. It seems that python is treating 'myvar' as string instead of unicode. Do I miss something?
Anyway, my final goal is to loop Unicode from \u0000 to \uFFFF, cast them as string and cast the string as HEX. Is there an easy way?
unichr() in Python 2 or chr() in Python 3 are the ways to construct a character from a number. \uxxxx escapes codes can only be typed directly in code.
Python 2:
>>> a='20'
>>> b='27'
>>> unichr(int(a+b,16))
u'\u2027'
Python 3:
>>> a='20'
>>> b='27'
>>> chr(int(a+b,16))
'‧'
You are confusing the Unicode escape sequence with an the \u characters. It's like confusing r"\n" (or "\\n") with an actual newline. You want to usecodecs.raw_unicode_escape_decode decode the str with 'unicode_escape':
>>> import codecs
>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> myvar
u'\\u2027'
>>> myvar.decode('unicode_escape')
(u'\u2027', 6)
>>> print(myvar.decode('unicode_escape')[0])
‧

Using python 2.7 regex to replace parts of a string

I have the following line:
b = re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', 'xMain \2\1\3', a)
where a is:
xMain Buchan/y1,/y0 Angus Sub1
Why does b come out as 'xMain \x02\x01\x03'?
My intention is to de-invert a name. In Regexbuddy this works OK but not in Python 2.7.
You see unprintable characters because \2\1\3 have meaning in a regular python string too, as octal escape codes:
>>> '\2'
'\x02'
>>> 'xMain \2\1\3'
'xMain \x02\x01\x03'
They never make it to the re.sub() function as written.
Use a raw string literal instead:
b = re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', r'xMain \2\1\3', a)
Note the r'...' string. In a raw string literal \... escape codes are not interpreted, leaving the back-references in place for the re module to use:
>>> r'xMain \2\1\3'
'xMain \\2\\1\\3'
The alternative would be to double the backslashes, escaping the escape:
b = re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', 'xMain \\2\\1\\3', a)
Either way, your replacement pattern now works as expected:
>>> import re
>>> a = 'xMain Buchan/y1,/y0 Angus Sub1'
>>> re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', r'xMain \2\1\3', a)
'xMain Angus BuchanSub1'

Python strip() unicode string?

How can you use string methods like strip() on a unicode string? and can't you access characters of a unicode string like with oridnary strings? (ex: mystring[0:4] )
It's working as usual, as long as they are actually unicode, not str (note: every string literal must be preceded by u, like in this example):
>>> a = u"coțofană"
>>> a
u'co\u021bofan\u0103'
>>> a[-1]
u'\u0103'
>>> a[2]
u'\u021b'
>>> a[3]
u'o'
>>> a.strip(u'ă')
u'co\u021bofan'
Maybe it's a bit late to answer to this, but if you are looking for the library function and not the instance method, you can use that as well.
Just use:
yourunicodestring = u' a unicode string with spaces all around '
unicode.strip(yourunicodestring)
In some cases it's easier to use this one, for example inside a map function like:
unicodelist=[u'a',u' a ',u' foo is just...foo ']
map (unicode.strip,unicodelist)
You can do every string operation, actually in Python 3, all str's are unicode.
>>> my_unicode_string = u"abcşiüğ"
>>> my_unicode_string[4]
u'i'
>>> my_unicode_string[3]
u'\u015f'
>>> print(my_unicode_string[3])
ş
>>> my_unicode_string[3:]
u'\u015fi\xfc\u011f'
>>> print(my_unicode_string[3:])
şiüğ
>>> print(my_unicode_string.strip(u"ğ"))
abcşiü
See the Python docs on Unicode strings and the following section on string methods. Unicode strings support all of the usual methods and operations as normal ASCII strings.

Python Replace \\ with \ [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
So I can't seem to figure this out... I have a string say, "a\\nb" and I want this to become "a\nb". I've tried all the following and none seem to work;
>>> a
'a\\nb'
>>> a.replace("\\","\")
File "<stdin>", line 1
a.replace("\\","\")
^
SyntaxError: EOL while scanning string literal
>>> a.replace("\\",r"\")
File "<stdin>", line 1
a.replace("\\",r"\")
^
SyntaxError: EOL while scanning string literal
>>> a.replace("\\",r"\\")
'a\\\\nb'
>>> a.replace("\\","\\")
'a\\nb'
I really don't understand why the last one works, because this works fine:
>>> a.replace("\\","%")
'a%nb'
Is there something I'm missing here?
EDIT I understand that \ is an escape character. What I'm trying to do here is turn all \\n \\t etc. into \n \t etc. and replace doesn't seem to be working the way I imagined it would.
>>> a = "a\\nb"
>>> b = "a\nb"
>>> print a
a\nb
>>> print b
a
b
>>> a.replace("\\","\\")
'a\\nb'
>>> a.replace("\\\\","\\")
'a\\nb'
I want string a to look like string b. But replace isn't replacing slashes like I thought it would.
There's no need to use replace for this.
What you have is a encoded string (using the string_escape encoding) and you want to decode it:
>>> s = r"Escaped\nNewline"
>>> print s
Escaped\nNewline
>>> s.decode('string_escape')
'Escaped\nNewline'
>>> print s.decode('string_escape')
Escaped
Newline
>>> "a\\nb".decode('string_escape')
'a\nb'
In Python 3:
>>> import codecs
>>> codecs.decode('\\n\\x21', 'unicode_escape')
'\n!'
You are missing, that \ is the escape character.
Look here: http://docs.python.org/reference/lexical_analysis.html
at 2.4.1 "Escape Sequence"
Most importantly \n is a newline character.
And \\ is an escaped escape character :D
>>> a = 'a\\\\nb'
>>> a
'a\\\\nb'
>>> print a
a\\nb
>>> a.replace('\\\\', '\\')
'a\\nb'
>>> print a.replace('\\\\', '\\')
a\nb
r'a\\nb'.replace('\\\\', '\\')
or
'a\nb'.replace('\n', '\\n')
Your original string, a = 'a\\nb' does not actually have two '\' characters, the first one is an escape for the latter. If you do, print a, you'll see that you actually have only one '\' character.
>>> a = 'a\\nb'
>>> print a
a\nb
If, however, what you mean is to interpret the '\n' as a newline character, without escaping the slash, then:
>>> b = a.replace('\\n', '\n')
>>> b
'a\nb'
>>> print b
a
b
It's because, even in "raw" strings (=strings with an r before the starting quote(s)), an unescaped escape character cannot be the last character in the string. This should work instead:
'\\ '[0]
In Python string literals, backslash is an escape character. This is also true when the interactive prompt shows you the value of a string. It will give you the literal code representation of the string. Use the print statement to see what the string actually looks like.
This example shows the difference:
>>> '\\'
'\\'
>>> print '\\'
\
In Python 3 it will be:
bytes(s, 'utf-8').decode("unicode_escape")
This works on Windows with Python 3.x:
import os
str(filepath).replace(os.path.sep, '/')
Where: os.path.sep is \ on Windows and / on Linux.
Case study
Used this to prevent errors when generating a Markdown file then rendering it to pdf.
path = "C:\\Users\\Programming\\Downloads"
# Replace \\ with a \ along with any random key multiple times
path.replace('\\', '\pppyyyttthhhooonnn')
# Now replace pppyyyttthhhooonnn with a blank string
path.replace("pppyyyttthhhooonnn", "")
print(path)
#Output...
C:\Users\Programming\Downloads

Categories