Python prevent decoding HEX to ASCII while removing backslashes from my Var - python

I want to strip some unwanted symbols from my variable. In this case the symbols are backslashes. I am using a HEX number, and as an example I will show some short simple code down bellow. But I don't want python to convert my HEX to ASCII, how would I prevent this from happening.? I have some long shell codes for asm to work with later which are really long and removing \ by hand is a long process. I know there are different ways like using echo -e "x\x\x\x" > output etc, but my whole script will be written in python.
Thanks
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> b = a.strip("\\")
>>> print b
1�Phtv
>>> a = "\x31\x32\x33\x34\x35\x36"
>>> b = a.strip("\\")
>>> print b
123456
At the end I would like it to print my var:
>>> print b
x31x32x33x34x35x36

There are no backslashes in your variable:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(a)
1ÀPhtv
Take newline for example: writing "\n" in Python will give you string with one character -- newline -- and no backslashes. See string literals docs for full syntax of these.
Now, if you really want to write string with such backslashes, you can do it with r modifier:
>>> a = r"\x31\xC0\x50\x68\x74\x76"
>>> print(a)
\x31\xC0\x50\x68\x74\x76
>>> print(a.replace('\\', ''))
x31xC0x50x68x74x76
But if you want to convert a regular string to hex-coded symbols, you can do it character by character, converting it to number ("\x31" == "1" --> 49), then to hex ("0x31"), and finally stripping the first character:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(''.join([hex(ord(x))[1:] for x in a]))
'x31xc0x50x68x74x76'

There are two problems in your Code.
First the simple one:
strip() just removes one occurrence. So you should use replace("\\", ""). This will replace every backslash with "", which is the same as removing it.
The second problem is pythons behavior with backslashes:
To get your example working you need to append an 'r' in front of your string to indicate, that it is a raw string. a = r"\x31\xC0\x50\x68\x74\x76". In raw strings, a backlash doesn't escape a character but just stay a backslash.
>>> r"\x31\xC0\x50\x68\x74\x76"
'\\x31\\xC0\\x50\\x68\\x74\\x76'

Related

How to properly unescape select sequences in python

I'm escaping certain characters in strings (e.g., \n, \\) with double backslashes, like this: text.replace("\\", "\\\\").replace("\n", "\\n")
Naïvely, I tried to unescape using: text.replace("\\n", "\n").replace("\\\\", "\\")
However, this fails on strings like:
>>> text = "\\\n\\n"
>>> print(text)
\
\n
>>> etext = text.replace("\\", "\\\\").replace("\n", "\\n")
>>> print(etext)
\\\n\\n
>>> ftext = etext.replace("\\n", "\n").replace("\\\\", "\\")
>>> print(ftext)
\
\
>>>
As you can see the original string doesn't survive the round trip.
Even changing the order of replaces around would not solve the issue.
The only way to correctly unescape is to do the replacements in one go.
Python's str has maketrans and translate to achieve a similar effect
but they only work on single characters as keys.
re.sub also does not work since the substitution would need to distinguish the case somehow. (\1 does not work since if the second character is n we want the newline character as output instead of n)
A correct (but slow) solution would be:
def unescape(text: str) -> str:
res: list[str] = []
in_escape = False
for c in text:
if in_escape:
in_escape = False
if c == "\\":
res.append("\\")
continue
if c == "n":
res.append("\n")
continue
if c == "\\":
in_escape = True
continue
res.append(c)
return "".join(res)
>>> text = "\\\n\\n"
>>> print(text)
\
\n
>>> etext = text.replace("\\", "\\\\").replace("\n", "\\n")
>>> print(etext)
\\\n\\n
>>> print(unescape(etext))
\
\n
>>>
Is there a proper/canonical/fast way of escaping (only certain sequences in) strings?
(EDIT: to answer why a subset of escapes is preferred. in my case other escapes are not needed and it's easy to permanently corrupt your data by escaping things that don't need to. for example, from the top of my head I can think of three different escape functions just in python alone that all escape completely different subsets of characters. even the str.escape function changes what it escapes between python versions. now most of the time unescape can handle a wider set of escape sequences than its corresponding escape function but this is not always the case. this all doesn't even take into account trying to load the escaped data in a different language)

python dict append to list error(value with \)

I got a problem when appending a dict to list
data = []
path = "abc\cde"
data.append({"image": path})
print(data)
When I append the path to the image, the output of data is [{'image':'abc\def'}].
It contains two \ instead of one.
When typing text that contains slashes, use raw strings to avoid having some sequences be interpreted as special characters, e.g. "\n" in a python string is a single character that represents a new line.
>>> data = []
>>> data.append({"image": r'abc\cde'})
>>> data
[{'image': 'abc\\cde'}]
>>>
>>> data.append({"image": r'abc\nasdf'})
>>> data
[{'image': 'abc\\cde'}, {'image': 'abc\\nasdf'}]
When you see two slashes is because that's how python repr-esents a string with slashes safely, it's not the actual content.
>>> r'abc\cde'
'abc\\cde'
>>> r'abc\nasdf'
'abc\\nasdf'
In this way a text with special chars can be visualized in a compact way. If you want to see what the actual content of those strings looks like, print them:
>>> print(r'abc\cde')
abc\cde
>>> print(r'abc\nasdf')
abc\nasdf
>>> print('abc\cde')
abc\cde
>>> print('abc\nasdf')
abc
asdf
Using raw strings only applies to strings you type manually, it's a method to explain python how to interpret certain characters. If the string comes from e.g. a file or a stream, the "meaning" of its char is already defined.
Regarding your question on how to concatenate a raw string (again, a raw string is a normal string) with a variable, there's no difference.
>>> with_slash = r'abc\cde'
>>> wout_slash = 'asdf'
>>> with_slash + wout_slash
'abc\\cdeasdf'
>>> print(with_slash + wout_slash)
abc\cdeasdf
\ is an escape character. It allows you to use special symbols, for example a new line \n or tab \t. If you want a string to contain a literal \, make sure that you put another \ before it.
In your case, Python understands that you meant "abc\\cde" even though you did not escape \. If you had abc\nde, the result would be abc<line_break>de.
>>> a = "abc\\cde"
>>> a
'abc\\cde'
>>> list(a)
['a', 'b', 'c', '\\', 'c', 'd', 'e']
As you see, even though it looks like a double backslash, it is just one \ character.
More info: https://www.w3schools.com/python/gloss_python_escape_characters.asp
The additional backslash is Python escaping the single backslash. The actual value of your path string is unchanged, as you can see when the value of data[0]['image'] is printed.
data = []
path = 'abc\cde'
data.append({"image": path})
# output: abc\cde
print(data[0]['image'])

how to compare backslash in python

I have a set of strings that are read from a file say ['\x1\p1', '\x2\p2', '\x3\p3', ... etc.].
When I read them into variables and print them the strings displayed as ['\\x1\\p1', '\\x2\\p2', '\\x3\\p3', ... etc.]. I understand that the variable is represented as '\x1\p1', ... etc. internally, but when it is displayed it is displayed with double slash.
but now I want to search and replace the elements of this list in the sentence, i.e say if \x1\p1 is in the sentence "How are you doing \x1\p1" then replace '\x1\p1' with 'Y'. But the replace method does not work in this case! wonder why?
Let me explain further:
my text file (codes.txt) has entries \xs1\x32, \xs2\x54 delimited by new line. so when I read it using
with open('codes') as codes:
code_list = codes.readlines()
next, I do lets say code_list_element_1 = code_list[1].rstrip()
when I print code_list_element_1, it displays as '\\xs1\\x32'
Next, let me target string be target_string = 'Hi! my name is \xs1\x32'
now I want to replace code_list_element_1 which is supposed to be \xs1\x32 in the target_string with say 'Y'
So, I tried code_list_element_1 in target_string. I get False
Next, instead of reading the codes from a text file I initialized a variable find_me = '\xs1\x32'
now, I try find_me in target_string. I get True
and hence target_string.replace(find_me,"Y") displays what I want: "Hi! my name is Y"
You are looking at a string representation that can be pasted back into Python; the backslashes are doubled to make sure the values are not interpreted as escape sequences (such as \n, meaning a newline, or \xfe, meaning the byte with value 254, hex FE).
If you are building new string values, you also need to use those doubled backslashes to prevent Python from seeing escape sequences where there are none, or use raw string literals:
>>> '\\x1\\p1'
'\\x1\\p1'
>>> r'\x1\p1'
'\\x1\\p1'
For this specific example, not handling the backslashes properly actually results in an exception:
>>> '\x1\p1'
ValueError: invalid \x escape
because Python expects to find two hex digits after a \x escape.
raw strings (those prefixed by r are very useful for backslash-itis.
In [9]: a=r"How are you doing \x1\p1"
In [10]: a
Out[10]: 'How are you doing \\x1\\p1'
In [11]: a.replace(r'\x1\p1', 'Y')
Out[11]: 'How are you doing Y'
In [12]:

Python issue with incorrectly formated strings that contains \x

At some point our python script receives string like that:
In [1]: ab = 'asd\xeffe\ctive'
In [2]: print ab
asd�fe\ctve \ \\ \\\k\\\
Data is damaged we need escape \x to be properly interpreted as \x but \c has not special meaning in string thus must be intact.
So far the closest solution I found is do something like:
In [1]: ab = 'asd\xeffe\ctve \\ \\\\ \\\\\\k\\\\\\'
In [2]: print ab.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
asd\xeffe\ctve \ \\ \\\k\\\
Output taken from IPython, I assumed that ab is a string not unicode string (in the later case we would have to do something like that:
def escape_string(s):
if isinstance(s, str):
s = s.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
elif isinstance(s, unicode):
s = s.encode('unicode-escape').replace('\\\\', '\\').replace("\\'", "'")
return s
\xhh is an escape character and \x is seen as the start of this escape.
'\\' is the same as '\x5c'. It is just two different ways to write the backslash character as a Python string literal.
These literal strings: r'\c', '\\c', '\x5cc', '\x5c\x63' are identical str objects in memory.
'\xef' is a single byte (239 as an integer), but r'\xef' (same as '\\xef') is a 4-byte string: '\x5c\x78\x65\x66'.
If s[0] returns '\xef' then it is what s object actually contains. If it is wrong then fix the source of the data.
Note: string-escape also escapes \n and the like:
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('unicode-escape')
\xef\\c\\\u2603"'\u2603\u2603"'\n\xa0
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('string-escape')
\xef\\c\\\\N{SNOWMAN}"\'\xe2\x98\x83\\u2603"\'\n\xa0
backslashreplace is used only on characters that cause UnicodeEncodeError:
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''
ï\c\☃"'☃☃"'
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''
�\c\\N{SNOWMAN}"'☃\u2603"'
�
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('ascii', 'backslashreplace')
\xef\c\\u2603"'\u2603\u2603"'
\xa0
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.decode('latin1').encode('ascii', 'backslashreplace')
\xef\c\\N{SNOWMAN}"'\xe2\x98\x83\u2603"'
\xa0
Backslashes introduce "escape sequences". \x specifically allows you to specify a byte, which is given as two hexadecimal digits after the x. ef are two hexadecimal digits, hence you get no error. Double the backslash to escape it, or use a raw string r"\xeffective".
Edit: While the Python console may show you '\\', this is precisely what you expect. You just say you expect something else because you confuse the string and its representation. It's a string containing a single backslash. If you were to output it with print, you'd see a single backslash.
But the string literal '\' is ill-formed (not closed because \' is an apostrophe, not a backslash and end-of-string-literal), so repr, which formats the results at the interactive shell, does not produce it. Instead it produces a string literal which you could paste into Python source code and get the same string object. For example, len('\\') == 1.
The \x escape sequence signifies a Unicode character in the string, and ef is being interpreted as the hex code. You can sanitize the string by adding an additional \, or else make it a raw string (r'\xeffective').
>>> r'\xeffective'[0]
'\\'
EDIT: You could convert an existing string using the following hack:
>>> a = '\xeffective'
>>> b = repr(a).strip("'")
>>> b
'\\xeffective'

Escape string and split it right after

i've the following code:
import re
key = re.escape('#one #two #some #tests #are #done')
print(key)
key = key.split()
print(key)
and the following output:
\#one\ \#two\ \#some\ \#tests\ \#are\ \#done
['\\#one\\', '\\#two\\', '\\#some\\', '\\#tests\\', '\\#are\\', '\\#done']
How come the backslashes are duplicated? I just want them once in my list, because i would like to use this list in a regular expression.
Thanks in advance! John
There is only one backslash each, but when printing the repr of the strings, they are duplicated (escaped) - just as you would need to duplicate them when using a string to build a regex. So everything is fine.
For example:
>>> len("\\")
1
>>> len("\\n")
2
>>> len("\n")
1
>>> print "\\n"
\n
>>> print "\n"
>>>
The \ character is an escape character, that is a character that changes the meaning of the subsequent character[s]. For example the "n" character is simply an "n". But if you escape it like "\n" it becomes the "newline" character. So, if you need to use a \ literal, you need to escape it with... itself: \\
The backslashes are not duplicated. To realize this, try to do:
for element in key:
print element
And you will see this output:
\#one\
\#two\
\#some\
\#tests\
\#are\
\#done
When you have printed whole list, the python used representation where strings are printed not as they are, but they are printed as python expression (notice the quotes "", they are not in the strings)
To actually encode string containing backslash, you need to duplicate that backslash. That is it.
When you convert a list to a string (e.g. to print it), it calls repr on each object contained in the list. That's why you get the quotes and extra backslashes in your second line of output. Try this:
s = "\\a string with an escaped backslash"
print s # prints: \a string with an escaped backslash
print repr(s) # prints: '\\a string with an escaped backslash'
The repr call puts quotes around the string, and shows the backslash escapes.

Categories