Escape string and split it right after

Escape string and split it right after - python

i've the following code:
import re
key = re.escape('#one #two #some #tests #are #done')
print(key)
key = key.split()
print(key)
and the following output:
\#one\ \#two\ \#some\ \#tests\ \#are\ \#done
['\\#one\\', '\\#two\\', '\\#some\\', '\\#tests\\', '\\#are\\', '\\#done']
How come the backslashes are duplicated? I just want them once in my list, because i would like to use this list in a regular expression.
Thanks in advance! John

There is only one backslash each, but when printing the repr of the strings, they are duplicated (escaped) - just as you would need to duplicate them when using a string to build a regex. So everything is fine.
For example:
>>> len("\\")
1
>>> len("\\n")
2
>>> len("\n")
1
>>> print "\\n"
\n
>>> print "\n"
>>>

The \ character is an escape character, that is a character that changes the meaning of the subsequent character[s]. For example the "n" character is simply an "n". But if you escape it like "\n" it becomes the "newline" character. So, if you need to use a \ literal, you need to escape it with... itself: \\

The backslashes are not duplicated. To realize this, try to do:
for element in key:
print element
And you will see this output:
\#one\
\#two\
\#some\
\#tests\
\#are\
\#done
When you have printed whole list, the python used representation where strings are printed not as they are, but they are printed as python expression (notice the quotes "", they are not in the strings)
To actually encode string containing backslash, you need to duplicate that backslash. That is it.

When you convert a list to a string (e.g. to print it), it calls repr on each object contained in the list. That's why you get the quotes and extra backslashes in your second line of output. Try this:
s = "\\a string with an escaped backslash"
print s # prints: \a string with an escaped backslash
print repr(s) # prints: '\\a string with an escaped backslash'
The repr call puts quotes around the string, and shows the backslash escapes.

Related

Python string.rstrip() doesn't strip specified characters

string = "hi())("
string = string.rstrip("abcdefghijklmnoprstuwxyz")
print(string)
I want to remove every letter from given string using rstrip method, however it does not change the string in the slightest.
Output:
'hi())('
What i Want:
'())('
I know that I can use regex, but I really don't understand why it doesn't work.
Note : It is a part of the Valid Parentheses challenge on code-wars

You have to use lstrip instead of rstrip:
>>> string = "hi())("
>>> string = string.lstrip("abcdefghijklmnoprstuwxyz")
>>> string
'())('

Python prevent decoding HEX to ASCII while removing backslashes from my Var

I want to strip some unwanted symbols from my variable. In this case the symbols are backslashes. I am using a HEX number, and as an example I will show some short simple code down bellow. But I don't want python to convert my HEX to ASCII, how would I prevent this from happening.? I have some long shell codes for asm to work with later which are really long and removing \ by hand is a long process. I know there are different ways like using echo -e "x\x\x\x" > output etc, but my whole script will be written in python.
Thanks
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> b = a.strip("\\")
>>> print b
1�Phtv
>>> a = "\x31\x32\x33\x34\x35\x36"
>>> b = a.strip("\\")
>>> print b
123456
At the end I would like it to print my var:
>>> print b
x31x32x33x34x35x36

There are no backslashes in your variable:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(a)
1ÀPhtv
Take newline for example: writing "\n" in Python will give you string with one character -- newline -- and no backslashes. See string literals docs for full syntax of these.
Now, if you really want to write string with such backslashes, you can do it with r modifier:
>>> a = r"\x31\xC0\x50\x68\x74\x76"
>>> print(a)
\x31\xC0\x50\x68\x74\x76
>>> print(a.replace('\\', ''))
x31xC0x50x68x74x76
But if you want to convert a regular string to hex-coded symbols, you can do it character by character, converting it to number ("\x31" == "1" --> 49), then to hex ("0x31"), and finally stripping the first character:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(''.join([hex(ord(x))[1:] for x in a]))
'x31xc0x50x68x74x76'

There are two problems in your Code.
First the simple one:
strip() just removes one occurrence. So you should use replace("\\", ""). This will replace every backslash with "", which is the same as removing it.
The second problem is pythons behavior with backslashes:
To get your example working you need to append an 'r' in front of your string to indicate, that it is a raw string. a = r"\x31\xC0\x50\x68\x74\x76". In raw strings, a backlash doesn't escape a character but just stay a backslash.
>>> r"\x31\xC0\x50\x68\x74\x76"
'\\x31\\xC0\\x50\\x68\\x74\\x76'

Python replace backward (\) with forward (/)

I am trying to replace \ with /. However, I'm having no success.
Following is the snapshot of the scenario that I am trying to achieve
string = "//SQL-SERVER/Lacie/City of X/Linservo\171002"
print string.replace("\\","/")
Output:
//SQL-SERVER/Lacie/City of X/Linservoy002
Desired output:
//SQL-SERVER/Lacie/City of X/Linservo/171002

You need to escape "\" with an extra "\".
>>> string = "//SQL-SERVER/Lacie/City of X/Linservo\\171002"
>>> string
'//SQL-SERVER/Lacie/City of X/Linservo\\171002'
>>> print string.replace("\\","/")
//SQL-SERVER/Lacie/City of X/Linservo/171002

string = r"//SQL-SERVER/Lacie/City of X/Linservo\171002"
print string.replace("\\","/")
output
//SQL-SERVER/Lacie/City of X/Linservo/171002

You have errors both in replace function and in string definition.
In your string definition \171 gives char with octal value of 171 – y
In you replace function, backslash escapes quote.
You should escape backslashes
string = "//SQL-SERVER/Lacie/City of X/Linservo\\171002"
string.replace("\\","/")

You can simply use ".replace" in python or if you want you can use regex :
import re
string = r"//SQL-SERVER/Lacie/City of X/Linservo\171002"
pattern=r'[\\]'
replaced_string=re.sub(pattern,"/",string)
print(replaced_string)
Since your original question shows : "X/Linservo\171002" here \171 referring to character encoding so it's replacing \171 to "y". you can try this in python interpreter :
In[2]: print("\171")
y

I want to replace single quotes with double quotes in a list

So I am making a program that takes a text file, breaks it into words, then writes the list to a new text file.
The issue I am having is I need the strings in the list to be with double quotes not single quotes.
For example
I get this ['dog','cat','fish'] when I want this ["dog","cat","fish"]
Here is my code
with open('input.txt') as f:
file = f.readlines()
nonewline = []
for x in file:
nonewline.append(x[:-1])
words = []
for x in nonewline:
words = words + x.split()
textfile = open('output.txt','w')
textfile.write(str(words))
I am new to python and haven't found anything about this.
Anyone know how to solve this?
[Edit: I forgot to mention that i was using the output in an arduino project that required the list to have double quotes.]

You cannot change how str works for list.
How about using JSON format which use " for strings.
>>> animals = ['dog','cat','fish']
>>> print(str(animals))
['dog', 'cat', 'fish']
>>> import json
>>> print(json.dumps(animals))
["dog", "cat", "fish"]
import json
...
textfile.write(json.dumps(words))

Most likely you'll want to just replace the single quotes with double quotes in your output by replacing them:
str(words).replace("'", '"')
You could also extend Python's str type and wrap your strings with the new type changing the __repr__() method to use double quotes instead of single. It's better to be simpler and more explicit with the code above, though.
class str2(str):
def __repr__(self):
# Allow str.__repr__() to do the hard work, then
# remove the outer two characters, single quotes,
# and replace them with double quotes.
return ''.join(('"', super().__repr__()[1:-1], '"'))
>>> "apple"
'apple'
>>> class str2(str):
... def __repr__(self):
... return ''.join(('"', super().__repr__()[1:-1], '"'))
...
>>> str2("apple")
"apple"
>>> str2('apple')
"apple"

In Python, double quote and single quote are the same. There's no different between them. And there's no point to replace a single quote with a double quote and vice versa:
2.4.1. String and Bytes literals
...In plain English: Both types of literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash () character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character...
"The issue I am having is I need the strings in the list to be with double quotes not single quotes." - Then you need to make your program accept single quotes, not trying to replace single quotes with double quotes.

Python issue with incorrectly formated strings that contains \x

At some point our python script receives string like that:
In [1]: ab = 'asd\xeffe\ctive'
In [2]: print ab
asd�fe\ctve \ \\ \\\k\\\
Data is damaged we need escape \x to be properly interpreted as \x but \c has not special meaning in string thus must be intact.
So far the closest solution I found is do something like:
In [1]: ab = 'asd\xeffe\ctve \\ \\\\ \\\\\\k\\\\\\'
In [2]: print ab.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
asd\xeffe\ctve \ \\ \\\k\\\
Output taken from IPython, I assumed that ab is a string not unicode string (in the later case we would have to do something like that:
def escape_string(s):
if isinstance(s, str):
s = s.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
elif isinstance(s, unicode):
s = s.encode('unicode-escape').replace('\\\\', '\\').replace("\\'", "'")
return s

\xhh is an escape character and \x is seen as the start of this escape.

'\\' is the same as '\x5c'. It is just two different ways to write the backslash character as a Python string literal.
These literal strings: r'\c', '\\c', '\x5cc', '\x5c\x63' are identical str objects in memory.
'\xef' is a single byte (239 as an integer), but r'\xef' (same as '\\xef') is a 4-byte string: '\x5c\x78\x65\x66'.
If s[0] returns '\xef' then it is what s object actually contains. If it is wrong then fix the source of the data.
Note: string-escape also escapes \n and the like:
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('unicode-escape')
\xef\\c\\\u2603"'\u2603\u2603"'\n\xa0
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('string-escape')
\xef\\c\\\\N{SNOWMAN}"\'\xe2\x98\x83\\u2603"\'\n\xa0
backslashreplace is used only on characters that cause UnicodeEncodeError:
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''
ï\c\☃"'☃☃"'
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''
�\c\\N{SNOWMAN}"'☃\u2603"'
�
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('ascii', 'backslashreplace')
\xef\c\\u2603"'\u2603\u2603"'
\xa0
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.decode('latin1').encode('ascii', 'backslashreplace')
\xef\c\\N{SNOWMAN}"'\xe2\x98\x83\u2603"'
\xa0

Backslashes introduce "escape sequences". \x specifically allows you to specify a byte, which is given as two hexadecimal digits after the x. ef are two hexadecimal digits, hence you get no error. Double the backslash to escape it, or use a raw string r"\xeffective".
Edit: While the Python console may show you '\\', this is precisely what you expect. You just say you expect something else because you confuse the string and its representation. It's a string containing a single backslash. If you were to output it with print, you'd see a single backslash.
But the string literal '\' is ill-formed (not closed because \' is an apostrophe, not a backslash and end-of-string-literal), so repr, which formats the results at the interactive shell, does not produce it. Instead it produces a string literal which you could paste into Python source code and get the same string object. For example, len('\\') == 1.

The \x escape sequence signifies a Unicode character in the string, and ef is being interpreted as the hex code. You can sanitize the string by adding an additional \, or else make it a raw string (r'\xeffective').
>>> r'\xeffective'[0]
'\\'
EDIT: You could convert an existing string using the following hack:
>>> a = '\xeffective'
>>> b = repr(a).strip("'")
>>> b
'\\xeffective'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Escape string and split it right after - python

Related

Python string.rstrip() doesn't strip specified characters

Python prevent decoding HEX to ASCII while removing backslashes from my Var

Python replace backward (\) with forward (/)

I want to replace single quotes with double quotes in a list

Python issue with incorrectly formated strings that contains \x

Categories

Resources