Retain the "\n"

Retain the "\n" - python

>>> t = "first%s\n"
>>> t = t %("second")
>>> print t
firstsecond
Is there anyway I could retain the "\n" at the end and get "firstsecond\n" as the output?

You need to escape the slash
>>> t = "first%s\\n"
>>> t = t %("second")
>>> print t
or use raw strings:
>>> t = r"first%s\n"
>>> t = t %("second")
>>> print t

print "firstsecond\n" displays "firstsecond" and the cursor is pushed to the next new line. So you don't see any backslash followed by n. Because the display of strings implies that the special characters such as \n are interpreted.
repr() prevents the interpretation so print repr("firstsecond\n") displays firstsecond\n
Then, what do you want ? :
t being "firstsecond\n" and to display repr(t) to verify that there is the character \n in it ?
or t being "firstsecond\\n" in order that print t will display firstsecond\n ?
See:
t = "first%s\n"
print repr(t),len(t)
t = t %("second")
print repr(t),len(t)
print '-------------------'
t = "first%s\\n" # the same as r"first%s\n"
print repr(t),len(t)
t = t %("second")
print repr(t),len(t)
result
'first%s\n' 8
'firstsecond\n' 12
-------------------
'first%s\\n' 9
'firstsecond\\n' 13
But don't make misinterpretation: when there is a display like that:
'first%s\\n' ,
the two backslashes \\ mean a value of ONE backslash. The two \\ appear only on the screen to express the value of a backslash in an escaped manner. Otherwise, it couldn't be possible to differentiate the two characters consisting of \ followed by n and the unique character \n

Depending on what do you need exactly, you may also check repr().
>>> s = "firstsecond\n"
>>> print repr(s)
'firstsecond\n'

Related

Python prevent decoding HEX to ASCII while removing backslashes from my Var

I want to strip some unwanted symbols from my variable. In this case the symbols are backslashes. I am using a HEX number, and as an example I will show some short simple code down bellow. But I don't want python to convert my HEX to ASCII, how would I prevent this from happening.? I have some long shell codes for asm to work with later which are really long and removing \ by hand is a long process. I know there are different ways like using echo -e "x\x\x\x" > output etc, but my whole script will be written in python.
Thanks
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> b = a.strip("\\")
>>> print b
1�Phtv
>>> a = "\x31\x32\x33\x34\x35\x36"
>>> b = a.strip("\\")
>>> print b
123456
At the end I would like it to print my var:
>>> print b
x31x32x33x34x35x36

There are no backslashes in your variable:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(a)
1ÀPhtv
Take newline for example: writing "\n" in Python will give you string with one character -- newline -- and no backslashes. See string literals docs for full syntax of these.
Now, if you really want to write string with such backslashes, you can do it with r modifier:
>>> a = r"\x31\xC0\x50\x68\x74\x76"
>>> print(a)
\x31\xC0\x50\x68\x74\x76
>>> print(a.replace('\\', ''))
x31xC0x50x68x74x76
But if you want to convert a regular string to hex-coded symbols, you can do it character by character, converting it to number ("\x31" == "1" --> 49), then to hex ("0x31"), and finally stripping the first character:
>>> a = "\x31\xC0\x50\x68\x74\x76"
>>> print(''.join([hex(ord(x))[1:] for x in a]))
'x31xc0x50x68x74x76'

There are two problems in your Code.
First the simple one:
strip() just removes one occurrence. So you should use replace("\\", ""). This will replace every backslash with "", which is the same as removing it.
The second problem is pythons behavior with backslashes:
To get your example working you need to append an 'r' in front of your string to indicate, that it is a raw string. a = r"\x31\xC0\x50\x68\x74\x76". In raw strings, a backlash doesn't escape a character but just stay a backslash.
>>> r"\x31\xC0\x50\x68\x74\x76"
'\\x31\\xC0\\x50\\x68\\x74\\x76'

Unicode object to a list

I have a utf8 - text corpus I can read easily in Python 2.7 :
sentence = codecs.open("D:\\Documents\\files\\sentence.txt", "r", encoding="utf8")
sentence = sentence.read()
> This is my sentence in the right format
However, when I pass this text corpus to a list (for example, for tokenizing) :
tokens = sentence.tokenize()
and print it in the notebook, I obtain bit-like caracters, like :
(u'\ufeff\ufeffFaux,', u'Tunisie')
(u'Tunisie', u"l'\xc9gypte,")
Whereas I would like normal characters just like in my original import.
So my question is : how can I pass unicode objects to a list without having strange bit/ASCII characters ?

It's all in how you print. Python 2 displays lists using ASCII-only characters and substituting backslash escape codes for non-ASCII characters. This is to make it easy to see hidden characters that normal printing would make invisible, like the double byte-order-mark (BOM) \ufeff you see in your strings. Printing individual string items will display them correctly.
Many examples
Original strings:
>>> s = (u'\ufeff\ufeffFaux,', u'Tunisie')
>>> t = (u'Tunisie', u"l'\xc9gypte,")
Displaying at the interactive prompt:
>>> s
(u'\ufeff\ufeffFaux,', u'Tunisie')
>>> t
(u'Tunisie', u"l'\xc9gypte,")
>>> print s
(u'\ufeff\ufeffFaux,', u'Tunisie')
>>> print t
(u'Tunisie', u"l'\xc9gypte,")
Printing individual strings from the tuples:
>>> print s[0]
Faux,
>>> print s[1]
Tunisie
>>> print t[0]
Tunisie
>>> print t[1]
l'Égypte,
>>> print ' '.join(s)
Faux, Tunisie
>>> print ' '.join(t)
Tunisie l'Égypte,
A way to print tuples without escape codes:
>>> print "('"+"', '".join(s)+"')"
('Faux,', 'Tunisie')
>>> print "('"+"', '".join(t)+"')"
('Tunisie', 'l'Égypte,')

Hm, codecs.open(...) returns a "wrapped version of the underlying file object" then you overwrite this variable with the result from executing the read method on that object. Brave, irritating - but ok ;-)
When you type say an äöüß into your "notebook", does it show like "this" or do you see some \uxxxxx instead?
The default value for codecs.open(...) is errors=strict so if this is the same environment for all samples, this should work.
I understand, that when you write "print it" you print the list, that is different from printing the content of the list.
Sample (taking a tab typed as \t into a normal "byte" string - this is python 2.7.11):
>>> a="\t"
>>> print a # below is an expanded tab
>>> a
'\t'
>>> [a]
['\t']
>>> print [a]
['\t']
>>> for element in [a]:
... print element
...
>>> # above is an expanded tab

How do I use a escape sequence with a variable?

a = "test"
b = "testing"
print a\nb
Is there a way I can use an escape sequence with a variable, or is it unnecessary?

If you are trying to print string variables by separating them with the newline character ('\n'), you can do so like this:
a = "Hello"
b = "World"
print(a+"\n"+b)
See demo here
Simply executing two separate print statements would also give a similar effect, this is because each print statement automatically inserts a newline character.
a = "Hello"
b = "World"
print(a)
print(b)
So it isn't necessary to use the newline character to escape a string while printing.

In case you'll have more to print/format in the future a maintainable solution would be,
l = [ 'Hello', 'world' ]
# Print with newlines
print('\n'.join(l))
# Print with tabs
print('\t'.join(l))
See https://docs.python.org/2/library/string.html?highlight=string%20join#string.join.

getting string between 2 characters in python

I need to get certain words out from a string in to a new format. For example, I call the function with the input:
text2function('$sin (x)$ is an function of x')
and I need to put them into a StringFunction:
StringFunction(function, independent_variables=[vari])
where I need to get just 'sin (x)' for function and 'x' for vari. So it would look like this finally:
StringFunction('sin (x)', independent_variables=['x']
problem is, I can't seem to obtain function and vari. I have tried:
start = string.index(start_marker) + len(start_marker)
end = string.index(end_marker, start)
return string[start:end]
and
r = re.compile('$()$')
m = r.search(string)
if m:
lyrics = m.group(1)
and
send = re.findall('$([^"]*)$',string)
all seems to seems to give me nothing. Am I doing something wrong? All help is appreciated. Thanks.

Tweeky way!
>>> char1 = '('
>>> char2 = ')'
>>> mystr = "mystring(123234sample)"
>>> print mystr[mystr.find(char1)+1 : mystr.find(char2)]
123234sample

$ is a special character in regex (it denotes the end of the string). You need to escape it:
>>> re.findall(r'\$(.*?)\$', '$sin (x)$ is an function of x')
['sin (x)']

If you want to cut a string between two identical characters (i.e, !234567890!)
you can use
line_word = line.split('!')
print (line_word[1])

You need to start searching for the second character beyond start:
end = string.index(end_marker, start + 1)
because otherwise it'll find the same character at the same location again:
>>> start_marker = end_marker = '$'
>>> string = '$sin (x)$ is an function of x'
>>> start = string.index(start_marker) + len(start_marker)
>>> end = string.index(end_marker, start + 1)
>>> string[start:end]
'sin (x)'
For your regular expressions, the $ character is interpreted as an anchor, not the literal character. Escape it to match the literal $ (and look for things that are not $ instead of not ":
send = re.findall('\$([^$]*)\$', string)
which gives:
>>> import re
>>> re.findall('\$([^$]*)\$', string)
['sin (x)']
The regular expression $()$ otherwise doesn't really match anything between the parenthesis even if you did escape the $ characters.

Python Escape Sequence and String Manipulation

I have the following two vars:
a = chr(92) + 'x11'
b = '\x11'
print 'a is: ' + a
print 'b is: ' + b
The result of these print statemtents:
a is: \x11
b is: <| # Here I am just showing a representation of the symbol that is printed for b
How can I make it so that variable a prints the same thing as var b using the chr(92) call? Thank you in advance.

The other answers are showing you how to make b give you what you get in a. If you want a to give you what you get in b (which is what you're asking, if I read you correctly), you need to decode the escape sequence:
>>> a
u'\\x11'
>>> a.decode('string-escape')
'\x11'
You can also use unicode-escape instead of string-escape if you want a unicode string as the result.

Check out the documentation for string literals.
Backslash is an escape character in Python strings, so to include a literal backslash in your string you need to escape them by using two consecutive backslashes. Alternatively, you can suppress the escaping behavior of backslashes by using a raw string literal, which is done by prefixing the string with r. For example:
Escaping the backslash:
b = '\\x11'
Using a raw string literal:
b = r'\x11'
If I am misinterpreting your question and b should be '\x11' or equivalently chr(17), but you just want it to display in the escaped format, you can use repr() for that:
>>> b = '\x11'
>>> print 'b is: ' + repr(b)
b is: '\x11'
If you don't want the quotes, use the string_escape encoding:
>>> print 'b is: ' + b.encode('string_escape')
b is: \x11
Or to get a to be the same as b, you can use a.decode('string_escape').

\x11 appears to be the hex value for a ^Q control character in ASCII:
\021 17 DC1 \x11 ^Q (Device control 1) (XON) (Default UNIX START char.)
You need to escape the \ to get the literal \x11

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Retain the "\n" - python

>>> t = "first%s\n" >>> t = t %("second") >>> print t firstsecond Is there anyway I could retain the "\n" at the end and get "firstsecond\n" as the output?

You need to escape the slash >>> t = "first%s\\n" >>> t = t %("second") >>> print t or use raw strings: >>> t = r"first%s\n" >>> t = t %("second") >>> print t

Depending on what do you need exactly, you may also check repr(). >>> s = "firstsecond\n" >>> print repr(s) 'firstsecond\n'

Related

Python prevent decoding HEX to ASCII while removing backslashes from my Var

Unicode object to a list

How do I use a escape sequence with a variable?

getting string between 2 characters in python

Python Escape Sequence and String Manipulation

Categories

Resources