How can I do a line break (line continuation) in Python? - python

Given:
e = 'a' + 'b' + 'c' + 'd'
How do I write the above in two lines?
e = 'a' + 'b' +
'c' + 'd'

What is the line? You can just have arguments on the next line without any problems:
a = dostuff(blahblah1, blahblah2, blahblah3, blahblah4, blahblah5,
blahblah6, blahblah7)
Otherwise you can do something like this:
if (a == True and
b == False):
or with explicit line break:
if a == True and \
b == False:
Check the style guide for more information.
Using parentheses, your example can be written over multiple lines:
a = ('1' + '2' + '3' +
'4' + '5')
The same effect can be obtained using explicit line break:
a = '1' + '2' + '3' + \
'4' + '5'
Note that the style guide says that using the implicit continuation with parentheses is preferred, but in this particular case just adding parentheses around your expression is probably the wrong way to go.

From PEP 8 -- Style Guide for Python Code:
The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.
Backslashes may still be appropriate at times. For example, long, multiple with-statements cannot use implicit continuation, so backslashes are acceptable:
with open('/path/to/some/file/you/want/to/read') as file_1, \
open('/path/to/some/file/being/written', 'w') as file_2:
file_2.write(file_1.read())
Another such case is with assert statements.
Make sure to indent the continued line appropriately. The preferred place to break around a binary operator is after the operator, not before it. Some examples:
class Rectangle(Blob):
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
if (width == 0 and height == 0 and
color == 'red' and emphasis == 'strong' or
highlight > 100):
raise ValueError("sorry, you lose")
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError("I don't think so -- values are %s, %s" %
(width, height))
Blob.__init__(self, width, height,
color, emphasis, highlight)file_2.write(file_1.read())
PEP8 now recommends the opposite convention (for breaking at binary operations) used by mathematicians and their publishers to improve readability.
Donald Knuth's style of breaking before a binary operator aligns operators vertically, thus reducing the eye's workload when determining which items are added and subtracted.
From PEP8: Should a line break before or after a binary operator?:
Donald Knuth explains the traditional rule in his Computers and Typesetting series: "Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations"[3].
Following the tradition from mathematics usually results in more readable code:
# Yes: easy to match operators with operands
income = (gross_wages
+ taxable_interest
+ (dividends - qualified_dividends)
- ira_deduction
- student_loan_interest)
In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth's style is suggested.
[3]: Donald Knuth's The TeXBook, pages 195 and 196

The danger in using a backslash to end a line is that if whitespace is added after the backslash (which, of course, is very hard to see), the backslash is no longer doing what you thought it was.
See Python Idioms and Anti-Idioms (for Python 2 or Python 3) for more.

Put a \ at the end of your line or enclose the statement in parens ( .. ). From IBM:
b = ((i1 < 20) and
(i2 < 30) and
(i3 < 40))
or
b = (i1 < 20) and \
(i2 < 30) and \
(i3 < 40)

You can break lines in between parenthesises and braces. Additionally, you can append the backslash character \ to a line to explicitly break it:
x = (tuples_first_value,
second_value)
y = 1 + \
2

From the horse's mouth: Explicit line
joining
Two or more physical lines may be
joined into logical lines using
backslash characters (\), as follows:
when a physical line ends in a
backslash that is not part of a string
literal or comment, it is joined with
the following forming a single logical
line, deleting the backslash and the
following end-of-line character. For
example:
if 1900 < year < 2100 and 1 <= month <= 12 \
and 1 <= day <= 31 and 0 <= hour < 24 \
and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
return 1
A line ending in a backslash cannot
carry a comment. A backslash does not
continue a comment. A backslash does
not continue a token except for string
literals (i.e., tokens other than
string literals cannot be split across
physical lines using a backslash). A
backslash is illegal elsewhere on a
line outside a string literal.

If you want to break your line because of a long literal string, you can break that string into pieces:
long_string = "a very long string"
print("a very long string")
will be replaced by
long_string = (
"a "
"very "
"long "
"string"
)
print(
"a "
"very "
"long "
"string"
)
Output for both print statements:
a very long string
Notice the parenthesis in the affectation.
Notice also that breaking literal strings into pieces allows to use the literal prefix only on parts of the string and mix the delimiters:
s = (
'''2+2='''
f"{2+2}"
)

One can also break the call of methods (obj.method()) in multiple lines.
Enclose the command in parenthesis "()" and span multiple lines:
> res = (some_object
.apply(args)
.filter()
.values)
For instance, I find it useful on chain calling Pandas/Holoviews objects methods.

It may not be the Pythonic way, but I generally use a list with the join function for writing a long string, like SQL queries:
query = " ".join([
'SELECT * FROM "TableName"',
'WHERE "SomeColumn1"=VALUE',
'ORDER BY "SomeColumn2"',
'LIMIT 5;'
])

Taken from The Hitchhiker's Guide to Python (Line Continuation):
When a logical line of code is longer than the accepted limit, you need to split it over multiple physical lines. The Python interpreter will join consecutive lines if the last character of the line is a backslash. This is helpful in some cases, but should usually be avoided because of its fragility: a white space added to the end of the line, after the backslash, will break the code and may have unexpected results.
A better solution is to use parentheses around your elements. Left with an unclosed parenthesis on an end-of-line the Python interpreter will join the next line until the parentheses are closed. The same behaviour holds for curly and square braces.
However, more often than not, having to split a long logical line is a sign that you are trying to do too many things at the same time, which may hinder readability.
Having that said, here's an example considering multiple imports (when exceeding line limits, defined on PEP-8), also applied to strings in general:
from app import (
app, abort, make_response, redirect, render_template, request, session
)

Related

Google colab cannot add comment

Usually I can make comment with # or """ for multiline comments. But in the following cases,
if i > 0:
if (df.loc[i, 'data'] <= level1) and \ # Comment
(df.loc[i - 1, 'data'] > level1) and \ # Comment
not ideal_state:
ideal_state_time = df.loc[i,'data']
ideal_state = True
I got the error
File "<ipython-input-24-07959bc4f436>", line 121
if (df.loc[i, 'data'] <= level1) and \ # Comment
^
SyntaxError: unexpected character after line continuation character
What is going on? What's wrong with commenting after the slash? I put the slash there because otherwise it will return an error.
You can try replacing \ (back-slashes) with ()(brackets) as shown below
if( (df.loc[i, 'data'] <= level1) and # Comment
(df.loc[i - 1, 'data'] > level1) and # Comment
not ideal_state
):
ideal_state_time = df.loc[i,'data']
ideal_state = True
You can see in PEP8, it's recommended to use brackets
The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.
Backslashes may still be appropriate at times. For example, long, multiple with-statements cannot use implicit continuation, so backslashes are acceptable:
with open('/path/to/some/file/you/want/to/read') as file_1, \
open('/path/to/some/file/being/written', 'w') as file_2:
file_2.write(file_1.read())```
The backslash allows your if to span multiple lines, as it says something like "ignore the upcoming char (newline)"
your interpreter reads it like this:
if (df.loc[i, 'data'] <= level1) and # Comment (df.loc[i - 1, 'data'] > lev...
And then your interpreter is right, your comment sign does not belong there.
Line continuations may never carry comments.
You're allowed to comment again after not ideal_state:
the backslash "\" is the line continuation character. i.e
print "massive super long string that doesn't fit" + \
"on a single line"
only newline charecters/whitespace are allowed after it.

What is the meaning of tuple under function definition [duplicate]

Given:
e = 'a' + 'b' + 'c' + 'd'
How do I write the above in two lines?
e = 'a' + 'b' +
'c' + 'd'
What is the line? You can just have arguments on the next line without any problems:
a = dostuff(blahblah1, blahblah2, blahblah3, blahblah4, blahblah5,
blahblah6, blahblah7)
Otherwise you can do something like this:
if (a == True and
b == False):
or with explicit line break:
if a == True and \
b == False:
Check the style guide for more information.
Using parentheses, your example can be written over multiple lines:
a = ('1' + '2' + '3' +
'4' + '5')
The same effect can be obtained using explicit line break:
a = '1' + '2' + '3' + \
'4' + '5'
Note that the style guide says that using the implicit continuation with parentheses is preferred, but in this particular case just adding parentheses around your expression is probably the wrong way to go.
From PEP 8 -- Style Guide for Python Code:
The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.
Backslashes may still be appropriate at times. For example, long, multiple with-statements cannot use implicit continuation, so backslashes are acceptable:
with open('/path/to/some/file/you/want/to/read') as file_1, \
open('/path/to/some/file/being/written', 'w') as file_2:
file_2.write(file_1.read())
Another such case is with assert statements.
Make sure to indent the continued line appropriately. The preferred place to break around a binary operator is after the operator, not before it. Some examples:
class Rectangle(Blob):
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
if (width == 0 and height == 0 and
color == 'red' and emphasis == 'strong' or
highlight > 100):
raise ValueError("sorry, you lose")
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError("I don't think so -- values are %s, %s" %
(width, height))
Blob.__init__(self, width, height,
color, emphasis, highlight)file_2.write(file_1.read())
PEP8 now recommends the opposite convention (for breaking at binary operations) used by mathematicians and their publishers to improve readability.
Donald Knuth's style of breaking before a binary operator aligns operators vertically, thus reducing the eye's workload when determining which items are added and subtracted.
From PEP8: Should a line break before or after a binary operator?:
Donald Knuth explains the traditional rule in his Computers and Typesetting series: "Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations"[3].
Following the tradition from mathematics usually results in more readable code:
# Yes: easy to match operators with operands
income = (gross_wages
+ taxable_interest
+ (dividends - qualified_dividends)
- ira_deduction
- student_loan_interest)
In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth's style is suggested.
[3]: Donald Knuth's The TeXBook, pages 195 and 196
The danger in using a backslash to end a line is that if whitespace is added after the backslash (which, of course, is very hard to see), the backslash is no longer doing what you thought it was.
See Python Idioms and Anti-Idioms (for Python 2 or Python 3) for more.
Put a \ at the end of your line or enclose the statement in parens ( .. ). From IBM:
b = ((i1 < 20) and
(i2 < 30) and
(i3 < 40))
or
b = (i1 < 20) and \
(i2 < 30) and \
(i3 < 40)
You can break lines in between parenthesises and braces. Additionally, you can append the backslash character \ to a line to explicitly break it:
x = (tuples_first_value,
second_value)
y = 1 + \
2
From the horse's mouth: Explicit line
joining
Two or more physical lines may be
joined into logical lines using
backslash characters (\), as follows:
when a physical line ends in a
backslash that is not part of a string
literal or comment, it is joined with
the following forming a single logical
line, deleting the backslash and the
following end-of-line character. For
example:
if 1900 < year < 2100 and 1 <= month <= 12 \
and 1 <= day <= 31 and 0 <= hour < 24 \
and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
return 1
A line ending in a backslash cannot
carry a comment. A backslash does not
continue a comment. A backslash does
not continue a token except for string
literals (i.e., tokens other than
string literals cannot be split across
physical lines using a backslash). A
backslash is illegal elsewhere on a
line outside a string literal.
If you want to break your line because of a long literal string, you can break that string into pieces:
long_string = "a very long string"
print("a very long string")
will be replaced by
long_string = (
"a "
"very "
"long "
"string"
)
print(
"a "
"very "
"long "
"string"
)
Output for both print statements:
a very long string
Notice the parenthesis in the affectation.
Notice also that breaking literal strings into pieces allows to use the literal prefix only on parts of the string and mix the delimiters:
s = (
'''2+2='''
f"{2+2}"
)
One can also break the call of methods (obj.method()) in multiple lines.
Enclose the command in parenthesis "()" and span multiple lines:
> res = (some_object
.apply(args)
.filter()
.values)
For instance, I find it useful on chain calling Pandas/Holoviews objects methods.
It may not be the Pythonic way, but I generally use a list with the join function for writing a long string, like SQL queries:
query = " ".join([
'SELECT * FROM "TableName"',
'WHERE "SomeColumn1"=VALUE',
'ORDER BY "SomeColumn2"',
'LIMIT 5;'
])
Taken from The Hitchhiker's Guide to Python (Line Continuation):
When a logical line of code is longer than the accepted limit, you need to split it over multiple physical lines. The Python interpreter will join consecutive lines if the last character of the line is a backslash. This is helpful in some cases, but should usually be avoided because of its fragility: a white space added to the end of the line, after the backslash, will break the code and may have unexpected results.
A better solution is to use parentheses around your elements. Left with an unclosed parenthesis on an end-of-line the Python interpreter will join the next line until the parentheses are closed. The same behaviour holds for curly and square braces.
However, more often than not, having to split a long logical line is a sign that you are trying to do too many things at the same time, which may hinder readability.
Having that said, here's an example considering multiple imports (when exceeding line limits, defined on PEP-8), also applied to strings in general:
from app import (
app, abort, make_response, redirect, render_template, request, session
)

What is the difference between '\' and '\n' escape sequence in python

I am coming from C language
In book of Python it is given
among escape sequence
\ - New line in a multi-line string
\n - Line break
I am confused and unable to differentiate between the two.
You have completely misread the book.
\ is not an escape sequence, and is not used on its own in strings. It is used in multi-line code.
\n is a newline character in strings.
The book is confusing you by mixing two entirely different concepts.
\n is an escape sequence in a string literal. Like other \single-character and \xhh or \uhhhh escape sequences these work exactly like those in C; they define a character in the string that would otherwise be difficult to spell out when writing code.
\ at the end of a physical line of code extends the logical line. That is, Python will see text on the next line as part of the current line, making it one long line of code. This applies anywhere in Python code.
You can trivially see the difference when you print the results of strings that use either technique:
escape_sequence = "This is a line.\nThis is another line"
logical_line_extended = "This is a logical line. \
This is still the same logical line."
print(escape_sequence)
print(logical_line_extended)
This outputs
This is a line.
This is another line
This is a logical line. This is still the same logical line.
Note that the line breaks have swapped! The \n escape sequence in the string value caused the output to be broken across two lines (the terminal or console or whatever is displaying the printed data, knows how to interpret a newline character), while the newline in the logical_line_extended string literal definition is gone; it was never part of the string value being defined, it was a newline in the source code only.
Python lets you extend a line of code like this because Python defines how you delimit logical lines very differently from C. In C, you end statements with ;, and group blocks of lines with {...} curly braces. Newlines are not part of how C reads your code.
So, the following C code:
if (a) { foo = 'bar'; spam = 'ham'; }
is the same thing as
if (a) {
foo = 'bar';
spam = 'ham';
}
C knows where each statement starts and ends because the programmer has to use ; and {...} to delimit lines and blocks, the language doesn't care about indentation or newlines at all here. In Python however, you explicitly use newlines and indentation to define the same structure. So Python uses whitespace instead of {, } and ;.
This means you could end up with long lines of code to hold a complex expression:
# deliberately convoluted long expression to illustrate a point
expr = 18 ** (1 / 3) / (6 * (3 + sqrt(3) * I) ** (1 / 3)) + 12 ** (1 / 3) * (3 + sqrt(3) * I) ** (1 / 3) / 12
The point of \ is to allow you to break up such a long expression across multiple logical lines by extending the current line with \ at the end:
# deliberately convoluted long expression to illustrate a point
expr = 18 ** (1 / 3) / (6 * (3 + sqrt(3) * I) ** (1 / 3)) + \
12 ** (1 / 3) * (3 + sqrt(3) * I) ** (1 / 3) / 12
So the \ as the last character on a line, tells Python to ignore the newline that's there and continue treating the following line as part of the same logical line.
Python also extends the logical line when it has seen an opening (, [ or { brace, until the matching }, ] or ) brace is found to close the expression. This is the preferred method of extending lines. So the above expression could be broken up across multiple logical lines with:
expr = (18 ** (1 / 3) / (6 * (3 + sqrt(3) * I) ** (1 / 3)) +
12 ** (1 / 3) * (3 + sqrt(3) * I) ** (1 / 3) / 12)
You can do the same with strings:
long_string = (
"This is a longer string that does not contain any newline "
"*characters*, but is defined in the source code with "
"multiple strings across multiple logical lines."
)
This uses another C string literal trick Python borrowed: multiple consecutive string literals form one long string object once parsed and compiled.
See the Lexical analysis reference documentation:
2.1.5. Explicit line joining
Two or more physical lines may be joined into logical lines using backslash characters (\)[.]
[...]
2.1.6. Implicit line joining
Expressions in parentheses, square brackets or curly braces can be split over more than one physical line without using backslashes.
The same documentation lists all the permitted Python string escape sequences.
\ - New line in a multi-line string
It is used for splitting a string which has a vast number of characters into multi lines as it is inconvenient to write in a single line.
This is something that has effect in the code only.
\n - Line break
This one on the other hand is a typical line break statement for printing something in a new line. The same thing we use in C and C++ languages.
this is something that has effect in the output.
Here is the response to your question:
Purpose of \n is basically used to give a line break as you mention too.
Example:
print("Hello\n")
print("Hi")
The output of the above would be like:
Hello
Hi
Purpose of \ is basically used to escape characters which have special meaning
Example: I have to print Hello\ in the output then the code will be like
print("Hello\\")
The output of the above code will be like:
Hello\
So bascially in order to print Hello\ in your output, you have to put two "\\" and this is the purpose of \ character (to escape special characters).
I hope this helps.
With "\" you can change line as you write your code. What I mean is that if you write a long line of code and you want to change line to see what you type.
For example :
print("This is a demonstration of backslash.")
is the same as writing :
print("This is a demonstration \
of backslash")
On the other hand with "\n" you can change line in what you want to print. For example, when you write:print("this is an \nexample"), it will print "this is an"(changes line) "example".
Use \n to have your output go to the next line.
print('Hello \nworld!')
Hello
world!
Use the back slash with a character that has a meaning to Python when you want that character to appear in the printed output.
print('It\'s cold outside')
It's cold outside
I hope this helps. 😀
As excellently answered by #Jimmy I further give the following examples to make the matter more clear.
Case 1:
>>> var1 = "Adolf Hitler was a German dictator. He started the second world war."
>>> print(var1)
Adolf Hitler was a German dictator. He started the second world war.
>>>
Case 2:
>>> var2 = "Adolf Hitler\
... was a German dictator.\
... He started the\
... second world war."\
...
>>> print(var2)
Adolf Hitler was a German dictator. He started the second world war.
>>>
Case 3:
>>> var3 = "Adolf Hitler\nwas a German dictator.\nHe started the\nsecond world war."
>>> print(var3)
Adolf Hitler
was a German dictator.
He started the
second world war.
>>>
Case 4:
>>> var4 = "Adolf Hitler\
... \nwas a German dictator.\
... \nhe started the\
... \nsecond world war."\
...
>>> print(var4)
Adolf Hitler
was a German dictator.
he started the
second world war.
>>>
There is also another point which #Jimmy has not mentioned. I have illustrated it by the following two examples -
Example 1:
>>> var5 = """
... This multi-line string
... has a space at the top
... and a space at the bottom
... when it prints.
... """
>>> print(var5)
This multi-line string
has a space at the top
and a space at the bottom
when it prints.
>>>
Example 2:
>>> var6 = """\
... This multi-line string
... has no space at the
... top or the bottom
... when it prints.\
... """
>>> print(var6)
This multi-line string
has no space at the
top or the bottom
when it prints.

Need help designing a regex or pyparsing approach to modify all words enclosed within pipes

For example:
blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|
Must become
blahblahx0Ax4Dx5Ex43adfsdasdx92 sgagrewasx12x5E
I'm trying something along the lines of: re.sub(r'\|(\w+ ?)*\|', r'x\1', a) But I'm having trouble getting it to work on more than the first match.
UPDATE: It looks like regex is not a good choice for this. Would a pyparsing solution be doable?
If not, I can write a simple iterative solution, but I would prefer something more extensible. But I'm having trouble getting it to work on more than the first match.
UPDATE2: I used a pure python approach in the end, it works fine and can deal with escape characters too.
def strtohex(self, string):
hexmode = False
hexstring = ''
i=0
while i<len(string):
if string[i] == '\\':
i += 1
#No escape charecters inside hex pipes
hexstring += string[i]
elif string[i] == '|':
hexmode = not hexmode
elif string[i] == ' ':
hexstring += '' if hexmode else ' '
else:
if hexmode:
hexstring += chr(int(string[i:i+2],16))
i += 1
else:
hexstring += string[i]
i += 1
return hexstring
Here is what this might look like in pyparsing:
from pyparsing import Word,hexnums,Suppress,OneOrMore
twoDigitHex = Word(hexnums,exact=2)
VERT = Suppress('|')
pattern = VERT + OneOrMore(twoDigitHex) + VERT
# attach parse action to prefix each 2-digit hex with 'x' and join all together
pattern.setParseAction(lambda t: ''.join('x'+tt for tt in t))
# take sample code, and use transformString to apply conversion
sample = "blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|"
print pattern.transformString(sample)
prints
blahblahx0Ax4Dx5Ex43adfsdasdx92 sgagrewasx12x5E
I'm sure you could do it using only a regex, but why bother? It's simple to use your programming language:
Break your string at the vertical bars. Check and substitute if appropriate. Recombine.
line = 'blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|'
parts = line.split('|')
for i, s in enumerate(parts):
if re.match(r'^([\dA-F]{2} )*[\dA-F]$', s):
parts[i] = re.sub('^| ', 'x', s)
result = "".join(parts)
The check is whether the entire substring consists of two-digit hex numbers separated by spaces. I assume all hex letters are capitalized, as in your example.
I proceeded in 2 times:
1st replace every hex value
then remove blanks and |
It gives:
>>> s = 'blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|'
>>> re.sub(r'[| ]', r'', re.sub(r' ?([0-9A-F]{2})', r'x\1', s))
'blahblahx0Ax4Dx5Ex43adfsdasdx92sgagrewasx12x5E'
I don't think python is capable of balanced regex expressions. To my knowledge, .NET is the only flavor with such support (and it looks quite ugly and is nightmarish to maintain).
You may be better off splitting the string on the pipe symbol, then rejoining the string, applying the desired formatting (via regex, if so desired) on the odd numbered string array items.
EDIT: On second thought, I believe this would be possible using a lookbehind with a variable-length expression, but unfortunately python does not have support for those. (For example, something along the lines of (?<=^(?:[^|]*\|[^|]*\|)*[^|]*)\|(\w+ ?)*\|)

Show non printable characters in a string

Is it possible to visualize non-printable characters in a python string with its hex values?
e.g. If I have a string with a newline inside I would like to replace it with \x0a.
I know there is repr() which will give me ...\n, but I'm looking for the hex version.
I don't know of any built-in method, but it's fairly easy to do using a comprehension:
import string
printable = string.ascii_letters + string.digits + string.punctuation + ' '
def hex_escape(s):
return ''.join(c if c in printable else r'\x{0:02x}'.format(ord(c)) for c in s)
I'm kind of late to the party, but if you need it for simple debugging, I found that this works:
string = "\n\t\nHELLO\n\t\n\a\17"
procd = [c for c in string]
print(procd)
# Prints ['\n,', '\t,', '\n,', 'H,', 'E,', 'L,', 'L,', 'O,', '\n,', '\t,', '\n,', '\x07,', '\x0f,']
While just list is simpler, a comprehension makes it easier to add in filtering/mapping if necessary.
You'll have to make the translation manually; go through the string with a regular expression for example, and replace each occurrence with the hex equivalent.
import re
replchars = re.compile(r'[\n\r]')
def replchars_to_hex(match):
return r'\x{0:02x}'.format(ord(match.group()))
replchars.sub(replchars_to_hex, inputtext)
The above example only matches newlines and carriage returns, but you can expand what characters are matched, including using \x escape codes and ranges.
>>> inputtext = 'Some example containing a newline.\nRight there.\n'
>>> replchars.sub(replchars_to_hex, inputtext)
'Some example containing a newline.\\x0aRight there.\\x0a'
>>> print(replchars.sub(replchars_to_hex, inputtext))
Some example containing a newline.\x0aRight there.\x0a
Modifying ecatmur's solution to handle non-printable non-ASCII characters makes it less trivial and more obnoxious:
def escape(c):
if c.printable():
return c
c = ord(c)
if c <= 0xff:
return r'\x{0:02x}'.format(c)
elif c <= '\uffff':
return r'\u{0:04x}'.format(c)
else:
return r'\U{0:08x}'.format(c)
def hex_escape(s):
return ''.join(escape(c) for c in s)
Of course if str.isprintable isn't exactly the definition you want, you can write a different function. (Note that it's a very different set from what's in string.printable—besides handling non-ASCII printable and non-printable characters, it also considers \n, \r, \t, \x0b, and \x0c as non-printable.
You can make this more compact; this is explicit just to show all the steps involved in handling Unicode strings. For example:
def escape(c):
if c.printable():
return c
elif c <= '\xff':
return r'\x{0:02x}'.format(ord(c))
else:
return c.encode('unicode_escape').decode('ascii')
Really, no matter what you do, you're going to have to handle \r, \n, and \t explicitly, because all of the built-in and stdlib functions I know of will escape them via those special sequences instead of their hex versions.
I did something similar once by deriving a str subclass with a custom __repr__() method which did what I wanted. It's not exactly what you're looking for, but may give you some ideas.
# -*- coding: iso-8859-1 -*-
# special string subclass to override the default
# representation method. main purpose is to
# prefer using double quotes and avoid hex
# representation on chars with an ord > 128
class MsgStr(str):
def __repr__(self):
# use double quotes unless there are more of them within the string than
# single quotes
if self.count("'") >= self.count('"'):
quotechar = '"'
else:
quotechar = "'"
rep = [quotechar]
for ch in self:
# control char?
if ord(ch) < ord(' '):
# remove the single quotes around the escaped representation
rep += repr(str(ch)).strip("'")
# embedded quote matching quotechar being used?
elif ch == quotechar:
rep += "\\"
rep += ch
# else just use others as they are
else:
rep += ch
rep += quotechar
return "".join(rep)
if __name__ == "__main__":
s1 = '\tWürttemberg'
s2 = MsgStr(s1)
print "str s1:", s1
print "MsgStr s2:", s2
print "--only the next two should differ--"
print "repr(s1):", repr(s1), "# uses built-in string 'repr'"
print "repr(s2):", repr(s2), "# uses custom MsgStr 'repr'"
print "str(s1):", str(s1)
print "str(s2):", str(s2)
print "repr(str(s1)):", repr(str(s1))
print "repr(str(s2)):", repr(str(s2))
print "MsgStr(repr(MsgStr('\tWürttemberg'))):", MsgStr(repr(MsgStr('\tWürttemberg')))
There is also a way to print non-printable characters in the sense of them executing as commands within the string even if not visible (transparent) in the string, and their presence can be observed by measuring the length of the string using len as well as by simply putting the mouse cursor at the start of the string and seeing/counting how many times you have to tap the arrow key to get from start to finish, as oddly some single characters can have a length of 3 for example, which seems perplexing. (Not sure if this was already demonstrated in prior answers)
In this example screenshot below, I pasted a 135-bit string that has a certain structure and format (which I had to manually create beforehand for certain bit positions and its overall length) so that it is interpreted as ascii by the particular program I'm running, and within the resulting printed string are non-printable characters such as the 'line break` which literally causes a line break (correction: form feed, new page I meant, not line break) in the printed output there is an extra entire blank line in between the printed result (see below):
Example of printing non-printable characters that appear in printed string
Input a string:100100001010000000111000101000101000111011001110001000100001100010111010010101101011100001011000111011001000101001000010011101001000000
HPQGg]+\,vE!:#
>>> len('HPQGg]+\,vE!:#')
17
>>>
In the above code excerpt, try to copy-paste the string HPQGg]+\,vE!:# straight from this site and see what happens when you paste it into the Python IDLE.
Hint: You have to tap the arrow/cursor three times to get across the two letters from P to Q even though they appear next to each other, as there is actually a File Separator ascii command in between them.
However, even though we get the same starting value when decoding it as a byte array to hex, if we convert that hex back to bytes they look different (perhaps lack of encoding, not sure), but either way the above output of the program prints non-printable characters (I came across this by chance while trying to develop a compression method/experiment).
>>> bytes(b'HPQGg]+\,vE!:#').hex()
'48501c514767110c5d2b5c2c7645213a40'
>>> bytes.fromhex('48501c514767110c5d2b5c2c7645213a40')
b'HP\x1cQGg\x11\x0c]+\\,vE!:#'
>>> (0x48501c514767110c5d2b5c2c7645213a40 == 0b100100001010000000111000101000101000111011001110001000100001100010111010010101101011100001011000111011001000101001000010011101001000000)
True
>>>
In the above 135 bit string, the first 16 groups of 8 bits from the big-endian side encode each character (including non-printable), whereas the last group of 7 bits results in the # symbol, as seen below:
Technical breakdown of the format of the above 135-bit string
And here as text is the breakdown of the 135-bit string:
10010000 = H (72)
10100000 = P (80)
00111000 = x1c (28 for File Separator) *
10100010 = Q (81)
10001110 = G(71)
11001110 = g (103)
00100010 = x11 (17 for Device Control 1) *
00011000 = x0c (12 for NP form feed, new page) *
10111010 = ] (93 for right bracket ‘]’
01010110 = + (43 for + sign)
10111000 = \ (92 for backslash)
01011000 = , (44 for comma, ‘,’)
11101100 = v (118)
10001010 = E (69)
01000010 = ! (33 for exclamation)
01110100 = : (58 for colon ‘:’)
1000000 = # (64 for ‘#’ sign)
So in closing, the answer to the sub-question about showing the non-printable as hex, in byte array further above appears the letters x1c which denote the file separator command which was also noted in the hint. The byte array could be considered a string if excluding the prefix b on the left side, and again this value shows in the print string albeit it is invisible (although its presence can be observed as demonstrated above with the hint and len command).

Categories