"""Different""", "quote", 'types' in function? - python

I am trying to figure out if the different quote types make a difference functionally. I have seen people say its preference for "" or '' but what about """ """? I tested it in a simple code to see if it would work and it does. I was wondering if """ triple quotes """ have a functional purpose for defined function arguments or is it just another quote option that can be used interchangeably like "" and ''?
As I have seen many people post about "" and '' I have not seen a post about """ """ or ''' ''' being used in functions.
My question is: Does the triple quote have a unique use as an argument or is it simply interchangeable with "" and ''? The reason I think it might have a unique function is because it is a multi line quote and I was wondering if it would allow a multi line argument to be submitted. I am not sure if something like that would even be useful but it could be.
Here is an example that prints out what you would expect using all the quote types I know of.
def myFun(var1="""different""",var2="quote",var3='types'):
return var1, var2, var3
print (myFun('All','''for''','one!'))
Result:
('All', 'for', 'one!')
EDIT:
After some more testing of the triple quote I did find some variation in how it works using return vs printing in the function.
def myFun(var1="""different""",var2="""quote""",var3='types'):
return (var1, var2, var3)
print(myFun('This',
'''Can
Be
Multi''',
'line!'))
Result:
('This', 'Can\nBe\nMulti', 'line!')
Or:
def myFun(var1="""different""",var2="""quote""",var3='types'):
print (var1, var2, var3)
myFun('This',
'''Can
Be
Multi''',
'line!')
Result:
This Can
Be
Multi line!

From the docs:
String literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). [...other rules applying identically to all string literal types omitted...]
In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A “quote” is the character used to open the string, i.e. either ' or ".)
Thus, triple-quoted string literals can span multiple lines, and can contain literal quotes without use of escape sequences, but are otherwise exactly identical to string literals expressed with other quoting types (including those using escape sequences such as \n or \' to express the same content).
Also see the Python 3 documentation: Bytes and String Literals -- which expresses an effectively identical set of rules with slightly different verbiage.
A more gentle introduction is also available in the language tutorial, which explicitly introduces triple-quotes as a way to permit strings to span multiple lines:
String literals can span multiple lines. One way is using triple-quotes: """...""" or '''...'''. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line. The following example:
print("""\
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
""")
produces the following output (note that the initial newline is not included):
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
To be clear, though: These are different syntax, but the string literals they create are indistinguishable from each other. That is to say, given the following code:
s1 = '''foo
'bar'
baz
'''
s2 = 'foo\n\'bar\'\nbaz\n'
there's no possible way to tell s1 and s2 apart from each other by looking at their values: s1 == s2 is true, and so is repr(s1) == repr(s2). The Python interpreter is even allowed to intern them to the same value, so it may (or may not) make id(s1) == id(s2) true depending on details (such as whether the code was run at the REPL or imported as a module).

FWIW, my understanding is that there's a convention whereby """ """, ''' ''' are used for docstring, which is kinda like a #comment, but is a recallable attribute that can be referenced later. https://www.python.org/dev/peps/pep-0257/
I'm a beginner too, but my understanding is that using triple quotes for strings is not the best idea, even if there's little functional difference with what you're doing currently (I don't know if there might be later). Sticking with conventions is helpful to others reading and using your code, and it seems to be a good rule of thumb that some conventions will bite you if you don't follow them, as in this case where a mal-formatted string with triple quotes will be interpreted as a docstring, and maybe not throw an error, and you'll need to search through a bunch of code to find the issue.

Related

Why do two or more string arguments without commas separating them results in concatenation when given in argument list [duplicate]

I can create a multi-line string using this syntax:
string = str("Some chars "
"Some more chars")
This will produce the following string:
"Some chars Some more chars"
Is Python joining these two separate strings or is the editor/compiler treating them as a single string?
P.s: I just want to understand the internals. I know there are other ways to declare or create multi-line strings.
Read the reference manual, it's in there.
Specifically:
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings,
(emphasis mine)
This is why:
string = str("Some chars "
"Some more chars")
is exactly the same as: str("Some chars Some more chars").
This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with str above) et-cetera.
The only caveat is when a string literal is not contained between one of the grouping delimiters (), {} or [] but, instead, spreads between two separate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:
string = "Some chars " \
"Some more chars"
Of course, concatenation of strings on the same physical line does not require the backslash. (string = "Hello " "World" is just fine)
Is Python joining these two separate strings or is the editor/compiler treating them as a single string?
Python is, now when exactly does Python do this is where things get interesting.
From what I could gather (take this with a pinch of salt, I'm not a parsing expert), this happens when Python transforms the parse tree (LL(1) Parser) for a given expression to it's corresponding AST (Abstract Syntax Tree).
You can get a view of the parsed tree via the parser module:
import parser
expr = """
str("Hello "
"World")
"""
pexpr = parser.expr(expr)
parser.st2list(pexpr)
This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in expr:
-- rest snipped for brevity --
[322,
[323,
[3, '"hello"'],
[3, '"world"']]]]]]]]]]]]]]]]]],
-- rest snipped for brevity --
The numbers correspond to either symbols or tokens in the parse tree and the mappings from symbol to grammar rule and token to constant are in Lib/symbol.py and Lib/token.py respectively.
As you can see in the snipped version I added, you have two different entries corresponding to the two different str literals in the expression parsed.
Next, we can view the output of the AST tree produced by the previous expression via the ast module provided in the Standard Library:
p = ast.parse(expr)
ast.dump(p)
# this prints out the following:
"Module(body = [Expr(value = Call(func = Name(id = 'str', ctx = Load()), args = [Str(s = 'hello world')], keywords = []))])"
The output is more user friendly in this case; you can see that the args for the function call is the single concatenated string Hello World.
In addition, I also stumbled upon a cool module that generates a visualization of the tree for ast nodes. Using it, the output of the expression expr is visualized like this:
Image cropped to show only the relevant part for the expression.
As you can see, in the terminal leaf node we have a single str object, the joined string for "Hello " and "World", i.e "Hello World".
If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at Parser/pgen.c while the code transforming the parse tree into an Abstract Syntax Tree is in Python/ast.c.
This information is for Python 3.5 and I'm pretty sure that unless you're using some really old version (< 2.5) the functionality and locations should be similar.
Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video From Source to Code: How CPython's Compiler Works.

What is the difference between multi-line comment and multi-line string in python? [duplicate]

This question already has answers here:
Python comments: # vs. strings
(3 answers)
Closed 6 years ago.
I am still exploring python. Today I came across multi-line strings. If I do:
a = '''
some-text
'''
The content of variable a is '\nsome-text\n'. But this leaves me confused. I always thought that if you enclose something within three single quotes ('''), you are commenting it out. So the above statement would be equivalent to something like this in C++:
a = /*
some-text
*/
What am I missing?
Technically such multi-line-comments enclosed in triple quotes are not really comments but string literals.
The reason why you can still use them to comment stuff out is that a string literal itself does not represent any kind of operation. It gets parsed, but nothing is done with it and it does not get assigned to a variable name, so it gets ignored.
You could also place any other literal into your code. As long as it is not involved in any kind of operation or assignment, it gets basically ignored like a comment. It is not a comment though, just useless code if you want to name it that way.
Here's an example of code that does... well, nothing:
# This is a real comment.
"useless normal string"
"""useless triple-quoted
multi-line
string"""
[1, "two"] # <-- useless list
42 # <-- useless number
I always thought that if you enclose something within three single quotes (''')
This is not the case, actually. Enclosing something in triple quotes '''string''', it creates a string expression which yields a string containing the characters within the quotes. The difference between this and a single quoted string 'string' is that the former can be on multiple lines. People often use this to comment out multiple lines.
However, if you don't assign the string expression to a variable, then you'll get something a lot like a comment.
'''this is
a useless piece of python
text that does nothing for
your program'''
In python, wrapping your code with ''' will encode it as a string, effectively commenting it out unless that code already contains a multi-line string literal ''' anywhere. Then, the string will be terminated.
print('''hello!
How are you?''')
# this will not have the intended comment effect
'''
print('''hello!
How are you?''')
'''

Python string literal concatenation

I can create a multi-line string using this syntax:
string = str("Some chars "
"Some more chars")
This will produce the following string:
"Some chars Some more chars"
Is Python joining these two separate strings or is the editor/compiler treating them as a single string?
P.s: I just want to understand the internals. I know there are other ways to declare or create multi-line strings.
Read the reference manual, it's in there.
Specifically:
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings,
(emphasis mine)
This is why:
string = str("Some chars "
"Some more chars")
is exactly the same as: str("Some chars Some more chars").
This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with str above) et-cetera.
The only caveat is when a string literal is not contained between one of the grouping delimiters (), {} or [] but, instead, spreads between two separate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:
string = "Some chars " \
"Some more chars"
Of course, concatenation of strings on the same physical line does not require the backslash. (string = "Hello " "World" is just fine)
Is Python joining these two separate strings or is the editor/compiler treating them as a single string?
Python is, now when exactly does Python do this is where things get interesting.
From what I could gather (take this with a pinch of salt, I'm not a parsing expert), this happens when Python transforms the parse tree (LL(1) Parser) for a given expression to it's corresponding AST (Abstract Syntax Tree).
You can get a view of the parsed tree via the parser module:
import parser
expr = """
str("Hello "
"World")
"""
pexpr = parser.expr(expr)
parser.st2list(pexpr)
This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in expr:
-- rest snipped for brevity --
[322,
[323,
[3, '"hello"'],
[3, '"world"']]]]]]]]]]]]]]]]]],
-- rest snipped for brevity --
The numbers correspond to either symbols or tokens in the parse tree and the mappings from symbol to grammar rule and token to constant are in Lib/symbol.py and Lib/token.py respectively.
As you can see in the snipped version I added, you have two different entries corresponding to the two different str literals in the expression parsed.
Next, we can view the output of the AST tree produced by the previous expression via the ast module provided in the Standard Library:
p = ast.parse(expr)
ast.dump(p)
# this prints out the following:
"Module(body = [Expr(value = Call(func = Name(id = 'str', ctx = Load()), args = [Str(s = 'hello world')], keywords = []))])"
The output is more user friendly in this case; you can see that the args for the function call is the single concatenated string Hello World.
In addition, I also stumbled upon a cool module that generates a visualization of the tree for ast nodes. Using it, the output of the expression expr is visualized like this:
Image cropped to show only the relevant part for the expression.
As you can see, in the terminal leaf node we have a single str object, the joined string for "Hello " and "World", i.e "Hello World".
If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at Parser/pgen.c while the code transforming the parse tree into an Abstract Syntax Tree is in Python/ast.c.
This information is for Python 3.5 and I'm pretty sure that unless you're using some really old version (< 2.5) the functionality and locations should be similar.
Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video From Source to Code: How CPython's Compiler Works.

Python 2.x: how to automate enforcing unicode instead of string?

How can I automate a test to enforce that a body of Python 2.x code contains no string instances (only unicode instances)?
Eg.
Can I do it from within the code?
Is there a static analysis tool that has this feature?
Edit:
I wanted this for an application in Python 2.5, but it turns out this is not really possible because:
2.5 doesn't support unicode_literals
kwargs dictionary keys can't be unicode objects, only strings
So I'm accepting the answer that says it's not possible, even though it's for different reasons :)
You can't enforce that all strings are Unicode; even with from __future__ import unicode_literals in a module, byte strings can be written as b'...', as they can in Python 3.
There was an option that could be used to get the same effect as unicode_literals globally: the command-line option -U. However it was abandoned early in the 2.x series because it basically broke every script.
What is your purpose for this? It is not desirable to abolish byte strings. They are not “bad” and Unicode strings are not universally “better”; they are two separate animals and you will need both of them. Byte strings will certainly be needed to talk to binary files and network services.
If you want to be prepared to transition to Python 3, the best tack is to write b'...' for all the strings you really mean to be bytes, and u'...' for the strings that are inherently Unicode. The default string '...' format can be used for everything else, places where you don't care and/or whether Python 3 changes the default string type.
It seems to me like you really need to parse the code with an honest to goodness python parser. Then you will need to dig through the AST your parser produces to see if it contains any string literals.
It looks like Python comes with a parser out of the box. From this documentation I got this code sample working:
import parser
from token import tok_name
def checkForNonUnicode(codeString):
return checkForNonUnicodeHelper(parser.suite(codeString).tolist())
def checkForNonUnicodeHelper(lst):
returnValue = True
nodeType = lst[0]
if nodeType in tok_name and tok_name[nodeType] == 'STRING':
stringValue = lst[1]
if stringValue[0] != "u": # Kind of hacky. Does this always work?
print "%s is not unicode!" % stringValue
returnValue = False
else:
for subNode in [lst[n] for n in range(1, len(lst))]:
if isinstance(subNode, list):
returnValue = returnValue and checkForNonUnicodeHelper(subNode)
return returnValue
print checkForNonUnicode("""
def foo():
a = 'This should blow up!'
""")
print checkForNonUnicode("""
def bar():
b = u'although this is ok.'
""")
which prints out
'This should blow up!' is not unicode!
False
True
Now doc strings aren't unicode but should be allowed, so you might have to do something more complicated like from symbol import sym_name where you can look up which node types are for class and function definitions. Then the first sub-node that's simply a string, i.e. not part of an assignment or whatever, should be allowed to not be unicode.
Good question!
Edit
Just a follow up comment. Conveniently for your purposes, parser.suite does not actually evaluate your python code. This means that you can run this parser over your Python files without worrying about naming or import errors. For example, let's say you have myObscureUtilityFile.py that contains
from ..obscure.relative.path import whatever
You can
checkForNonUnicode(open('/whoah/softlink/myObscureUtilityFile.py').read())
Our SD Source Code Search Engine (SCSE) can provide this result directly.
The SCSE provides a way to search extremely quickly across large sets of files using some of the language structure to enable precise queries and minimize false positives. It handles a wide array
of languages, even at the same time, including Python. A GUI shows search hits and a page of actual text from the file containing a selected hit.
It uses lexical information from the source languages as the basis for queries, comprised of various langauge keywords and pattern tokens that match varying content langauge elements. SCSE knows the types of lexemes available in the langauge. One can search for a generic identifier (using query token I) or an identifier matching some regulatr expression. Similar, on can search for a generic string (using query token "S" for "any kind of string literal") or for a specific
type of string (for Python including "UnicodeStrings", non-unicode strings, etc, which collectively make up the set of Python things comprising "S").
So a search:
'for' ... I=ij*
finds the keyword 'for' near ("...") an identifier whose prefix is "ij" and shows you all the hits. (Language-specific whitespace including line breaks and comments are ignored.
An trivial search:
S
finds all string literals. This is often a pretty big set :-}
A search
UnicodeStrings
finds all string literals that are lexically defined as Unicode Strings (u"...")
What you want are all strings that aren't UnicodeStrings. The SCSE provides a "subtract" operator that subtracts hits of one kind that overlap hits of another. So your question, "what strings aren't unicode" is expressed concisely as:
S-UnicodeStrings
All hits shown will be the strings that aren't unicode strings, your precise question.
The SCSE provides logging facilities so that you can record hits. You can run SCSE from a command line, enabling a scripted query for your answer. Putting this into a command script would provide a tool gives your answer directly.

Single quotes vs. double quotes in Python [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
According to the documentation, they're pretty much interchangeable. Is there a stylistic reason to use one over the other?
I like to use double quotes around strings that are used for interpolation or that are natural language messages, and single quotes for small symbol-like strings, but will break the rules if the strings contain quotes, or if I forget. I use triple double quotes for docstrings and raw string literals for regular expressions even if they aren't needed.
For example:
LIGHT_MESSAGES = {
'English': "There are %(number_of_lights)s lights.",
'Pirate': "Arr! Thar be %(number_of_lights)s lights."
}
def lights_message(language, number_of_lights):
"""Return a language-appropriate string reporting the light count."""
return LIGHT_MESSAGES[language] % locals()
def is_pirate(message):
"""Return True if the given message sounds piratical."""
return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None
Quoting the official docs at https://docs.python.org/2.0/ref/strings.html:
In plain English: String literals can be enclosed in matching single quotes (') or double quotes (").
So there is no difference. Instead, people will tell you to choose whichever style that matches the context, and to be consistent. And I would agree - adding that it is pointless to try to come up with "conventions" for this sort of thing because you'll only end up confusing any newcomers.
I used to prefer ', especially for '''docstrings''', as I find """this creates some fluff""". Also, ' can be typed without the Shift key on my Swiss German keyboard.
I have since changed to using triple quotes for """docstrings""", to conform to PEP 257.
I'm with Will:
Double quotes for text
Single quotes for anything that behaves like an identifier
Double quoted raw string literals for regexps
Tripled double quotes for docstrings
I'll stick with that even if it means a lot of escaping.
I get the most value out of single quoted identifiers standing out because of the quotes. The rest of the practices are there just to give those single quoted identifiers some standing room.
If the string you have contains one, then you should use the other. For example, "You're able to do this", or 'He said "Hi!"'. Other than that, you should simply be as consistent as you can (within a module, within a package, within a project, within an organisation).
If your code is going to be read by people who work with C/C++ (or if you switch between those languages and Python), then using '' for single-character strings, and "" for longer strings might help ease the transition. (Likewise for following other languages where they are not interchangeable).
The Python code I've seen in the wild tends to favour " over ', but only slightly. The one exception is that """these""" are much more common than '''these''', from what I have seen.
Triple quoted comments are an interesting subtopic of this question. PEP 257 specifies triple quotes for doc strings. I did a quick check using Google Code Search and found that triple double quotes in Python are about 10x as popular as triple single quotes -- 1.3M vs 131K occurrences in the code Google indexes. So in the multi line case your code is probably going to be more familiar to people if it uses triple double quotes.
"If you're going to use apostrophes,
^
you'll definitely want to use double quotes".
^
For that simple reason, I always use double quotes on the outside. Always
Speaking of fluff, what good is streamlining your string literals with ' if you're going to have to use escape characters to represent apostrophes? Does it offend coders to read novels? I can't imagine how painful high school English class was for you!
Python uses quotes something like this:
mystringliteral1="this is a string with 'quotes'"
mystringliteral2='this is a string with "quotes"'
mystringliteral3="""this is a string with "quotes" and more 'quotes'"""
mystringliteral4='''this is a string with 'quotes' and more "quotes"'''
mystringliteral5='this is a string with \"quotes\"'
mystringliteral6='this is a string with \042quotes\042'
mystringliteral6='this is a string with \047quotes\047'
print mystringliteral1
print mystringliteral2
print mystringliteral3
print mystringliteral4
print mystringliteral5
print mystringliteral6
Which gives the following output:
this is a string with 'quotes'
this is a string with "quotes"
this is a string with "quotes" and more 'quotes'
this is a string with 'quotes' and more "quotes"
this is a string with "quotes"
this is a string with 'quotes'
I use double quotes in general, but not for any specific reason - Probably just out of habit from Java.
I guess you're also more likely to want apostrophes in an inline literal string than you are to want double quotes.
Personally I stick with one or the other. It doesn't matter. And providing your own meaning to either quote is just to confuse other people when you collaborate.
It's probably a stylistic preference more than anything. I just checked PEP 8 and didn't see any mention of single versus double quotes.
I prefer single quotes because its only one keystroke instead of two. That is, I don't have to mash the shift key to make single quote.
In Perl you want to use single quotes when you have a string which doesn't need to interpolate variables or escaped characters like \n, \t, \r, etc.
PHP makes the same distinction as Perl: content in single quotes will not be interpreted (not even \n will be converted), as opposed to double quotes which can contain variables to have their value printed out.
Python does not, I'm afraid. Technically seen, there is no $ token (or the like) to separate a name/text from a variable in Python. Both features make Python more readable, less confusing, after all. Single and double quotes can be used interchangeably in Python.
I chose to use double quotes because they are easier to see.
I just use whatever strikes my fancy at the time; it's convenient to be able to switch between the two at a whim!
Of course, when quoting quote characetrs, switching between the two might not be so whimsical after all...
Your team's taste or your project's coding guidelines.
If you are in a multilanguage environment, you might wish to encourage the use of the same type of quotes for strings that the other language uses, for instance. Else, I personally like best the look of '
None as far as I know. Although if you look at some code, " " is commonly used for strings of text (I guess ' is more common inside text than "), and ' ' appears in hashkeys and things like that.
I aim to minimize both pixels and surprise. I typically prefer ' in order to minimize pixels, but " instead if the string has an apostrophe, again to minimize pixels. For a docstring, however, I prefer """ over ''' because the latter is non-standard, uncommon, and therefore surprising. If now I have a bunch of strings where I used " per the above logic, but also one that can get away with a ', I may still use " in it to preserve consistency, only to minimize surprise.
Perhaps it helps to think of the pixel minimization philosophy in the following way. Would you rather that English characters looked like A B C or AA BB CC? The latter choice wastes 50% of the non-empty pixels.
I use double quotes because I have been doing so for years in most languages (C++, Java, VB…) except Bash, because I also use double quotes in normal text and because I'm using a (modified) non-English keyboard where both characters require the shift key.
' = "
/ = \ = \\
example :
f = open('c:\word.txt', 'r')
f = open("c:\word.txt", "r")
f = open("c:/word.txt", "r")
f = open("c:\\\word.txt", "r")
Results are the same
=>> no, they're not the same.
A single backslash will escape characters. You just happen to luck out in that example because \k and \w aren't valid escapes like \t or \n or \\ or \"
If you want to use single backslashes (and have them interpreted as such), then you need to use a "raw" string. You can do this by putting an 'r' in front of the string
im_raw = r'c:\temp.txt'
non_raw = 'c:\\temp.txt'
another_way = 'c:/temp.txt'
As far as paths in Windows are concerned, forward slashes are interpreted the same way. Clearly the string itself is different though. I wouldn't guarantee that they're handled this way on an external device though.

Categories