Related
I am trying to figure out if the different quote types make a difference functionally. I have seen people say its preference for "" or '' but what about """ """? I tested it in a simple code to see if it would work and it does. I was wondering if """ triple quotes """ have a functional purpose for defined function arguments or is it just another quote option that can be used interchangeably like "" and ''?
As I have seen many people post about "" and '' I have not seen a post about """ """ or ''' ''' being used in functions.
My question is: Does the triple quote have a unique use as an argument or is it simply interchangeable with "" and ''? The reason I think it might have a unique function is because it is a multi line quote and I was wondering if it would allow a multi line argument to be submitted. I am not sure if something like that would even be useful but it could be.
Here is an example that prints out what you would expect using all the quote types I know of.
def myFun(var1="""different""",var2="quote",var3='types'):
return var1, var2, var3
print (myFun('All','''for''','one!'))
Result:
('All', 'for', 'one!')
EDIT:
After some more testing of the triple quote I did find some variation in how it works using return vs printing in the function.
def myFun(var1="""different""",var2="""quote""",var3='types'):
return (var1, var2, var3)
print(myFun('This',
'''Can
Be
Multi''',
'line!'))
Result:
('This', 'Can\nBe\nMulti', 'line!')
Or:
def myFun(var1="""different""",var2="""quote""",var3='types'):
print (var1, var2, var3)
myFun('This',
'''Can
Be
Multi''',
'line!')
Result:
This Can
Be
Multi line!
From the docs:
String literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). [...other rules applying identically to all string literal types omitted...]
In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A “quote” is the character used to open the string, i.e. either ' or ".)
Thus, triple-quoted string literals can span multiple lines, and can contain literal quotes without use of escape sequences, but are otherwise exactly identical to string literals expressed with other quoting types (including those using escape sequences such as \n or \' to express the same content).
Also see the Python 3 documentation: Bytes and String Literals -- which expresses an effectively identical set of rules with slightly different verbiage.
A more gentle introduction is also available in the language tutorial, which explicitly introduces triple-quotes as a way to permit strings to span multiple lines:
String literals can span multiple lines. One way is using triple-quotes: """...""" or '''...'''. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line. The following example:
print("""\
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
""")
produces the following output (note that the initial newline is not included):
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
To be clear, though: These are different syntax, but the string literals they create are indistinguishable from each other. That is to say, given the following code:
s1 = '''foo
'bar'
baz
'''
s2 = 'foo\n\'bar\'\nbaz\n'
there's no possible way to tell s1 and s2 apart from each other by looking at their values: s1 == s2 is true, and so is repr(s1) == repr(s2). The Python interpreter is even allowed to intern them to the same value, so it may (or may not) make id(s1) == id(s2) true depending on details (such as whether the code was run at the REPL or imported as a module).
FWIW, my understanding is that there's a convention whereby """ """, ''' ''' are used for docstring, which is kinda like a #comment, but is a recallable attribute that can be referenced later. https://www.python.org/dev/peps/pep-0257/
I'm a beginner too, but my understanding is that using triple quotes for strings is not the best idea, even if there's little functional difference with what you're doing currently (I don't know if there might be later). Sticking with conventions is helpful to others reading and using your code, and it seems to be a good rule of thumb that some conventions will bite you if you don't follow them, as in this case where a mal-formatted string with triple quotes will be interpreted as a docstring, and maybe not throw an error, and you'll need to search through a bunch of code to find the issue.
How should i do real escaping in Python for SQLite3?
If i google for it (or search stackoverflow) there are tons of questions for this and every time the response is something like:
dbcursor.execute("SELECT * FROM `foo` WHERE `bar` like ?", ["foobar"])
This helps against SQL-Injections, and is enough if i would do just comperations with "=" but it doesn't stripe Wildcards of course.
So if i do
cursor.execute(u"UPDATE `cookies` set `count`=? WHERE `nickname` ilike ?", (cookies, name))
some user could supply "%" for a nickname and would replace all of the cookie-entries with one line.
I could filter it myself (ugh… i probably will forget one of those lesser-known wildcards anyway), i could use lowercase on nick and nickname and replace "ilike" with "=", but what i would really like to do would be something along the lines of:
foo = sqlescape(nick)+"%"
cursor.execute(u"UPDATE `cookies` set `count`=? WHERE `nickname` ilike ?", (cookies, foo))
? parameters are intended to avoid formatting problems for SQL strings (and other problematic data types like floating-point numbers and blobs).
LIKE/GLOB wildcards work on a different level; they are always part of the string itself.
SQL allows to escape them, but there is no default escape character; you have to choose some with the ESCAPE clause:
escaped_foo = my_like_escape(foo, "\\")
c.execute("UPDATE cookies SET count = ? WHERE nickname LIKE ? ESCAPE '\',
(cookies, escaped_foo))
(And you have to write your own my_like_escape function for % and _ (LIKE) or * and ? (GLOB).)
You've avoided outright code injection by using parametrized queries. Now it seems you're trying to do a pattern match with user-supplied data, but you want the user-supplied portion of the data to be treated as literal data (hence no wildcards). You have several options:
Just filter the input. SQLite's LIKE only understands % and _ as wildcards, so it's pretty hard to get it wrong. Just make sure to always filter inputs. (My preferred method: Filter just before the query is constructed, not when user input is read).
In general, a "whitelist" approach is considered safer and easier than removing specific dangerous characters. That is, instead of deleting % and _ from your string (and any "lesser-known wildcards", as you say), scan your string and keep only the characters you want. E.g., if your "nicknames" can contain ASCII letters, digits, "-" and ".", it can be sanitized like this:
name = re.sub(r"[^A-Za-z\d.-]", "", name)
This solution is specific to the particula field you are matching against, and works well for key fields and other identifiers. I would definitely do it this way if I had to search with RLIKE, which accepts full regular expressions so there are a lot more characters to watch out for.
If you don't want the user to be able to supply a wildcard, why would you use LIKE in your query anyway? If the inputs to your queries come from many places in the code (or maybe you're even writing a library), you'll make your query safer if you can avoid LIKE altogether:
Here's case insensitive matching:
SELECT * FROM ... WHERE name = 'someone' COLLATE NOCASE
In your example you use prefix matching ("sqlescape(nick)+"%""). Here's how to do it with exact search:
size = len(nick)
cursor.execute(u"UPDATE `cookies` set `count`=? WHERE substr(`nickname`, 1, ?) = ?",
(cookies, size, nick))
Ummm, normally you'd want just replace 'ilike' with normal '=' comparison that doesn't interpret '%' in any special way. Escaping (effectively blacklisting of bad patterns) is error prone, e.g. even if you manage to escape all known patterns in the version of sqlLite you use, any future upgrade can put you at risk, etc..
It's not clear to me why you'd want to mass-update cookies based on a fuzzy match on user name.
If you really want to do that, my preferred approach would be to SELECT the list first and decide what to UPDATE at the application level to maintain a maximum level of control.
There are several very fun ways to do this with string format-ing.
From Python's Documentation:
The built-in str and unicode classes provide the ability to do complex variable substitutions and value formatting via the str.format() method:
s = "string"
c = "Cool"
print "This is a {0}. {1}, huh?".format(s,c)
#=> This is a string. Cool, huh?
Other nifty tricks you can do with string formatting:
"First, thou shalt count to {0}".format(3) # References first positional argument
"Bring me a {}".format("shrubbery!") # Implicitly references the first positional argument
"From {} to {}".format('Africa','Mercia') # Same as "From {0} to {1}"
"My quest is {name}" # References keyword argument 'name'
"Weight in tons {0.weight}" # 'weight' attribute of first positional arg
"Units destroyed: {players[0]}" # First element of keyword argument 'players'.`
I faced an error on "bad group name".
Here is the code:
for qitem in q['display']:
if qitem['type'] == 1:
for keyword in keywordTags.split('|'):
p = re.compile('^' + keyword + '$')
newstring=''
for word in qitem['value'].split():
if word[-1:] == ',':
word = word[0:len(word)-1]
newstring += (p.sub('<b>'+word+'</b>', word) + ', ')
else:
newstring += (p.sub('<b>'+word+'</b>', word) + ' ')
qitem['value']=newstring
And here's the error:
error at /result/1/
bad group name
Request Method: GET
Django Version: 1.4.1
Exception Type: error
Exception Value: bad group name
Exception Location: C:\Python27\lib\re.py in _compile_repl, line 257
Python Executable: C:\Python27\python.exe
Python Version: 2.7.3 Python
Path: ['D:\ExamPapers', 'C:\Windows\SYSTEM32\python27.zip',
'C:\Python27\DLLs', 'C:\Python27\lib',
'C:\Python27\lib\plat-win', 'C:\Python27\lib\lib-tk',
'C:\Python27', 'C:\Python27\lib\site-packages']
Server time: Sun,3 Mar 2013 15:31:05 +0800
Traceback Switch to copy-and-paste view
C:\Python27\lib\site-packages\django\core\handlers\base.py in get_response
response = callback(request, *callback_args, **callback_kwargs) ... ▶ Local vars ?
D:\ExamPapers\views.py in result
newstring += (p.sub(''+word+'', word) + ' ') ... ▶ Local vars
In summary, the error is at:
newstring += (p.sub('<b>'+word+'</b>', word) + ' ')
So you're trying to highlight in bold an occurrence of a set of keywords. Right now this code is broken in quite a lot of ways. You're using the re module right now to match the keywords but you're also breaking the keywords and the strings down into individual words, you don't need to do both and the interaction between these two different approaches to the solving the problem are what is causing you issues.
You can use regular expressions to match multiple possible strings at the same time, that's what they're good for! So instead of "^keyword$" to match just "keyword" you could use "^keyword|hello$" to match either "keyword" or "hello". You also use the ^ and $ characters which only match the beginning or end of the entire string, but what you probably wanted originally was to match the beginning or end of words, for this you can use \b like this r"\b(keyword|hello)\b". Note that in the last example I added a r character before the string, this stands for "raw" and turns off pythons usual handling of back slash characters which conflicts with regular expressions, it's good practice to always use the r before the string when the string contains a regular expression. I also used brackets to group together the words.
The regular expression sub method allows you to substitute things matched by a regular expression with another string. It also allow you to make "back references" in the replacing string that include parts of original string that matched. The parts that it includes are called "groups" and are indicated with brackets in the original regular expression, in the example above there is only one set of brackets and these are the first so they're indicated by the back reference \1. The cause of the actual error message you asked about is that your replacement string contained what looked like a backref but there weren't any groups in your regular expression.
Using that you do something like this:
keywordMatcher = re.compile(r"\b(keyword|hello)\b")
value = keywordMatcher.sub(r"<b>\1</b>", value)
Another thing that isn't directly related to what you're asking but is incredibly important is that you are taking source plain text strings (I assume) and making them into HTML, this gives a lot of chance for script injection vulnerabilities which if you don't take the time to understand and avoid will allow bad guys to hack the applications you build (they can do this in an automated way, so even if you think your app will be too small for anyone to notice it can still get hacked and used for all sorts of bad things, don't let this happen!). The basic rule is that it's ok to convert text to HTML but you need to "escape" it first, this is very simple:
from django.utils import html
html_safe = html.escape(my_text)
All this does is convert characters like < to < which the browser will show as < but won't interpret as the beginning of a tag. So if a bad guy types <script> into one of your forms and it gets processed by your code it will display it as <script> and not execute it as a script.
Likewise, if you use an text in a regular expression that you don't intend to have special regular expression characters then you must escape that too! You can do this using re.escape:
import re
my_regexp = re.compile(r"\b%s\b" % (re.escape(my_word),))
Ok, so now we've got that out of the way here is a method you could use to do what you wanted:
value = "this is my super duper testing thingy"
keywords = "super|my|test"
from django.utils import html
import re
# first we must split up the keywords
keywords = keywords.split("|")
# Next we must make each keyword safe for use in a regular expression,
# this is similar to the HTML escaping we discussed above but not to
# be confused with it.
keywords = [re.escape(k) for k in keywords]
# Now we reform the keywordTags string, but this time we know each keyword is regexp-safe
keywords = "|".join(keywords)
# Finally we create a regular expression that matches *any* of the keywords
keywordMatcher = re.compile(r'\b(%s)\b' % (keywords,))
# We are going to make the value into HTML (by adding <b> tags) so must first escape it
value = html.escape(value)
# We can then apply the regular expression to the value. We use a "back reference" `\0` to say
# that each keyword found should be replace with itself wrapped in a <b> tag
value = keywordMatcher.sub(r"<b>\1</b>", value)
print value
I urge you to take the time to understand what this does, otherwise you're just going to get yourself into a mess! It's always easier to just cut and paste and move on but this leads to crappy broken code and worse of all means you yourself don't improve and don't learn. All great coders started of as beginner coders who took the time to understand things :)
How can I automate a test to enforce that a body of Python 2.x code contains no string instances (only unicode instances)?
Eg.
Can I do it from within the code?
Is there a static analysis tool that has this feature?
Edit:
I wanted this for an application in Python 2.5, but it turns out this is not really possible because:
2.5 doesn't support unicode_literals
kwargs dictionary keys can't be unicode objects, only strings
So I'm accepting the answer that says it's not possible, even though it's for different reasons :)
You can't enforce that all strings are Unicode; even with from __future__ import unicode_literals in a module, byte strings can be written as b'...', as they can in Python 3.
There was an option that could be used to get the same effect as unicode_literals globally: the command-line option -U. However it was abandoned early in the 2.x series because it basically broke every script.
What is your purpose for this? It is not desirable to abolish byte strings. They are not “bad” and Unicode strings are not universally “better”; they are two separate animals and you will need both of them. Byte strings will certainly be needed to talk to binary files and network services.
If you want to be prepared to transition to Python 3, the best tack is to write b'...' for all the strings you really mean to be bytes, and u'...' for the strings that are inherently Unicode. The default string '...' format can be used for everything else, places where you don't care and/or whether Python 3 changes the default string type.
It seems to me like you really need to parse the code with an honest to goodness python parser. Then you will need to dig through the AST your parser produces to see if it contains any string literals.
It looks like Python comes with a parser out of the box. From this documentation I got this code sample working:
import parser
from token import tok_name
def checkForNonUnicode(codeString):
return checkForNonUnicodeHelper(parser.suite(codeString).tolist())
def checkForNonUnicodeHelper(lst):
returnValue = True
nodeType = lst[0]
if nodeType in tok_name and tok_name[nodeType] == 'STRING':
stringValue = lst[1]
if stringValue[0] != "u": # Kind of hacky. Does this always work?
print "%s is not unicode!" % stringValue
returnValue = False
else:
for subNode in [lst[n] for n in range(1, len(lst))]:
if isinstance(subNode, list):
returnValue = returnValue and checkForNonUnicodeHelper(subNode)
return returnValue
print checkForNonUnicode("""
def foo():
a = 'This should blow up!'
""")
print checkForNonUnicode("""
def bar():
b = u'although this is ok.'
""")
which prints out
'This should blow up!' is not unicode!
False
True
Now doc strings aren't unicode but should be allowed, so you might have to do something more complicated like from symbol import sym_name where you can look up which node types are for class and function definitions. Then the first sub-node that's simply a string, i.e. not part of an assignment or whatever, should be allowed to not be unicode.
Good question!
Edit
Just a follow up comment. Conveniently for your purposes, parser.suite does not actually evaluate your python code. This means that you can run this parser over your Python files without worrying about naming or import errors. For example, let's say you have myObscureUtilityFile.py that contains
from ..obscure.relative.path import whatever
You can
checkForNonUnicode(open('/whoah/softlink/myObscureUtilityFile.py').read())
Our SD Source Code Search Engine (SCSE) can provide this result directly.
The SCSE provides a way to search extremely quickly across large sets of files using some of the language structure to enable precise queries and minimize false positives. It handles a wide array
of languages, even at the same time, including Python. A GUI shows search hits and a page of actual text from the file containing a selected hit.
It uses lexical information from the source languages as the basis for queries, comprised of various langauge keywords and pattern tokens that match varying content langauge elements. SCSE knows the types of lexemes available in the langauge. One can search for a generic identifier (using query token I) or an identifier matching some regulatr expression. Similar, on can search for a generic string (using query token "S" for "any kind of string literal") or for a specific
type of string (for Python including "UnicodeStrings", non-unicode strings, etc, which collectively make up the set of Python things comprising "S").
So a search:
'for' ... I=ij*
finds the keyword 'for' near ("...") an identifier whose prefix is "ij" and shows you all the hits. (Language-specific whitespace including line breaks and comments are ignored.
An trivial search:
S
finds all string literals. This is often a pretty big set :-}
A search
UnicodeStrings
finds all string literals that are lexically defined as Unicode Strings (u"...")
What you want are all strings that aren't UnicodeStrings. The SCSE provides a "subtract" operator that subtracts hits of one kind that overlap hits of another. So your question, "what strings aren't unicode" is expressed concisely as:
S-UnicodeStrings
All hits shown will be the strings that aren't unicode strings, your precise question.
The SCSE provides logging facilities so that you can record hits. You can run SCSE from a command line, enabling a scripted query for your answer. Putting this into a command script would provide a tool gives your answer directly.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
According to the documentation, they're pretty much interchangeable. Is there a stylistic reason to use one over the other?
I like to use double quotes around strings that are used for interpolation or that are natural language messages, and single quotes for small symbol-like strings, but will break the rules if the strings contain quotes, or if I forget. I use triple double quotes for docstrings and raw string literals for regular expressions even if they aren't needed.
For example:
LIGHT_MESSAGES = {
'English': "There are %(number_of_lights)s lights.",
'Pirate': "Arr! Thar be %(number_of_lights)s lights."
}
def lights_message(language, number_of_lights):
"""Return a language-appropriate string reporting the light count."""
return LIGHT_MESSAGES[language] % locals()
def is_pirate(message):
"""Return True if the given message sounds piratical."""
return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None
Quoting the official docs at https://docs.python.org/2.0/ref/strings.html:
In plain English: String literals can be enclosed in matching single quotes (') or double quotes (").
So there is no difference. Instead, people will tell you to choose whichever style that matches the context, and to be consistent. And I would agree - adding that it is pointless to try to come up with "conventions" for this sort of thing because you'll only end up confusing any newcomers.
I used to prefer ', especially for '''docstrings''', as I find """this creates some fluff""". Also, ' can be typed without the Shift key on my Swiss German keyboard.
I have since changed to using triple quotes for """docstrings""", to conform to PEP 257.
I'm with Will:
Double quotes for text
Single quotes for anything that behaves like an identifier
Double quoted raw string literals for regexps
Tripled double quotes for docstrings
I'll stick with that even if it means a lot of escaping.
I get the most value out of single quoted identifiers standing out because of the quotes. The rest of the practices are there just to give those single quoted identifiers some standing room.
If the string you have contains one, then you should use the other. For example, "You're able to do this", or 'He said "Hi!"'. Other than that, you should simply be as consistent as you can (within a module, within a package, within a project, within an organisation).
If your code is going to be read by people who work with C/C++ (or if you switch between those languages and Python), then using '' for single-character strings, and "" for longer strings might help ease the transition. (Likewise for following other languages where they are not interchangeable).
The Python code I've seen in the wild tends to favour " over ', but only slightly. The one exception is that """these""" are much more common than '''these''', from what I have seen.
Triple quoted comments are an interesting subtopic of this question. PEP 257 specifies triple quotes for doc strings. I did a quick check using Google Code Search and found that triple double quotes in Python are about 10x as popular as triple single quotes -- 1.3M vs 131K occurrences in the code Google indexes. So in the multi line case your code is probably going to be more familiar to people if it uses triple double quotes.
"If you're going to use apostrophes,
^
you'll definitely want to use double quotes".
^
For that simple reason, I always use double quotes on the outside. Always
Speaking of fluff, what good is streamlining your string literals with ' if you're going to have to use escape characters to represent apostrophes? Does it offend coders to read novels? I can't imagine how painful high school English class was for you!
Python uses quotes something like this:
mystringliteral1="this is a string with 'quotes'"
mystringliteral2='this is a string with "quotes"'
mystringliteral3="""this is a string with "quotes" and more 'quotes'"""
mystringliteral4='''this is a string with 'quotes' and more "quotes"'''
mystringliteral5='this is a string with \"quotes\"'
mystringliteral6='this is a string with \042quotes\042'
mystringliteral6='this is a string with \047quotes\047'
print mystringliteral1
print mystringliteral2
print mystringliteral3
print mystringliteral4
print mystringliteral5
print mystringliteral6
Which gives the following output:
this is a string with 'quotes'
this is a string with "quotes"
this is a string with "quotes" and more 'quotes'
this is a string with 'quotes' and more "quotes"
this is a string with "quotes"
this is a string with 'quotes'
I use double quotes in general, but not for any specific reason - Probably just out of habit from Java.
I guess you're also more likely to want apostrophes in an inline literal string than you are to want double quotes.
Personally I stick with one or the other. It doesn't matter. And providing your own meaning to either quote is just to confuse other people when you collaborate.
It's probably a stylistic preference more than anything. I just checked PEP 8 and didn't see any mention of single versus double quotes.
I prefer single quotes because its only one keystroke instead of two. That is, I don't have to mash the shift key to make single quote.
In Perl you want to use single quotes when you have a string which doesn't need to interpolate variables or escaped characters like \n, \t, \r, etc.
PHP makes the same distinction as Perl: content in single quotes will not be interpreted (not even \n will be converted), as opposed to double quotes which can contain variables to have their value printed out.
Python does not, I'm afraid. Technically seen, there is no $ token (or the like) to separate a name/text from a variable in Python. Both features make Python more readable, less confusing, after all. Single and double quotes can be used interchangeably in Python.
I chose to use double quotes because they are easier to see.
I just use whatever strikes my fancy at the time; it's convenient to be able to switch between the two at a whim!
Of course, when quoting quote characetrs, switching between the two might not be so whimsical after all...
Your team's taste or your project's coding guidelines.
If you are in a multilanguage environment, you might wish to encourage the use of the same type of quotes for strings that the other language uses, for instance. Else, I personally like best the look of '
None as far as I know. Although if you look at some code, " " is commonly used for strings of text (I guess ' is more common inside text than "), and ' ' appears in hashkeys and things like that.
I aim to minimize both pixels and surprise. I typically prefer ' in order to minimize pixels, but " instead if the string has an apostrophe, again to minimize pixels. For a docstring, however, I prefer """ over ''' because the latter is non-standard, uncommon, and therefore surprising. If now I have a bunch of strings where I used " per the above logic, but also one that can get away with a ', I may still use " in it to preserve consistency, only to minimize surprise.
Perhaps it helps to think of the pixel minimization philosophy in the following way. Would you rather that English characters looked like A B C or AA BB CC? The latter choice wastes 50% of the non-empty pixels.
I use double quotes because I have been doing so for years in most languages (C++, Java, VB…) except Bash, because I also use double quotes in normal text and because I'm using a (modified) non-English keyboard where both characters require the shift key.
' = "
/ = \ = \\
example :
f = open('c:\word.txt', 'r')
f = open("c:\word.txt", "r")
f = open("c:/word.txt", "r")
f = open("c:\\\word.txt", "r")
Results are the same
=>> no, they're not the same.
A single backslash will escape characters. You just happen to luck out in that example because \k and \w aren't valid escapes like \t or \n or \\ or \"
If you want to use single backslashes (and have them interpreted as such), then you need to use a "raw" string. You can do this by putting an 'r' in front of the string
im_raw = r'c:\temp.txt'
non_raw = 'c:\\temp.txt'
another_way = 'c:/temp.txt'
As far as paths in Windows are concerned, forward slashes are interpreted the same way. Clearly the string itself is different though. I wouldn't guarantee that they're handled this way on an external device though.