I am writing a Python function (2.7) where one of the input parameters is a string. The string value should be one of a few values.
i.e. the parameter, in this case "sort" could be any from the list ['first author' 'last author' 'title'] ( the list is actually a bit longer)
Is there a nice way of helping the user supply this input (via tab completion). The ultimate goal is to try and improve the interface for the user so that it is easier for them to use my code and to make less mistakes.
Option 1: Define constants
This might be the best option but there are multiple ways of doing this. My approach is below.
I'm considering creating constants in the module in which the function resides, but this seems messy. My current solution is to include a class with the options, but this makes the input really long, unless I cheat and make the class something nonsensical and short (which is what I've done but it seems wrong and non-obvious). What should I do?
Option 2: Via interpreted documentation
NOTE: I don't think this option actually exists ...
I like the idea of placing code in the documentation which editors could use to perform tab-complete, but I'm not aware of this option existing.
Other options?
Current code (roughly):
class C():
SORT_FIRST_AUTHOR = 'first author'
SORT_LAST_AUTHOR = 'last author'
SORT_TITLE = 'title'
#Ideally these might be selected by: my_module.search.sort_option.FIRST_AUTHOR instead of my_module.C.FIRST_AUTHOR but that's REALLY long
def search(query_string,sort = None): #Changed sort = [], to sort = None per comments
UPDATE: (Trying to clarify the question)
This is somewhat similar to this question, although I think the answers diverge from what I'm looking for because of the specifics of the question.
Pythonic way to have a choice of 2-3 options as an argument to a function
Ideally, I would like the definition of search() to allow tab completion on "sort" by having specifications, something like:
def search(....):
"""
:param sort:
:values sort: 'first author'|'last author'|'title'
"""
But, I'm not aware of that actually being a valid documentation option.
alternatively, I could define some constants that allows this so that the call becomes:
search(my_a,my_b,package_name.C.SORT_FIRST_AUTHOR)
But this seems ugly and wrong.
So when you want the user to pass in one out of a set of options, how do you do it?
Relying on a user to correctly type a string literal is usually a bad idea as it can result in silent errors if the string contains a typo.
It is common to define uppercase variables at the module level, for instance the socket module has
socket.SOCK_STREAM
socket.SOCK_DGRAM
socket.SOCK_RAW
These are passed to functions as parameters, and could easily contain string values.
"Ideally these might be selected by: my_module.search.sort_option.FIRST_AUTHOR instead of my_module.C.FIRST_AUTHOR but that's REALLY long"
Yes that first option seems a bit long. mymodule.SORT_FIRST_AUTHOR seems reasonable though, and using class variables could be appropriate if the constants are constrained to that class and/or if the class is likely to be imported from the module independently.
from mymodule import Searcher
s = Searcher(..., Searcher.SORT_FIRST_AUTHOR)
Prefixing the constants with a common term (like SORT) will aide editors when tab completing entries and makes the constants more descriptive.
You could also consider using a namedtuple
>>> SortOpts = namedtuple('SortOpts', ('FIRST_AUTHOR', 'LAST_AUTHOR'))
>>> SORT_OPT = SortOpts(FIRST_AUTHOR=0, LAST_AUTHOR=1)
>>> SORT_OPT.FIRST_AUTHOR
0
>>> SORT_OPT.FIRST_AUTHOR = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
This has the advantage of protecting the attributes from being changed, however I don't think this will work with tab completion in most editors.
Related
I'm trying to learn Python, and, while I managed to stumble on the answer to my current problem, I'd like to know how I can better find answers in the future.
My goal was to take a list of strings as input, and return a string whose characters were the union of the characters in the strings, e.g.
unionStrings( ("ab", "bc"))
would return "abc".
I implemented it like this:
def unionStrings( strings ):
# Input: A list of strings
# Output: A string that is the (set) union of input strings
all = set()
for s in strings:
all = all.union(set(s))
return "".join(sorted(list(all)))
I felt the for loop was unnecessary, and searched for more neater, more pythonic(?), improvements .
First question: I stumbled on using the class method set.union(), instead of set1.union(set2). Should I have been able to find this in the standard Python docs? I've not been able to find it there.
So I tried using set.union() like this:
>>> set.union( [set(x) for x in ("ab","bc")] )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'union' requires a 'set' object but received a 'list'
Again, I stumbled around and finally found that I should be calling it like this:
>>> set.union( *[set(x) for x in ("ab","bc")] )
set(['a', 'c', 'b'])
Second question: I think this means that set.union is (effectively) declared as
set.union( *sets)
and not
set.union( setsList )
Is that correct? (I'm still learning how to use splat '*'.)
Third question: Where could I find documentation on the signature of set.union()? I didn't see it in the set/freezeset doc's, and I couldn't get the inspect module to give me anything. I'm not even sure set is a module, it seems to be a type. Is it defined in a module, or what?
Thanks for reading my complicated question. It's more "How do I navigate Python documentation?" than "How do I do this in Python code?".
Responding to jonrsharpe's comment:
Ohhhhh! I'm so used to C++ where you define separate static and instance methods. Now that you explain it I can really see what's happening.
The only thing I might do different is write it as
t = set.union( *[set(x) for x strings] )
return "".join(sorted(t))
because it bugs me to treat strings[0] differently from the strings in strings[1:] when, functionally, they don't play different roles. If I have to call set() on one of them, I'd rather call it on all of them, since union() is going to do it anyways. But that's just style, right?
There are several questions here. Firstly, you should know that:
Class.method(instance, arg)
is equivalent to:
instance.method(arg)
for instance methods. You can call the method on the class and explicitly provide the instance, or just call it on the instance.
For historical reasons, many of the standard library and built-in types don't follow the UppercaseWords convention for class names, but they are classes. Therefore
set.union(aset, anotherset)
is the same as
aset.union(anotherset)
set methods specifically can be tricky, because of the way they're often used. set.method(arg1, arg2, ...) requires arg1 to already be a set, the instance for the method, but all the other arguments will be converted (from 2.6 on).
This isn't directly covered in the set docs, because it's true for everything; Python is pretty consistent.
In terms of needing to "splat", note that the docs say:
union(other, ...)
rather than
union(others)
i.e. each iterable is a separate argument, hence you need to unpack your list of iterables.
Your function could therefore be:
def union_strings(strings):
if not strings:
return ""
return "".join(sorted(set(strings[0]).union(*strings[1:])))
or, avoiding the special-casing of strings[0]:
def union_strings(strings):
if not strings:
return ""
return "".join(sorted(set.union(*map(set, strings))))
so i know this is a bit of a workaround and theres probably a better way to do this, but heres the deal. Ive simplified the code from where tis gathering this info from and just given solid values.
curSel = nuke.selectedNodes()
knobToChange = "label"
codeIn = "[value in]"
kcPrefix = "x"
kcStart = "['"
kcEnd = "']"
changerString = kcPrefix+kcStart+knobToChange+kcEnd
for x in curSel:
changerString.setValue(codeIn)
But i get the error i figured i would - which is that a string has no attribute "setValue"
its because if i just type x['label'] instead of changerString, it works, but even though changer string says the exact same thing, its being read as a string instead of code.
Any ideas?
It looks like you're looking for something to evaluate the string into a python object based on your current namespace. One way to do that would be to use the globals dictionary:
globals()['x']['label'].setValue(...)
In other words, globals()['x']['label'] is the same thing as x['label'].
Or to spell it out explicitly for your case:
globals()[kcPrefix][knobToChange].setValue(codeIn)
Others might suggest eval:
eval('x["label"]').setValue(...) #insecure and inefficient
but globals is definitely a better idea here.
Finally, usually when you want to do something like this, you're better off using a dictionary or some other sort of data structure in the first place to keep your data more organized
Righto, there's two things you're falling afoul of. Firstly, in your original code where you are trying to do the setValue() call on a string you're right in that it won't work. Ideally use one of the two calls (x.knob('name_of_the_knob') or x['name_of_the_knob'], whichever is consistent with your project/facility/personal style) to get and set the value of the knob object.
From the comments, your code would look like this (my comments added for other people who aren't quite as au fait with Nuke):
# select all the nodes
curSel = nuke.selectedNodes()
# nuke.thisNode() returns the script's context
# i.e. the node from which the script was invoked
knobToChange = nuke.thisNode()['knobname'].getValue()
codeIn = nuke.thisNode()['codeinput'].getValue()
for x in curSel:
x.knob(knobToChange).setValue(codeIn)
Using this sample UI with the values in the two fields as shown and the button firing off the script...
...this code is going to give you an error message of 'Nothing is named "foo"' when you execute it because the .getValue() call is actually returning you the evaluated result of the knob - which is the error message as it tries to execute the TCL [value foo], and finds that there isn't any object named foo.
What you should ideally do is instead invoke .toScript() which returns the raw text.
# select all the nodes
curSel = nuke.selectedNodes()
# nuke.thisNode() returns the script's context
# i.e. the node from which the script was invoked
knobToChange = nuke.thisNode()['knobname'].toScript()
codeIn = nuke.thisNode()['codeinput'].toScript()
for x in curSel:
x.knob(knobToChange).setValue(codeIn)
You can sidestep this problem as you've noted by building up a string, adding in square brackets etc etc as per your original code, but yes, it's a pain, a maintenance nightmare, and starting to go down that route of building objects up from strings (which #mgilson explains how to do in both a globals() or eval() method)
For those who haven't had the joy of working with Nuke, here's a small screencap that may (or may not..) provide more context:
There are many questions on SO about using Python's eval on insecure strings (eg.: Security of Python's eval() on untrusted strings?, Python: make eval safe). The unanimous answer is that this is a bad idea.
However, I found little information on which strings can be considered safe (if any).
Now I'm wondering if there is a definition of "safe strings" available (eg.: a string that only contains lower case ascii chars or any of the signs +-*/()). The exploits I found generally relied on either of _.,:[]'" or the like. Can such an approach be secure (for use in a graph painting web application)?
Otherwise, I guess using a parsing package as Alex Martelli suggested is the only way.
EDIT:
Unfortunately, there are neither answers that give a compelling explanation for why/ how the above strings are to be considered insecure (a tiny working exploit) nor explanations for the contrary. I am aware that using eval should be avoided, but that's not the question. Hence, I'll award a bounty to the first who comes up with either a working exploit or a really good explanation why a string mangled as described above is to be considered (in)secure.
Here you have a working "exploit" with your restrictions in place - only contains lower case ascii chars or any of the signs +-*/() .
It relies on a 2nd eval layer.
def mask_code( python_code ):
s="+".join(["chr("+str(ord(i))+")" for i in python_code])
return "eval("+s+")"
bad_code='''__import__("os").getcwd()'''
masked= mask_code( bad_code )
print masked
print eval(bad_code)
output:
eval(chr(111)+chr(115)+chr(46)+chr(103)+chr(101)+chr(116)+chr(99)+chr(119)+chr(100)+chr(40)+chr(41))
/home/user
This is a very trivial "exploit". I'm sure there's countless others, even with further character restrictions.
It bears repeating that one should always use a parser or ast.literal_eval(). Only by parsing the tokens can one be sure the string is safe to evaluate. Anything else is betting against the house.
No, there isn't, or at least, not a sensible, truly secure way. Python is a highly dynamic language, and the flipside of that is that it's very easy to subvert any attempt to lock the language down.
You either need to write your own parser for the subset you want, or use something existing, like ast.literal_eval(), for particular cases as you come across them. Use a tool designed for the job at hand, rather than trying to force an existing one to do the job you want, badly.
Edit:
An example of two strings, that, while fitting your description, if eval()ed in order, would execute arbitrary code (this particular example running evil.__method__().
"from binascii import *"
"eval(unhexlify('6576696c2e5f5f6d6574686f645f5f2829'))"
An exploit similar to goncalopp's but that also satisfy the restriction that the string 'eval' is not a substring of the exploit:
def to_chrs(text):
return '+'.join('chr(%d)' % ord(c) for c in text)
def _make_getattr_call(obj, attr):
return 'getattr(*(list(%s for a in chr(1)) + list(%s for a in chr(1))))' % (obj, attr)
def make_exploit(code):
get = to_chrs('get')
builtins = to_chrs('__builtins__')
eval = to_chrs('eval')
code = to_chrs(code)
return (_make_getattr_call(
_make_getattr_call('globals()', '{get}') + '({builtins})',
'{eval}') + '({code})').format(**locals())
It uses a combination of genexp and tuple unpacking to call getattr with two arguments without using the comma.
An example usage:
>>> exploit = make_exploit('__import__("os").system("echo $PWD")')
>>> print exploit
getattr(*(list(getattr(*(list(globals() for a in chr(1)) + list(chr(103)+chr(101)+chr(116) for a in chr(1))))(chr(95)+chr(95)+chr(98)+chr(117)+chr(105)+chr(108)+chr(116)+chr(105)+chr(110)+chr(115)+chr(95)+chr(95)) for a in chr(1)) + list(chr(101)+chr(118)+chr(97)+chr(108) for a in chr(1))))(chr(95)+chr(95)+chr(105)+chr(109)+chr(112)+chr(111)+chr(114)+chr(116)+chr(95)+chr(95)+chr(40)+chr(34)+chr(111)+chr(115)+chr(34)+chr(41)+chr(46)+chr(115)+chr(121)+chr(115)+chr(116)+chr(101)+chr(109)+chr(40)+chr(34)+chr(101)+chr(99)+chr(104)+chr(111)+chr(32)+chr(36)+chr(80)+chr(87)+chr(68)+chr(34)+chr(41))
>>> eval(exploit)
/home/giacomo
0
This proves that to define restrictions only on the text that make the code safe is really hard. Even things like 'eval' in code are not safe. Either you must remove the possibility of executing a function call at all, or you must remove all dangerous built-ins from eval's environment. My exploit also shows that getattr is as bad as eval even when you can not use the comma, since it allows you to walk arbitrary into the object hierarchy. For example you can obtain the real eval function even if the environment does not provide it:
def real_eval():
get_subclasses = _make_getattr_call(
_make_getattr_call(
_make_getattr_call('()',
to_chrs('__class__')),
to_chrs('__base__')),
to_chrs('__subclasses__')) + '()'
catch_warnings = 'next(c for c in %s if %s == %s)()' % (get_subclasses,
_make_getattr_call('c',
to_chrs('__name__')),
to_chrs('catch_warnings'))
return _make_getattr_call(
_make_getattr_call(
_make_getattr_call(catch_warnings, to_chrs('_module')),
to_chrs('__builtins__')),
to_chrs('get')) + '(%s)' % to_chrs('eval')
>>> no_eval = __builtins__.__dict__.copy()
>>> del no_eval['eval']
>>> eval(real_eval(), {'__builtins__': no_eval})
<built-in function eval>
Even though if you remove all the built-ins, then the code becomes safe:
>>> eval(real_eval(), {'__builtins__': None})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'getattr' is not defined
Note that setting '__builtins__' to None removes also chr, list, tuple etc.
The combo of your character restrinctions and '__builtins__' to None is completely safe, because the user has no way to access anything. He can't use the ., the brackets [] or any built-in function or type.
Even though I must say in this way what you can evaluate is pretty limited. You can't do much more than do operations on numbers.
Probably it's enough to remove eval, getattr, and chr from the built-ins to make the code safe, at least I can't think of a way to write an exploit that does not use one of them.
A "parsing" approach is probably safer and gives more flexibility. For example this recipe is pretty good and is also easily customizable to add more restrictions.
To study how to make safe eval I suggest RestrictedPython module (over 10 years of production usage, one fine piece of Python software)
http://pypi.python.org/pypi/RestrictedPython
RestrictedPython takes Python source code and modifies its AST (Abstract Syntax Tree) to make the evaluation safe within the sandbox, without leaking any Python internals which might allow to escape the sandbox.
From RestrictedPython source code you'll learn what kind of tricks are needed to perform to make Python sandboxed safe.
You probably should avoid eval, actually.
But if your stuck with it, you could just make sure your strings are alphanumeric. That should be safe.
Assuming the named functions exist and are safe:
if re.match("^(?:safe|soft|cotton|ball|[()])+$", code): eval(code)
It's not enough to create input sanitization routines. You must also ensure that sanitization is not once accidentally omitted. One way to do that is taint checking.
Is it common in Python to keep testing for type values when working in a OOP fashion?
class Foo():
def __init__(self,barObject):
self.bar = setBarObject(barObject)
def setBarObject(barObject);
if (isInstance(barObject,Bar):
self.bar = barObject
else:
# throw exception, log, etc.
class Bar():
pass
Or I can use a more loose approach, like:
class Foo():
def __init__(self,barObject):
self.bar = barObject
class Bar():
pass
Nope, in fact it's overwhelmingly common not to test for type values, as in your second approach. The idea is that a client of your code (i.e. some other programmer who uses your class) should be able to pass any kind of object that has all the appropriate methods or properties. If it doesn't happen to be an instance of some particular class, that's fine; your code never needs to know the difference. This is called duck typing, because of the adage "If it quacks like a duck and flies like a duck, it might as well be a duck" (well, that's not the actual adage but I got the gist of it I think)
One place you'll see this a lot is in the standard library, with any functions that handle file input or output. Instead of requiring an actual file object, they'll take anything that implements the read() or readline() method (depending on the function), or write() for writing. In fact you'll often see this in the documentation, e.g. with tokenize.generate_tokens, which I just happened to be looking at earlier today:
The generate_tokens() generator requires one argument, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section File Objects). Each call to the function should return one line of input as a string.
This allows you to use a StringIO object (like an in-memory file), or something wackier like a dialog box, in place of a real file.
In your own code, just access whatever properties of an object you need, and if it's the wrong kind of object, one of the properties you need won't be there and it'll throw an exception.
I think that it's good practice to check input for type. It's reasonable to assume that if you asked a user to give one data type they might give you another, so you should code to defend against this.
However, it seems like a waste of time (both writing and running the program) to check the type of input that the program generates independent of input. As in a strongly-typed language, checking type isn't important to defend against programmer error.
So basically, check input but nothing else so that code can run smoothly and users don't have to wonder why they got an exception rather than a result.
If your alternative to the type check is an else containing exception handling, then you should really consider duck typing one tier up, supporting as many objects with the methods you require from the input, and working inside a try.
You can then except (and except as specifically as possible) that.
The final result wouldn't be unlike what you have there, but a lot more versatile and Pythonic.
Everything else that needed to be said about the actual question, whether it's common/good practice or not, I think has been answered excellently by David's.
I agree with some of the above answers, in that I generally never check for type from one function to another.
However, as someone else mentioned, anything accepted from a user should be checked, and for things like this I use regular expressions. The nice thing about using regular expressions to validate user input is that not only can you verify that the data is in the correct format, but you can parse the input into a more convenient form, like a string into a dictionary.
I have a programming experience with statically typed languages. Now writing code in Python I feel difficulties with its readability. Lets say I have a class Host:
class Host(object):
def __init__(self, name, network_interface):
self.name = name
self.network_interface = network_interface
I don't understand from this definition, what "network_interface" should be. Is it a string, like "eth0" or is it an instance of a class NetworkInterface? The only way I'm thinking about to solve this is a documenting the code with a "docstring". Something like this:
class Host(object):
''' Attributes:
#name: a string
#network_interface: an instance of class NetworkInterface'''
Or may be there are name conventions for things like that?
Using dynamic languages will teach you something about static languages: all the help you got from the static language that you now miss in the dynamic language, it wasn't all that helpful.
To use your example, in a static language, you'd know that the parameter was a string, and in Python you don't. So in Python you write a docstring. And while you're writing it, you realize you had more to say about it than, "it's a string". You need to say what data is in the string, and what format it should have, and what the default is, and something about error conditions.
And then you realize you should have written all that down for your static language as well. Sure, Java would force you know that it was a string, but there's all these other details that need to be specified, and you have to manually do that work in any language.
The docstring conventions are at PEP 257.
The example there follows this format for specifying arguments, you can add the types if they matter:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
There was also a rejected PEP for docstrings for attributes ( rather than constructor arguments ).
The most pythonic solution is to document with examples. If possible, state what operations an object must support to be acceptable, rather than a specific type.
class Host(object):
def __init__(self, name, network_interface)
"""Initialise host with given name and network_interface.
network_interface -- must support the same operations as NetworkInterface
>>> network_interface = NetworkInterface()
>>> host = Host("my_host", network_interface)
"""
...
At this point, hook your source up to doctest to make sure your doc examples continue to work in future.
Personally I found very usefull to use pylint to validate my code.
If you follow pylint suggestion almost automatically your code become more readable,
you will improve your python writing skills, respect naming conventions. You can also define your own naming conventions and so on. It's very useful specially for a python beginner.
I suggest you to use.
Python, though not as overtly typed as C or Java, is still typed and will throw exceptions if you're doing things with types that simply do not play nice together.
To that end, if you're concerned about your code being used correctly, maintained correctly, etc. simply use docstrings, comments, or even more explicit variable names to indicate what the type should be.
Even better yet, include code that will allow it to handle whichever type it may be passed as long as it yields a usable result.
One benefit of static typing is that types are a form of documentation. When programming in Python, you can document more flexibly and fluently. Of course in your example you want to say that network_interface should implement NetworkInterface, but in many cases the type is obvious from the context, variable name, or by convention, and in these cases by omitting the obvious you can produce more readable code. Common is to describe the meaning of a parameter and implicitly giving the type.
For example:
def Bar(foo, count):
"""Bar the foo the given number of times."""
...
This describes the function tersely and precisely. What foo and bar mean will be obvious from context, and that count is a (positive) integer is implicit.
For your example, I'd just mention the type in the document string:
"""Create a named host on the given NetworkInterface."""
This is shorter, more readable, and contains more information than a listing of the types.