I've encountered a problem in a project where it may be useful to be able to pass a large number (in the tens, not the hundreds) of arguments to a single "Write once, use many times" function in Python. The issue is, I'm not really sure what the best ay is to handle a large block of functions like that - just pass them all in as a single dictionary and unpack that dictionary inside the function, or is there a more efficient/pythonic way of achieving the same effect.
Depending on exactly what you are doing, you can pass in arbitrary parameters to Python functions in one of two standard ways. The first is to pass them as a tuple (i.e. based on location in the function call). The second is to pass them as key-value pairs, stored in a map in the function definition. If you wanted to be able to differentiate the arguments using keys, you would call the function using arguments of the form key=value and retrieve them from a map parameter (declared with ** prefix) in the function definition. This parameter is normally called kwargs by convention. The other way to pass an arbitrary number of parameters is to pass them as a tuple. Python will wrap the arguments in a tuple automatically if you declare it with the * prefix. This parameter is usually called args by convention. You can of course use both of these in some combination along with other named arguments as desired.
Related
I want to understand when should I use varargs vs a list type in the function parameter in Python 2.7
Suppose I write a function that processes a list of URLs. I could define the function in two different ways:
Option 1:
def process_urls(urls):
if not isinstance(urls, list) or isinstance(urls, tuple):
raise TypeError("urls should be a list or tuple type")
Option 2:
def process_urls(*urls):
# urls is guaranteed to be a tuple
Option 2 guarantees urls to be a tuple but can take in random number of positional arguments which could be garbage such as process_urls(['url1', 'url2'], "this is not a url")
From a programming standpoint, which option is preferred?
The first, but without the type checking. Type checks kill duck typing. What if the caller wants to pass in a generator, or a set, or other iterable? Don't limit them to just lists and tuples.
Neither is unequivocally best. Each style has benefits in different situations.
Using a single iterable argument is going to be better most of the time, especially if the caller already has the URLs packed up into a list. If they have a list and needed to use the varargs style, they'd need to call process_urls(*existing_list_of_URLs) whould needlessly unpacks and then repacks the arguments. As John Kugelman suggests in his answer, you should probably not use explicit type checking to enforce the type of the argument, just assume it's an iterable and work from there.
Using a variable argument list might be nicer than requiring a list if your function is mostly going to be called with separate URLs. For instance, maybe the URLs are hard coded like this: process_urls("http://example.com", "https://stackoverflow.com"). Or maybe they're in separate variables, but the specific variable to be used are directly coded in: process_url(primary_url, backup_url).
A final option: Support both approaches! You can specify that your function accepts one or more arguments. If it gets only one, it expects an iterable containing URLs. If it gets more than one argument, it expects each to be a separate URL. Here's what that might look like:
def process_urls(*args):
if len(args) == 1:
args = args[0]
# do stuff with args, which is an iterable of URLs
There's one downside to this, that a single URL string passed by itself will be incorrectly identified as a sequence of URLs, each consisting of a single character from the original string. That's such an awkward failure case, so you might want to explicitly check for it. You could choose to raise an exception, or just accept a single string as an argument as if it was in a container.
I am trying to call a stored proc on Postgres/PLPGSQL from Django/Python. I have the stored proc defined using a VARIADIC parameter:
CREATE OR REPLACE FUNCTION udf_getmultiplecategoriescodetypes (VARIADIC NUMERIC[])
then the only place I want to use the array of parameters in the proc is in the WHERE stmt:
WHERE cct.code_category_fk_id = ANY($1)
All this works perfectly when I call the function from the DBeaver console:
SELECT * FROM udf_getmultiplecategoriescodetypes(1, 2)
However, if I use the callproc function in Django/Python, using the same type of syntax, like this:
c.callproc("udf_getmultpilecategoriescodetypes", (1, 2))
I get errors:
LINE 1: SELECT * FROM udf_getmultpilecategoriescodetypes(1,2)
HINT: No function matches the given name and argument types. You might need to add
explicit type casts.
function udf_getmultpilecategoriescodetypes(integer, integer) does not exist
Furthermore, in DBeaver, when the function is created and stored in the functions listing, if I try to delete it, it says the function cannot be found.
Function Showing in TreeView
Deletion Error Msg
I've since found out that I can delete it by using DROP FUNCTION and including the VARIADIC parameter so it recognises it based on number and type of parameters. But why is it like that?
So, two questions:
What is the correct way to pass an array of integers from a Django/Python callproc function to a VARIADIC parameter in a Postgres/PLPGSQL stored proc?
Why does DBeaver not recognise a listed stored proc function as existing when an array or VARIADIC is used as the parameter? And might this somehow be related to the callproc error, since the errors of both issues seem to be related to the VARIADIC parameter?
What is the correct way to pass an array of integers from a Django/Python callproc function to a VARIADIC parameter in a Postgres/PGPLSQL stored proc?
You defined the parameter as VARIADIC NUMERIC[], so you really want to pass an array of numeric, not an array on integer.
And since it's a VARIADIC function, you can pass a list of numeric values instead of an actual array - like you do. See:
Pass multiple values in single parameter
But that's not the problem at hand. Postgres function type resolution will fall back to func(numeric[]) if there is no func(int[]). Seems to be a plain typo. Do you see the difference?
udf_getmultiplecategoriescodetypes
udf_getmultpilecategoriescodetypes
Conciser names and maybe some underscores might help prevent such typos.
Why does DBeaver not recognise a listed stored proc function as existing when an array or VARIADIC is used as the parameter? And might this somehow be related to the callproc error, since the errors of both issues seem to be related to the VARIADIC parameter?
Postgres allows function overloading. Hence, the function signature is comprised of its name and parameters to be unambiguous. DBeaver has nothing to do with it.
On top of it, be aware that the same function can exist in multiple schemas. So make sure you operate with the right search_path and don't have (and inadvertently call) a copy in another schema.
So your attempt to drop the function public.udf_getmultiplecategoriescodetypes() fails due to the missing parameter. And it may also fail if the function was created in a different schema.
Related:
Error: No function matches the given name and argument types
Is there a way to disable function overloading in Postgres
How does the search_path influence identifier resolution and the "current schema"
What are the conventions for ordering parameters in Python? For instance,
def plot_graph(G, filename, ...)
# OR
def plot_graph(filename, G, ...)
There is no discussion in PEP 0008 -- Style Guide for Python Code | Python.org
Excerpt from the answer of Conventions for order of parameters in a function,
If a language allows passing a hash/map/associative array as a single parameter, try to opt for passing that. This is especially useful for methods with >=3 parameters, ESPECIALLY when those same parameters will be passed to nested function calls.
Is it extreme to convert each parameter into a key-value pair, like def plot_graph(graph=None, filename=None, ...)?
There's really no convention for ordering function parameters, except a limitation that positional non-default parameters must go before parameters with defaults and only then keyword parameters, i.e. def func(pos_1, pos_n, pos_1_w_default='default_val', pos_n_w_default='default_val', *args, kw_1, kw_n, kw_1_w_default='default_val', kw_n_w_default='default_val', **kwargs).
Usually you define parameters order logically based on their meaning for the function, e.g. if you define a function that does subtraction, it's logical, that minuend should be the first parameter and subtrahend should be second. In this case reverse order is possible, but it's not logical.
Also, if you consider that your function might be used partially, that might affect your decision on parameter ordering.
Most things you need to know about function parameters are in the official tutorial.
P.S. Regarding your particular example with graph function... Considering your function name, it is used for displaying a graph, so a graph must be provided as argument, otherwise there's nothing to display, so making graph=None by default doesn't make much sense.
It is not extreme to use only keyword arguments. I have seen that in many codebases. This allows you to extend functionalities (by adding new keyword arguments to your functions) without breaking your previous code. It can be slightly more tedious to use, but definitely easier to maintain and to extend.
Also have a look at PEP 3102 -- Keyword-Only Arguments, which is a way to force the use of keyword arguments in python 3.
My program derives a sequence args and a mapping kwargs from user input. I want to check that input, and then forward it to a python function f (which is chosen based on user input). In this case, a function signature mismatch between f and [kw]args is an input error; I must distinguish it from possible programming errors within the implementation of f, even though they might both raise TypeError.
So I want to check the signature before attempting the function call. Is there a way to do this other than to manually compare [kw]args to the result of inspect.getargspec (or .getfullargspec or .signature in later python versions)?
Related questions: Is there a way to check a function's signature in Python?
The method using inspect is probably the most straightforward way of doing this that exists - it's not something one would normally expect to be doing in Python.
(Typically, allowing end users to call arbitrary functions with arbitrary inputs is not what a programmer wants.)
Is there a good rule of thumb as to when you should prefer varargs function signatures in your API over passing an iterable to a function? ("varargs" being short for "variadic" or "variable-number-of-arguments"; i.e. *args)
For example, os.path.join has a vararg signature:
os.path.join(first_component, *rest) -> str
Whereas min allows either:
min(iterable[, key=func]) -> val
min(a, b, c, ...[, key=func]) -> val
Whereas any/all only permit an iterable:
any(iterable) -> bool
Consider using varargs when you expect your users to specify the list of arguments as code at the callsite or having a single value is the common case. When you expect your users to get the arguments from somewhere else, don't use varargs. When in doubt, err on the side of not using varargs.
Using your examples, the most common usecase for os.path.join is to have a path prefix and append a filename/relative path onto it, so the call usually looks like os.path.join(prefix, some_file). On the other hand, any() is usually used to process a list of data, when you know all the elements you don't use any([a,b,c]), you use a or b or c.
My rule of thumb is to use it when you might often switch between passing one and multiple parameters. Instead of having two functions (some GUI code for example):
def enable_tab(tab_name)
def enable_tabs(tabs_list)
or even worse, having just one function
def enable_tabs(tabs_list)
and using it as enable_tabls(['tab1']), I tend to use just: def enable_tabs(*tabs). Although, seeing something like enable_tabs('tab1') looks kind of wrong (because of the plural), I prefer it over the alternatives.
You should use it when your parameter list is variable.
Yeah, I know the answer is kinda daft, but it's true. Maybe your question was a bit diffuse. :-)
Default arguments, like min() above is more useful when you either want to different behaviours (like min() above) or when you simply don't want to force the caller to send in all parameters.
The *arg is for when you have a variable list of arguments of the same type. Joining is a typical example. You can replace it with an argument that takes a list as well.
**kw is for when you have many arguments of different types, where each argument also is connected to a name. A typical example is when you want a generic function for handling form submission or similar.
They are completely different interfaces.
In one case, you have one parameter, in the other you have many.
any(1, 2, 3)
TypeError: any() takes exactly one argument (3 given)
os.path.join("1", "2", "3")
'1\\2\\3'
It really depends on what you want to emphasize: any works over a list (well, sort of), while os.path.join works over a set of strings.
Therefore, in the first case you request a list; in the second, you request directly the strings.
In other terms, the expressiveness of the interface should be the main guideline for choosing the way parameters should be passed.