My program derives a sequence args and a mapping kwargs from user input. I want to check that input, and then forward it to a python function f (which is chosen based on user input). In this case, a function signature mismatch between f and [kw]args is an input error; I must distinguish it from possible programming errors within the implementation of f, even though they might both raise TypeError.
So I want to check the signature before attempting the function call. Is there a way to do this other than to manually compare [kw]args to the result of inspect.getargspec (or .getfullargspec or .signature in later python versions)?
Related questions: Is there a way to check a function's signature in Python?
The method using inspect is probably the most straightforward way of doing this that exists - it's not something one would normally expect to be doing in Python.
(Typically, allowing end users to call arbitrary functions with arbitrary inputs is not what a programmer wants.)
Related
Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.
I am trying to call a stored proc on Postgres/PLPGSQL from Django/Python. I have the stored proc defined using a VARIADIC parameter:
CREATE OR REPLACE FUNCTION udf_getmultiplecategoriescodetypes (VARIADIC NUMERIC[])
then the only place I want to use the array of parameters in the proc is in the WHERE stmt:
WHERE cct.code_category_fk_id = ANY($1)
All this works perfectly when I call the function from the DBeaver console:
SELECT * FROM udf_getmultiplecategoriescodetypes(1, 2)
However, if I use the callproc function in Django/Python, using the same type of syntax, like this:
c.callproc("udf_getmultpilecategoriescodetypes", (1, 2))
I get errors:
LINE 1: SELECT * FROM udf_getmultpilecategoriescodetypes(1,2)
HINT: No function matches the given name and argument types. You might need to add
explicit type casts.
function udf_getmultpilecategoriescodetypes(integer, integer) does not exist
Furthermore, in DBeaver, when the function is created and stored in the functions listing, if I try to delete it, it says the function cannot be found.
Function Showing in TreeView
Deletion Error Msg
I've since found out that I can delete it by using DROP FUNCTION and including the VARIADIC parameter so it recognises it based on number and type of parameters. But why is it like that?
So, two questions:
What is the correct way to pass an array of integers from a Django/Python callproc function to a VARIADIC parameter in a Postgres/PLPGSQL stored proc?
Why does DBeaver not recognise a listed stored proc function as existing when an array or VARIADIC is used as the parameter? And might this somehow be related to the callproc error, since the errors of both issues seem to be related to the VARIADIC parameter?
What is the correct way to pass an array of integers from a Django/Python callproc function to a VARIADIC parameter in a Postgres/PGPLSQL stored proc?
You defined the parameter as VARIADIC NUMERIC[], so you really want to pass an array of numeric, not an array on integer.
And since it's a VARIADIC function, you can pass a list of numeric values instead of an actual array - like you do. See:
Pass multiple values in single parameter
But that's not the problem at hand. Postgres function type resolution will fall back to func(numeric[]) if there is no func(int[]). Seems to be a plain typo. Do you see the difference?
udf_getmultiplecategoriescodetypes
udf_getmultpilecategoriescodetypes
Conciser names and maybe some underscores might help prevent such typos.
Why does DBeaver not recognise a listed stored proc function as existing when an array or VARIADIC is used as the parameter? And might this somehow be related to the callproc error, since the errors of both issues seem to be related to the VARIADIC parameter?
Postgres allows function overloading. Hence, the function signature is comprised of its name and parameters to be unambiguous. DBeaver has nothing to do with it.
On top of it, be aware that the same function can exist in multiple schemas. So make sure you operate with the right search_path and don't have (and inadvertently call) a copy in another schema.
So your attempt to drop the function public.udf_getmultiplecategoriescodetypes() fails due to the missing parameter. And it may also fail if the function was created in a different schema.
Related:
Error: No function matches the given name and argument types
Is there a way to disable function overloading in Postgres
How does the search_path influence identifier resolution and the "current schema"
I've encountered a problem in a project where it may be useful to be able to pass a large number (in the tens, not the hundreds) of arguments to a single "Write once, use many times" function in Python. The issue is, I'm not really sure what the best ay is to handle a large block of functions like that - just pass them all in as a single dictionary and unpack that dictionary inside the function, or is there a more efficient/pythonic way of achieving the same effect.
Depending on exactly what you are doing, you can pass in arbitrary parameters to Python functions in one of two standard ways. The first is to pass them as a tuple (i.e. based on location in the function call). The second is to pass them as key-value pairs, stored in a map in the function definition. If you wanted to be able to differentiate the arguments using keys, you would call the function using arguments of the form key=value and retrieve them from a map parameter (declared with ** prefix) in the function definition. This parameter is normally called kwargs by convention. The other way to pass an arbitrary number of parameters is to pass them as a tuple. Python will wrap the arguments in a tuple automatically if you declare it with the * prefix. This parameter is usually called args by convention. You can of course use both of these in some combination along with other named arguments as desired.
In my recent project I have the problem, that some values are often misinterpreted. For instance I calculate a wave as a sum of two waves (for which I need two amplitudes and two phase shifts), and then sample it at 4 points. I pass these tuples of four values to different functions, but sometimes I made the mistake to pass wave parameters instead of sample points.
These errors are hard to find, because all the calculations work without any error, but the values are totally meaningless in this context and so the results are just wrong.
What I want now is some kind of semantic type. I want to state that the one function returns sample points and the other function awaits sample points, and that I can do nothing that would conflict this declarations without immediately getting an error.
Is there any way to do this in python?
I would recommend implementing specific data types to be able to distinguish between different kind of information with the same structure.
You can simply subclass list for example and then do some type checking at runtime within your functions:
class WaveParameter(list):
pass
class Point(list):
pass
# you can use them just like lists
point = Point([1, 2, 3, 4])
wp = WaveParameter([5, 6])
# of course all methods from list are inherited
wp.append(7)
wp.append(8)
# let's check them
print(point)
print(wp)
# type checking examples
print isinstance(point, Point)
print isinstance(wp, Point)
print isinstance(point, WaveParameter)
print isinstance(wp, WaveParameter)
So you can include this kind of type checking in your functions, to make sure the correct kind of data was passed to it:
def example_function_with_waveparameter(data):
if not isinstance(data, WaveParameter):
log.error("received wrong parameter type (%s instead WaveParameter)" %
type(data))
# and then do the stuff
or simply assert:
def example_function_with_waveparameter(data):
assert(isinstance(data, WaveParameter))
Pyhon's notion of a "semantic type" is called a class, but as mentioned, Python is dynamically typed so even using custom classes instead of tuples you won't get any compile-time error - at best you'll get runtime errors if your classes are designed in such a way that trying to use one instead of the other will fail.
Now classes are not just about data, they are about behaviour too, so if you have functions that do waveform-specific computations these functions would probably become methods of the Waveform class, and idem for the Point part, and this might be enough to avoid logical errors like passing a "waveform" tuple to a function expecting a "point" tuple.
To make a long story short: if you want a statically typed functional language, Python is not the right tool (Haskell might be a better choice). If you really want / have to use Python, try using classes and methods instead of tuples and functions, it still won't detect type errors at compile-time but chances are you'll have less type errors AND that these type errors will be detected at runtime instead of producing wrong results.
Is there a good rule of thumb as to when you should prefer varargs function signatures in your API over passing an iterable to a function? ("varargs" being short for "variadic" or "variable-number-of-arguments"; i.e. *args)
For example, os.path.join has a vararg signature:
os.path.join(first_component, *rest) -> str
Whereas min allows either:
min(iterable[, key=func]) -> val
min(a, b, c, ...[, key=func]) -> val
Whereas any/all only permit an iterable:
any(iterable) -> bool
Consider using varargs when you expect your users to specify the list of arguments as code at the callsite or having a single value is the common case. When you expect your users to get the arguments from somewhere else, don't use varargs. When in doubt, err on the side of not using varargs.
Using your examples, the most common usecase for os.path.join is to have a path prefix and append a filename/relative path onto it, so the call usually looks like os.path.join(prefix, some_file). On the other hand, any() is usually used to process a list of data, when you know all the elements you don't use any([a,b,c]), you use a or b or c.
My rule of thumb is to use it when you might often switch between passing one and multiple parameters. Instead of having two functions (some GUI code for example):
def enable_tab(tab_name)
def enable_tabs(tabs_list)
or even worse, having just one function
def enable_tabs(tabs_list)
and using it as enable_tabls(['tab1']), I tend to use just: def enable_tabs(*tabs). Although, seeing something like enable_tabs('tab1') looks kind of wrong (because of the plural), I prefer it over the alternatives.
You should use it when your parameter list is variable.
Yeah, I know the answer is kinda daft, but it's true. Maybe your question was a bit diffuse. :-)
Default arguments, like min() above is more useful when you either want to different behaviours (like min() above) or when you simply don't want to force the caller to send in all parameters.
The *arg is for when you have a variable list of arguments of the same type. Joining is a typical example. You can replace it with an argument that takes a list as well.
**kw is for when you have many arguments of different types, where each argument also is connected to a name. A typical example is when you want a generic function for handling form submission or similar.
They are completely different interfaces.
In one case, you have one parameter, in the other you have many.
any(1, 2, 3)
TypeError: any() takes exactly one argument (3 given)
os.path.join("1", "2", "3")
'1\\2\\3'
It really depends on what you want to emphasize: any works over a list (well, sort of), while os.path.join works over a set of strings.
Therefore, in the first case you request a list; in the second, you request directly the strings.
In other terms, the expressiveness of the interface should be the main guideline for choosing the way parameters should be passed.