How to use functions with several paramiters in a groupby - python

I have the following dataset for which I want to calculate several aggregation metrics>
For some I'm using the standard functions, but for other I relay on the tsfresh library, from where I'm importing the functions:
sample.groupby('id').agg(['std', benford_correlation,absolute_maximum])
It works well for functions that have only one parameter, as is the case of:
from tsfresh.feature_extraction.feature_calculators import benford_correlation #(x)
from tsfresh.feature_extraction.feature_calculators import absolute_maximum #(x)
But for others like:
from tsfresh.feature_extraction.feature_calculators import autocorrelation#(x, lag)
I get and error since it has two parameters, x and lag by I'm only passing the x implicitly in the groupby.
How can I specify the other parameters required?

see the pandas.DataFrameGroupBy.aggregate docs. Additional keyword arguments are passed to the function. So you can do this:
sample.groupby('id').agg(
['std', benford_correlation,absolute_maximum],
additional_arg=value,
)
but if you need to pass different arguments to each function, you could use a lambda function:
sample.groupby('id').agg(
[
'std',
lambda s: benford_correlation(s, lag=1),
absolute_maximum,
],
)

Related

How can I get my function, which uses another function as argument to work?

For an assignment I have to make a function that calculates the forward derivative of an input function and then make sure it works by running it on sin(x).
I tried to make it like this:
import numpy as np
import matplotlib.pyplot as plt
def ForwardDer(f(x),h=0.1):
FDer = (f(x*+h)-f(x*))/h
return FDer
And to test this code I ran:
ExampleSin = ForwardDer(math.sin(5))
print(ExampleSin)
This gave me a syntax error so after some googling I adjusted my code to the following.
def ForwardDer(f,x*,h=0.1):
FDer = (f(x*+h)-f(x*))/h
return FDer
ExampleSin = ForwardDer(math.sin(),5)
print(ExampleSin)
This complains that math.sin has too few arguments but using (math.sin(5)) as an argument also doesn't work. Can anybody explain to me how I can succesfully call a function like this in another function? I really don't get it.
When you pass function, method, class or any other callable as argument, you don't want to call it with ().
Do not use * in variable name. It's special character.
It's also a good practice to name functions/methods/variables with snake_case and classes with CamelCase (Read: Naming Conventions).
I refactored your code a bit, check it out:
import math
def forward_der(func: callable, arg: float, h: float = 0.1) -> float:
return (func(arg + h) - func(arg)) / h
example_sin = forward_der(math.sin, 5)
print(example_sin)
It prints to console:
0.33109592335406

Using numba to compile dynamic functions

I'm writing a program which dynamically detects and imports python functions and detects which input parameters and outputs that is will expect/generate.
Like so:
def importFunctions(self, filename):
moduleImport = __import__(filename)
members = getmembers(moduleImport, isfunction)
functions = []
for m in members:
function = getattr(moduleImport, m[0])
number_of_inputs = function.__code__.co_argcount
inputs = function.__code__.co_varnames
if number_of_inputs > 1:
inputs = inputs[0:number_of_inputs-1]
elif number_of_inputs == 1:
inputOne = inputs[0]
inputs = []
inputs.append(inputOne)
outputs = function.__annotations__["return"]
functions.append([function, inputs, outputs])
return functions
This works only when I properly annotate the function, an example function could look something like this:
from numba import jit
#jit
def subtraction(a, b) -> ["difference"]:
a = float(a)
b = float(b)
difference = a - b
return (difference,)
This work perfectly fine without the decorator, but when I want to add the numba "jit" decorator to a function, I get an error saying that the imported function is missing the "return"-annotation.
UPDATE
Having tried to aces the original function by using "func.py_func" as suggested by #Rutger Kassies, my suspicions are that either getmembers or getattr it not proporely importing the numba to-be-compiled function.
It seems that getmembers finds "jit" as a separate member, and doesn't correctly associate it with the original function. The way it's written above, the 'function' named "jit", is of type function, as it should be. However, calling it returns a "<function _jit..wrapper". This has me scratching my head quite a bit but I suppose the 'getattr' is somehow behind this.
My guess is that I will have to fin another approach to dynamically importing functions that doesn't rely on "getattr".
If you're dealing with the numba.jit or numba.njit decorators, you can access the original function, in all it's annotated glory, by accessing the .py_func attribute. A simple example:
import numpy as np
import numba
from typing import get_type_hints, Annotated, Any
custom_output_type = Annotated[Any, "something"]
#numba.njit
def func(x: float) -> custom_output_type:
return x**2
# trigger compilation, not required
func(1.2)
get_type_hints(func.py_func, include_extras=True)
Which returns what you would expect from a regular Python function:
{'x': float, 'return': typing.Annotated[typing.Any, 'something']}
It would be similar when using the inspect module.
It gets more complicated when you use the other decorators lie vectorize & guvectorize, unfortunately. See for example:
https://numba.discourse.group/t/using-annotations-with-numba-gu-vectorize-functions/1008
It's probably best to rely as much as possible on the inspect & typing modules over accessing the private attributes of a function.

decorator approach to JSON serializing?

I am using JSON to send data from Python to R (note: I'm much more familiar with R than Python). For primitives, the json module works great. For many other Python objects (e.g. numpy arrays) you have to define a custom encoder, like in this stack overflow answer. However, that requires you to pass the encoder as an argument to json.dumps, which doesn't work that well for my case.
I know there are other packages like json_tricks that have much more advanced capabilities for JSON serialization, but since I don't have control over what Python distribution a user has I don't want to rely on any non-default modules for serializing objects to JSON.
I'm wondering if there is a way to use contextlib decorators to define additional ways for serializing JSON objects. Ideally, I'm looking for a way that would allow users to overload some standard function standard_wrapper that I provide to add new methods for their own classes (or types from modules that they load) without requiring them to modify standard_wrapper. Some psuedocode below:
import json
def standard_wrapper(o):
return o
obj = [44,64,13,4,79,2,454,89,0]
json.dumps(obj)
json.dumps(standard_wrapper(obj))
import numpy as np
objnp = np.sort(obj)
json.dumps(objnp) # FAILS
#some_decorator_to_overload_standard_wrapper
# some code
json.dumps(standard_wrapper(objnp)) # HOPEFULLY WORKS
This is essentially function overloading by type---I've seen examples for overloading by arguments in Python, but I don't see how to do it by type.
EDIT I was mixing up decorators with contextlib (which I had only ever seen used a decorator).
It's easy to use singledispatch from functools module to overload a function by type, as shown in this answer to a different post. However, a simpler solution that may fit my needs is to create a dictionary of functions where the keys correspond to the object type.
import numpy
func_dict = {}
a = [2,5,2,9,75,8,36,2,8]
an = numpy.sort(a)
func_dict[type(an)] = lambda x: x.tolist()
func_dict[type(a)] = lambda x: x
import json
json.dumps(func_dict[type(a)](a))
json.dumps(func_dict[type(an)](an))
Adding support for another type is achieved by adding another function to the dictionary.

What's the correct way for importing the whole module as well as a couple of its functions in Python?

I know that from module import * will import all the functions in current namespace but it is a bad practice. I want to use two functions directly and use module.function when I have to use any other function from the module. What I am doing currently is:
import module
from module import func1, func2
# DO REST OF MY STUFF
Is it a good practice? Does the order of first two statements matter?
Is there a better way using which I can use these two functions directly and use rest of the functions as usual with the module's name prepended to them?
Using just import module results in very long statements with a lot of repetition if I use the same function from the given module five times in a single statement. That's what I want to avoid.
The order doesn't matter and it's not a pythonic way. When you import the module there is no need to import some of its functions separately again. If you are not sure how many of the functions you might need to use just import the module and access to the functions on demand with a simple reference.
# The only import you need
import module
# Use module.funcX when you need any of its functions
After all, if you want to use some of your functions (much) more than the others, as the cost of attribute access is greater than importing the functions separately, you better to import them as you've done.
And still, the order doesn't matter. You can do:
import module
from module import func1, func2
For more info read the documentation https://www.python.org/dev/peps/pep-0008/#imports
It is not good to do (may be opinion based):
import module
from module import func1, func2 # `func1` and `func2` are already part of module
Because you already hold a reference to module.
If I were you, I would import it in the form of import module. Since your issue is that module.func1() becomes too long. I may import the module and use as for creating a alias for the name. For example:
import module as mo
# ^ for illustration purpose. Even the name of
# your actual module wont be `module`.
# Alias should also be self-explanatory
# For example:
import database_manager as db_manager
Now I may access the functions as:
mo.func1()
mo.func2()
Edit: Based on the edit in actual question
If your are calling same function in the same line, there is possibility that your are already doing some thing wrong. It will be great if you can share what your that function does.
For example: Want to the rertun value of those functions to be passed as argument to another function? as:
test_func(mo.func1(x), mo.func1(y). mo.func1(z))
could be done as:
params_list = [x, y, z]
func_list = [mo.func1(param) for param in params_list]
test_func(*func_list)

Is it possible to get a list of all the keyword arguments of a built-in function? [duplicate]

I'm trying to figure out the arguments of a method retrieved from a module.
I found an inspect module with a handy function, getargspec.
It works for a function that I define, but won't work for functions from an imported module.
import math, inspect
def foobar(a,b=11): pass
inspect.getargspec(foobar) # this works
inspect.getargspec(math.sin) # this doesn't
I'll get an error like this:
File "C:\...\Python 2.5\Lib\inspect.py", line 743, in getargspec
raise TypeError('arg is not a Python function')
TypeError: arg is not a Python function
Is inspect.getargspec designed only for local functions or am I doing something wrong?
It is impossible to get this kind of information for a function that is implemented in C instead of Python.
The reason for this is that there is no way to find out what arguments the method accepts except by parsing the (free-form) docstring since arguments are passed in a (somewhat) getarg-like way - i.e. it's impossible to find out what arguments it accepts without actually executing the function.
You can get the doc string for such functions/methods which nearly always contains the same type of information as getargspec. (I.e. param names, no. of params, optional ones, default values).
In your example
import math
math.sin.__doc__
Gives
"sin(x)
Return the sine of x (measured in radians)"
Unfortunately there are several different standards in operation. See What is the standard Python docstring format?
You could detect which of the standards is in use, and then grab the info that way. From the above link it looks like pyment could be helpful in doing just that.

Categories