Namespace issues when calling patsy within a function - python

I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this):
import statsmodels.formula.api as smf
def wrapper(formula, data, **kwargs):
return smf.logit(formula, data).fit(**kwargs)
If I give this function to a user, who then attempts to define his/her own function:
def square(x):
return x**2
model = wrapper('y ~ x + square(x)', data=df)
they will receive a NameError because the patsy module is looking in the namespace of wrapper for the function square. Is there a safe, Pythonic way to handle this situation without knowing a priori what the function names are or how many functions will be needed?
FYI: This is for Python 3.4.3.

statsmodels uses the patsy package to parse the formulas and create the design matrix. patsy allows user functions as part of formulas and obtains or evaluates the user function in the user namespace or environment.
as reference see eval_env keyword in http://patsy.readthedocs.org/en/latest/API-reference.html
from_formula is the method of models that implements the formula interface to patsy. It use eval_env to provide the necessary information to patsy, which by default is the calling environment of the user. This can be overwritten by the user with the corresponding keyword argument.
The simplest way to define the eval_env is as an integer that indicates the stacklevel that patsy should use. from_formula is incrementing it to take account of the additional level in the statsmodels methods.
According to the comments, eval_env = 2 will use the next higher level from the level that creates the model, e.g. with model = smf.logit(..., eval_env=2).
This creates the model, calls patsy and creates the design matrix, model.fit() will estimate it and returns the results instance.

if you are willing to use eval to do the heavy lifting of your function you can construct a namespace from the arguments to wrapper and the local variables to the outer frame:
wrapper_code = compile("smf.logit(formula, data).fit(**kwargs)",
"<WrapperFunction>","eval")
def wrapper(formula,data,**kwargs):
outer_frame = sys._getframe(1)
namespace = dict(outer_frame.f_locals)
namespace.update(formula=formula, data=data, kwargs=kwargs, smf=smf)
return eval(wrapper_code,namespace)
I don't really see this as a cheat since it seems to be what logit is doing anyway for it to raise a NameError, and as long as wrapper_code is not modified and there are no name conflicts (like using something called data) this should do what you want.

Related

Re-referencing a large number of functions in python

I have a file functional.py which defines a number of useful functions. For each function, I want to create an alias that when called will give a reference to a function. Something like this:
foo/functional.py
def fun1(a):
return a
def fun2(a):
return a+1
...
foo/__init__.py
from inspect import getmembers, isfunction
from . import functional
for (name, fun) in getmembers(functional, isfunction):
dun = lambda f=fun: f
globals()[name] = dun
>> bar.fun1()(1)
>> 1
>> bar.fun2()(1)
>> 2
I can get the functions from functional.py using inspect and dynamically define a new set of functions that are fit for my purpose.
But why? you might ask... I am using a configuration manager Hydra where one can instantiate objects by specifying the fully qualified name. I want to make use of the functions in functional.py in the config and have hydra pass a reference to the function when creating an object that uses the function (more details can be found in the Hydra documentation).
There are many functions and I don't want to write them all out ... people have pointed out in similar questions that modifying globals() for this purpose is bad practice. My use case is fairly constrained - documentation wise there is a one-one mapping (but obviously an IDE won't be able to resolve it).
Basically, I am wondering if there is a better way to do it!
Is your question related to this feature request and in particular to this comment?
FYI: In Hydra 1.1, instantiate fully supports positional arguments so I think you should be able to call functools.partial directly without redefining it.

How to make a PyTorch Distribution on GPU

Is it possible to make the PyTorch distributions create their samples directly on GPU.
If I do
from torch.distributions import Uniform, Normal
normal = Normal(3, 1)
sample = normal.sample()
Then sample will be on CPU. Of course it is possible to do sample = sample.to(torch.device("cuda")) to make it on GPU. But is there a way to have the sample go directly to GPU without first creating it on CPU?
PyTorch distributions inherit from Object, not nn.Module so it does not have a to method the put the distribution instance on GPU.
Any ideas?
Distributions use the reparametrization trick. Thus giving size 0 tensors which are on GPU to the distribution constructor works. As follows:
normal = Normal(torch.tensor(0).to(device=torch.device("cuda")), torch.tensor(1).to(device=torch.device("cuda")))
In my case, I'm using a Normal Distribution as my prior in a neural net model. I have a class called class1, for example, and in the init function I have to initiate my prior. However, calling .to('cuda') of an instance of class1 doesn't change the distribution device and causes error in later usages. Therefore, I could have used register buffers to manage it as follows.
class class1(nn.Module):
def __init__(self):
super().__init__()
self.register_buffer("mean", torch.tensor(0.))
self.register_buffer("var", torch.tensor(1.))
def get_dist(self):
return torch.distributions.Normal(self.mean, self.var)
However, I have several priors, and it's not possible to register_buffer a list. So, an option could be initiating distributions in get_dist property unless you don't care about the time complexity of initiating distributions. I decided to define a function for initiating distributions and a try-except in get_dist to handle different states. If the distributions variable is not assigned or on CPU while we expect it to be on GPU, it jumps to except where I initiate the distributions using torch.zeros(..).to(device).
Overall, to handle this error of CPU/GPU device, you need to initiate a distribution using Tensor input parameters with appropriate device. And the main reason is torch.Distribution module hasn't a device attribute unfortunately.
I just came across the same problem, and thanks to the other answers here for the pointers. I want to offer another option if you want a distribution inside a module, which is to override the to method in the module and manually call the to methods on the distribution parameter tensors. I've only tested with Uniform but works well here.
class MyModule(nn.Module):
def __init__(self, ...):
self.rng = Uniform(
low=torch.zeros(3),
high=torch.ones(3)
)
def to(self, *args, **kwargs):
super().to(*args, **kwargs)
self.rng.low = self.rng.low.to(*args, **kwargs)
self.rng.high = self.rng.high.to(*args, **kwargs)
Now you can put your model on the gpu as usual and self.rng.sample() will return a sample on the correct device.
You can solve the problem of "transferring non-parameter/buffer attributes to GPU" by overriding self._apply(self, fn) method of your network. Like this:
def _apply(self, fn):
# apply fn() to your modules
for module in self.children(): # like 'ResNet_backbone'
module._apply(fn)
# apply fn() to your prior
self.prior.attr1 = fn(self.prior.attr1) # like 'MultivariateNormal.loc', need to be Tensor
self.prior.attr2 = fn(self.prior.attr2)
···
self.prior.attrN = fn(self.prior.attrN)
# if we do not use register_buffer(Tensor)
# apply fn() to your non-parameter/buffer attributes
# need to be Tensor too
self.attr1 = fn(self.attr1)
self.attr2 = fn(self.attr2)
···
self.attrN = fn(self.attrN)

What is the function of #tf_export before a tensorflow function definition in Python?

I'm working with Tensorflow in Python. In a custom written function I found #tf_export() before the function definition like below, the function of which I don't understand. Could somebody explain?
#tf_export("signal.ifftshift")
def ifftshift(x, axes=None, name=None):
As I understand, it allows Tensorflow to expose a function or class under a different name. For example, the Server class within the distribute module actually lives in the training/server_lib.py file within the repo, but, since it is exported as distribute.Server, you can use it like tf.distribute.Server().
# training/server_lib.py
#tf_export("distribute.Server", v1=["distribute.Server", "train.Server"])
#deprecation.deprecated_endpoints("train.Server")
class Server(object):
...
It makes it confusing to find the code, but I imagine it's a more flexible way to create these "logical" modules.
It is a convenient way to output dot delimited symbols directly to the tf API. Namely, a user can access ifftshift() with tf.signal.ifftshift(), without caring about the true path (here tf.python.ops.signal.fft_ops.ifftshif()).

How to remove library modules or specific functions from pycallgraph

I am using pycallgraph to analyze my code performance. However, the call graph is pretty messy with many calls to system functions as well as certain functions I would not like to document. How can I stop pycallgraph from reporting these calls?
Pycallgraph provides filtering capabilities to filter out any module, class or function you would like to exclude from call graph. Following function should be defined before you start the trace and passed to pycallgraph
Example
def filtercalls(call_stack, modul, clas, func, full):
mod_ignore = ['shutil','scipy.optimize','re','os','sys','json']
func_ignore = ['CustomFunctionName','pdbcall']
clas_ignore = ['pdb']
return modul not in mod_ignore and func not in func_ignore and clas not in clas_ignore
The pycallgraph trace start is
pycallgraph.start_trace(filter_func=filtercalls)
This way, any module, class or function you provide in filtercalls will be removed. Please note that many time in standard libraries providing just the module name is not enough. Thus, including numpy in mod_ignore will still result in numpy.core being included

Set defaults at runtime

I manage a fairly large python-based quantum chemistry suite, PyQuante. I'm currently struggling with how to set various defaults so that users can choose among different options at runtime.
For example, I have three different methods for computing electron repulsion integrals. Let's call them a,b,c. I used to simply pick the one I liked best (say, c), and have that hard-wired into the module that computes these integrals.
I have now modified this to use a module, Defaults.py, that contains all such hard-wires. But this is set at compile/install time. I would now like users to be able to override these options at runtime, say, using a .pyquanterc.py file.
In my integral routines, I currently have something like
from Defaults import integral_method
I know about dictionaries, and the .update() method. But I don't know how I would use this in real life. My defaults module looks like
integral_method = c
should I modify the end of Defaults.py to look for a .pythonrc.py file and override these values? E.g.
if os.path.exists('$HOME/.pythonrc.py'): do_something
If so, what should do_something look like?
With your current setup, the user can change the default functions in his scripts quite easily:
import Defaults
Defaults.integral_method = somefunc
If the user adds this to his script, all your modules that use integral_method from Defaults will use somefunc to calculate integrals.
I might do this via a factory class.
class IntegralSolver:
"""
Factory class containing methods for solving integrals.
>>> solver = IntegralSolver("method1")
>>> solver(x)
# solution via method1
Can also be used directly:
>>> IntegralSolver.method2(x)
# solution via method2
"""
def __init__(self, method):
self.__call__ = getattr(self, method)
#staticmethod
def method1(x):
return method1_solution
#staticmethod
def method2(x):
return method2_solution
It really depends on how your user runs the toolset. If they twiddle the python code each time, just setting a block at the top labeled OPTIONS should be good. If they run it off the command line, use the argparse library to allow them to switch options on the command line. Perhaps have it read the options out of a file with configParser to read a default file with your options, and if the user sets it, an additional file with their options.

Categories