I am using JSON to send data from Python to R (note: I'm much more familiar with R than Python). For primitives, the json module works great. For many other Python objects (e.g. numpy arrays) you have to define a custom encoder, like in this stack overflow answer. However, that requires you to pass the encoder as an argument to json.dumps, which doesn't work that well for my case.
I know there are other packages like json_tricks that have much more advanced capabilities for JSON serialization, but since I don't have control over what Python distribution a user has I don't want to rely on any non-default modules for serializing objects to JSON.
I'm wondering if there is a way to use contextlib decorators to define additional ways for serializing JSON objects. Ideally, I'm looking for a way that would allow users to overload some standard function standard_wrapper that I provide to add new methods for their own classes (or types from modules that they load) without requiring them to modify standard_wrapper. Some psuedocode below:
import json
def standard_wrapper(o):
return o
obj = [44,64,13,4,79,2,454,89,0]
json.dumps(obj)
json.dumps(standard_wrapper(obj))
import numpy as np
objnp = np.sort(obj)
json.dumps(objnp) # FAILS
#some_decorator_to_overload_standard_wrapper
# some code
json.dumps(standard_wrapper(objnp)) # HOPEFULLY WORKS
This is essentially function overloading by type---I've seen examples for overloading by arguments in Python, but I don't see how to do it by type.
EDIT I was mixing up decorators with contextlib (which I had only ever seen used a decorator).
It's easy to use singledispatch from functools module to overload a function by type, as shown in this answer to a different post. However, a simpler solution that may fit my needs is to create a dictionary of functions where the keys correspond to the object type.
import numpy
func_dict = {}
a = [2,5,2,9,75,8,36,2,8]
an = numpy.sort(a)
func_dict[type(an)] = lambda x: x.tolist()
func_dict[type(a)] = lambda x: x
import json
json.dumps(func_dict[type(a)](a))
json.dumps(func_dict[type(an)](an))
Adding support for another type is achieved by adding another function to the dictionary.
Related
I have a file functional.py which defines a number of useful functions. For each function, I want to create an alias that when called will give a reference to a function. Something like this:
foo/functional.py
def fun1(a):
return a
def fun2(a):
return a+1
...
foo/__init__.py
from inspect import getmembers, isfunction
from . import functional
for (name, fun) in getmembers(functional, isfunction):
dun = lambda f=fun: f
globals()[name] = dun
>> bar.fun1()(1)
>> 1
>> bar.fun2()(1)
>> 2
I can get the functions from functional.py using inspect and dynamically define a new set of functions that are fit for my purpose.
But why? you might ask... I am using a configuration manager Hydra where one can instantiate objects by specifying the fully qualified name. I want to make use of the functions in functional.py in the config and have hydra pass a reference to the function when creating an object that uses the function (more details can be found in the Hydra documentation).
There are many functions and I don't want to write them all out ... people have pointed out in similar questions that modifying globals() for this purpose is bad practice. My use case is fairly constrained - documentation wise there is a one-one mapping (but obviously an IDE won't be able to resolve it).
Basically, I am wondering if there is a better way to do it!
Is your question related to this feature request and in particular to this comment?
FYI: In Hydra 1.1, instantiate fully supports positional arguments so I think you should be able to call functools.partial directly without redefining it.
I am trying to parallelize a function that takes in an object in Python:
In using Pathos, the map function automatically dills the object before distributing it among the processors.
However, it takes ~1 min to dill the object each time, and I need run this function up to 100 times. All in all, it is taking nearly 2 hours to just serialize the object before even running it.
Is there a way to just serialize it once, and use it multiple times?
Thanks very much
The easiest thing to do is to do this manually.
Without an example of your code, I have to make a lot of assumptions and write something pretty vague, so let's take the simplest case.
Assume you're using dill manually, so your existing code looks like this:
obj = function_that_creates_giant_object()
for i in range(zillions):
results.append(pool.apply(func, (dill.dumps(obj),)))
All you have to do is move the dumps out of the loop:
obj = function_that_creates_giant_object()
objpickle = dill.dumps(obj)
for i in range(zillions):
results.append(pool.apply(func, (objpickle,)))
But depending on your actual use, it may be better to just stick a cache in front of dill:
cachedpickle = functools.lru_cache(maxsize=10)(dill.dumps)
obj = function_that_creates_giant_object()
for i in range(zillions):
results.append(pool.apply(wrapped_func, (cachedpickle(obj),))
Of course if you're monkeypatching multiprocessing to use dill in place of pickle, you can just as easy patch it to use this cachedpickle function.
If you're using multiprocess, which is a forked version of multiprocessing that pre-substitutes dill for pickle, it's less obvious how to patch that; you'll need to go through the source and see where it's using dill and get it to use your wrapper. But IIRC, it just does a import dill as pickle somewhere and then uses the same code as (a slightly out-of-date version of multiprocessing), so it isn't all that different.
In fact, you can even write a module that exposes the same interface as pickle and dill:
import functools
import dill
def loads(s):
return dill.loads(s)
#lru_cache(maxsize=10)
def dumps(o):
return dill.dumps(o)
… and just replace the import dill as pickle with import mycachingmodule as pickle.
… or even monkeypatch it after loading with multiprocess.helpers.pickle = mycachingmodule (or whatever the appropriate name is—you're still going to have to find where that relevant import happens in the source of whatever you're using).
And that's about as complicated as it's likely to get.
I am currently learning how to write Python (v3.5) extension modules using the Python C API. Some operations, like fast numerical work, are best done in C, while other operations, like string manipulation, are far easier to implement in Python. Is there an agreed-upon way to use both Python and C code to define a new type?
For example, I've written a Matrix type in C that supports basic storage and arithmetic operations. I want to define the Matrix.__str__ using Python, where string manipulations are much easier and I don't need to worry about cstrings.
I attempted to define the __str__ method when the module loads in __init__.py as follows:
from mymodule._mymodule import Matrix;
def as_str(self):
print("This is a matrix!");
Matrix.__str__ = as_str;
When I run this code, I get a TypeError: can't set attributes of built-in/extension type 'matey.Matrix'. Is there an acceptable way to do this? If the solution is to subclass Matrix instead, what is the best way to keep my C base classes / Python subclasses organized within a module?
Personally, I wouldn't try and do object-oriented stuff in C. I'd stick to writing a module which exposes some (stateless) functions.
If I wanted the Python interface to be object oriented, I'd write a class in Python which imports that (C extension) module and consumes functions from it. Maintaining of any state would all be done in Python.
You could instead define a _Matrix type which you then extend with a traditional OOP approach
from mymodule._mymodule import _Matrix;
class Matrix(_Matrix):
def __str__(self):
return "This is a matrix!"
I have a file that I read from which has definitions of ctypes that are used in a separate project. I can read the file and obtain all the necessary information to create a ctype that I want in Python like the name, fields, bitfields, ctype base class (Structure, Union, Enum, etc), and pack.
I want to be able to create a ctype class from the information above. I also want these ctypes to be pickleable.
I currently have two solutions, both of which I feel like are hacks.
Solution 1
Generate a Python code object in an appropriate ctype format by hand or with the use of something like Jinja2 and then evaluate the python code object.
This solution has the downside of using eval. I always try to stay away from eval and I don't feel like this is a good place to use it.
Solution 2
Create a ctype dynamically in a function like so:
from ctypes import Structure
def create_ctype_class(name, base, fields, pack):
class CtypesStruct(base):
_fields_ = fields
_pack_ = pack
CtypesStruct.__name__ = name
return CtypesStruct
ctype = create_ctype_class('ctype_struct_name', ctypes.Structure,
[('field1', ctypes.c_uint8)], 4)
This solution isn't so bad, but setting the name of the class is ugly and the type cannot be pickled.
Is there a better way of creating a dynamic ctype class?
Note: I am using Python 2.7
Solution 2 is probably your better option, though if you're also writing such classes statically, you may want to use a metaclass to deduplicate some of that code. If you need your objects to be pickleable, then you'll need a way to reconstruct them from pickleable objects. Once you've implemented such a mechanism, you can make the pickle module aware of it with a __reduce__() method.
I would go with a variant of Solution 1. Instead of evaling code, create a directory with an __init__.py (i.e. a package), add it to your sys.path and write out an entire python module containing all of the classes. Then you can import them from a stable namespace which will make pickle happier.
You can either take the output and add it to your app's source code or dynamically recreate it and cache it on a target machine at runtime.
pywin32 uses an approach like this for caching classes generated from ActiveX interfaces.
I have a plypython function which does some json magic. For this it obviously imports the json library.
Is the import called on every call to the function? Are there any performance implication I have to be aware of?
The import is executed on every function call. This is the same behavior you would get if you wrote a normal Python module with the import statement inside a function body as oppposed to at the module level.
Yes, this will affect performance.
You can work around this by caching your imports like this:
CREATE FUNCTION test() RETURNS text
LANGUAGE plpythonu
AS $$
if 'json' in SD:
json = SD['json']
else:
import json
SD['json'] = json
return json.dumps(...)
$$;
This is admittedly not very pretty, and better ways to do this are being discussed, but they won't happen before PostgreSQL 9.4.
The declaration in the body of a PL/Python function will eventually become an ordinary Python function and will thus behave as such. When a Python function imports a module for the first time the module is cached in the sys.modules dictionary (https://docs.python.org/3/reference/import.html#the-module-cache). Subsequent imports of the same module will simply bind the import name to the module object found in the dictionary. In a sense, what I'm saying may cast some doubt on the usefulness of the tip given in the accepted answer, since it makes it somewhat redundant, as Python already does a similar caching for you.
To sum things up, I'd say that if you import in the standard way of simply using the import or from [...] import constructs, then you need not worry about repeated imports, in functions or otherwise, Python has got you covered.
On the other hand, Python allows you to bypass its native import semantics and to implement your own (with the __import__() function and importlib module). If this is what you're doing, maybe you should review what's available in the toolbox (https://docs.python.org/3/reference/import.html).