Good day to you,
Today I was moving code from threading to multiprocess. Everything seemed okay, until I got The following error:
Error
Traceback (most recent call last):
File "run.py", line 93, in <module>
main()
File "run.py", line 82, in main
emenu.executemenu(components, _path)
File "/home/s1810979/paellego/lib/execute/execute_menu.py", line 29, in executemenu
e.executeall(installed, _path)
File "/home/s1810979/paellego/lib/execute/execute.py", line 153, in executeall
pool.starmap(phase2, args)
File "/usr/lib64/python3.4/multiprocessing/pool.py", line 268, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib64/python3.4/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib64/python3.4/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/lib64/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib64/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'module'>: attribute lookup module on builtins failed
Code
execute.py
def executeall(components, _path):
args = []
manager = multiprocessing.Manager()
q = manager.Queue()
resultloc = '/some/result.log'
for component in components:
for apkpath, resultpath in zip(execonfig.apkpaths, execonfig.resultpaths):
args.append((component,apkpath,resultpath,q,)) #Args for subprocesses
cores = askcores()
with multiprocessing.Pool(processes=cores) as pool:
watcher = pool.apply_async(lgr.log, (resultloc+'/results.txt', q,))
pool.starmap(phase2, args)
component.py
class Component(object):
def __init__(self, installmodule, runmodule, installerloc, installationloc, dependencyloc):
self.installmodule = installmodule
self.runmodule = runmodule
self.installerloc = installerloc
self.installationloc = installationloc
self.dependencyloc = dependencyloc
self.config = icnf.Installconfiguration(installerloc+'/conf.conf')
#lots of functions...
installconfig.py
class State(Enum):
BEGIN=0 #Look for units
UNIT=1 #Look for unit keypairs
KEYPAIR=3
class Phase(Enum):
NONE=0
DEPS=1
PKGS=2
class Installconfiguration(object):
def __init__(self, config):
dictionary = self.reader(config) #Fill a dictionary
#dictionary (key:Phase, value: (dictionary key: str, job))
self.deps = dictionary[Phase.DEPS]
self.pkgs = dictionary[Phase.PKGS]
job.py
class Job(object):
def __init__(self, directory=None, url=None):
self.directory = directory if directory else ''
self.url = url if url else ''
As you can see, I pass a component as argument to function phase2(component, str, str, multiprocess.manager.Queue()).
The second and third argument of the constructor of component are modules imported with importlib.
What I tried
I am new to python, but not to programming. Here is what I tried:
Because the error itself did not point out what the problem was exactly, I tried removing args to find out which can't be pickled: Remove component, and everything works fine, so this appears to be the cause for trouble. However, I need this object passed to my processes.
I searched around the internet for hours, but did not find anything but basic tutorials about multiprocessing, and explanations about how pickle works. I did find this saying it should work, but not on windows or something. However, it does not work on Unix (which I use)
My ideas
As I understood it, nothing suggests I cannot send a class containing two importlib modules. I do not know what the exact problem is with component class, but importlib module as members are the only non-regular things. This is why I believe the problem occurs here.
Question
Do you know why a class containing modules is unsuitable for 'pickling'? How can one get a better idea why and where Can't pickle <class 'module'> errors occur?
More code
Full source code for this can be found on https://github.com/Sebastiaan-Alvarez-Rodriguez/paellego
Questions to me
Please leave comments requesting clarifications/more code snippets/??? if you would like me to edit this question
A last request
I would like solutions to use python standard library only, python 3.3 preferably. Also, a requirement of my code is that it runs on Unix systems.
Thanks in advance
Edit
As requested, here is a minimal example which greatly simplifies the problem:
main.py (you could execute as python main.py foo)
#!/usr/bin/env python
import sys
import importlib
import multiprocessing
class clazz(object):
def __init__(self, moduly):
self.moduly = moduly
def foopass(self, stringy):
self.moduly.foo(stringy)
def barpass(self, stringy, numbery):
self.moduly.bar(stringy)
print('Second argument: '+str(numbery))
def worker(clazzy, numbery):
clazzy.barpass('wow', numbery)
def main():
clazzy = clazz(importlib.import_module(sys.argv[1]))
clazzy.foopass('init')
args = [(clazzy, 2,)]
with multiprocessing.Pool(processes=2) as pool:
pool.starmap(worker, args)
if __name__ == "__main__":
main()
foo.py (needs to be in same directory for above call suggestion):
#!/usr/bin/env python
globaly = 0
def foo(stringy):
print('foo '+stringy)
global globaly
globaly = 5
def bar(stringy):
print('bar '+stringy)
print(str(globaly))
This gives error upon running: TypeError: can't pickle module objects
Now we know that pickling module objects is (sadly) not possible.
In order to get rid of the error, let clazz not take a module as attribute, however convenient, but let it take "modpath", which is the required string for importlib to import the module specified by user.
It looks like this (foo.py remains exactly the same as above):
#!/usr/bin/env python
import sys
import importlib
import multiprocessing
class clazz(object):
def __init__(self, modpathy):
self.modpathy = modpathy
def foopass(self, stringy):
moduly = importlib.import_module(self.modpathy)
moduly.foo(stringy)
def barpass(self, stringy, numbery):
moduly = importlib.import_module(self.modpathy)
moduly.bar(stringy)
print('Second argument: '+str(numbery))
def worker(clazzy, number):
clazzy.barpass('wow', number)
def main():
clazzy = clazz(sys.argv[1])
clazzy.foopass('init')
args = [(clazzy, 2,)]
with multiprocessing.Pool(processes=2) as pool:
pool.starmap(worker, args)
if __name__ == "__main__":
main()
If you require that your globals, such as globaly, are guaranteed to maintain state, then you need to pass a mutable object (e.g. list, dictionary) to hold this data, thanks #DavisHerring:
Module attributes are called “global variables” in Python, but they are no more persistent or accessible than any other data. Why not just use dictionaries?
The example code would look like this:
#!/usr/bin/env python
import sys
import importlib
import multiprocessing
class clazz(object):
def __init__(self, modpathy):
self.modpathy = modpathy
self.dictionary = {}
def foopass(self, stringy):
moduly = importlib.import_module(self.modpathy)
moduly.foo(stringy, self.dictionary)
def barpass(self, stringy, numbery):
moduly = importlib.import_module(self.modpathy)
moduly.bar(stringy, self.dictionary)
print('Second argument: '+str(numbery))
def worker(clazzy, number):
clazzy.barpass('wow', number)
def main():
clazzy = clazz(sys.argv[1])
clazzy.foopass('init')
args = [(clazzy, 2,)]
with multiprocessing.Pool(processes=2) as pool:
pool.starmap(worker, args)
if __name__ == "__main__":
main()
foo.py (no more globals):
#!/usr/bin/env python
def foo(stringy, dictionary):
print('foo '+stringy)
globaly = 5
dictionary['globaly'] = globaly
def bar(stringy, dictionary):
print('bar '+stringy)
globaly = dictionary['globaly']
print(str(globaly))
This way you can work around the problem without annoying can't pickle ... errors, and while maintaining states
I'm trying to declare a new tool to the CherryPy toolbox following the examples from the docs: Docs CherryPy Tools.
According to the examples I have written:
import cherrypy
def myTool():
print ("myTool")
class Root(object):
#cherrypy.expose
#cherrypy.tools.mytool()
def index(self):
return "Hello World!"
if __name__ == '__main__':
cherrypy.tools.mytool = cherrypy.Tool('before_finalize', myTool)
cherrypy.quickstart(Root(), '/')
This results in the following error:
Traceback (most recent call last):
File "server.py", line 6, in <module>
class Root(object):
File "server.py", line 8, in Root
#cherrypy.tools.mytool()
AttributeError: 'Toolbox' object has no attribute 'mytool'
However if I change the notation to the following it works as expected.
import cherrypy
def myTool():
print ("myTool")
class Root(object):
#cherrypy.expose
def index(self):
return "Hello World!"
index._cp_config = {'tools.mytool.on': True}
if __name__ == '__main__':
cherrypy.tools.mytool = cherrypy.Tool('before_finalize', myTool)
cherrypy.quickstart(Root(), '/')
The docs says that both methods have the same effect, but not in my case. If anyone knows what I'm doing wrong I'll be very grateful.
The tool should not be defined globally, hence the #cherrypy.tools.mytool() notation.
I'm using python 3.6.
The problem is the misunderstanding of the evaluation order of python (top-down), at the time the class is defined the tool has not been defined.
You can define the tool in another file import at the top (before the class definition) and it should work.
The second form works, because the configuration is done indirectly using strings in the config, not the real tool objects.
I am following the flaskr tutorial. When I run python flaskr.py, I get this error:
Traceback (most recent call last):
File "flaskr.py", line 26, in <module>
#app.before_request()
File "/Users/myname/anaconda/lib/python2.7/site-packages/flask/app.py", line 62, in wrapper_func
return f(self, *args, **kwargs)
TypeError: before_request() takes exactly 2 arguments (1 given)
However, on step 4 it specifically says that before_request() takes no arguments. I have carefully followed all the instructions. Why am I getting this error?
import sqlite3
from flask import Flask, g
DATABASE = '/tmp/flaskr.db'
app = Flask(__name__)
app.config.from_object(__name__)
def connect_db():
return sqlite3.connect(app.config['DATABASE'])
#app.before_request()
def before_request():
g.db = connect_db()
#app.teardown_request()
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
if __name__ == '__main__':
app.run()
before_request is a decorator. Rather than calling it, you apply it directly to the decorated function.
#app.before_request
def my_before_request_function():
pass
teardown_request behaves the same way, so you need to change that too or you'll get the same error.
If you go back to the tutorial and look carefully at the code, you will notice that they do not call the decorators directly.
Decorators are called with the decorated function as the first (and only) argument. Another pattern of decorators is the "decorator factory", where a function does take arguments, producing the actual decorator (which just takes the implicit decorated function argument). Since before_request is not a factory, the docs just say it takes no arguments.
Trying to mock out calls to pyazure library for django testing, but I can't figure out how to mock out the PyAzure class constructor so that it doesn't cause a TypeError. Is there a better way to approach mocking out an access library that generates a connection object?
Anything I've tried other than None generates a TypeError, which means I can't really even begin to test any of the PyAzure connection methods with actual return values. What is the best way to replace a working class with a fake class using mock?
Test Error:
======================================================================
ERROR: test_management_certificate_connect (azure_cloud.tests.ViewsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bschott/Source/django-nimbis/apps/azure_cloud/tests.py", line 107, in test_management_certificate_connect
self.cert1.connect()
File "/Users/bschott/Source/django-nimbis/apps/azure_cloud/models.py", line 242, in connect
subscription_id=self.subscription.subscription_id)
TypeError: __init__() should return None, not 'FakeAzure'
----------------------------------------------------------------------
tests.py:
class ViewsTest(TestCase):
def setUp(self):
...
self.cert1 = ManagementCertificate.objects.create(
name="cert1",
subscription=self.subscription1,
management_cert=File(open(__file__), "cert1.pem"),
owner=self.user1)
...
class FakeAzure(object):
""" testing class for azure """
def list_services(self):
return ['service1', 'service2', 'service3']
def list_storages(self):
return ['storage1', 'storage2', 'storage3']
#mock.patch.object(pyazure.PyAzure, '__init__')
def test_management_certificate_connect(self, mock_pyazure_init):
mock_pyazure_init.return_value = self.FakeAzure()
self.cert1.connect()
assert mock_pyazure_init.called
models.py
class ManagementCertificate(models.Model):
# support connection caching to azure
_cached_connection = None
def connect(self):
"""
Connect to the management interface using these credentials.
"""
if not self._cached_connection:
self._cached_connection = pyazure.PyAzure(
management_cert_path=self.management_cert.path,
subscription_id=self.subscription.subscription_id)
logging.debug(self._cached_connection)
return self._cached_connection
You seem to have a misconception about what __init__() does. Its purpose is to initialise an instance that was already created earlier. The first argument to __init__() is self, which is the instance, so you can see it was already allocated when __init__() is called.
There is a method __new__() that is called before __init__() to create the actual instance. I think it would be much easier, though, to replace the whole class by a mock class, instead of mocking single methods.
$ py twitterDump2.py
Traceback (most recent call last):
File "twitterDump2.py", line 30, in <module>
stream=tweepy.Stream(username,password,listener)
TypeError: __init__() takes exactly 3 arguments (4 given)
My code:
username="abc"
password="abc"
listener = StreamWatcherListener()
stream=tweepy.Stream(username,password,listener)
The first argument to __init__ is usually self so it is expecting you to pass only two arguments.
Surprising the tweepy.streaming.py code suggests:
class Stream(object):
host = 'stream.twitter.com'
def __init__(self, auth, listener, **options):
self.auth = auth
self.listener = listener
The auth is created this way:
auth = tweepy.BasicAuthHandler(username, password)
Your code should be something like this
username="abc"
password="abc"
listener = StreamWatcherListener()
auth = tweepy.BasicAuthHandler(username, password)
stream=tweepy.Stream(auth,listener)
See the code at : http://github.com/joshthecoder/tweepy/blob/master/tweepy/streaming.py
pyfunc has given the reasons why this is not working.
To see what arguments, type:
help(tweepy.Stream)
This will give you what arguments the Stream class requires.
This is for your reference:
def __init__(self, auth, listener, **options)
options takes a dictionary that delivers keywords arguments with the ** operator.