Can I tell python multiprocessing.Process not to serialize something? - python

I essentially have a class like this:
class ConnectionClass:
def __init__(self, addr:str) -> None:
self.addr = addr
def connection(self):
if self._connection is None:
self._connection = magic_create_socket(self.addr)
return self._connection
def do_something(self):
self._connection.send_message("something")
I will be passing it via something like:
def do_processing(connection):
# this will run many times:
connection.do_something()
# There will be more processes or maybe a process pool. I want to avoid repeating myself
my_connection = ConnectionClass("some address")
my_proc = multiprocessing.Process(do_processing, my_connection)
Now clearly, each process should have its own connection sockets, file descriptors and so on. So while I want to pass any props that describe the connection, like addr in this simplified example, but I want the ConnectionClass._connection be None when it is copied to the other process, so it gets lazy initialized again.
I COULD make the connection description and the actual wrapper for the socket/fd separate, but it means extra classes, extra code to pass the description from one to another and so on.
Is it possible to use some annotation to tell Pythons multiprocessing library to ignore certain values when serializing the data for the other process?

Related

Python multiprocessing.Pool.apply_async() not executing class function

In a custom class I have the following code:
class CustomClass():
triggerQueue: multiprocessing.Queue
def __init__(self):
self.triggerQueue = multiprocessing.Queue()
def poolFunc(queueString):
print(queueString)
def listenerFunc(self):
pool = multiprocessing.Pool(5)
while True:
try:
queueString = self.triggerQueue.get_nowait()
pool.apply_async(func=self.poolFunc, args=(queueString,))
except queue.Empty:
break
What I intend to do is:
add a trigger to the queue (not implemented in this snippet) -> works as intended
run an endless loop within the listenerFunc that reads all triggers from the queue (if any are found) -> works as intended
pass trigger to poolFunc which is to be executed asynchronosly -> not working
It works as soon as I source my poolFun() outside of the class like
def poolFunc(queueString):
print(queueString)
class CustomClass():
[...]
But why is that so? Do I have to pass the self argument somehow? Is it impossible to perform it this way in general?
Thank you for any hint!
There are several problems going on here.
Your instance method, poolFunc, is missing a self parameter.
You are never properly terminating the Pool. You should take advantage of the fact that a multiprocessing.Pool object is a context manager.
You're calling apply_async, but you're never waiting for the results. Read the documentation: you need to call the get method on the AsyncResult object to receive the result; if you don't do this before your program exits your poolFunc function may never run.
By making the Queue object part of your class, you won't be able to pass instance methods to workers.
We can fix all of the above like this:
import multiprocessing
import queue
triggerQueue = multiprocessing.Queue()
class CustomClass:
def poolFunc(self, queueString):
print(queueString)
def listenerFunc(self):
results = []
with multiprocessing.Pool(5) as pool:
while True:
try:
queueString = triggerQueue.get_nowait()
results.append(pool.apply_async(self.poolFunc, (queueString,)))
except queue.Empty:
break
for res in results:
print(res.get())
c = CustomClass()
for i in range(10):
triggerQueue.put(f"testval{i}")
c.listenerFunc()
You can, as you mention, also replace your instance method with a static method, in which case we can keep triggerQueue as part of the class:
import multiprocessing
import queue
class CustomClass:
def __init__(self):
self.triggerQueue = multiprocessing.Queue()
#staticmethod
def poolFunc(queueString):
print(queueString)
def listenerFunc(self):
results = []
with multiprocessing.Pool(5) as pool:
while True:
try:
queueString = self.triggerQueue.get_nowait()
results.append(pool.apply_async(self.poolFunc, (queueString,)))
except queue.Empty:
break
for r in results:
print(r.get())
c = CustomClass()
for i in range(10):
c.triggerQueue.put(f"testval{i}")
c.listenerFunc()
But we still need to reap the pool_async results.
Okay, I found an answer and a workaround:
the answer is based the anser of noxdafox to this question.
Instance methods cannot be serialized that easily. What the Pickle protocol does when serialising a function is simply turning it into a string.
For a child process would be quite hard to find the right object your instance method is referring to due to separate process address spaces.
A functioning workaround is to declare the poolFunc() as static function like
#staticmethod
def poolFunc(queueString):
print(queueString)

Non-lazy instance creation with Pyro4 and instance_mode='single'

My aim is to provide to a web framework access to a Pyro daemon that has time-consuming tasks at the first loading. So far, I have managed to keep in memory (outside of the web app) a single instance of a class that takes care of the time-consuming loading at its initialization. I can also query it with my web app. The code for the daemon is:
Pyro4.expose
#Pyro4.behavior(instance_mode='single')
class Store(object):
def __init__(self):
self._store = ... # the expensive loading
def query_store(self, query):
return ... # Useful query tool to expose to the web framework.
# Not time consuming, provided self._store is
# loaded.
with Pyro4.Daemon() as daemon:
uri = daemon.register(Thing)
with Pyro4.locateNS() as ns:
ns.register('thing', uri)
daemon.requestLoop()
The issue I am having is that although a single instance is created, it is only created at the first proxy query from the web app. This is normal behavior according to the doc, but not what I want, as the first query is still slow because of the initialization of Thing.
How can I make sure the instance is already created as soon as the daemon is started?
I was thinking of creating a proxy instance of Thing in the code of the daemon, but this is tricky because the event loop must be running.
EDIT
It turns out that daemon.register() can accept either a class or an object, which could be a solution. This is however not recommended in the doc (link above) and that feature apparently only exists for backwards compatibility.
Do whatever initialization you need outside of your Pyro code. Cache it somewhere. Use the instance_creator parameter of the #behavior decorator for maximum control over how and when an instance is created. You can even consider pre-creating server instances yourself and retrieving one from a pool if you so desire? Anyway, one possible way to do this is like so:
import Pyro4
def slow_initialization():
print("initializing stuff...")
import time
time.sleep(4)
print("stuff is initialized!")
return {"initialized stuff": 42}
cached_initialized_stuff = slow_initialization()
def instance_creator(cls):
print("(Pyro is asking for a server instance! Creating one!)")
return cls(cached_initialized_stuff)
#Pyro4.behavior(instance_mode="percall", instance_creator=instance_creator)
class Server:
def __init__(self, init_stuff):
self.init_stuff = init_stuff
#Pyro4.expose
def work(self):
print("server: init stuff is:", self.init_stuff)
return self.init_stuff
Pyro4.Daemon.serveSimple({
Server: "test.server"
})
But this complexity is not needed for your scenario, just initialize the thing (that takes a long time) and cache it somewhere. Instead of re-initializing it every time a new server object is created, just refer to the cached pre-initialized result. Something like this;
import Pyro4
def slow_initialization():
print("initializing stuff...")
import time
time.sleep(4)
print("stuff is initialized!")
return {"initialized stuff": 42}
cached_initialized_stuff = slow_initialization()
#Pyro4.behavior(instance_mode="percall")
class Server:
def __init__(self):
self.init_stuff = cached_initialized_stuff
#Pyro4.expose
def work(self):
print("server: init stuff is:", self.init_stuff)
return self.init_stuff
Pyro4.Daemon.serveSimple({
Server: "test.server"
})

Keeping context-manager object alive through function calls

I am running into a bit of an issue with keeping a context manager open through function calls. Here is what I mean:
There is a context-manager defined in a module which I use to open SSH connections to network devices. The "setup" code handles opening the SSH sessions and handling any issues, and the teardown code deals with gracefully closing the SSH session. I normally use it as follows:
from manager import manager
def do_stuff(device):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
return processed_output
In order to keep the SSH session open and not have to re-establish it across function calls, I would like to do add an argument to "do_stuff" which can optionally return the SSH session along with the data returned from the SSH session, as follows:
def do_stuff(device, return_handle=False):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
if return_handle:
return (processed_output, conn)
else:
return processed_output
I would like to be able to call this function "do_stuff" from another function, as follows, such that it signals to "do_stuff" that the SSH handle should be returned along with the output.
def do_more_stuff(device):
data, conn = do_stuff(device, return_handle=True)
output = conn.send_command("show users")
#process output...
return processed_output
However the issue that I am running into is that the SSH session is closed, due to the do_stuff function "returning" and triggering the teardown code in the context-manager (which gracefully closes the SSH session).
I have tried converting "do_stuff" into a generator, such that its state is suspended and perhaps causing the context-manager to stay open:
def do_stuff(device, return_handle=False):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
if return_handle:
yield (processed_output, conn)
else:
yield processed_output
And calling it as such:
def do_more_stuff(device):
gen = do_stuff(device, return_handle=True)
data, conn = next(gen)
output = conn.send_command("show users")
#process output...
return processed_output
However this approach does not seem to be working in my case, as the context-manager gets closed, and I get back a closed socket.
Is there a better way to approach this problem? Maybe my generator needs some more work...I think using a generator to hold state is the most "obvious" way that comes to mind, but overall should I be looking into another way of keeping the session open across function calls?
Thanks
I found this question because I was looking for a solution to an analogous problem where the object I wanted to keep alive was a pyvirtualdisplay.display.Display instance with selenium.webdriver.Firefox instances in it.
I also wanted any opened resources to die if an exception were raised during the display/browser instance creations.
I imagine the same could be applied to your database connection.
I recognize this probably only a partial solution and contains less-than-best practices. Help is appreciated.
This answer is the result of an ad lib spike using the following resources to patch together my solution:
https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator
http://www.wefearchange.org/2013/05/resource-management-in-python-33-or.html
(I do not yet fully grok what is described here though I appreciate the potential. The second link above eventually proved to be the most helpful by providing analogous situations.)
from pyvirtualdisplay.display import Display
from selenium.webdriver import Firefox
from contextlib import contextmanager, ExitStack
RFBPORT = 5904
def acquire_desktop_display(rfbport=RFBPORT):
display_kwargs = {'backend': 'xvnc', 'rfbport': rfbport}
display = Display(**display_kwargs)
return display
def release_desktop_display(self):
print("Stopping the display.")
# browsers apparently die with the display so no need to call quits on them
self.display.stop()
def check_desktop_display_ok(desktop_display):
print("Some checking going on here.")
return True
class XvncDesktopManager:
max_browser_count = 1
def __init__(self, check_desktop_display_ok=None, **kwargs):
self.rfbport = kwargs.get('rfbport', RFBPORT)
self.acquire_desktop_display = acquire_desktop_display
self.release_desktop_display = release_desktop_display
self.check_desktop_display_ok = check_desktop_display_ok \
if check_desktop_display_ok is None else check_desktop_display_ok
#contextmanager
def _cleanup_on_error(self):
with ExitStack() as stack:
"""push adds a context manager’s __exit__() method
to stack's callback stack."""
stack.push(self)
yield
# The validation check passed and didn't raise an exception
# Accordingly, we want to keep the resource, and pass it
# back to our caller
stack.pop_all()
def __enter__(self):
url = 'http://stackoverflow.com/questions/30905121/'\
'keeping-context-manager-object-alive-through-function-calls'
self.display = self.acquire_desktop_display(self.rfbport)
with ExitStack() as stack:
# add XvncDesktopManager instance's exit method to callback stack
stack.push(self)
self.display.start()
self.browser_resources = [
Firefox() for x in range(self.max_browser_count)
]
for browser_resource in self.browser_resources:
for url in (url, ):
browser_resource.get(url)
"""This is the last bit of magic.
ExitStacks have a .close() method which unwinds
all the registered context managers and callbacks
and invokes their exit functionality."""
# capture the function that calls all the exits
# will be called later outside the context in which it was captured
self.close_all = stack.pop_all().close
# if something fails in this context in enter, cleanup
with self._cleanup_on_error() as stack:
if not self.check_desktop_display_ok(self):
msg = "Failed validation for {!r}"
raise RuntimeError(msg.format(self.display))
# self is assigned to variable after "as",
# manually call close_all to unwind callback stack
return self
def __exit__(self, *exc_details):
# had to comment this out, unable to add this to callback stack
# self.release_desktop_display(self)
pass
I had a semi-expected result with the following:
kwargs = {
'rfbport': 5904,
}
_desktop_manager = XvncDesktopManager(check_desktop_display_ok=check_desktop_display_ok, **kwargs)
with ExitStack() as stack:
# context entered and what is inside the __enter__ method is executed
# desktop_manager will have an attribute "close_all" that can be called explicitly to unwind the callback stack
desktop_manager = stack.enter_context(_desktop_manager)
# I was able to manipulate the browsers inside of the display
# and outside of the context
# before calling desktop_manager.close_all()
browser, = desktop_manager.browser_resources
browser.get(url)
# close everything down when finished with resource
desktop_manager.close_all() # does nothing, not in callback stack
# this functioned as expected
desktop_manager.release_desktop_display(desktop_manager)

Logging SMTP connections with Twisted

Python newbie here. I'm writing an SMTP server using Twisted and twisted.mail.smtp. I'd like to log incoming connections and possibly dump them when there are too many concurrent connections. Basically, I want ConsoleMessageDelivery.connectionMade() method to be called in the following, when a new connection is made:
class ConsoleMessageDelivery:
implements(smtp.IMessageDelivery)
def connectionMade(self):
# This never gets called
def receivedHeader(self, helo, origin, recipients):
myHostname, clientIP = helo
headerValue = "by %s from %s with ESMTP ; %s" % (myHostname, clientIP, smtp.rfc822date())
# email.Header.Header used for automatic wrapping of long lines
return "Received: %s" % Header(headerValue)
def validateFrom(self, helo, origin):
# All addresses are accepted
return origin
def validateTo(self, user):
if user.dest.local == "console":
return lambda: ConsoleMessage()
raise smtp.SMTPBadRcpt(user)
class ConsoleMessage:
implements(smtp.IMessage)
def __init__(self):
self.lines = []
def lineReceived(self, line):
self.lines.append(line)
def eomReceived(self):
return defer.succeed(None)
def connectionLost(self):
# There was an error, throw away the stored lines
self.lines = None
class ConsoleSMTPFactory(smtp.SMTPFactory):
protocol = smtp.ESMTP
def __init__(self, *a, **kw):
smtp.SMTPFactory.__init__(self, *a, **kw)
self.delivery = ConsoleMessageDelivery()
def buildProtocol(self, addr):
p = smtp.SMTPFactory.buildProtocol(self, addr)
p.delivery = self.delivery
return p
connectionMade is part of twisted.internet.interfaces.IProtocol, not part of twisted.mail.smtp.IMessageDelivery. There's no code anywhere in the mail server implementation that cares about a connectionMade method on a message delivery implementation.
A better place to put per connection logic is in the factory. And specifically, a good way to approach this is with a factory wrapper, to isolate the logic about connection limits and logging from the logic about servicing SMTP connections.
Twisted comes with a few factory wrappers. A couple in particular that might be interesting to you are twisted.protocols.policies.LimitConnectionsByPeer and twisted.protocols.policies.LimitTotalConnectionsFactory.
Unfortunately, I don't know of any documentation explaining twisted.protocols.policies. Fortunately, it's not too complicated. Most of the factories in the module wrap another arbitrary factory to add some piece of behavior. So, for example, to use LimitConnectionsByPeer, you do something like this:
from twisted.protocols.policies import LimitConnectionsByPeer
...
factory = ConsoleSMTPFactory()
wrapper = LimitConnectionsByPeer(ConsoleSMTPFactory(...))
reactor.listenTCP(465, wrapper)
This is all that's needed to get LimitConnectionsByPeer to do its job.
There's only a little bit more complexity involved in writing your own wrapper. First, subclass WrappingFactory. Then implement whichever methods you're interested in customizing. In your case, if you want to reject connections from a certain IP, that would mean overriding buildProtocol. Then, unless you also want to customize the protocol that is constructed (which you don't in this case), call the base implementation and return its result. For example:
from twisted.protocols.policies import WrappingFactory
class DenyFactory(WrappingFactory):
def buildProtocol(self, clientAddress):
if clientAddress.host == '1.3.3.7':
# Reject it
return None
# Accept everything else
return WrappingFactory.buildProtocol(self, clientAddress)
These wrappers stack, so you can combine them as well:
from twisted.protocols.policies import LimitConnectionsByPeer
...
factory = ConsoleSMTPFactory()
wrapper = LimitConnectionsByPeer(DenyFactory(ConsoleSMTPFactory(...)))
reactor.listenTCP(465, wrapper)

Python use a virtual class to apply a generic "pipe" pattern

I am trying to find out if it would be possible to take the following code, and use the magic of python to simplify code.
Right now I have a command interface that sits on top of a bunch of python sub processes. When I need to communicate with the sub process's I pipe commands to them. Basically it comes down to a string command, and a dictionary of arguments.
Here is the pattern that gets repeated (I showed 1 for simplicitys sake but in reality this is repeated 7 times for different processes)
Create the processes:
class MasterProcess(object):
def __init__(self):
self.stop = multiprocessing.Event()
(self.event_generator_remote, self.event_generator_in)
= multiprocessing.Pipe(duplex=True)
self.event_generator= Process(target=self.create_event_generator,
kwargs={'in': self.event_generator_remote}
)
self.event_generator.start()
def create_event_generator(self, **kwargs):
eg= EventGenerator()
in_pipe = kwargs['in']
while(not self.stop.is_set()):
self.stop.wait(1)
if(in_pipe.poll()):
msg = in_pipe.recv()
cmd = msg[0]
args = msg[1]
if cmd =='create_something':
in_pipe.send(eg.create(args))
else:
raise NotImplementedException(cmd)
And then on the command interface is just pumping commands to the process:
mp.MasterProcess()
pipe = mp.event_generator_remote
>>cmd: create_something args
#i process the above and then do something like the below
cmd = "create_something"
args = {
#bah
}
pipe.send([command, args])
attempt = 0
while(not pipe.poll()):
time.sleep(1)
attempt +=1
if attempt > 20:
return None
return pipe.recv()
What I want to move to is more of a remote facade type deal where the client just calls a method like it would normally, and I translate that call to the above.
For example the new command would look like:
mp.MasterProcess()
mp_proxy = MasterProcessProxy(mp.event_generator_remote)
mp_proxy.create_something(args)
So my virtual class would be MasterProcessProxy, there are really no methods behind the scenes somehow take the method name, and provided args and pipe them to the process?
Does that make sense? Would it be possible to do the same on the other side? Just assume whatever comes down the pipe will be in the form cmd , args where cmd is a local method? and just do a self.() ?
As I am typing this up I understand it is probably confusing, so please let me know what needs clarification.
Thanks.
You can use __getattr__ to create proxy methods for your stub class:
class MasterProcessProxy(object):
def __init__(self, pipe):
self.pipe = pipe
# This is called when an attribute is requested on the object.
def __getattr__(self, name):
# Create a dynamic function that sends a command through the pipe
# Keyword arguments are sent as command arguments.
def proxy(**kwargs):
self.pipe.send([name, kwargs])
return proxy
Now you can use it as you wanted:
mp.MasterProcess()
mp_proxy = MasterProcessProxy(mp.event_generator_remote)
mp_proxy.create_something(spam="eggs", bacon="baked beans")
# Will call pipe.send(["create_something", {"spam":"eggs", "bacon":"baked beans"}])
You might want to check out the twisted framework. This won't beat figuring out how to do it yourself, but will make writing this style of application a lot easier.

Categories