Persistent List Python Implementation

Persistent List Python Implementation - python

I have class that sends logs to an exposed API. Now what I want to happen is to save/persists failed logs in a list so that it can be resent to the server again.
This is what I have so far.
You may notice that I have declared a class variable, and I have read that this is not really advisable.
My concern is, is there a better way to persist lists or queues?
from collections import *
import time
import threading
import requests
import json
URL_POST = "http:/some/url/here"
class LogManager:
listQueue = deque()
def post(self, log):
headers = {"Content-type": "application/json",
"Accept": "text/plain", "User-Agent": "Test user agent"}
resp = requests.post(URL_POST, data=json.dumps(log), headers=headers)
return resp.status_code
def send_log(self, log):
try:
print ("Sending log to backend")
self.post(log)
except: # sending to server fails for some reason
print ("Sending logs to server fail, appending to Queue")
LogManager.listQueue.append(log)
def resend_log(self, log):
print ("checking if deque has values")
if LogManager.listQueue:
logToPost = LogManager.listQueue.popleft()
try:
self.post(logToPost)
print ("resending failed logs")
except: #for some reason it fails again
LogManager.listQueue.appendleft(logToPost)
print ("appending log back to deque")
def run(self,log):
t1 = threading.Thread(target=self.send_log, args=(log,))
t2 = threading.Thread(target=self.resend_log,args=(log,))
t1.start()
time.sleep(2)
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
while True:
logs = LogManager()
logs.run({"some log": "test logs"})

Since the only variable LogManager stores is a persistent one, there doesn't seem to be any need to re-instantiate the class each time. I would probably move the logs = LogManager() line outside of the while-loop and change list_queue to an instance variable, self.list_queue that is created in the class' __init__ method. This way you'll have one LogManager with one queue that just calls its run method each loop.
That said, if you're keeping it inside the loop because your class will have further functionality that requires a new instance each loop, then using a class variable to track a list between instances is exactly what class variables are for. They're not inadvisable at all; the example given in the documentation you link to is bad because they're using a class variable for data that should be instance-specific.

Related

Python multiprocessing.Pool.apply_async() not executing class function

In a custom class I have the following code:
class CustomClass():
triggerQueue: multiprocessing.Queue
def __init__(self):
self.triggerQueue = multiprocessing.Queue()
def poolFunc(queueString):
print(queueString)
def listenerFunc(self):
pool = multiprocessing.Pool(5)
while True:
try:
queueString = self.triggerQueue.get_nowait()
pool.apply_async(func=self.poolFunc, args=(queueString,))
except queue.Empty:
break
What I intend to do is:
add a trigger to the queue (not implemented in this snippet) -> works as intended
run an endless loop within the listenerFunc that reads all triggers from the queue (if any are found) -> works as intended
pass trigger to poolFunc which is to be executed asynchronosly -> not working
It works as soon as I source my poolFun() outside of the class like
def poolFunc(queueString):
print(queueString)
class CustomClass():
[...]
But why is that so? Do I have to pass the self argument somehow? Is it impossible to perform it this way in general?
Thank you for any hint!

There are several problems going on here.
Your instance method, poolFunc, is missing a self parameter.
You are never properly terminating the Pool. You should take advantage of the fact that a multiprocessing.Pool object is a context manager.
You're calling apply_async, but you're never waiting for the results. Read the documentation: you need to call the get method on the AsyncResult object to receive the result; if you don't do this before your program exits your poolFunc function may never run.
By making the Queue object part of your class, you won't be able to pass instance methods to workers.
We can fix all of the above like this:
import multiprocessing
import queue
triggerQueue = multiprocessing.Queue()
class CustomClass:
def poolFunc(self, queueString):
print(queueString)
def listenerFunc(self):
results = []
with multiprocessing.Pool(5) as pool:
while True:
try:
queueString = triggerQueue.get_nowait()
results.append(pool.apply_async(self.poolFunc, (queueString,)))
except queue.Empty:
break
for res in results:
print(res.get())
c = CustomClass()
for i in range(10):
triggerQueue.put(f"testval{i}")
c.listenerFunc()
You can, as you mention, also replace your instance method with a static method, in which case we can keep triggerQueue as part of the class:
import multiprocessing
import queue
class CustomClass:
def __init__(self):
self.triggerQueue = multiprocessing.Queue()
#staticmethod
def poolFunc(queueString):
print(queueString)
def listenerFunc(self):
results = []
with multiprocessing.Pool(5) as pool:
while True:
try:
queueString = self.triggerQueue.get_nowait()
results.append(pool.apply_async(self.poolFunc, (queueString,)))
except queue.Empty:
break
for r in results:
print(r.get())
c = CustomClass()
for i in range(10):
c.triggerQueue.put(f"testval{i}")
c.listenerFunc()
But we still need to reap the pool_async results.

Okay, I found an answer and a workaround:
the answer is based the anser of noxdafox to this question.
Instance methods cannot be serialized that easily. What the Pickle protocol does when serialising a function is simply turning it into a string.
For a child process would be quite hard to find the right object your instance method is referring to due to separate process address spaces.
A functioning workaround is to declare the poolFunc() as static function like
#staticmethod
def poolFunc(queueString):
print(queueString)

Non-lazy instance creation with Pyro4 and instance_mode='single'

My aim is to provide to a web framework access to a Pyro daemon that has time-consuming tasks at the first loading. So far, I have managed to keep in memory (outside of the web app) a single instance of a class that takes care of the time-consuming loading at its initialization. I can also query it with my web app. The code for the daemon is:
Pyro4.expose
#Pyro4.behavior(instance_mode='single')
class Store(object):
def __init__(self):
self._store = ... # the expensive loading
def query_store(self, query):
return ... # Useful query tool to expose to the web framework.
# Not time consuming, provided self._store is
# loaded.
with Pyro4.Daemon() as daemon:
uri = daemon.register(Thing)
with Pyro4.locateNS() as ns:
ns.register('thing', uri)
daemon.requestLoop()
The issue I am having is that although a single instance is created, it is only created at the first proxy query from the web app. This is normal behavior according to the doc, but not what I want, as the first query is still slow because of the initialization of Thing.
How can I make sure the instance is already created as soon as the daemon is started?
I was thinking of creating a proxy instance of Thing in the code of the daemon, but this is tricky because the event loop must be running.
EDIT
It turns out that daemon.register() can accept either a class or an object, which could be a solution. This is however not recommended in the doc (link above) and that feature apparently only exists for backwards compatibility.

Do whatever initialization you need outside of your Pyro code. Cache it somewhere. Use the instance_creator parameter of the #behavior decorator for maximum control over how and when an instance is created. You can even consider pre-creating server instances yourself and retrieving one from a pool if you so desire? Anyway, one possible way to do this is like so:
import Pyro4
def slow_initialization():
print("initializing stuff...")
import time
time.sleep(4)
print("stuff is initialized!")
return {"initialized stuff": 42}
cached_initialized_stuff = slow_initialization()
def instance_creator(cls):
print("(Pyro is asking for a server instance! Creating one!)")
return cls(cached_initialized_stuff)
#Pyro4.behavior(instance_mode="percall", instance_creator=instance_creator)
class Server:
def __init__(self, init_stuff):
self.init_stuff = init_stuff
#Pyro4.expose
def work(self):
print("server: init stuff is:", self.init_stuff)
return self.init_stuff
Pyro4.Daemon.serveSimple({
Server: "test.server"
})
But this complexity is not needed for your scenario, just initialize the thing (that takes a long time) and cache it somewhere. Instead of re-initializing it every time a new server object is created, just refer to the cached pre-initialized result. Something like this;
import Pyro4
def slow_initialization():
print("initializing stuff...")
import time
time.sleep(4)
print("stuff is initialized!")
return {"initialized stuff": 42}
cached_initialized_stuff = slow_initialization()
#Pyro4.behavior(instance_mode="percall")
class Server:
def __init__(self):
self.init_stuff = cached_initialized_stuff
#Pyro4.expose
def work(self):
print("server: init stuff is:", self.init_stuff)
return self.init_stuff
Pyro4.Daemon.serveSimple({
Server: "test.server"
})

Python Tornado Async Fetching of URLs

In the following code example I have a function do_async_thing which appears to return a Future, even though I'm not sure why?
import tornado.ioloop
import tornado.web
import tornado.httpclient
#tornado.gen.coroutine
def do_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
response = yield http.fetch("http://www.google.com/")
return response.body
class MainHandler(tornado.web.RequestHandler):
def get(self):
x = do_async_thing()
print(x) # <tornado.concurrent.Future object at 0x10753a6a0>
self.set_header("Content-Type", "application/json")
self.write('{"foo":"bar"}')
self.finish()
if __name__ == "__main__":
app = tornado.web.Application([
(r"/foo/?", MainHandler),
])
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
You'll see that I yield the call to fetch and in doing so I should have forced the value to be realised (and subsequently been able to access the body field of the response).
What's more interesting is how I can even access the body field on a Future and not have it error (as far as I know a Future has no such field/property/method)
So does anyone know how I can:
Resolve the Future so I get the actual value
Modify this example so the function do_async_thing makes multiple async url fetches
Now it's worth noting that because I was still getting a Future back I thought I would try adding a yield to prefix the call to do_async_thing() (e.g. x = yield do_async_thing()) but that gave me back the following error:
tornado.gen.BadYieldError: yielded unknown object <generator object get at 0x1023bc308>
I also looked at doing something like this for the second point:
def do_another_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
a = http.fetch("http://www.google.com/")
b = http.fetch("http://www.github.com/")
return a, b
class MainHandler(tornado.web.RequestHandler):
def get(self):
y = do_another_async_thing()
print(y)
But again this returns:
<tornado.concurrent.Future object at 0x102b966d8>
Where as I would've expected a tuple of Futures at least? At this point I'm unable to resolve these Futures without getting an error such as:
tornado.gen.BadYieldError: yielded unknown object <generator object get at 0x1091ac360>
Update
Below is an example that works (as per answered by A. Jesse Jiryu Davis)
But I've also added another example where by I have a new function do_another_async_thing which makes two async HTTP requests (but evaluating their values are a little bit more involved as you'll see):
def do_another_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
a = http.fetch("http://www.google.com/")
b = http.fetch("http://www.github.com/")
return a, b
#tornado.gen.coroutine
def do_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
response = yield http.fetch("http://www.google.com/")
return response.body
class MainHandler(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
x = yield do_async_thing()
print(x) # displays HTML response
fa, fb = do_another_async_thing()
fa = yield fa
fb = yield fb
print(fa.body, fb.body) # displays HTML response for each
It's worth clarifying: you might expect the two yield statements for do_another_async_thing to cause a blockage. But here is a breakdown of the steps that are happening:
do_another_async_thing returns immediately a tuple with two Futures
we yield the first tuple which causes the program to be blocked until the value is realised
the value is realised and so we move to the next line
we yield again, causing the program to block until the value is realised
but as both futures were created at the same time and run concurrently the second yield returns practically instantly

Coroutines return futures. To wait for the coroutine to complete, the caller must also be a coroutine, and must yield the future. So:
#gen.coroutine
def get(self):
x = yield do_async_thing()
For more info see Refactoring Tornado Coroutines.

Keeping context-manager object alive through function calls

I am running into a bit of an issue with keeping a context manager open through function calls. Here is what I mean:
There is a context-manager defined in a module which I use to open SSH connections to network devices. The "setup" code handles opening the SSH sessions and handling any issues, and the teardown code deals with gracefully closing the SSH session. I normally use it as follows:
from manager import manager
def do_stuff(device):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
return processed_output
In order to keep the SSH session open and not have to re-establish it across function calls, I would like to do add an argument to "do_stuff" which can optionally return the SSH session along with the data returned from the SSH session, as follows:
def do_stuff(device, return_handle=False):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
if return_handle:
return (processed_output, conn)
else:
return processed_output
I would like to be able to call this function "do_stuff" from another function, as follows, such that it signals to "do_stuff" that the SSH handle should be returned along with the output.
def do_more_stuff(device):
data, conn = do_stuff(device, return_handle=True)
output = conn.send_command("show users")
#process output...
return processed_output
However the issue that I am running into is that the SSH session is closed, due to the do_stuff function "returning" and triggering the teardown code in the context-manager (which gracefully closes the SSH session).
I have tried converting "do_stuff" into a generator, such that its state is suspended and perhaps causing the context-manager to stay open:
def do_stuff(device, return_handle=False):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
if return_handle:
yield (processed_output, conn)
else:
yield processed_output
And calling it as such:
def do_more_stuff(device):
gen = do_stuff(device, return_handle=True)
data, conn = next(gen)
output = conn.send_command("show users")
#process output...
return processed_output
However this approach does not seem to be working in my case, as the context-manager gets closed, and I get back a closed socket.
Is there a better way to approach this problem? Maybe my generator needs some more work...I think using a generator to hold state is the most "obvious" way that comes to mind, but overall should I be looking into another way of keeping the session open across function calls?
Thanks

I found this question because I was looking for a solution to an analogous problem where the object I wanted to keep alive was a pyvirtualdisplay.display.Display instance with selenium.webdriver.Firefox instances in it.
I also wanted any opened resources to die if an exception were raised during the display/browser instance creations.
I imagine the same could be applied to your database connection.
I recognize this probably only a partial solution and contains less-than-best practices. Help is appreciated.
This answer is the result of an ad lib spike using the following resources to patch together my solution:
https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator
http://www.wefearchange.org/2013/05/resource-management-in-python-33-or.html
(I do not yet fully grok what is described here though I appreciate the potential. The second link above eventually proved to be the most helpful by providing analogous situations.)
from pyvirtualdisplay.display import Display
from selenium.webdriver import Firefox
from contextlib import contextmanager, ExitStack
RFBPORT = 5904
def acquire_desktop_display(rfbport=RFBPORT):
display_kwargs = {'backend': 'xvnc', 'rfbport': rfbport}
display = Display(**display_kwargs)
return display
def release_desktop_display(self):
print("Stopping the display.")
# browsers apparently die with the display so no need to call quits on them
self.display.stop()
def check_desktop_display_ok(desktop_display):
print("Some checking going on here.")
return True
class XvncDesktopManager:
max_browser_count = 1
def __init__(self, check_desktop_display_ok=None, **kwargs):
self.rfbport = kwargs.get('rfbport', RFBPORT)
self.acquire_desktop_display = acquire_desktop_display
self.release_desktop_display = release_desktop_display
self.check_desktop_display_ok = check_desktop_display_ok \
if check_desktop_display_ok is None else check_desktop_display_ok
#contextmanager
def _cleanup_on_error(self):
with ExitStack() as stack:
"""push adds a context manager’s __exit__() method
to stack's callback stack."""
stack.push(self)
yield
# The validation check passed and didn't raise an exception
# Accordingly, we want to keep the resource, and pass it
# back to our caller
stack.pop_all()
def __enter__(self):
url = 'http://stackoverflow.com/questions/30905121/'\
'keeping-context-manager-object-alive-through-function-calls'
self.display = self.acquire_desktop_display(self.rfbport)
with ExitStack() as stack:
# add XvncDesktopManager instance's exit method to callback stack
stack.push(self)
self.display.start()
self.browser_resources = [
Firefox() for x in range(self.max_browser_count)
]
for browser_resource in self.browser_resources:
for url in (url, ):
browser_resource.get(url)
"""This is the last bit of magic.
ExitStacks have a .close() method which unwinds
all the registered context managers and callbacks
and invokes their exit functionality."""
# capture the function that calls all the exits
# will be called later outside the context in which it was captured
self.close_all = stack.pop_all().close
# if something fails in this context in enter, cleanup
with self._cleanup_on_error() as stack:
if not self.check_desktop_display_ok(self):
msg = "Failed validation for {!r}"
raise RuntimeError(msg.format(self.display))
# self is assigned to variable after "as",
# manually call close_all to unwind callback stack
return self
def __exit__(self, *exc_details):
# had to comment this out, unable to add this to callback stack
# self.release_desktop_display(self)
pass
I had a semi-expected result with the following:
kwargs = {
'rfbport': 5904,
}
_desktop_manager = XvncDesktopManager(check_desktop_display_ok=check_desktop_display_ok, **kwargs)
with ExitStack() as stack:
# context entered and what is inside the __enter__ method is executed
# desktop_manager will have an attribute "close_all" that can be called explicitly to unwind the callback stack
desktop_manager = stack.enter_context(_desktop_manager)
# I was able to manipulate the browsers inside of the display
# and outside of the context
# before calling desktop_manager.close_all()
browser, = desktop_manager.browser_resources
browser.get(url)
# close everything down when finished with resource
desktop_manager.close_all() # does nothing, not in callback stack
# this functioned as expected
desktop_manager.release_desktop_display(desktop_manager)

How to make spydlay module to work like httplib/http.client?

I have to test server based on Jetty. This server can work with its own protocol, HTTP, HTTPS and lastly it started to support SPDY. I have some stress tests which are based on httplib /http.client -- each thread start with similar URL (some data in query string are variable), adds execution time to global variable and every few seconds shows some statistics. Code looks like:
t_start = time.time()
connection.request("GET", path)
resp = connection.getresponse()
t_stop = time.time()
check_response(resp)
QRY_TIMES.append(t_stop - t_start)
Client working with native protocol shares httplib API, so connection may be native, HTTPConnection or HTTPSConnection.
Now I want to add SPDY test using spdylay module. But its interface is opaque and I don't know how to change its opaqueness into something similar to httplib interface. I have made test client based on example but while 2nd argument to spdylay.urlfetch() is class name and not object I do not know how to use it with my tests. I have already add tests to on_close() method of my class which extends spdylay.BaseSPDYStreamHandler, but it is not compatibile with other tests. If it was instance I would use it outside of spdylay.urlfetch() call.
How can I use spydlay in a code that works based on httplib interfaces?

My only idea is to use global dictionary where url is a key and handler object is a value. It is not ideal because:
new queries with the same url will overwrite previous response
it is easy to forget to free handler from global dictionary
But it works!
import sys
import spdylay
CLIENT_RESULTS = {}
class MyStreamHandler(spdylay.BaseSPDYStreamHandler):
def __init__(self, url, fetcher):
super().__init__(url, fetcher)
self.headers = []
self.whole_data = []
def on_header(self, nv):
self.headers.append(nv)
def on_data(self, data):
self.whole_data.append(data)
def get_response(self, charset='UTF8'):
return (b''.join(self.whole_data)).decode(charset)
def on_close(self, status_code):
CLIENT_RESULTS[self.url] = self
def spdy_simply_get(url):
spdylay.urlfetch(url, MyStreamHandler)
data_handler = CLIENT_RESULTS[url]
result = data_handler.get_response()
del CLIENT_RESULTS[url]
return result
if __name__ == '__main__':
if '--test' in sys.argv:
spdy_response = spdy_simply_get('https://localhost:8443/test_spdy/get_ver_xml.hdb')
I hope somebody can do spdy_simply_get(url) better.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Persistent List Python Implementation - python

Related

Python multiprocessing.Pool.apply_async() not executing class function

Non-lazy instance creation with Pyro4 and instance_mode='single'

Python Tornado Async Fetching of URLs

Keeping context-manager object alive through function calls

How to make spydlay module to work like httplib/http.client?

Categories

Resources