Waiting on events in other requests in Twisted - python

I have a simple Twisted server that handles requests like this (obviously, asynchronously)
global SomeSharedMemory
if SomeSharedMemory is None:
SomeSharedMemory = LoadSharedMemory()
return PickSomething(SomeSharedMemory)
Where SomeSharedMemory is loaded from a database.
I want to avoid loading SomeSharedMemory from the database multiple times. Specifically, when the server first starts, and we get two concurrent incoming requests, we might see something like this:
Request 1: Check for SomeSharedMemory, don't find it
Request 1: Issue database query to load SSM
Request 2: Check for SSM, don't find it
Request 2: Issue database query to load SSM
Request 1: Query returns, store SSM
Request 1: Return result
Request 2: Query returns, store SSM
Request 2: Return result
With more concurrent requests, the database gets hammered. I'd like to do something like this (see http://docs.python.org/library/threading.html#event-objects):
global SomeSharedMemory, SSMEvent
if SomeSharedMemory is None:
if not SSMEvent.isSet():
SSMEvent.wait()
else:
# assumes that the event is initialized "set"
SSMEvent.clear()
SomeSharedMemory = LoadSharedMemory()
SSMEvent.set()
return PickSomething(SomeSharedMemory)
Such that if one request is loading the shared memory, other requests will wait politely until the query is complete rather than issue their own duplicate database queries.
Is this possible in Twisted?

The way your example is set up, it's hard to see how you could actually have the problem you're describing. If a second request comes in to your Twisted server before the call to LoadSharedMemory issued by the first has returned, then the second request will just wait before being processed. When it is finally handled, SomeSharedMemory will be initialized and there will be no duplication.
However, I suppose maybe it is the case that LoadSharedMemory is asynchronous and returns a Deferred, so that your code really looks more like this:
def handleRequest(request):
if SomeSharedMemory is None:
d = initSharedMemory()
d.addCallback(lambda ignored: handleRequest(request))
else:
d = PickSomething(SomeSharedMemory)
return d
In this case, it's entirely possible that a second request might arrive while initSharedMemory is off doing its thing. Then you would indeed end up with two tasks trying to initialize that state.
The thing to do, of course, is notice this third state that you have. There is not only un-initialized and initializ-ed, but also initializ-ing. So represent that state as well. I'll hide it inside the initSharedMemory function to keep the request handler as simpler as it already is:
initInProgress = None
def initSharedMemory():
global initInProgress
if initInProgress is None:
initInProgress = _reallyInit()
def initialized(result):
global initInProgress, SomeSharedMemory
initInProgress = None
SomeSharedMemory = result
initInProgress.addCallback(initialized)
d = Deferred()
initInProgress.chainDeferred(d)
return d
This is a little gross because of the globals everywhere. Here's a slightly cleaner version:
from twisted.internet.defer import Deferred, succeed
class SharedResource(object):
def __init__(self, initializer):
self._initializer = initializer
self._value = None
self._state = "UNINITIALIZED"
self._waiting = []
def get(self):
if self._state == "INITIALIZED":
# Return the already computed value
return succeed(self._value)
# Create a Deferred for the caller to wait on
d = Deferred()
self._waiting.append(d)
if self._state == "UNINITIALIZED":
# Once, run the setup
self._initializer().addCallback(self._initialized)
self._state = "INITIALIZING"
# Initialized or initializing state here
return d
def _initialized(self, value):
# Save the value, transition to the new state, and tell
# all the previous callers of get what the result is.
self._value = value
self._state = "INITIALIZED"
waiting, self._waiting = self._waiting, None
for d in waiting:
d.callback(value)
SomeSharedMemory = SharedResource(initializeSharedMemory)
def handleRequest(request):
return SomeSharedMemory.get().addCallback(PickSomething)
Three states, nice explicit transitions between them, no global state to update (at least if you give SomeSharedMemory some non-global scope), and handleRequest doesn't know about any of this, it just asks for a value and then uses it.

Related

SQLAlchemy session cleared in celery job and on_success function

I am building a tool that fetches data from a different database, transforms it, and stores it in my own database. I'm migrating from APScheduler to Celery, but I ran into the following problem:
I use a class I call JobRecords to store when a job ran, whether it was successful and which errors it encountered. I use this to know not too look too far back for updated entries, especially since some tables have multiple millions of rows.
Since the system is the same for all jobs, I created a subclass from the celery Task object. I make sure the job is executed within the Flask app context, and I fetch the latest time this Job finished successfully. I also make sure I register a value for now to avoid timing issues between querying the database and adding the job record.
class RecordedTask(Task):
"""
Task sublass that uses JobRecords to get the last run date
and add new JobRecords on completion
"""
now: datetime = None
ignore_result = True
_session: scoped_session = None
success: bool = True
info: dict = None
#property
def session(self) -> Session:
"""Making sure we have one global session instance"""
if self._session is None:
from app.extensions import db
self._session = db.session
return self._session
def __call__(self, *args, **kwargs):
from app.models import JobRecord
kwargs['last_run'] = (
self.session.query(func.max(JobRecord.run_at_))
.filter(JobRecord.job_id == self.name, JobRecord.success)
.first()
)[0] or datetime.min
self.now = kwargs['now'] = datetime.utcnow()
with app.app_context():
super(RecordedTask, self).__call__(*args, **kwargs)
def on_failure(self, exc, task_id, args: list, kwargs: dict, einfo):
self.session.rollback()
self.success = False
self.info = dict(
args=args,
kwargs=kwargs,
error=exc.args,
exc=format_exception(exc.__class__, exc, exc.__traceback__),
)
app.logger.error(f"Error executing job '{self.name}': {exc}")
def on_success(self, retval, task_id, args: list, kwargs: dict):
app.logger.info(f"Executed job '{self.name}' successfully, adding JobRecord")
for entry in self.to_trigger:
if len(entry) == 2:
job, kwargs = entry
else:
job, = entry
kwargs = {}
app.logger.info(f"Scheduling job '{job}'")
current_celery_app.signature(job, **kwargs).delay()
def after_return(self, *args, **kwargs):
from app.models import JobRecord
record = JobRecord(
job_id=self.name,
run_at_=self.now,
info=self.info,
success=self.success
)
self.session.add(record)
self.session.commit()
self.session.remove()
I added an example of a job to update a model called Location, but there are a lot of jobs just like this one.
#celery.task(bind=True, name="update_locations")
def update_locations(self, last_run: datetime = datetime.min, **_):
"""Get the locations from the external database and check for updates"""
locations: List[ExternalLocation] = ExternalLocation.query.filter(
ExternalLocation.updated_at_ >= last_run
).order_by(ExternalLocation.id).all()
app.logger.info(f"ExternalLocation: collected {len(locations)} updated locations")
for update_location in locations:
existing_location: Location = Location.query.filter(
Location.external_id == update_location.id
).first()
if existing_location is None:
self.session.add(Location.from_worker(update_location))
else:
existing_location.update_from_worker(update_location)
The problem is that when I run this job, the Location objects are not committed with the JobRecord, so only the latter is created. If I track it with the debugger, Location.query.count() returns the correct value inside the function, but as soon as it enters the on_success callback, it's back to 0, and self._session.new returns an empty dict.
I already tried adding the session as a property to make sure it's the same instance everywhere, but the problem still persists. Maybe it has something to do with it being a scoped_session because of Flask-SQLAlchemy?
Sorry about the large amount of code, I did try to strip as much away as possible. Any help is welcome!
I found out that the culprit was the combination of scoped_session and the Flask app context. Like any contextmanager, running the code with app.app_context() triggered the __exit__ function on leaving, which in turn caused the ScopedRegistry, where the scoped_session was stored, to be cleared. Then, a new session was created, the JobRecords were added to that, and that session was committed. Therefore, the locations would not be written to the database.
There are two possible solutions. If you don't use sessions in other files than in your task, you can add a session property to the task. This way, you avoid the scoped_session alltogether, and can clean up in your after_return function.
#property
def session(self):
if self._session is None:
from dashboard.extensions import db
self._session = db.create_session(options={})()
return self._session
However, I was accessing the session in my model definition files as well, through from extensions import db. Therefore, I was using two different sessions. I ended up using app.app_context().push() instead of the contextmanager, thus avoiding the __exit__ function
app.app_context().push()
super(RecordedTask, self).__call__(*args, **kwargs)

block read of instance variable when trying to set it

Class A(object):
def __init__(self, cookie):
self.__cookie = cookie
def refresh_cookie():
```This method refresh the cookie after every 10 min```
self.__cookie = <newcookie>
#property
def cookie(self):
return self.__cookie
Problem is cookie value gets changed after every 10 min. However if some method already had the older cookie then request fails. This happen when multiple threads using the same A object.
I am looking for some solution where whenever we tries to refresh i.e. modify cookie value no one should be able to read the cookie value rather there should be a lock at cookie value.
This is a job for a condition variable.
from threading import Lock, Condition
class A(object):
def __init__(self, cookie):
self.__cookie = cookie
self.refreshing = Condition()
def refresh_cookie():
```This method refresh the cookie after every 10 min```
with self.refreshing:
self.__cookie = <newcookie>
self.refreshing.notifyAll()
#property
def cookie(self):
with self.refreshing:
return self.__cookie
Only one thread can enter a with block governed by self.refreshing at a time. The first thread to try will succeed; the others will block until the first leaves its with block.

Keeping context-manager object alive through function calls

I am running into a bit of an issue with keeping a context manager open through function calls. Here is what I mean:
There is a context-manager defined in a module which I use to open SSH connections to network devices. The "setup" code handles opening the SSH sessions and handling any issues, and the teardown code deals with gracefully closing the SSH session. I normally use it as follows:
from manager import manager
def do_stuff(device):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
return processed_output
In order to keep the SSH session open and not have to re-establish it across function calls, I would like to do add an argument to "do_stuff" which can optionally return the SSH session along with the data returned from the SSH session, as follows:
def do_stuff(device, return_handle=False):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
if return_handle:
return (processed_output, conn)
else:
return processed_output
I would like to be able to call this function "do_stuff" from another function, as follows, such that it signals to "do_stuff" that the SSH handle should be returned along with the output.
def do_more_stuff(device):
data, conn = do_stuff(device, return_handle=True)
output = conn.send_command("show users")
#process output...
return processed_output
However the issue that I am running into is that the SSH session is closed, due to the do_stuff function "returning" and triggering the teardown code in the context-manager (which gracefully closes the SSH session).
I have tried converting "do_stuff" into a generator, such that its state is suspended and perhaps causing the context-manager to stay open:
def do_stuff(device, return_handle=False):
with manager(device) as conn:
output = conn.send_command("show ip route")
#process output...
if return_handle:
yield (processed_output, conn)
else:
yield processed_output
And calling it as such:
def do_more_stuff(device):
gen = do_stuff(device, return_handle=True)
data, conn = next(gen)
output = conn.send_command("show users")
#process output...
return processed_output
However this approach does not seem to be working in my case, as the context-manager gets closed, and I get back a closed socket.
Is there a better way to approach this problem? Maybe my generator needs some more work...I think using a generator to hold state is the most "obvious" way that comes to mind, but overall should I be looking into another way of keeping the session open across function calls?
Thanks
I found this question because I was looking for a solution to an analogous problem where the object I wanted to keep alive was a pyvirtualdisplay.display.Display instance with selenium.webdriver.Firefox instances in it.
I also wanted any opened resources to die if an exception were raised during the display/browser instance creations.
I imagine the same could be applied to your database connection.
I recognize this probably only a partial solution and contains less-than-best practices. Help is appreciated.
This answer is the result of an ad lib spike using the following resources to patch together my solution:
https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator
http://www.wefearchange.org/2013/05/resource-management-in-python-33-or.html
(I do not yet fully grok what is described here though I appreciate the potential. The second link above eventually proved to be the most helpful by providing analogous situations.)
from pyvirtualdisplay.display import Display
from selenium.webdriver import Firefox
from contextlib import contextmanager, ExitStack
RFBPORT = 5904
def acquire_desktop_display(rfbport=RFBPORT):
display_kwargs = {'backend': 'xvnc', 'rfbport': rfbport}
display = Display(**display_kwargs)
return display
def release_desktop_display(self):
print("Stopping the display.")
# browsers apparently die with the display so no need to call quits on them
self.display.stop()
def check_desktop_display_ok(desktop_display):
print("Some checking going on here.")
return True
class XvncDesktopManager:
max_browser_count = 1
def __init__(self, check_desktop_display_ok=None, **kwargs):
self.rfbport = kwargs.get('rfbport', RFBPORT)
self.acquire_desktop_display = acquire_desktop_display
self.release_desktop_display = release_desktop_display
self.check_desktop_display_ok = check_desktop_display_ok \
if check_desktop_display_ok is None else check_desktop_display_ok
#contextmanager
def _cleanup_on_error(self):
with ExitStack() as stack:
"""push adds a context manager’s __exit__() method
to stack's callback stack."""
stack.push(self)
yield
# The validation check passed and didn't raise an exception
# Accordingly, we want to keep the resource, and pass it
# back to our caller
stack.pop_all()
def __enter__(self):
url = 'http://stackoverflow.com/questions/30905121/'\
'keeping-context-manager-object-alive-through-function-calls'
self.display = self.acquire_desktop_display(self.rfbport)
with ExitStack() as stack:
# add XvncDesktopManager instance's exit method to callback stack
stack.push(self)
self.display.start()
self.browser_resources = [
Firefox() for x in range(self.max_browser_count)
]
for browser_resource in self.browser_resources:
for url in (url, ):
browser_resource.get(url)
"""This is the last bit of magic.
ExitStacks have a .close() method which unwinds
all the registered context managers and callbacks
and invokes their exit functionality."""
# capture the function that calls all the exits
# will be called later outside the context in which it was captured
self.close_all = stack.pop_all().close
# if something fails in this context in enter, cleanup
with self._cleanup_on_error() as stack:
if not self.check_desktop_display_ok(self):
msg = "Failed validation for {!r}"
raise RuntimeError(msg.format(self.display))
# self is assigned to variable after "as",
# manually call close_all to unwind callback stack
return self
def __exit__(self, *exc_details):
# had to comment this out, unable to add this to callback stack
# self.release_desktop_display(self)
pass
I had a semi-expected result with the following:
kwargs = {
'rfbport': 5904,
}
_desktop_manager = XvncDesktopManager(check_desktop_display_ok=check_desktop_display_ok, **kwargs)
with ExitStack() as stack:
# context entered and what is inside the __enter__ method is executed
# desktop_manager will have an attribute "close_all" that can be called explicitly to unwind the callback stack
desktop_manager = stack.enter_context(_desktop_manager)
# I was able to manipulate the browsers inside of the display
# and outside of the context
# before calling desktop_manager.close_all()
browser, = desktop_manager.browser_resources
browser.get(url)
# close everything down when finished with resource
desktop_manager.close_all() # does nothing, not in callback stack
# this functioned as expected
desktop_manager.release_desktop_display(desktop_manager)

How to make spydlay module to work like httplib/http.client?

I have to test server based on Jetty. This server can work with its own protocol, HTTP, HTTPS and lastly it started to support SPDY. I have some stress tests which are based on httplib /http.client -- each thread start with similar URL (some data in query string are variable), adds execution time to global variable and every few seconds shows some statistics. Code looks like:
t_start = time.time()
connection.request("GET", path)
resp = connection.getresponse()
t_stop = time.time()
check_response(resp)
QRY_TIMES.append(t_stop - t_start)
Client working with native protocol shares httplib API, so connection may be native, HTTPConnection or HTTPSConnection.
Now I want to add SPDY test using spdylay module. But its interface is opaque and I don't know how to change its opaqueness into something similar to httplib interface. I have made test client based on example but while 2nd argument to spdylay.urlfetch() is class name and not object I do not know how to use it with my tests. I have already add tests to on_close() method of my class which extends spdylay.BaseSPDYStreamHandler, but it is not compatibile with other tests. If it was instance I would use it outside of spdylay.urlfetch() call.
How can I use spydlay in a code that works based on httplib interfaces?
My only idea is to use global dictionary where url is a key and handler object is a value. It is not ideal because:
new queries with the same url will overwrite previous response
it is easy to forget to free handler from global dictionary
But it works!
import sys
import spdylay
CLIENT_RESULTS = {}
class MyStreamHandler(spdylay.BaseSPDYStreamHandler):
def __init__(self, url, fetcher):
super().__init__(url, fetcher)
self.headers = []
self.whole_data = []
def on_header(self, nv):
self.headers.append(nv)
def on_data(self, data):
self.whole_data.append(data)
def get_response(self, charset='UTF8'):
return (b''.join(self.whole_data)).decode(charset)
def on_close(self, status_code):
CLIENT_RESULTS[self.url] = self
def spdy_simply_get(url):
spdylay.urlfetch(url, MyStreamHandler)
data_handler = CLIENT_RESULTS[url]
result = data_handler.get_response()
del CLIENT_RESULTS[url]
return result
if __name__ == '__main__':
if '--test' in sys.argv:
spdy_response = spdy_simply_get('https://localhost:8443/test_spdy/get_ver_xml.hdb')
I hope somebody can do spdy_simply_get(url) better.

Tornado mysteriously hanging after an async operation -- how can I debug?

Our app uses a fair few network calls (it's built on top of a third-party REST API), so we're using a lot of asynchronous operations to keep the system responsive. (Using Swirl to stay sane, since the app was written before tornado.gen came about). So when the need arose to do a little geocoding, we figured it would be trivial -- throw in a couple of async calls to another external API, and we'd be golden.
Somehow, our async code is mysteriously hanging Tornado -- the process is still running, but it won't respond to requests or output anything to the logs. Worse, when we take the third-party server out of the equation entirely, it still hangs -- it seems to lock up some arbitrary period after the async request returns.
Here's some stub code that replicates the problem:
def async_geocode(lat, lon, callback, fields=('city', 'country')):
'''Translates lat and lon into human-readable geographic info'''
iol = IOLoop.instance()
iol.add_timeout(time.time() + 1, lambda: callback("(unknown)"))
And here's the test that usually (but not always -- that's how it got to production in the first place) catches it:
class UtilTest(tornado.testing.AsyncTestCase):
def get_new_ioloop(self):
'''Ensure that any test code uses the right IOLoop, since the code
it tests will use the singleton.'''
return tornado.ioloop.IOLoop.instance()
def test_async_geocode(self):
# Yahoo gives (-122.419644, 37.777125) for SF, so we expect it to
# reverse geocode to SF too...
async_geocode(lat=37.777, lon=-122.419, callback=self.stop,
fields=('city', 'country'))
result = self.wait(timeout=4)
self.assertEquals(result, u"San Francisco, United States")
# Now test if it's hanging (or has hung) the IOLoop on finding London
async_geocode(lat=51.506, lon=-0.127, callback=self.stop,
fields=('city',))
result = self.wait(timeout=5)
self.assertEquals(result, u"London")
# Test it fails gracefully
async_geocode(lat=0.00, lon=0.00, callback=self.stop,
fields=('city',))
result = self.wait(timeout=6)
self.assertEquals(result, u"(unknown)")
def test_async_geocode2(self):
async_geocode(lat=37.777, lon=-122.419, callback=self.stop,
fields=('city', 'state', 'country'))
result = self.wait(timeout=7)
self.assertEquals(result, u"San Francisco, California, United States")
async_geocode(lat=51.506325, lon=-0.127144, callback=self.stop,
fields=('city', 'state', 'country'))
result = self.wait(timeout=8)
self.io_loop.add_timeout(time.time() + 8, lambda: self.stop(True))
still_running = self.wait(timeout=9)
self.assert_(still_running)
Note that the first test almost always passes, and it's the second test (and its call to async_geocode) that usually fails.
Edited to add: Note also that we have lots of similarly asynchronous calls to our other third-party API which are working absolutely fine.
(For completeness, here's the full implementation of async_geocode and its helper class (although the stub above replicates the problem)):
def async_geocode(lat, lon, callback, fields=('city', 'country')):
'''Use AsyncGeocoder to do the work.'''
geo = AsyncGeocoder(lat, lon, callback, fields)
geo.geocode()
class AsyncGeocoder(object):
'''
Reverse-geocode to as specific a level as possible
Calls Yahoo! PlaceFinder for reverse geocoding. Takes a lat, lon, and
callback function (to call with the result string when the request
completes), and optionally a sequence of fields to return, in decreasing
order of specificity (e.g. street, neighborhood, city, country)
NB: Does not do anything intelligent with the geocoded data -- just returns
the first result found.
'''
url = "http://where.yahooapis.com/geocode"
def __init__(self, lat, lon, callback, fields, ioloop=None):
self.lat, self.lon = lat, lon
self.callback = callback
self.fields = fields
self.io_loop = ioloop or IOLoop.instance()
self._client = AsyncHTTPClient(io_loop=self.io_loop)
def geocode(self):
params = urllib.urlencode({
'q': '{0}, {1}'.format(self.lat, self.lon),
'flags': 'J', 'gflags': 'R'
})
tgt_url = self.url + "?" + params
self._client.fetch(tgt_url, self.geocode_cb)
def geocode_cb(self, response):
geodata = json_decode(response.body)
try:
geodata = geodata['ResultSet']['Results'][0]
except IndexError:
# Response didn't contain anything
result_string = ""
else:
results = []
for f in self.fields:
val = geodata.get(f, None)
if val:
results.append(val)
result_string = ", ".join(results)
if result_string == '':
# This can happen if the response was empty _or_ if
# the requested fields weren't in it. Regardless,
# the user needs to see *something*
result_string = "(unknown)"
self.io_loop.add_callback(lambda: self.callback(result_string))
Edit: So after quite a bit of tedious debugging and logging the situations in which the system fails over a few days, it turns out that, as the accepted answer points out, my test was failing for unrelated reasons. It also turns out that the reason it was hanging was nothing to do with the IOLoop, but rather that one of the coroutines in question was immediately hanging waiting for a database lock.
Sorry for the mis-targeted question, and thank you all for your patience.
Your second test appears to failing because of this part:
self.io_loop.add_timeout(time.time() + 8, lambda: self.stop(True))
still_running = self.wait(timeout=9)
self.assert_(still_running)
when you add a timeout to the IOLoop through self.wait, that timeout is not cleared when self.stop is called, as far as I can tell. I.E. your first timeout is persisting, and when you sleep the IOLoop for 8 seconds, it triggers.
I doubt any of that is related to your original problem.

Categories