I am using werkzeug caching to cache a commonly used object in memory between requests. I have been doing a lot of refactoring and started using blue prints, but now the application hard crashes when it tries to write to the cache. I can't get any debug information on it because it just dies. Anyone have any idea where to look, or a better way to approach this? The data I am reading from a database rarely ever changes so I want to cache it in the webserver across requests and have it timeout and refresh every 10 or 20 minutes.
I apologize for such little information, I had little to go on and I figured I would throw it out there. So it turns out this was a big red herring.
The real answer is...I am an idiot.
I was caching an object that had overridden the getattr function, which had a really bad typo.
return self.__getatribute__(name)
Notice, the missing t in getattribute. This caused an infinite loop and made the application die silently. Thanks for the help, next time i'll give some more info.
Related
I am running an application on gae flexible with python and flask. I periodically dispatch cloud tasks with a cron job. These basically loop through all users and perform some cluster analysis. The tasks terminate without throwing any kind of error but don't perform all the work (meaning not all users were looped through). It doesn't seem to happen at a consistent time 276.5s - 323.3s nor does it ever stop at the same user. Has anybody experienced anything similar?
My guess is that I am breaching some type of resource limit or timeout somewhere. Things i have thought about or tried:
Cloud tasks should be allowed to run for up to an hour (as per this: https://cloud.google.com/tasks/docs/creating-appengine-handlers)
I increased the timeout of gunicorn workers to be 3600 to reflect this.
I have several workers running.
I tried to find if there are memory spikes or cpu overload but didn't see anything suspicious.
Sorry if I am too vague or am completely missing the point, I am quite confused with this problem. Thank you for any pointers.
Thank you for all the suggestions, I played around with them and have found out the root cause, although by accident reading firestore documentation. I had no indication that this had anything to do with firestore.
From here: https://googleapis.dev/python/firestore/latest/collection.html
I found out that Query.stream() (or Query.get()) has a timeout on the individual documents like so:
Note: The underlying stream of responses will time out after the
max_rpc_timeout_millis value set in the GAPIC client configuration for
the RunQuery API. Snapshots not consumed from the iterator before that
point will be lost.
So what eventually timed out was the query of all users, I came across this by chance, none of the errors I caught pointed me back towards the query. Hope this helps someone in the future!
Other than use Cloud Scheduler, you can inspect the logs to make sure the Tasks ran properly and make sure there's no deadline issues. As application logs are grouped, and after the task itself is executed, it’s sent to Stackdriver. When a task is forcibly terminated, no log may be output. Try catching the Deadline exception so that some log is output and you may see some helpful info to start troubleshooting.
We have a little data which almost won't be updated but read frequently (site config and some selection items like states and counties information), I think if I can move it to application memory instead of any database, our I/O performance would get a big improvement.
But we have a lot of web servers, I cannot figure out a good solution how to notice all the servers to reload these data.
You are likely looking for a cache pattern: Is there a Python caching library? You just need to ask how stale you can afford to be. If it was looking this up on every request, even a short-lived cache can massively improve performance. It's likely though that this information can live for minutes or hours without too much risk of being "stale".
If you can't live with a stale cache, I've implemented solutions that have a single database call, which keeps track of the last updated date for any of the cached data. This at least reduces the cache lookups to a single database call.
Be aware though, as soon as you are sharing updateable information, you have to deal with multi-threaded updates of shared state. Make sure you understand the implications of this. Hopefully your caching library handles this gracefully.
Sometimes, with requests that do a lot, Google AppEngine returns an error. I have been handling this by some trickery: memcaching intermediate processed data and just requesting the page again. This often works because the memcached data does not have to be recalculated and the request finishes in time.
However... this hack requires seeing an error, going back, and clicking again. Obviously less than ideal.
Any suggestions?
inb4: "optimize your process better", "split your page into sub-processes", and "use taskqueue".
Thanks for any thoughts.
Edit - To clarify:
Long wait for requests is ok because the function is administrative. I'm basically looking to run a data-mining function. I'm searching over my datastore and modifying a bunch of objects. I think the correct answer is that AppEngine may not be the right tool for this. I should be exporting the data to a computer where I can run functions like this on my own. It seems AppEngine is really intended for serving with lighter processing demands. Maybe the quota/pricing model should offer the option to increase processing timeouts and charge extra.
If interactive user requests are hitting the 30 second deadline, you have bigger problems: your user has almost certainly given up and left anyway.
What you can do depends on what your code is doing. There's a lot to be optimized by batching datastore operations, or reducing them by changing how you model your data; you can offload work to the Task Queue; for URLFetches, you can execute them in parallel. Tell us more about what you're doing and we may be able to provide more concrete suggestions.
I have been handling something similar by building a custom automatic retry dispatcher on the client. Whenever an ajax call to the server fails, the client will retry it.
This works very well if your page is ajaxy. If your app spits entire HTML pages then you can use a two pass process: first send an empty page containing only an ajax request. Then, when AppEngine receives that ajax request, it outputs the same HTML you had before. If the ajax call succeeds it fills the DOM with the result. If it fails, it retries once.
I'm serving requests from several XMLRPC clients over WAN. The thing works great for, let's say, a period of one day (sometimes two), then freezes in socket.py:
data = self._sock.recv(self._rbufsize)
_sock.timeout is -1, _sock.gettimeout is None
There is nothing special I do in the main thread (just receiving XMLRPC calls), there are another two threads talking to DB. Both these threads work fine and survive this block (did a check with WinPdb). Clients are sending requests not being longer than 1KB, and there isn't any special content: just nice and clean strings in dictionary. Between two blockings I serve tens of thousands requests without problems.
Firewall is off, no strange software on the same machine, etc...
I use Windows XP and Python 2.6.4. I've checked differences between 2.6.4. and 2.6.5, and didn't find anything important (or am I mistaking?). 2.7 version is not an option as I would miss binaries for MySqlDB.
The only thing that happens from time to time caused by the clients that have poor internet connection is that sockets break. This is happening, every 5-10 minutes (there are just five clients accessing server every 2 seconds).
I've spent great deal of time on this issue, now I'm beginning to lose any ideas what to do. Any hint or thought would be highly appreciated.
What exactly is happening in your OS's TCP/IP stack (possibly in the python layers on top, but that's less likely) to cause this is a mystery. As a practical workaround, I'd set a timeout longer than the delays you expect between requests (10 seconds should be plenty if you expect a request every 2 seconds) and if one occurs, close and reopen. (Calibrate the delay needed to work around freezes without interrupting normal traffic by trial and error). Unpleasant to hack a fix w/o understanding the problem, I know, but being pragmatical about such things is a necessary survival trait in the world of writing, deploying and operating actual server systems. Be sure to comment the workaround accurately for future maintainers!
thanks so much for the fast response. Right after I've receive it I augmented the timeout to 10 seconds. Now it is all running without problems, but of course I would need to wait another day or two to have sort of confirmation, but only after 5 days I'll be sure and will come back with the results. I see now that 140K request went well already, having so hard experience on this one I would wait at least another 200K.
What you were proposing about auto adaptation of timeouts (without putting the system down) sounds also reasonable. Would the right way to go be in creating a small class (e.g. AutoTimeoutCalibrator) and embedding it directly into serial.py?
Yes - being pragmatical is the only way without loosing another 10 days trying to figure out the real reason behind.
Thanks again, I'll be back with the results.
(sorry, but for some reason I was not able to post it as a reply to your post)
I'm looking for some recommendations for a python web application. We have some memory restrictions and we try to keep it small and lean.
We thought about using WSGI (and a python webserver) and build the rest ourself. We already have a template engine we'd like to use, but we are open for some suggestions regarding the whole request handling (the controller).
The application has to run in a single process and the requests have to be processed with multiple threads.
We've looked at django, but we are a not sure if it fits into our memory budget.
Your feedback is very welcome!
Cheers,
Reto
I've been using Werkzeug because it's more a small collection of really useful components than a whole framework. It runs behind a wsgi server of your choice (and comes with a built-in one). If you want something even easier, Flask might be worth a look. Also, you might want to bookmark the rather speedy Jinja in case your template engine doesn't pan out. Those folks over at pocoo.org have been releasing some nice stuff.
You can run an django application in 20 mb memory easily. probably a django application will use less memory than 20mb.
I want to advise you to check webpy and cherrypy
but I'm big fan of django. if you have 20 mb memory to run application, django will give you everythig it has.
I'd go for bottle. It has all the conciseness of web.py but with some nice routing features.
You could take a look at Twisted, which has a module twisted.web. That seems to be fairly light-weight. I'm currently using it, and with a simple app it starts almost instantaneously, so it can't be all that resource intensive :)
I don't know whether Twisted uses different threads.
webpy (http://webpy.org/) is a very minimal memory footprint but highly usable framework. But it all depends on how complex your application is going to be.
Also please take a look at WHIFF. It's tiny and very flexible whiff documentation