Get faster API-Responses (Python)

Get faster API-Responses (Python) - python

How to get faster responses from an API.
I already used a session with keep alive but there still seems to be a faster method because somebody is always faster than me. I have a 25ms ping to the server so theres not much room to work with in the ping department. any tips?

Related

Is it a bad practice to use sleep() in a web server in production?

I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.

Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)

Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.

Jira python runs very slowly, any ideas on why?

I'm using jira-python to automate a bunch of tasks in Jira. One thing that I find weird is that jira-python takes a long time to run. It seems like it's loading or something before sending the requests. I'm new to python, so I'm a little confused as to what's actually going on. Before finding jira-python, I was sending requests to the Jira REST API using the requests library, and it was blazing fast (and still is, if I compare the two). Whenever I run the scripts that use jira-python, there's a good 15 second delay while 'loading' the library, and sometimes also a good 10-15 second delay sending each request.
Is there something I'm missing with python that could be causing this issue? Anyway to keep a python script running as a service so it doesn't need to 'load' the library each time it's ran?

#ThePavoIC, you seem to be correct. I notice MASSIVE changes in speed if Jira has been restarted and re-indexed recently. Scripts that would take a couple minutes to run would complete in seconds. Basically, you need to make sure Jira is tuned for performance and keep your indexes up to date.

Flask Blueprints and werkzeug.contrib.cache

I am using werkzeug caching to cache a commonly used object in memory between requests. I have been doing a lot of refactoring and started using blue prints, but now the application hard crashes when it tries to write to the cache. I can't get any debug information on it because it just dies. Anyone have any idea where to look, or a better way to approach this? The data I am reading from a database rarely ever changes so I want to cache it in the webserver across requests and have it timeout and refresh every 10 or 20 minutes.

I apologize for such little information, I had little to go on and I figured I would throw it out there. So it turns out this was a big red herring.
The real answer is...I am an idiot.
I was caching an object that had overridden the getattr function, which had a really bad typo.
return self.__getatribute__(name)
Notice, the missing t in getattribute. This caused an infinite loop and made the application die silently. Thanks for the help, next time i'll give some more info.

How to keep global variables persistent over multiple google appengine instances?

Our situation is as follows:
We are working on a schoolproject where the intention is that multiple teams walk around in a city with smarthphones and play a city game while walking.
As such, we can have 10 active smarthpones walking around in the city, all posting their location, and requesting data from the google appengine.
Someone is behind a webbrowser,watching all these teams walk around, and sending them messages etc.
We are using the datastore the google appengine provides to store all the data these teams send and request, to store the messages and retrieve them etc.
However we soon found out we where at our max limit of reads and writes, so we searched for a solution to be able to retrieve periodic updates(which cost the most reads and writes) without using any of the limited resources google provides. And obviously, because it's a schoolproject we don't want to pay for more reads and writes.
Storing this information in global variables seemed an easy and quick solution, which it was... but when we started to truly test we noticed some of our data was missing and then reappearing. Which turned out to be because there where so many requests being done to the cloud that a new instance was made, and instances don't keep these global variables persistent.
So our question is:
Can we somehow make sure these global variables are always the same on every running instance of google appengine.
OR
Can we limit the amount of instances ever running, no matter how many requests are done to '1'.
OR
Is there perhaps another way to store this data in a better way, without using the datastore and without using globals.

You should be using memcache. If you use the ndb (new database) library, you can automatically cache the results of queries. Obviously this won't improve your writes much, but it should significantly improve the numbers of reads you can do.
You need to back it with the datastore as data can be ejected from memcache at any time. If you're willing to take the (small) chance of losing updates you could just use memcache. You could do something like store just a message ID in the datastore and have the controller periodically verify that every message ID has a corresponding entry in memcache. If one is missing the controller would need to reenter it.

Interesting question. Some bad news first, I don't think there's a better way of storing data; no, you won't be able to stop new instances from spawning and no, you cannot make seperate instances always have the same data.
What you could do is have the instances perioidically sync themselves with a master record in the datastore, by choosing the frequency of this intelligently and downloading/uploading the information in one lump you could limit the number of read/writes to a level that works for you. This is firmly in the kludge territory though.
Despite finding the quota for just about everything else, I can't find the limits for free read/write so it is possible that they're ludicrously small but the fact that you're hitting them with a mere 10 smartphones raises a red flag to me. Are you certain that the smartphones are being polled (or calling in) at a sensible frequency? It sounds like you might be hammering them unnecessarily.

Consider jabber protocol for communication between peers. Free limits are on quite high level for it.

First, definitely use memcache as Tim Delaney said. That alone will probably solve your problem.
If not, you should consider a push model. The advantage is that your clients won't be asking you for new data all the time, only when something has actually changed. If the update is small enough that you can deliver it in the push message, you won't need to worry about datastore reads on memcache misses, or any other duplicate work, for all those clients: you read the data once when it changes and push it out to everyone.
The first option for push is C2DM (Android) or APN (iOS). These are limited on the amount of data they send and the frequency of updates.
If you want to get fancier you could use XMPP instead. This would let you do more frequent updates with (I believe) bigger payloads but might require more engineering. For a starting point, see Stack Overflow questions about Android and iOS.
Have fun!

SimpleXmlRpcServer _sock.rcv freezes after thousands of requests

I'm serving requests from several XMLRPC clients over WAN. The thing works great for, let's say, a period of one day (sometimes two), then freezes in socket.py:
data = self._sock.recv(self._rbufsize)
_sock.timeout is -1, _sock.gettimeout is None
There is nothing special I do in the main thread (just receiving XMLRPC calls), there are another two threads talking to DB. Both these threads work fine and survive this block (did a check with WinPdb). Clients are sending requests not being longer than 1KB, and there isn't any special content: just nice and clean strings in dictionary. Between two blockings I serve tens of thousands requests without problems.
Firewall is off, no strange software on the same machine, etc...
I use Windows XP and Python 2.6.4. I've checked differences between 2.6.4. and 2.6.5, and didn't find anything important (or am I mistaking?). 2.7 version is not an option as I would miss binaries for MySqlDB.
The only thing that happens from time to time caused by the clients that have poor internet connection is that sockets break. This is happening, every 5-10 minutes (there are just five clients accessing server every 2 seconds).
I've spent great deal of time on this issue, now I'm beginning to lose any ideas what to do. Any hint or thought would be highly appreciated.

What exactly is happening in your OS's TCP/IP stack (possibly in the python layers on top, but that's less likely) to cause this is a mystery. As a practical workaround, I'd set a timeout longer than the delays you expect between requests (10 seconds should be plenty if you expect a request every 2 seconds) and if one occurs, close and reopen. (Calibrate the delay needed to work around freezes without interrupting normal traffic by trial and error). Unpleasant to hack a fix w/o understanding the problem, I know, but being pragmatical about such things is a necessary survival trait in the world of writing, deploying and operating actual server systems. Be sure to comment the workaround accurately for future maintainers!

thanks so much for the fast response. Right after I've receive it I augmented the timeout to 10 seconds. Now it is all running without problems, but of course I would need to wait another day or two to have sort of confirmation, but only after 5 days I'll be sure and will come back with the results. I see now that 140K request went well already, having so hard experience on this one I would wait at least another 200K.
What you were proposing about auto adaptation of timeouts (without putting the system down) sounds also reasonable. Would the right way to go be in creating a small class (e.g. AutoTimeoutCalibrator) and embedding it directly into serial.py?
Yes - being pragmatical is the only way without loosing another 10 days trying to figure out the real reason behind.
Thanks again, I'll be back with the results.
(sorry, but for some reason I was not able to post it as a reply to your post)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.