I'm using jira-python to automate a bunch of tasks in Jira. One thing that I find weird is that jira-python takes a long time to run. It seems like it's loading or something before sending the requests. I'm new to python, so I'm a little confused as to what's actually going on. Before finding jira-python, I was sending requests to the Jira REST API using the requests library, and it was blazing fast (and still is, if I compare the two). Whenever I run the scripts that use jira-python, there's a good 15 second delay while 'loading' the library, and sometimes also a good 10-15 second delay sending each request.
Is there something I'm missing with python that could be causing this issue? Anyway to keep a python script running as a service so it doesn't need to 'load' the library each time it's ran?
#ThePavoIC, you seem to be correct. I notice MASSIVE changes in speed if Jira has been restarted and re-indexed recently. Scripts that would take a couple minutes to run would complete in seconds. Basically, you need to make sure Jira is tuned for performance and keep your indexes up to date.
Related
I'm using the Nest API to poll the current temperature and related temperature data from two of my Nests (simultaneously).
I was initially polling this data every minute but started getting an error:
nest.nest.APIError: blocked
I don't get the error every minute, more like intermittently every 5-10 minutes.
Reading through their documentation it seems that while pulling data once per minute is permissible, it's the maximum recommended query frequency.
So I set it to two minutes. I'm still getting the error.
I'm using this Python package, although I'm starting to wonder if there's too much going on under the hood that is making unnecessary requests.
Has anyone had any experience with this type of Nest error, or this Python package before?
Does polling two Nests with the same authenticated call result in multiple requests, as it relates to their data limiting?
Should I just scrap this package and roll my own? (this is generally my preference, but I need to learn to stop always re-writing everything the moment I hit a snag like this in order to fully control and thoroughly understand each aspect of a particular integration, right?)
Is it possible to submit multiple sequences to the Bio.Blast.NCBIWWW module at the same time? I've tried to create a function that runs my blast and have several of them run using multiprocessing, but I think the NCBI server boots me after a while and the connection stops working.
I don't know what sort of limits NCBI has on their service, but you may want to look into installing BLAST locally and running your queries that way. Biopython has support for local BLAST: http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec96
Here they detail how to properly use it:
http://www.ncbi.nlm.nih.gov/BLAST/Doc/node60.html
Do not launch more than 50 threads.
Wait for the RID of a BLAST before starting the next one.
Flooding the server can lead to many problems and eventually we may be forced to block access from sites which flood the severs with no warning. We strongly suggest you limit your scripts to not send a request until you receive a RID from the server. Alternatively please introduce a "sleep" command in your script which sends request no less then one once per three seconds.
Biopython does the wait for the RID for you, but if you launch several queries you are certainly going to be banned.
Using web.py, I'm building a website in which I display search results from two third party websites through their public API. Unfortunately, for the APIs to send back the result takes about 4 seconds. If I query the second API only after I received the answer from the first, this obviously takes me about 8 seconds, which is way too long. To bring this down I want to send the requests to the APIs simultaneously and simply continue as soon as I received an answer from both the APIs.
My problem is now: how to do this?
I've never worked with parallel computing, but I've heard of multiprocessing and threading. I don't really know what the difference or advantages of each are. I also know that for example C++ is able to do parallel computations. It could therefore also be an option to write the part that queries the APIs in C++ (I'm a beginner in C++, but I think I'd manage). Finally, there could of course be options that I am totally overlooking. Maybe web.py has some options to do this, or maybe there are Python modules which are specifically made to do this?
Since only researching and understanding all of these options would take me quite a lot of time, I thought I'd ask you guys here for some tips.
So which one do you think I should go for? And most importantly: why? All tips are welcome!
You want an asynchronous HTTP request library. Examples of this would be gevent, or grequests.
Alternatively, you could use Python's built-in threading module to run synchronous requests in multiple threads.
Either way, no need to go to another language.
I've done some experiments using Apache Bench to profile my code response times, and it doesn't quite generate the right kind of data for me. I hope the good people here have ideas.
Specifically, I need a tool that
Does HTTP requests over the network (it doesn't need to do anything very fancy)
Records response times as accurately as possible (at least to a few milliseconds)
Writes the response time data to a file without further processing (or provides it to my code, if a library)
I know about ab -e, which prints data to a file. The problem is that this prints only the quantile data, which is useful, but not what I need. The ab -g option would work, except that it doesn't print sub-second data, meaning I don't have the resolution I need.
I wrote a few lines of Python to do it, but the httplib is horribly inefficient and so the results were useless. In general, I need better precision than pure Python is likely to provide. If anyone has suggestions for a library usable from Python, I'm all ears.
I need something that is high performance, repeatable, and reliable.
I know that half my responses are going to be along the lines of "internet latency makes that kind of detailed measurements meaningless." In my particular use case, this is not true. I need high resolution timing details. Something that actually used my HPET hardware would be awesome.
Throwing a bounty on here because of the low number of answers and views.
I have done this in two ways.
With "loadrunner" which is a wonderful but pretty expensive product (from I think HP these days).
With combination perl/php and the Curl package. I found the CURL api slightly easier to use from php. Its pretty easy to roll your own GET and PUT requests. I would also recommend manually running through some sample requests with Firefox and the LiveHttpHeaders add on to captute the exact format of the http requests you need.
JMeter is pretty handy. It has a GUI from which you can set up your requests and threadpools and it also can be run from the command line.
If you can code in Java, you can look at the combination of JUnitPerf + HttpUnit.
The downside is that you will have to do more things yourself. But at the price of this you will get unlimited flexibility and arguably more preciseness than with GUI tools, not to mention HTML parsing, JavaScript execution, etc.
There's also another project called Grinder which seems to be purposed for a similar task but I don't have any experience with it.
A good reference of opensource perfomance testing tools: http://www.opensourcetesting.org/performance.php
You will find descriptions and a "most popular" list
httperf is very powerful.
I've used a script to drive 10 boxes on the same switch to generate load by "replaying" requests to 1 server. I had my web app logging response time (server only) to the granularity I needed, but I didn't care about the response time to the client. I'm not sure you care to include the trip to and from the client in your calculations, but if you did it shouldn't be to difficult to code up. I then processed my log with a script which extracted the times per url and did scatter plot graphs, and trend graphs based on load.
This satisfied my requirements which were:
Real world distribution of calls to different urls.
Trending performance based on load.
Not influencing the web app by running other intensive ops on the same box.
I did controller as a shell script that foreach server started a process in the background to loop over all the urls in a file calling curl on each one. I wrote the log processor in Perl since I was doing more Perl at that time.
I'm serving requests from several XMLRPC clients over WAN. The thing works great for, let's say, a period of one day (sometimes two), then freezes in socket.py:
data = self._sock.recv(self._rbufsize)
_sock.timeout is -1, _sock.gettimeout is None
There is nothing special I do in the main thread (just receiving XMLRPC calls), there are another two threads talking to DB. Both these threads work fine and survive this block (did a check with WinPdb). Clients are sending requests not being longer than 1KB, and there isn't any special content: just nice and clean strings in dictionary. Between two blockings I serve tens of thousands requests without problems.
Firewall is off, no strange software on the same machine, etc...
I use Windows XP and Python 2.6.4. I've checked differences between 2.6.4. and 2.6.5, and didn't find anything important (or am I mistaking?). 2.7 version is not an option as I would miss binaries for MySqlDB.
The only thing that happens from time to time caused by the clients that have poor internet connection is that sockets break. This is happening, every 5-10 minutes (there are just five clients accessing server every 2 seconds).
I've spent great deal of time on this issue, now I'm beginning to lose any ideas what to do. Any hint or thought would be highly appreciated.
What exactly is happening in your OS's TCP/IP stack (possibly in the python layers on top, but that's less likely) to cause this is a mystery. As a practical workaround, I'd set a timeout longer than the delays you expect between requests (10 seconds should be plenty if you expect a request every 2 seconds) and if one occurs, close and reopen. (Calibrate the delay needed to work around freezes without interrupting normal traffic by trial and error). Unpleasant to hack a fix w/o understanding the problem, I know, but being pragmatical about such things is a necessary survival trait in the world of writing, deploying and operating actual server systems. Be sure to comment the workaround accurately for future maintainers!
thanks so much for the fast response. Right after I've receive it I augmented the timeout to 10 seconds. Now it is all running without problems, but of course I would need to wait another day or two to have sort of confirmation, but only after 5 days I'll be sure and will come back with the results. I see now that 140K request went well already, having so hard experience on this one I would wait at least another 200K.
What you were proposing about auto adaptation of timeouts (without putting the system down) sounds also reasonable. Would the right way to go be in creating a small class (e.g. AutoTimeoutCalibrator) and embedding it directly into serial.py?
Yes - being pragmatical is the only way without loosing another 10 days trying to figure out the real reason behind.
Thanks again, I'll be back with the results.
(sorry, but for some reason I was not able to post it as a reply to your post)