is GAE throttling me? - python

I have a Google App Engine HTTP resource that takes 20 seconds to respond. The resource does a calculation requiring very little bandwidth and no storage access. Billing is not enabled. If my desktop application spawns 100 threads to POST 500 times (each thread will on average POST 5 times). I believe that 500 POSTs use up just a little more than the freebie time for non-billing accounts, which is 6.5 CPU hours per 24 hour period. I might be about 10 POSTs over the limit because towards the end, about 10 of the 500 will fail even if I allow each request to retry twice.
In any event, the fact that I'm a little over the limit probably does not affect the problem which prompted my question. My question is: the dashboard measurement "CPU seconds used per second" is about 17. I would like this to be 100, because after all, I have 100 threads.
I'm not really good with Firebug or other monitoring tools so I have not proven that there is a peak of 100 outstanding requests on the wire-side of the Python standard library web methods, but I do print "hey" to the desktop console when there are 100 outstanding threads. It says "hey" fairly early so I think the number of CPU seconds per second should be a lot closer to 100 than 17. Is my problem on the desktop or is GAE throttling me and how can I get 100 CPU seconds per second? How can I get somebody at Google to help with this question? I think their "support" link just goes to "community-style" support.

Search the groups for 1000ms. Your app will not be given as many resources if your user-requests do not return in less than 1000ms. You might also face additional issues with requests that are taking 20 seconds, I believe if your requests sit in the pending queue it counts against the run-time increasing the likelihood you will get deadline / timeout errors.
You should look into breaking your code up and doing the processing in the task queue, or submitting more requests with less work per request.

Related

Python Urllib UrlOpen Read

Say I am retrieving a list of Urls from a server using Urllib2 library from Python. I noticed that it took about 5 seconds to get one page and it would take a long time to finish all the pages I want to collect.
I am thinking out of those 5 seconds. Most of the time was consumed on the server side and I am wondering could I just start using the threading library. Say 5 threads in this case, then the average time could be dramatically increased. Maybe 1 or 2 seconds in each page. (might make the server a bit busy). How could I optimize the number of threads so I could get a legit speed and not pushing the server too hard.
Thanks!
Updated:
I increased the number of threads one by one and monitored the total time (units: minutes) spent to scrape 100 URLs. and it turned out that the total time dramatically decreased when you change the number of threads to 2, and keep decreasing as you increase the number of threads, but the 'improvement' caused by threading become less and less obvious. (the total time even shows a bounce back when you build too many threads)
I know this is only a specific case for the web server that I harvest but I decided to share just to show the power of threading and hope would be helpful for somebody one day.
There are a few things you can do. If the URLs are on different domains, then you might just fan out the work to threads, each downloading a page from a different domain.
If your URLs all point to the same server and you do not want stress the server, then you can just retrieve the URLs sequentially. If the server is happy with a couple of parallel requests, the you can look into pools of workers. You could start, say a pool of four workers and add all your URL to a queue, from which the workers will pull new URLs.
Since you tagged the question with "screen-scraping" as well, scrapy is a dedicated scraping framework, which can work in parallel.
Python 3 comes with a set of new builtin concurrency primitives under concurrent.futures.
Here is a caveat. I have encountered a number of servers powered by somewhat "elderly" releases of IIS. They often will not service a request if there is not a one second delay between requests.

Using Google App Engine for Website Load Testing

Azure, Amazon and other instance based cloud providers can be used to carry out website load tests (by spinning up numerous instances running programs that send requests to a set of URLs) and I was wondering if I would be able to do this with Google App Engine.
So far, however it seems this is not the case. The only implementation I can think of at the moment is setting up the maximum number of cron jobs each executing at the highest frequency, each task requesting a bunch of URLs and at the same time popping in further tasks in the task queue.
According to my calculations this is only enough to fire off a maximum of 25 concurrent requests (as an application can have maximum 20 cron tasks each executing no more frequent than once a minute and the default queue has a throughput rate of 5 task invocations per second.
Any ideas if there is a way I could have more concurrent requests fetching URLs in an automated way?
The taskqueue API allows 100 task invocations per second per queue with the following max active queues quota:
Free: 10 active queues (not including the default queue)
Billing: 100 active queues (not including the default queue)
With a single UrlFetch per task, multiplying [max number of active queues] * [max number of tasks invocation per second] * [60 seconds] you can reach these nominal Urlfetch calls rate:
Free:
11 * 100 * 60 = 66000 Urlfetch calls/minute
Billing:
101 * 100 * 60 = 606000 Urlfetch calls/minute
These rates are limited by the number of allowed UrlFetch per minute quota:
Free:
3,000 calls/minute
Billing: 32,000 calls/minute
As you can see, Taskqueue + Urlfetch APIs can be used effectively to suit your load testing need.
Load testing against a public url may not be as accurate as getting boxes attached directly to the same switch as your target server. There are so many uncontrollable network effects.
Depending on your exact circumstances I would recommend borrowing a few desktop boxes for the purpose and using them. Any half decent machine should be able to generate a 2-3 thousand calls a minute.
That said, it really depends on the target scale you wish to achieve.

How do Google App Engine Task Queues work?

I'm confused about Task execution using queues. I've read the documentation and I thought I understood bucket_size and rate, but when I send 20 Tasks to a queue set to 5/h, size 5, all 20 Tasks execute one after the other as quickly as possible, finishing in less than 1 minute.
deferred.defer(spam.cookEggs,
egg_keys,
_queue="tortoise")
- name: tortoise
rate: 5/h
bucket_size: 5
What I want is whether I create 10 or 100 Tasks, I only want 5 of them to run per hour. So it would take 20 Tasks approximately 4 hours to complete. I want their execution spread out.
UPDATE
The problem was I assumed that when running locally, that Task execution rate rules were followed, but that is not the case. You cannot test execution rates locally. When I deployed to production, the rate and bucket size I had set executed as I expected.
Execution rates are not honored by the app_devserver. This issue should not occur in production.
[Answer discovered by Nick Johnson and/or question author; posting here as community wiki so we have something that can get marked accepted]
You want to set bucket_size to 1, or else you'll have "bursts" of queued activity like you saw there.
From the documentation:
bucket_size
Limits the burstiness of the queue's
processing, i.e. a higher bucket size
allows bigger spikes in the queue's
execution rate. For example, consider
a queue with a rate of 5/s and a
bucket size of 10. If that queue has
been inactive for some time (allowing
its "token bucket" to fill up), and 20
tasks are suddenly enqueued, it will
be allowed to execute 10 tasks
immediately. But in the following
second, only 5 more tasks will be able
to be executed because the token
bucket has been depleted and is
refilling at the specified rate of
5/s.

SimpleDB query performance improvement using boto

I am trying to use the SimpleDB in following way.
I want to keep 48 hrs worth data at anytime into simpledb and query it for different purposes.
Each domain has 1 hr worth data, so at any time there are 48 domains present in the simpledb.
As the new data is constantly uploaded, I delete the oldest domain and create a new domain for each new hour.
Each domain is about 50MB in size, the total size of all the domains is around 2.2 GB.
The item in the domain has following type of attributes
identifier - around 50 characters long -- 1 per item
timestamp - timestamp value -- 1 per item
serial_n_data - 500-1000 bytes data -- 200 per item
I'm using python boto library to upload and query the data.
I send 1 item/sec with around 200 attributes in the domain.
For one of the application of this data, I need to get all the data from all the 48 domains.
The Query looks like, "SELECT * FROM domain", for all the domains.
I use 8 threads to query data with each thread taking responsibility of few domains.
e.g domain 1-6 thread 1
domain 7-12 thread 2 and so on
It takes close to 13 minutes to get the entire data.I am using boto's select method for this.I need much more faster performance than this. Any suggestions on speed up the querying process? Is there any other language that I can use, which can speed up the things?
Use more threads
I would suggest inverting your threads/domain ratio from 1/6 to something closer to 30/1. Most of the time taken to pull down large chunks of data from SimpleDB is going to be spent waiting. In this situation upping the thread count will vastly improve your throughput.
One of the limits of SimpleDB is the query response size cap at 1MB. This means pulling down the 50MB in a single domain will take a minimum of 50 Selects (the original + 49 additional pages). These must occur sequentially because the NextToken from the current response is needed for the next request. If each Select takes 2+ seconds (not uncommon with large responses and high request volume) you spend 2 minutes on each domain. If every thread has to iterate thru each of 6 domains in turn, that's about 12 minutes right there. One thread per domain should cut that down to about 2 minutes easily.
But you should be able to do much better than that. SimpleDB is optimized for concurrency. I would try 30 threads per domain, giving each thread a portion of the hour to query on, since it is log data after all. For example:
SELECT * FROM domain WHERE timestamp between '12:00' and '12:02'
(Obviously, you'd use real timestamp values) All 30 queries can be kicked off without waiting for any responses. In this way you still need to make at least 50 queries per domain, but instead of making them all sequentially you can get a lot more concurrency. You will have to test for yourself how many threads gives you the best throughput. I would encourage you to try up to 60 per domain, breaking the Select conditions down to one minute increments. If it works for you then you will have fully parallel queries and most likely have eliminated all follow up pages. If you get 503 ServiceUnavailable errors, scale back the threads.
The domain is the basic unit of scalability for SimpleDB so it is good that you have a convenient way to partition your data. You just need take advantage of the concurrency. Rather than 13 minutes, I wouldn't be surprised if you were able to get the data in 13 seconds for an app running on EC2 in the same region. But the actual time it takes will depend on a number of other factors.
Cost Concerns
As a side note, I should mention the costs of what you are doing, even though you haven't raised the issue. CreateDomain and DeleteDomain are heavyweight operations. Normally I wouldn't advise using them so often. You are charged about 25 seconds of box usage each time so creating and deleting one each hour adds up to about $70 per month just for domain management. You can store orders of magnitude more data in a domain than the 50MB you mention. So you might want to let the data accumulate more before you delete. If your queries include the timestamp (or could be made to include the timestamp) query performance may not be hurt at all by having an extra GB of old data in the domain. In any case, GetAttributes and PutAttributes will never suffer a performance hit with a large domain size, it is only queries that don't make good use of a selective index. You'd have to test your queries to see. That is just a suggestion, I realize that the create/delete is cleaner conceptually.
Also writing 200 attributes at a time is expensive, due to a quirk in the box usage formula. The box usage for writes is proportional to the number of attributes raised to the power of 3 ! The formula in hours is:
0.0000219907 + 0.0000000002 N^3
For the base charge plus the per attribute charge, where N is the number of attributes. In your situation, if you write all 200 attributes in a single request, the box usage charges will be about $250 per million items ($470 per million if you write 256 attributes). If you break each request in to 4 requests with 50 attributes each, you will quadruple your PutAttributes volume, but reduce the box usage charges by an order of magnitude to about $28 per million items. If you are able break the requests down, then it may be worth doing. If you cannot (due to request volume, or just the nature of your app) it means that SimpleDB can end up being extremely unappealing from a cost standpoint.
I have had the same issue as you Charlie. After profiling the code, I have narrowed the performance problem down to SSL. It seems like that is where it is spending most of it's time and hence CPU cycles.
I have read of a problem in the httplib library (which boto uses for SSL) where the performance doesn't increase unless the packets are over a certain size, though that was for Python 2.5 and may have already been fixed.
SBBExplorer uses Multithreaded BatchPutAttributes to achieve high write throughput while uploading bulk data to Amazon SimpleDB. SDB Explorer allows multiple parallel uploads. If you have the bandwidth, you can take full advantage of that bandwidth by running number of BatchPutAttributes processes at once in parallel queue that will reduce the time spend in processing.
http://www.sdbexplorer.com/

Using Task Queues to schedule the fetching/parsing of a number of feeds in App Engine (Python)

Say I had over 10,000 feeds that I wanted to periodically fetch/parse.
If the period were say 1h that would be 24x10000 = 240,000 fetches.
The current 10k limit of the labs Task Queue API would preclude one from
setting up one task per fetch. How then would one do this?
Update: RE: Fetching nurls per task - Given the 30second timeout per request at some point this would hit a ceiling. Is
there anyway to parallelize it so each task queue initiates a bunch of async parallel fetches each of which would take less than 30sec to finish but the lot together may take more than that.
Here's the asynchronous urlfetch API:
http://code.google.com/appengine/docs/python/urlfetch/asynchronousrequests.html
Set of a bunch of requests with a reasonable deadline (give yourself some headroom under your timeout, so that if one request times out you still have time to process the others). Then wait on each one in turn and process as they complete.
I haven't used this technique myself in GAE, so you're on your own finding any non-obvious gotchas. Sadly there doesn't seem to be a select() style call in the API to wait for the first of several requests to complete.
2 fetches per task? 3?
Group up the fetches, so instead of queuing 1 fetch you queue up, say, a work unit that does 10 fetches.

Categories