HTTP 503 Error occurs when I use script. Manul dont - python

I wrote script in Python, which send requests to google and make something with results.
I used gogole library, which I found somewhere in Internet :)
When I requested many times ( aprox. 50) Google blocks another requests(503 Error), regardless time.sleep(2) in every iteration.
Above google library uses urllib and BeautifulSoup. Do you know what can make errors or maybe do you know what library is better?
Tomek

Related

translate.google.com requests limit from Python requests module?

I have a small script making a request to http://translate.google.com/m?hl=%s&sl=%s&q=%s as my base_link however I've seemed to have surpassed the requests limit
HTTPError: HTTP Error 429: Too Many Requests
I am looking for what the limit is for google translate. The answers out there point to the Google Translate API limits however not the limit if one is directly accessing from Python's requests module.
Let me know if I can provide some more information.
The request you are submitting using the url http://translate.google.com/m?hl=%s&sl=%s&q=%s is using the free service via http://translate.google.com. The limits you have been seeing online is related to Google Translate API which is under Google Cloud Platform. These 2 are totally different from each other.
Currently, there is no public document discussing the limits in using googletrans. But, Googletrans will block your IP address once the system detects that you have been exploiting the free translation service by submitting huge amounts of requests, resulting in
HTTPError: HTTP Error 429: Too Many Requests
.
You can use Google Cloud Platform’s Translation API which is a paid service. Using this, you can avoid low limits set in googletrans which will give you more flexibility with the amount of requests you can submit.
According to this answer, the content quota (on the API) is 6,000,000 characters for 60 seconds per project and user. Therefore it is safe to assume that Google would protect the private API with much less than this. Google will not release the numbers for the private API (because it is supposed to only be used by the website itself, therefore they aren't willing to give any information about it since only they are supposed to know it anyways). If you need a more specific answer, I suggest you try and find
more accurate API limits and assume it is much less than whatever number it is.
I'm not sure but according to this source it seems to be 500,000 characters for free user.

fastest way to interact with a website in python

I know we can use selenium with chrome driver as a high level website interaction tool, we can speed it up with Phantomjs. Then we can use the requests module to be even faster. But that's where my knowledge stops.
whats the fastest possible way in python that someone can do post and get requests? I assume there is a lower level library than requests? do we use sockets and packets?
'To execute the request as fast as possible'
if python's requests lib is the fastest is there other libs in other programming languages such as c++ that worth a look at?
It really depends on the task, for web scraping 1000 pages its fine but when you need to requests.post 1000000+ it adds up. I've also looked into the multiprocessing lib. It helps a lot using all computational resources I have but traversing the network and waiting for the response is the thing that takes the longest. I would have thought the best way to increase speed is by sending and receiving less data. lets say receive only 5 input parameters and send only those 5 as a post back and wait for a 200 response. any ideas how can i do this without receiving all source code?
Thanks!

Dryscrape (py): 'Operation on socket not supported'

Because of the huge hassles with finding a good scraping solution for Py, I'm using Dryscrape. I can't seem to get it to consistently work through a proxy, however. Some sites causes it to throw the following:
InvalidResponseError: Error while loading URL
https://apis.google.com/js/plusone.js: Operation on socket is not
supported (error code 99)
I guess it's some kind of proxy protection thingy, but I'm not breaking any TOS or anything. Only some sites do this, but the whole project is kind of relying on looking something up on the site daily. Does anyone have a solution?
It's really hard to tell without any code and knowing what you are trying to accomplish. But if you are trying to scrape a lot of pages at once, try throttling back the # of current connections to your proxy. Does it occur on the same page(s) each attempt?

Python GData lib causes too much redirects

I'm using python GData to work with Google Calendar. I'm making really simple requests (e.g. create event), using OAuth authorization.
Usually this works OK, but sometimes I'm receiving lots of 302 redirects, that leads to "Maximum redirects count reached" exception.
If I re-try same request, it's usually works correct.
I can't figure out, why is this happening, looks like it's a random event.
As a walkthrough I wrote a code which retries requests few times, if there is such error, but may be there is an explanation of this behavior or even solutions to evade it?
Answer from Google support forum:
This might happen due to some issues in the Calendar servers and is not an error on your part. The best way to "resolve" this issue is to retry again.

Google OAuth2 callbacks calls me without parameters

I have been running a service for some months now using Google's
OAuth2 authentication. Most of the time everything works well, but
there are occasional issues with callbacks coming back empty from Google to me: Something along the lines of 1 out of 15 callbacks arrives at my service completely without parameters in the GET request. Just a naked /my-callback-url request, no parameters at all.
I'm having quite some difficulty explaining this behaviour and neither can I find many references to it when searching the net.
I'm also so far unable to re-create this phenomenon in my own development environment, so my solution ideas have had to be mostly speculation: My first hunch at a quick-n-dirty work around was to re-generate the OAuth request URL and return a 302 redirect response back so Google can have another go. But that sounds like taking the risk of creating an infinite redirect loop if it would turn out that the problem originates from my code. I would very much prefer to understand what's actually going on.
Do any of you have experience of 'empty' OAuth2 callbacks from Google?
And in that case, what would be the most sensible way of handling
them? Or are there a typical error when generating the authentication
URL's that causes this behaviour (I'm using Python &
Requests-OAuthlib)
to handle my OAuth2 interaction).
I suspect that these requests are not redirects back from Google. There are crawlers and other hackers trying to hit every endpoint that they find on the web. So these could be just abusive requests.
If you can correlate the request with an empty parameter with a request that redirected from your server (based on IP address or a cookie you set before redirecting to Google) then we can try to investigate further.

Categories