GitHub blocks my access token after calling 3-4 search API

GitHub blocks my access token after calling 3-4 search API - python

I am using PyCurl to call GitHub search API and extract some information. here is the code snippet to call API.
from io import BytesIO
import pycurl
url = f"https://api.github.com/search/code?q=import%2Bkeras+size:1..100+language:python&page=1&per_page=100"
output = BytesIO()
request = pycurl.Curl()
request.setopt(pycurl.HTTPHEADER, [f'Authorization: token {access_token}'])
request.setopt(request.URL, url)
request.setopt(request.WRITEDATA, output)
request.perform()
The problem is GitHub blocks my access token after just 3-4 requests. But in GitHub documentation, 5000 requests per hour is mentioned as the limitation for the number of requests.
I am using Python 3.8 and PyCurl 7.44.1.
Do you have any idea to resolve this problem?

"The Search API has a custom rate limit, separate from the rate limit governing the rest of the REST API"
You can check your rate limit status like this:
curl \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/rate_limit

GitHub has a different rate limit for search requests because they are substantially more expensive than a normal API call. You can query them at the endpoint https://api.github.com/rate_limit.
However, in your case, you're seeing a secondary rate limit, which means that something you're doing looks suspicious and you're getting blocked for that reason. The only way you can find out why that is would be to contact GitHub Support.
I will point out that it's a best practice to use a unique, identifying User-Agent header so that your traffic can be distinguished from other traffic. That may or may not help here, but libcurl is a very common user-agent, and since there will be a nontrivial number of people using it for abusive purposes, it's possible that your traffic got flagged by an automated system for that reason.

Eventually, I added a sleep of 5 seconds between each API request and the problem has been resolved.
Thanks for your contributions.

Related

translate.google.com requests limit from Python requests module?

I have a small script making a request to http://translate.google.com/m?hl=%s&sl=%s&q=%s as my base_link however I've seemed to have surpassed the requests limit
HTTPError: HTTP Error 429: Too Many Requests
I am looking for what the limit is for google translate. The answers out there point to the Google Translate API limits however not the limit if one is directly accessing from Python's requests module.
Let me know if I can provide some more information.

The request you are submitting using the url http://translate.google.com/m?hl=%s&sl=%s&q=%s is using the free service via http://translate.google.com. The limits you have been seeing online is related to Google Translate API which is under Google Cloud Platform. These 2 are totally different from each other.
Currently, there is no public document discussing the limits in using googletrans. But, Googletrans will block your IP address once the system detects that you have been exploiting the free translation service by submitting huge amounts of requests, resulting in
HTTPError: HTTP Error 429: Too Many Requests
.
You can use Google Cloud Platform’s Translation API which is a paid service. Using this, you can avoid low limits set in googletrans which will give you more flexibility with the amount of requests you can submit.

According to this answer, the content quota (on the API) is 6,000,000 characters for 60 seconds per project and user. Therefore it is safe to assume that Google would protect the private API with much less than this. Google will not release the numbers for the private API (because it is supposed to only be used by the website itself, therefore they aren't willing to give any information about it since only they are supposed to know it anyways). If you need a more specific answer, I suggest you try and find
more accurate API limits and assume it is much less than whatever number it is.

I'm not sure but according to this source it seems to be 500,000 characters for free user.

HTTP Error 429 Too Many Requests - delete cookies

I am using module google search for webscraping, but I got this error 429. I tried uninstall and install module again, but it didn't help. So my next idea is delete cookies, but I don't know how. Can you help me, please?
query = 'site:https://stackoverflow.com urllib.error.HTTPError: HTTP Error 429: Too Many Requests'
search_query = search(query=query, stop=10)
for url in search_query:
print(url)

429 Too Many Requests
The HTTP 429 Too Many Requests response status code indicates that the user has sent too many requests in a given amount of time ("rate limiting"). The response representations SHOULD include details explaining the condition, and MAY include a Retry-After header indicating how long to wait before making a new request.
Note that this specification does not define how the origin server identifies the user, nor how it counts requests. For example, an origin server that is limiting request rates can do so based upon counts of requests on a per-resource basis, across the entire server, or even among a set of servers. Likewise, it might identify the user by its authentication credentials, or a stateful cookie.
You're sending too many requests in a short period of time. The Custom Search API might be useful depending on your use scenario. If not then you might have to use proxies for your calls, or implement a wait and retry mechanism

There is resolution of my problem: https://support.google.com/gsa/answer/4411411#requests
How to disable cookie handling with the Python requests library?

How to change authentication type on Twitter API

We're building a simple script using the Tweepy Python library. We need to get some simple information on all of the accounts following our Twitter account.
Using the cursor Tweepy's built in this is quite easy, but we very rapidly hit the 15 request limit for the window. I've read that per-App (as opposed to per-user) authentication allows for 300 requests per minute, but I can't find anything in the Tweepy API that supports this.
Any pointers?

You should be able to use tweepy.AppAuthHandler in the same way that you're using OauthHandler.
auth = tweepy.AppAuthHandler(token, secret)
api = tweepy.API(auth)
For some reason, this isn't documented, but you can take a look at the code yourself on GitHub.
It also depends what kinds of requests you're making. There are no resources with a 15 request user limit that have a 300 request app limit. You can consult this chart to determine the limits for user and app auth for each endpoint. In any case, using them in conjunction will at least double your requests.

I'm not what information of your followers you are after exactly, but it might be part of the request followers/list. If so, you can set the field count to 200 which is the maximum value, while the default is only 20. That will save you some requests.

Client Digest Authentication Python with URLLIB2 will not remember Authorization Header Information

I am trying to use Python to write a client that connects to a custom http server that uses digest authentication. I can connect and pull the first request without problem. Using TCPDUMP (I am on MAC OS X--I am both a MAC and a Python noob) I can see the first request is actually two http requests, as you would expect if you are familiar with RFC2617. The first results in the 401 UNAUTHORIZED. The header information sent back from the server is correctly used to generate headers for a second request with some custom Authorization header values which yields a 200 OK response and the payload.
Everything is great. My HTTPDigestAuthHandler opener is working, thanks to urllib2.
In the same program I attempt to request a second, different page, from the same server. I expect, per the RFC, that the TCPDUMP will show only one request this time, using almost all the same Authorization Header information (nc should increment).
Instead it starts from scratch and first gets the 401 and regenerates the information needed for a 200.
Is it possible with urllib2 to have subsequent requests with digest authentication recycle the known Authorization Header values and only do one request?
[Re-read that a couple times until it makes sense, I am not sure how to make it any more plain]
Google has yielded surprisingly little so I guess not. I looked at the code for urllib2.py and its really messy (comments like: "This isn't a fabulous effort"), so I wouldn't be shocked if this was a bug. I noticed that my Connection Header is Closed, and even if I set it to keepalive, it gets overwritten. That led me to keepalive.py but that didn't work for me either.
Pycurl won't work either.
I can hand code the entire interaction, but I would like to piggy back on existing libraries where possible.
In summary, is it possible with urllib2 and digest authentication to get 2 pages from the same server with only 3 http requests executed (2 for first page, 1 for second).
If you happen to have tried this before and already know its not possible please let me know. If you have an alternative I am all ears.
Thanks in advance.

Although it's not available out of the box, urllib2 is flexible enough to add it yourself. Subclass HTTPDigestAuthHandler, hack it (retry_http_digest_auth method I think) to remember authentication information and define an http_request(self, request) method to use it for all subsequent requests (add WWW-Authenticate header).

Google OAuth2 callbacks calls me without parameters

I have been running a service for some months now using Google's
OAuth2 authentication. Most of the time everything works well, but
there are occasional issues with callbacks coming back empty from Google to me: Something along the lines of 1 out of 15 callbacks arrives at my service completely without parameters in the GET request. Just a naked /my-callback-url request, no parameters at all.
I'm having quite some difficulty explaining this behaviour and neither can I find many references to it when searching the net.
I'm also so far unable to re-create this phenomenon in my own development environment, so my solution ideas have had to be mostly speculation: My first hunch at a quick-n-dirty work around was to re-generate the OAuth request URL and return a 302 redirect response back so Google can have another go. But that sounds like taking the risk of creating an infinite redirect loop if it would turn out that the problem originates from my code. I would very much prefer to understand what's actually going on.
Do any of you have experience of 'empty' OAuth2 callbacks from Google?
And in that case, what would be the most sensible way of handling
them? Or are there a typical error when generating the authentication
URL's that causes this behaviour (I'm using Python &
Requests-OAuthlib)
to handle my OAuth2 interaction).

I suspect that these requests are not redirects back from Google. There are crawlers and other hackers trying to hit every endpoint that they find on the web. So these could be just abusive requests.
If you can correlate the request with an empty parameter with a request that redirected from your server (based on IP address or a cookie you set before redirecting to Google) then we can try to investigate further.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.