I have a small script making a request to http://translate.google.com/m?hl=%s&sl=%s&q=%s as my base_link however I've seemed to have surpassed the requests limit
HTTPError: HTTP Error 429: Too Many Requests
I am looking for what the limit is for google translate. The answers out there point to the Google Translate API limits however not the limit if one is directly accessing from Python's requests module.
Let me know if I can provide some more information.
The request you are submitting using the url http://translate.google.com/m?hl=%s&sl=%s&q=%s is using the free service via http://translate.google.com. The limits you have been seeing online is related to Google Translate API which is under Google Cloud Platform. These 2 are totally different from each other.
Currently, there is no public document discussing the limits in using googletrans. But, Googletrans will block your IP address once the system detects that you have been exploiting the free translation service by submitting huge amounts of requests, resulting in
HTTPError: HTTP Error 429: Too Many Requests
.
You can use Google Cloud Platform’s Translation API which is a paid service. Using this, you can avoid low limits set in googletrans which will give you more flexibility with the amount of requests you can submit.
According to this answer, the content quota (on the API) is 6,000,000 characters for 60 seconds per project and user. Therefore it is safe to assume that Google would protect the private API with much less than this. Google will not release the numbers for the private API (because it is supposed to only be used by the website itself, therefore they aren't willing to give any information about it since only they are supposed to know it anyways). If you need a more specific answer, I suggest you try and find
more accurate API limits and assume it is much less than whatever number it is.
I'm not sure but according to this source it seems to be 500,000 characters for free user.
Related
I've been recently working on a project in which I need to access a asp.net web API in order to get some data. The way I've been gaining access to this API so far is by manually setting the cookies manually within the code and then using requests to get the information that I need. My task now is to automate this process. I get the cookies by using the Chrome developer tools, in the network tab. Now obviously the cookies change every once in a while so I've been trying to make something that will automatically change the cookies inside.
I should mention that the network at which this is being done is air-gaped and getting python libraries inside is kind of tedious, so I am trying to avoid that. It is also the reason why getting code examples here is very complicated.
The way the log-in process works in this web app is as follows (data from chrome dev tools):
Upon entering the URL there are a bunch of redirects which seem to do nothing.
A request is made to /login.aspx which returns a "set-cookie: 'sessionId=xyz'" header and redirects to /LandingPage.aspx
A request is made to /LandingPage.aspx with said cookie which returns a "set-cookie" header with a bunch of cookies (ASP.NET etc'). These are the cookies that I need in order to make the python script access the API.
What's written above is the browser way of doing things, when I try to imitate this in python requests, I get the first cookie from /login.aspx but when it redirects to /LandingPage.aspx, I get a 401 Unauthorized with the following headers:
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
After having done some reading I understood that these response headers are related to NTLM and Kerberos protocols (side question: if it responds with both headers does it mean that I need to provide both authentications or that either one will suffice?).
Quick google search yielded that after these mentioned responses should follow a request with the Kerberos/NTLM token (which I have no idea how to acquire) in order to get a 200 response. I find this pretty weird considering the browser doesn't make any of these requests and the web app just gives it the cookies without it seemingly transferring any NTLM or Kerberos data.
I've thought of a few ways to overcome this and hopefully you could help me figure out whether this would work.
Trying to get the requests-kerberos or requests-ntlm libraries for python and using those to overcome this problem. I would like your opinion to whether this would work. I am reluctant to use this method though, because of what was mentioned above.
Somehow using PowerShell to get these tokens and then somehow using these tokens in python requests without the above mentioned libraries. But I have no idea if this would work either.
I would very much appreciate anyone who could maybe further explain the process that's happening here in general, and of course would greatly appreciate any help with solving this.
Thank you very much!
Trying to get the requests-kerberos or requests-ntlm libraries for python and using those to overcome this problem. I would like your opinion to whether this would work. I am reluctant to use this method though, because of what was mentioned above.
Yes, requests-kerberos would work. HTTP Negotiate means Kerberos almost 100% of the time.
For Linux I'd slightly prefer requests-gssapi, which is based on a more maintained 'gssapi' backend, but at the moment it's limited to Unix-ish systems only – while requests-kerberos has the advantage of supporting Windows through the 'winkerberos' backend. But it doesn't really matter; both will do the job fine.
Don't use NTLM if you can avoid it. Your domain admins will appreciate being able to turn off NTLM domain-wide as soon as they can.
Somehow using PowerShell to get these tokens and then somehow using these tokens in python requests without the above mentioned libraries. But I have no idea if this would work either.
Technically it's possible, but doing this via PowerShell (or .NET in general) is going the long way around. You can achieve exactly the same thing using Python's sspi module, which talks directly to the actual Windows SSPI interface that handles Kerberos ticket acquisition (and NTLM, for that matter).
(The gssapi module is the Linux equivalent, and the spnego module is a cross-platform wrapper around both.)
You can see a few examples here – OP has a .NET example, the answer has Python.
But keep in mind that Kerberos tokens contain not only the service ticket but also a one time use authenticator (to prevent replay attacks), so you need to get a fresh token for every HTTP request.
So don't reinvent the wheel and just use requests-kerberos, which will automatically call SSPI to get a token whenever needed.
it says that in order for requests-kerberos to work there has to be a TGT cached already on the PC. This program is supposed to run for weeks without being interfered with and to my understanding these tickets expire after about 10 hours.
That's typical for all Kerberos use, not just requests-kerberos specifically.
If you run the app on Windows, from an interactive session, then Windows will automatically renew Kerberos tickets as needed (it keeps your password cached in LSA memory for that purpose). However, don't run long-term tasks in interactive sessions...
If you run the app on Windows, as a service, then it will use the "machine credentials" aka "computer account" (see details), and again LSA will keep the tickets up-to-date.
If you run the app on Linux, then you can create a keytab that stores the client credentials for the application. (This doesn't need domain admin rights, you only need to know the app account's password.)
On Linux there are at least 4 different ways to use a keytab for long-term jobs: k5start (third-party, but common); KRB5_CLIENT_KTNAME (built-in to MIT Kerberos, but only in recent versions); gss-proxy (from RedHat, might already be part of the OS); or a basic cronjob that just re-runs kinit to acquire new tickets every 4-6 hours.
I find this pretty weird considering the browser doesn't make any of these requests and the web app just gives it the cookies without it seemingly transferring any NTLM or Kerberos data.
It likely does, you might be overlooking it.
Note that some SSO systems use JavaScript to dynamically probe for whether the browser has Kerberos authentication properly set up – if the main page really doesn't send a token, then it might be an iframe or an AJAX/XHR request that does.
I was making some test http requests using Python's request library. When searching for Walmart's Canadian site (www.walmart.ca), I got this:
How do servers like Walmart's detect that my request is being made programatically? I understand browsers send all sorts of metadata to the server. I was hoping to get a few specific examples of how this is commonly done. I've found a similar question, albeit related to Selenium Web Driver, here where it claims that there are some vendors that provide this service but I was hoping to get something a bit more specific.
Appreciate any insights, thanks.
As mentioned in the comments, a real browsers send many different values - headers, cookies, data. It reads from server not only HTML but also images, CSS, JS, fonts. Browser can also run JavaScript which can get other information about browser - version, extensions, data in local storage, etc (i.e how you move mouse). And real human loads/visits pages with random delays and in rather in random order. And all these elements can be used to detect a script. Servers may use very complex systems even Machine Learning (Artificial Intelligence) and use data from few mintues or hours to compare your behavior.
I am using PyCurl to call GitHub search API and extract some information. here is the code snippet to call API.
from io import BytesIO
import pycurl
url = f"https://api.github.com/search/code?q=import%2Bkeras+size:1..100+language:python&page=1&per_page=100"
output = BytesIO()
request = pycurl.Curl()
request.setopt(pycurl.HTTPHEADER, [f'Authorization: token {access_token}'])
request.setopt(request.URL, url)
request.setopt(request.WRITEDATA, output)
request.perform()
The problem is GitHub blocks my access token after just 3-4 requests. But in GitHub documentation, 5000 requests per hour is mentioned as the limitation for the number of requests.
I am using Python 3.8 and PyCurl 7.44.1.
Do you have any idea to resolve this problem?
"The Search API has a custom rate limit, separate from the rate limit governing the rest of the REST API"
You can check your rate limit status like this:
curl \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/rate_limit
GitHub has a different rate limit for search requests because they are substantially more expensive than a normal API call. You can query them at the endpoint https://api.github.com/rate_limit.
However, in your case, you're seeing a secondary rate limit, which means that something you're doing looks suspicious and you're getting blocked for that reason. The only way you can find out why that is would be to contact GitHub Support.
I will point out that it's a best practice to use a unique, identifying User-Agent header so that your traffic can be distinguished from other traffic. That may or may not help here, but libcurl is a very common user-agent, and since there will be a nontrivial number of people using it for abusive purposes, it's possible that your traffic got flagged by an automated system for that reason.
Eventually, I added a sleep of 5 seconds between each API request and the problem has been resolved.
Thanks for your contributions.
We're building a simple script using the Tweepy Python library. We need to get some simple information on all of the accounts following our Twitter account.
Using the cursor Tweepy's built in this is quite easy, but we very rapidly hit the 15 request limit for the window. I've read that per-App (as opposed to per-user) authentication allows for 300 requests per minute, but I can't find anything in the Tweepy API that supports this.
Any pointers?
You should be able to use tweepy.AppAuthHandler in the same way that you're using OauthHandler.
auth = tweepy.AppAuthHandler(token, secret)
api = tweepy.API(auth)
For some reason, this isn't documented, but you can take a look at the code yourself on GitHub.
It also depends what kinds of requests you're making. There are no resources with a 15 request user limit that have a 300 request app limit. You can consult this chart to determine the limits for user and app auth for each endpoint. In any case, using them in conjunction will at least double your requests.
I'm not what information of your followers you are after exactly, but it might be part of the request followers/list. If so, you can set the field count to 200 which is the maximum value, while the default is only 20. That will save you some requests.
I have been running a service for some months now using Google's
OAuth2 authentication. Most of the time everything works well, but
there are occasional issues with callbacks coming back empty from Google to me: Something along the lines of 1 out of 15 callbacks arrives at my service completely without parameters in the GET request. Just a naked /my-callback-url request, no parameters at all.
I'm having quite some difficulty explaining this behaviour and neither can I find many references to it when searching the net.
I'm also so far unable to re-create this phenomenon in my own development environment, so my solution ideas have had to be mostly speculation: My first hunch at a quick-n-dirty work around was to re-generate the OAuth request URL and return a 302 redirect response back so Google can have another go. But that sounds like taking the risk of creating an infinite redirect loop if it would turn out that the problem originates from my code. I would very much prefer to understand what's actually going on.
Do any of you have experience of 'empty' OAuth2 callbacks from Google?
And in that case, what would be the most sensible way of handling
them? Or are there a typical error when generating the authentication
URL's that causes this behaviour (I'm using Python &
Requests-OAuthlib)
to handle my OAuth2 interaction).
I suspect that these requests are not redirects back from Google. There are crawlers and other hackers trying to hit every endpoint that they find on the web. So these could be just abusive requests.
If you can correlate the request with an empty parameter with a request that redirected from your server (based on IP address or a cookie you set before redirecting to Google) then we can try to investigate further.