How to get redirect url? - python

I am using urllib.request to perform a sequence of http calls in python 3.6. I need to retrieve the value of a 302 http redirect that is returned in response to a urllib.request.urlopen call like so...
import urllib.request
... many previous http calls ...
post_data = {'foo': 'bar', 'some': 'otherdata'}
encoded = urllib.parse.urlencode(post_data).encode('utf-8')
req = urllib.request.Request('https://some-url', encoded)
redirected_url = urllib.request.urlopen(req).geturl()
I get an error like...
urllib.error.HTTPError: HTTP Error 302: Found - Redirection to url 'gibberish://login_callback?code=ABCD......' is not allowed
What I need is to actually get the url that is being returned in the 302 as the .geturl() method should provide, but instead I get an error.
Please no answers like "Hey use this other library that I'm super into right now" as we've spent a long time building this script using urllib2 and we have very little python knowledge.
Thanks for your help.

If you dont want to use the requests library (which is almost part of the core libs at this point), you need to write a custom HTTPRedirectHandler using urllib2.
import urllib2
class CustomHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
### DO YOUR STUFF HERE
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
opener = urllib2.build_opener(CustomHTTPRedirectHandler)
post_data = {'foo': 'bar', 'some': 'otherdata'}
encoded = urllib.parse.urlencode(post_data).encode('utf-8')
req = urllib.request.Request('https://some-url', encoded)
opener.urlopen(req)

Related

How to print text of POST request without making request

If I make the request
api-key = 'asdfklhsdfkjahsdlgkjahlkdjahfsa'
url = 'https://www.website.com'
headers = {'api-key': api-key,
'Content-Type': 'application/json'}
request_data = {'foo': 'bar', 'egg': 'spam'}
result = requests.post(url, headers=headers, data=request_data)
The server is contacted. Suppose that instead I want to do something like
request_string = requests.foobar(url, headers=headers, data=request_data)
import os
os.system('curl ' + request_string)
So that I can look to see what the request is doing without bothering the server (possibly to the point that I could c&p it into curl), what would foobar be? Or in general, what is a way to inspect the contents of the request without making it?
Here's another post that implies that you can use Request().prepare() to observe the request without actually sending the request.
Furthermore the official documentation reads "In some cases you may wish to do some extra work to the body or headers (or anything else really) before sending a request. The simple recipe for this is the following" and then it illustrates Request.prepare()

HTTP Error 307: Temporary Redirect in Python3 - INTRANET [duplicate]

I'm using Python 3.7 with urllib.
All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).
This is the error i get:
ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect
I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.
These is the piece of code i use to perform the request:
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
self.logger.debug(req.headers)
self.logger.info(req.data)
resp = urllib.request.urlopen(req)
url is an https resource and i set an header with some Authhorization info and content-type.
req.data is a JSON
From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL.
I've also tried to use an opener specifiyng the default redirect handler, but with the same result
opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
resp = opener.open(req)
What could be the problem?
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
How to handle 307 redirection using urllib2 from http to https
HTTP Error 307: Temporary Redirect in Python3 - INTRANET
HTTP Error 307 - Temporary redirect in python script

How to retrieve json file from Twiter Search API with CURL using urllib library

I just learnt how to access data thru application API and I stumble to a trouble.
I am currently trying to retrieve json file from Twitter Search API. The documentation of the API says CURL is used in order to access the data through HTTP GET. Below is the curl format:
curl "https://api.twitter.com/1.1/tweets/search/:product/:label.json?query=TwitterDev%20%5C%22search%20api%5C%22&maxResults=500&fromDate=<yyyymmddhhmm>&toDate=<yyyymmddhhmm>" -H "Authorization: Bearer TOKEN"
-curl api format
I have already tried to access it using urllib but still got error message during the run.
Below is the code I used.
import urllib.request, urllib.parse, urllib.error
import twurl
import ssl
import json
#TWITTER_URL = 'https://api.twitter.com/1.1/statuses/user_timeline.json'
TWITTER_URL = 'https://api.twitter.com/1.1/tweets/search/30day/data1.json'
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
while True:
#acct = input('Enter Twitter Account: ')
#if (len(acct) < 1): break
parameters = {
#'screen_name': acct,
#'COunT': '3',
'query': 'inline skate',
'fromDate': '201906010000',
'toDate': '201906102359',
'maxResults': '20',
}
url = twurl.augment(TWITTER_URL, parameters)
print('\nRetrieving', url, '\n')
connection = urllib.request.urlopen(url, context=ctx)
data = connection.read().decode()
js = json.loads(data)
jsdmp = json.dumps(js, indent=2)
print(jsdmp, '\n')
#for data in js:
#print(data['text'])
headers = dict(connection.getheaders())
print('\nRemaining', headers['x-rate-limit-remaining'], '\n')
break
Here's the error message I keep getting:
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Unprocessable Entity
It works when I try to access user_timeline tweets retriever above
TWITTER_URL = 'https://api.twitter.com/1.1/statuses/user_timeline.json'
But it doesn't with the search API I guess it is because of the CURL.
I already check the parameter I use is necessary as the docs tell to use.
Also, the twurl library used to access the token and process the url and parameters simultaneously to produce final url. (as far as I understand)
Below is the twurl code:
import urllib.request, urllib.parse, urllib.error
import oauth
import hidden
# https://apps.twitter.com/
# Create App and get the four strings, put them in hidden.py
def augment(url, parameters):
secrets = hidden.oauth()
consumer = oauth.OAuthConsumer(secrets['consumer_key'], secrets['consumer_secret'])
token = oauth.OAuthToken(secrets['token_key'], secrets['token_secret'])
oauth_request = oauth.OAuthRequest.from_consumer_and_token(consumer,
token, http_method='GET', http_url=url,
parameters=parameters)
#below is the main function galls!!
oauth_request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
consumer, token)
return oauth_request.to_url()
Below is hidden, where all the key and token stored:
def oauth():
return {"consumer_key": "2HZq407wF.................",
"consumer_secret": "OsemLubDmCcQq5Y3q............",
"token_key": "75230340-2SGPJWWn..............",
"token_secret": "NZcII332Y3EI.............."}
I have looked for solution and most of them seem to use urllib2 for CURL which I dont think is compatible with python 3?
Do you have any suggestion? I kind of stuck in this step and not going anywhere.
Thanks yall

Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?

According to the urllib2 documentation,
Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.
And yet the following code
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
raises an HTTPError with code 201 (created):
ERROR 2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created
So why is urllib2 throwing HTTPErrors on this successful request?
It's not too much of a pain; I can easily extend the code to:
try:
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
except HTTPError, e:
if e.code == 201:
# success! :)
else:
# fail! :(
else:
# when will this happen...?
But this doesn't seem like the intended behavior, based on the documentation and the fact that I can't find similar questions about this odd behavior.
Also, what should the else block be expecting? If successful status codes are all interpreted as HTTPErrors, then when does urllib2.urlopen() just return a normal file-like response object like all the urllib2 documentation refers to?
You can write a custom Handler class for use with urllib2 to prevent specific error codes from being raised as HTTError. Here's one I've used before:
class BetterHTTPErrorProcessor(urllib2.BaseHandler):
# a substitute/supplement to urllib2.HTTPErrorProcessor
# that doesn't raise exceptions on status codes 201,204,206
def http_error_201(self, request, response, code, msg, hdrs):
return response
def http_error_204(self, request, response, code, msg, hdrs):
return response
def http_error_206(self, request, response, code, msg, hdrs):
return response
Then you can use it like:
opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)
req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)
As the actual library documentation mentions:
For 200 error codes, the response object is returned immediately.
For non-200 error codes, this simply passes the job on to the protocol_error_code handler methods, via OpenerDirector.error(). Eventually, urllib2.HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.
http://docs.python.org/library/urllib2.html#httperrorprocessor-objects
I personally think it was a mistake and very nonintuitive for this to be the default behavior.
It's true that non-2XX codes imply a protocol level error, but turning that into an exception is too far (in my opinion at least).
In any case, I think the most elegant way to avoid this is:
opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
opener.process_response['https'].remove(processor)
break # there's only one such handler by default
response = opener.open('https://www.google.com')
Now you have the response object. You can check it's status code, headers, body, etc.

Making a POST call instead of GET using urllib2

There's a lot of stuff out there on urllib2 and POST calls, but I'm stuck on a problem.
I'm trying to do a simple POST call to a service:
url = 'http://myserver/post_service'
data = urllib.urlencode({'name' : 'joe',
'age' : '10'})
content = urllib2.urlopen(url=url, data=data).read()
print content
I can see the server logs and it says that I'm doing GET calls, when I'm sending the data
argument to urlopen.
The library is raising an 404 error (not found), which is correct for a GET call, POST calls are processed well (I'm also trying with a POST within a HTML form).
Do it in stages, and modify the object, like this:
# make a string with the request type in it:
method = "POST"
# create a handler. you can specify different handlers here (file uploads etc)
# but we go for the default
handler = urllib2.HTTPHandler()
# create an openerdirector instance
opener = urllib2.build_opener(handler)
# build a request
data = urllib.urlencode(dictionary_of_POST_fields_or_None)
request = urllib2.Request(url, data=data)
# add any other information you want
request.add_header("Content-Type",'application/json')
# overload the get method function with a small anonymous function...
request.get_method = lambda: method
# try it; don't forget to catch the result
try:
connection = opener.open(request)
except urllib2.HTTPError,e:
connection = e
# check. Substitute with appropriate HTTP code.
if connection.code == 200:
data = connection.read()
else:
# handle the error case. connection.read() will still contain data
# if any was returned, but it probably won't be of any use
This way allows you to extend to making PUT, DELETE, HEAD and OPTIONS requests too, simply by substituting the value of method or even wrapping it up in a function. Depending on what you're trying to do, you may also need a different HTTP handler, e.g. for multi file upload.
This may have been answered before: Python URLLib / URLLib2 POST.
Your server is likely performing a 302 redirect from http://myserver/post_service to http://myserver/post_service/. When the 302 redirect is performed, the request changes from POST to GET (see Issue 1401). Try changing url to http://myserver/post_service/.
Have a read of the urllib Missing Manual. Pulled from there is the following simple example of a POST request.
url = 'http://myserver/post_service'
data = urllib.urlencode({'name' : 'joe', 'age' : '10'})
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
print response.read()
As suggested by #Michael Kent do consider requests, it's great.
EDIT: This said, I do not know why passing data to urlopen() does not result in a POST request; It should. I suspect your server is redirecting, or misbehaving.
The requests module may ease your pain.
url = 'http://myserver/post_service'
data = dict(name='joe', age='10')
r = requests.post(url, data=data, allow_redirects=True)
print r.content
it should be sending a POST if you provide a data parameter (like you are doing):
from the docs:
"the HTTP request will be a POST instead of a GET when the data parameter is provided"
so.. add some debug output to see what's up from the client side.
you can modify your code to this and try again:
import urllib
import urllib2
url = 'http://myserver/post_service'
opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))
data = urllib.urlencode({'name' : 'joe',
'age' : '10'})
content = opener.open(url, data=data).read()
Try this instead:
url = 'http://myserver/post_service'
data = urllib.urlencode({'name' : 'joe',
'age' : '10'})
req = urllib2.Request(url=url,data=data)
content = urllib2.urlopen(req).read()
print content
url="https://myserver/post_service"
data["name"] = "joe"
data["age"] = "20"
data_encoded = urllib2.urlencode(data)
print urllib2.urlopen(url + "?" + data_encoded).read()
May be this can help

Categories