urllib2.urlopen() returning a different result

urllib2.urlopen() returning a different result - python

I'm trying to fill out a form using a python program, it works well for some sites, but not this particular one, i'm not sure why.
this is the code snipet
query = {
'adults':'1',
'children':'0',
'infants':'0',
'trip':'RT',
'deptCode':'LOS',
'arrvCode':'ABV',
'searchType':'D',
'deptYear':'2011',
'deptMonth':'12',
'deptDay':'10',
'retYear':'2011',
'retMonth':'12',
'retDay':'11',
'cabin':'E',
'currency':'NGN',
'deptTime':'',
'arrvTime':'',
'airlinePref':''}
encoded = urllib.urlencode(query)
url = 'http://www.wakanow.com/ng/flights/SearchProcess.aspx?' + encoded
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(url, encoded, headers)
response = urllib2.urlopen(req)
print 'RESPONSE:', response
print 'URL :', response.geturl()
headers = response.info()
print 'DATE :', headers['date']
print 'HEADERS :'
print '---------'
print headers
data = response.read()
print 'LENGTH :', len(data)
print 'DATA :'
print '---------'
print data
It all works fine, but the result i get is not what appears if i type the entire url into a web browser directly, that gives me the right result.
I'm not sure what the problem is can anyone help me out?

You're likely doing a GET in your browser, but in your code, you're actually doing a POST to a URL with the query data AND with your query data as the POST data. You probably just want to do a GET. From this page,
urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])
data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
So, what you really want, is:
req = urllib2.Request(url, headers=headers)

If the 2nd parameter (data) for urllib2.Request is provided then urllib2.urlopen(req) makes a POST request.
Use encoded either in url (GET) or as data in urllib2.Request (POST) not both i.e,
either GET request:
url = 'http://www.wakanow.com/ng/flights/SearchProcess.aspx?' + encoded
req = urllib2.Request(url, headers=headers) #NOTE: no `encoded`
or POST request:
url = 'http://www.wakanow.com/ng/flights/SearchProcess.aspx' #NOTE: no `encoded`
req = urllib2.Request(url, data=encoded, headers=headers)

This url hangs. Try it with a less heavy search string.
And you may consider controlling this with a timeout:
import urllib,urllib2,socket
timeout = 10
socket.setdefaulttimeout(timeout)

Related

Am I sending a date object correctly to the request body? Python

I am sending a POST request via python urllib and urllib2 libraries.
I am able to send the request, but it is ignoring the dates (values).
In the documentation, it says I need to pass the date object on the request body. Bellow is the code I am using.
url = 'https://api.kenshoo.com/v2/reports/5233/runs/?ks=105'
values = {'dateRange': {'from':'2015-09-22', 'to':'2015-09-22'}}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
req.add_header('Content-Type', 'application/json; charset=utf-8')
req.add_header('Content-Length', 0)
response = urllib2.urlopen(req)
From the API documentation, this is what I know about the date format.
"The request body must contain a dates range in YYYY-MM-DD format, i.e.
{"dateRange":{"from":"2014-10-20", "to":"2014-10-22"}}
The complete documentation for the request can be found here
http://docs.api.kenshoo.com/#!/Reports/runReport

You should send JSON-formatted document, not urlencoded data:
url = 'https://api.kenshoo.com/v2/reports/5233/runs/?ks=105'
values = {'dateRange': {'from':'2015-09-22', 'to':'2015-09-22'}}
req = urllib2.Request(url)
req.add_header('Content-Type', 'application/json')
response = urllib2.urlopen(req, json.dumps(values))

Requests - determine parameterised url prior to issuing request, for inclusion in Referer header

I am writing a Python 2.7 script using Requests to automate access to a particular website. The website has a requirement that a Referer header matching the request URL is provided, for "security reasons". The URL is built up from a number of items in a params dict, passed to requests.post().
Is there a way to determine what the URL that Requests will use is, prior to making the request, so that the Referer header can be set to this correct value? Let's assume that I have a lot of parameters:
params = { 'param1' : value1, 'param2' : value2, # ... etc
}
base_url = "http://example.com"
headers = { 'Referer' : url } # but what is 'url' to be?
requests.post(base_url, params=params, headers=headers) # fails as Referer does not match final url
I suppose one workaround is to issue the request and see what the URL is, after the fact. However there are two problems with this - 1. it adds significant overhead to the execution time of the script, as there will be a lot of such requests, and 2. it's not actually a useful workaround because the server actually redirects the request to another URL, so reading it afterwards doesn't give the correct Referer value.
I'd like to note that I have this script working with urllib/urllib2, and I am attempting to write it with Requests to see whether it is possible and perhaps simpler. It's not a complicated process the script has to follow, but it may perhaps be slightly beyond the scope of Requests. That's fine, I'd just like to confirm that this is the case.

I think I found a solution, based on Prepared Requests. The concept is that Session.prepare_request() will do everything to prepare the request except send it, which allows my script to then read the prepared request's url, which now includes the parameters whose order are determined by the dict order. Then it can set the Referer header appropriately and then issue the original request.
params = {'param1' : value1, 'param2' : value2, # ... etc
}
url = "http://example.com"
# Referer must be correct
# To determine correct Referer url, prepare a request without actually sending it
req = requests.Request('POST', url, params=params)
prepped = session.prepare_request(req)
#r = session.send(prepped) # don't actually send it
# add the Referer header by examining the prepared url
headers = { 'Referer': prepped.url }
# now send normally
r = session.post(url, params=params, data=data, headers=headers)

it looks like you've found correctly the prepare_request feature in Requests.
However, if you still wanted to use your initial method, I believe you could use your base_url as your Referer:
base_url = "http://example.com"
headers = { 'Referer' : base_url }
requests.post(base_url, params=params, headers=headers)
I suspect this because your POST has the PARAMS directly attached to the base_url. If, for example, you were on:
http://www.example.com/trying-to-send-upload/
adding some params to this POST, you would then use:
referer = "http://www.example.com/trying-to-send-something/"
headers = { 'Referer' : referer, 'Host' : 'example.com' }
requests.post(referer, params=params, headers=headers)
ADDED
I would check my URL visually by using a simple statement after you've created the URL string:
print(post_url)
If this is good, you should print out the details of the reply from the server you're posting to, as it might also give you some hints as to why your query was rejected:
s = requests.post(referer, params=params, headers=headers)
print(s.status_code)
print(s.text)
Love to hear if this works as well for you.

Python: How to redirect from a Python CGI script to a PHP page and keep POST data

I have an upload.php page that sends some data to a Python CGI script through a form, I then process the data in the background and I want to redirect to another php page, response_page.php, which displays info based on the processed data. I cannot send the data back to PHP and make the redirect at the same time, though.
My code:
#!/usr/bin/env python
import cgi
import cgitb
cgitb.enable()
try:
form = cgi.FieldStorage()
fn = form.getvalue('picture_name')
cat_id = form.getvalue('selected')
except KeyError:
print "Content-type: text/html"
print
print "<html><head>"
print "</head><body>error</body></html>"
else:
...
# here I processed the form data and stored it in data_to_be_displayed
# data to be processed and displayed in the response page
data_to_be_displayed = [1,2,3]
import httplib, json, urllib
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
conn = httplib.HTTPConnection('192.168.56.101:80')
#converting list to a json stream
data_to_be_displayed = json.dumps(data_to_be_displayed, ensure_ascii = 'False')
postData = urllib.urlencode({'matches':data_to_be_displayed})
conn.request("POST", "/response_page.php", postData, headers)
response = conn.getresponse()
if response.status == 200:
print "Location: /response_page.php"
print # to end the CGI response headers.
conn.close()
I found this: How to make python urllib2 follow redirect and keep post method , but I don't understand how I should use the urllib2.HTTPRedirectHandlerClass in my code.

Why don't you post to response_page.php using liburl2?
import urllib
import urllib2
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
data_to_be_displayed = json.dumps(data_to_be_displayed, ensure_ascii = 'False')
postData = urllib.urlencode({'matches':data_to_be_displayed})
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()
For reference I've used the idea from pythons' documentation: https://docs.python.org/2/howto/urllib2.html#headers
you might also consider using Twisted apt for higher level code:
https://twistedmatrix.com/
EDIT:
After understanding better what are your asking for, I've found this post referring that redirect 307 is EXACTLY what you want (if now I understand right):
https://softwareengineering.stackexchange.com/questions/99894/why-doesnt-http-have-post-redirect

Retrieve only HTTP response instead of debug info

I've the request below which works fine but it returns sent request and server full response. How can I just get response body ?
req = urllib2.Request('some url here', data = '')
opener = urllib2.build_opener(urllib2.HTTPSHandler(debuglevel = 1))
req = opener.open(req)
reply = req.read()
req.close()
print reply

That extra information is explicitly requested with debuglevel=1. You can simply remove that, or even the whole opener definition, like this:
import urllib2
print (urllib2.urlopen('https://phihag.de/').read())

Problem making a GET request and spoof User-Agent in urllib2

With this code, urllib2 make a GET request:
#!/usr/bin/python
import urllib2
req = urllib2.Request('http://www.google.fr')
req.add_header('User-Agent', '')
response = urllib2.urlopen(req)
With this one (which is almost the same), a POST request:
#!/usr/bin/python
import urllib2
headers = { 'User-Agent' : '' }
req = urllib2.Request('http://www.google.fr', '', headers)
response = urllib2.urlopen(req)
My question is: how can i make a GET request with the second code style ?
The documentation (http://docs.python.org/release/2.6.5/library/urllib2.html) says that
headers should be a dictionary, and
will be treated as if add_header() was
called with each key and value as
arguments
Yeah, except that in order to use the headers parameter, you have to pass data, and when data is passed, the request become a POST.
Any help will be very appreciated.

Use:
req = urllib2.Request('http://www.google.fr', None, headers)
or:
req = urllib2.Request('http://www.google.fr', headers=headers)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.