Problem making a GET request and spoof User-Agent in urllib2 - python

With this code, urllib2 make a GET request:
#!/usr/bin/python
import urllib2
req = urllib2.Request('http://www.google.fr')
req.add_header('User-Agent', '')
response = urllib2.urlopen(req)
With this one (which is almost the same), a POST request:
#!/usr/bin/python
import urllib2
headers = { 'User-Agent' : '' }
req = urllib2.Request('http://www.google.fr', '', headers)
response = urllib2.urlopen(req)
My question is: how can i make a GET request with the second code style ?
The documentation (http://docs.python.org/release/2.6.5/library/urllib2.html) says that
headers should be a dictionary, and
will be treated as if add_header() was
called with each key and value as
arguments
Yeah, except that in order to use the headers parameter, you have to pass data, and when data is passed, the request become a POST.
Any help will be very appreciated.

Use:
req = urllib2.Request('http://www.google.fr', None, headers)
or:
req = urllib2.Request('http://www.google.fr', headers=headers)

Related

Glassdoor API login not working with Python, response 403 Bots not allowed

Unfortunately I get the error: HTTP Status 403 - Bots not allowed while using the following Python code.
import requests
URL = 'http://api.glassdoor.com/api/api.htm?v=1&format=json&t.p={PartnerID}&t.k={Key}&action=employers&q=pharmaceuticals&userip={IP_address}&useragent=Mozilla/%2F4.0'
response = requests.get(URL)
print(response)
The URL does work when I try it from my browser. What can I do to make it work from a code?
Update: SOLVED.
Apologies for not posting the question in the right way (I am new at SO).
According to this StackOverflow answer, you need to include a header field (note that this example uses urllib2 rather than requests):
import urllib2, sys
url = "http://api.glassdoor.com/api/api.htm?t.p=yourID&t.k=yourkey&userip=8.28.178.133&useragent=Mozilla&format=json&v=1&action=employers&q="
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(url,headers=hdr)
response = urllib2.urlopen(req)
with the requests module, it's probably:
import requests
URL = 'http://api.glassdoor.com/api/api.htm?v=1&format=json&t.p={PartnerID}&t.k={Key}&action=employers&q=pharmaceuticals&userip={IP_address}&useragent=Mozilla/%2F4.0'
headers = {'user-agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
print(response)

How to add header in requests

Is there any other elegant way to add header to requests :
import requests
requests.get(url,headers={'Authorization', 'GoogleLogin auth=%s' % authorization_token})
doesn't work, while urllib2 worked :
import urllib2
request = urllib2.Request('http://maps.google.com/maps/feeds/maps/default/full')
request.add_header('Authorization', 'GoogleLogin auth=%s' % authorization_token)
urllib2.urlopen(request).read()
You can add headers by passing a dictionary as an argument.
This should work:
requests.get(url,headers={'Authorization': 'GoogleLogin auth=%s' % authorization_token})
Why your code not worked?
You were not passing a dictionary to the headers argument. You were passing values according to the format defined in add_header() function.
According to docs,
requests.get(url, params=None, headers=None, cookies=None, auth=None,
timeout=None)
headers – (optional) Dictionary of HTTP Headers to send with the
Request.
Why request.add_header() worked?
Your way of adding headers using request.add_header()worked because the function is defined as such in the urllib2 module.
Request.add_header(key, val)
It accepts two arguments -
Header name (key of dict defined earlier)
Header value(value of the corresponding key in the dict defined earlier)
You can pass dictionary through headers keyword. This is very elegant in Python :-)
headers = {
"header_name": "header_value",
}
requests.get(url, headers=headers)
You can add custom headers to a requests request using the following format which uses a Python dictionary having a colon, :, in its syntax.
r = requests.get(url, headers={'Authorization': 'GoogleLogin auth=%s' % authorization_token})
This is presented in the Requests documentation for custom headers as follows:
>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
Headers should be a dict, thus this should work
headers= {}
headers['Authorization']= 'GoogleLogin auth=%s' % authorization_token
requests.get(url, headers=headers)

Python urllib,urllib2 fill form

I want to fill a HTML form with urllib2 and urllib.
import urllib
import urllib2
url = 'site.com/registration.php'
values = {'password' : 'password',
'username': 'username'
}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
But on the end of the form is a button(input type='submit'). If you don't click the button you can't send the data what you wrote in the input(type text)
How can I click the button with urllib and urllib2?
IF you look at your forms action attribute you could figure out what uri your form is submitting to. You can then make a post request to that uri.
This can be done like :
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
https://docs.python.org/2/howto/urllib2.html
You could also use the requests library which makes it a lot easier. You can read about it here
Another thing you might need to factor in is the CSRF(Cross Site Request Forgery) token which is embedded in forms. You will have to some how acquire it and pass it in with your post.
This is really more of something you would do with Selenium or similar. selenium.webdriver.common.action_chains.ActionChains.click, perhaps.

urllib2.urlopen() returning a different result

I'm trying to fill out a form using a python program, it works well for some sites, but not this particular one, i'm not sure why.
this is the code snipet
query = {
'adults':'1',
'children':'0',
'infants':'0',
'trip':'RT',
'deptCode':'LOS',
'arrvCode':'ABV',
'searchType':'D',
'deptYear':'2011',
'deptMonth':'12',
'deptDay':'10',
'retYear':'2011',
'retMonth':'12',
'retDay':'11',
'cabin':'E',
'currency':'NGN',
'deptTime':'',
'arrvTime':'',
'airlinePref':''}
encoded = urllib.urlencode(query)
url = 'http://www.wakanow.com/ng/flights/SearchProcess.aspx?' + encoded
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(url, encoded, headers)
response = urllib2.urlopen(req)
print 'RESPONSE:', response
print 'URL :', response.geturl()
headers = response.info()
print 'DATE :', headers['date']
print 'HEADERS :'
print '---------'
print headers
data = response.read()
print 'LENGTH :', len(data)
print 'DATA :'
print '---------'
print data
It all works fine, but the result i get is not what appears if i type the entire url into a web browser directly, that gives me the right result.
I'm not sure what the problem is can anyone help me out?
You're likely doing a GET in your browser, but in your code, you're actually doing a POST to a URL with the query data AND with your query data as the POST data. You probably just want to do a GET. From this page,
urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])
data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
So, what you really want, is:
req = urllib2.Request(url, headers=headers)
If the 2nd parameter (data) for urllib2.Request is provided then urllib2.urlopen(req) makes a POST request.
Use encoded either in url (GET) or as data in urllib2.Request (POST) not both i.e,
either GET request:
url = 'http://www.wakanow.com/ng/flights/SearchProcess.aspx?' + encoded
req = urllib2.Request(url, headers=headers) #NOTE: no `encoded`
or POST request:
url = 'http://www.wakanow.com/ng/flights/SearchProcess.aspx' #NOTE: no `encoded`
req = urllib2.Request(url, data=encoded, headers=headers)
This url hangs. Try it with a less heavy search string.
And you may consider controlling this with a timeout:
import urllib,urllib2,socket
timeout = 10
socket.setdefaulttimeout(timeout)

urllib2 and json

can anyone point out a tutorial that shows me how to do a POST request using urllib2 with the data being in JSON format?
Messa's answer only works if the server isn't bothering to check the content-type header. You'll need to specify a content-type header if you want it to really work. Here's Messa's answer modified to include a content-type header:
import json
import urllib2
data = json.dumps([1, 2, 3])
req = urllib2.Request(url, data, {'Content-Type': 'application/json'})
f = urllib2.urlopen(req)
response = f.read()
f.close()
Whatever urllib is using to figure out Content-Length seems to get confused by json, so you have to calculate that yourself.
import json
import urllib2
data = json.dumps([1, 2, 3])
clen = len(data)
req = urllib2.Request(url, data, {'Content-Type': 'application/json', 'Content-Length': clen})
f = urllib2.urlopen(req)
response = f.read()
f.close()
Took me for ever to figure this out, so I hope it helps someone else.
Example - sending some data encoded as JSON as a POST data:
import json
import urllib2
data = json.dumps([1, 2, 3])
f = urllib2.urlopen(url, data)
response = f.read()
f.close()
To read json response use json.loads(). Here is the sample.
import json
import urllib
import urllib2
post_params = {
'foo' : bar
}
params = urllib.urlencode(post_params)
response = urllib2.urlopen(url, params)
json_response = json.loads(response.read())
You certainly want to hack the header to have a proper Ajax Request :
headers = {'X_REQUESTED_WITH' :'XMLHttpRequest',
'ACCEPT': 'application/json, text/javascript, */*; q=0.01',}
request = urllib2.Request(path, data, headers)
response = urllib2.urlopen(request).read()
And to json.loads the POST on the server-side.
Edit : By the way, you have to urllib.urlencode(mydata_dict) before sending them. If you don't, the POST won't be what the server expect
This is what worked for me:
import json
import requests
url = 'http://xxx.com'
payload = {'param': '1', 'data': '2', 'field': '4'}
headers = {'content-type': 'application/json'}
r = requests.post(url, data = json.dumps(payload), headers = headers)

Categories