I've the request below which works fine but it returns sent request and server full response. How can I just get response body ?
req = urllib2.Request('some url here', data = '')
opener = urllib2.build_opener(urllib2.HTTPSHandler(debuglevel = 1))
req = opener.open(req)
reply = req.read()
req.close()
print reply
That extra information is explicitly requested with debuglevel=1. You can simply remove that, or even the whole opener definition, like this:
import urllib2
print (urllib2.urlopen('https://phihag.de/').read())
Related
I'm trying to run a search on a website for the word 'Adrian'. I already understand that first I have to send a request to the website, in the response I will have an XSRF-token that I need to use for the second request. As I understand, if I'm using session.get(), it keeps the cookies automatically for the second request, too.
I run the first request, get a 200 OK response, I print out the cookies, the token is there. I run the second request, I get back a 400 error but if I print out the header of the second request, the token is there. I don't know where it is going wrong.
Why do I get 400 for the second one?
import requests
session = requests.Session()
response = session.get('https://www.racebets.com/en/horse-racing/formguide')
print(response)
cookies = session.cookies.get_dict()
print(cookies)
XSRFtoken = cookies['XSRF-TOKEN']
print(XSRFtoken)
response = session.get('https://www.racebets.com/ajax/formguide/search?s=Adrian')
print(response)
print(response.request.headers)
I also tried to skip session and use requests.get() in the second request and add the token to the header by myself but the result is the same:
import requests
session = requests.Session()
response = session.get('https://www.racebets.com/en/horse-racing/formguide')
print(response)
cookies = session.cookies.get_dict()
print(cookies)
XSRFtoken = cookies['XSRF-TOKEN']
print(XSRFtoken)
headers = {'XSRF-TOKEN': XSRFtoken}
response = session.get('https://www.racebets.com/ajax/formguide/search?s=Adrian', headers=headers)
print(response)
print(response.request.headers)
As Paul said:
The API you're trying to make an HTTP GET request to cares about two request headers: cookie and x-xsrf-token. Log your browser's network traffic to see what they're composed of.
The header needs to be named x-xsrf-token. Try this:
token = session.cookies.get('XSRF-TOKEN')
headers = {'X-XSRF-TOKEN': token}
response = session.get('https://www.racebets.com/ajax/formguide/search?s=Adrian', headers=headers)
I am writing a python program which will send a post request with a password, if the password is correct, the server will return a special cookie "BDCLND".
I did this in Postman first. You can see the url, headers, the password I used and the response cookies in the snapshots below.
The response cookie didn't have the "BDCLND" cookie because the password 'ssss' was wrong. However, the server did send a 'BAIDUID' cookie back, now, if I sent another post request with the 'BAIDUID' cookie and the correct password 'v0vb', the "BDCLND" cookie would show up in the response. Like this:
Then I wrote the python program like this:
import requests
import string
import re
import sys
def main():
url = "https://pan.baidu.com/share/verify?surl=pK753kf&t=1508812575130&bdstoken=null&channel=chunlei&clienttype=0&web=1&app_id=250528&logid=MTUwODgxMjU3NTEzMTAuMzM2MTI4Njk5ODczMDUxNw=="
headers = {
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
"Referer":"https://pan.baidu.com/share/init?surl=pK753kf"
}
password={'pwd': 'v0vb'}
response = requests.post(url=url, data=password, headers=headers)
cookieJar = response.cookies
for cookie in cookieJar:
print(cookie.name)
response = requests.post(url=url, data=password, headers=headers, cookies=cookieJar)
cookieJar = response.cookies
for cookie in cookieJar:
print(cookie.name)
main()
When I run this, the first forloop did print up "BAIDUID", so that part is good, However, the second forloop printed nothing, it turned out the second cookiejar was just empty. I am not sure what I did wrong here. Please help.
Your second response has no cookies because you set the request cookies manually in the cookies parameter, so the server won't send a 'Set-Cookie' header.
Passing cookies across requests with the cookies parameter is not a good idea, use a Session object instead.
import requests
def main():
ses = requests.Session()
ses.headers['User-Agent'] = 'Mozilla/5'
url = "https://pan.baidu.com/share/verify?surl=pK753kf&t=1508812575130&bdstoken=null&channel=chunlei&clienttype=0&web=1&app_id=250528&logid=MTUwODgxMjU3NTEzMTAuMzM2MTI4Njk5ODczMDUxNw=="
ref = "https://pan.baidu.com/share/init?surl=pK753kf"
headers = {"Referer":ref}
password={'pwd': 'v0vb'}
response = ses.get(ref)
cookieJar = ses.cookies
for cookie in cookieJar:
print(cookie.name)
response = ses.post(url, data=password, headers=headers)
cookieJar = ses.cookies
for cookie in cookieJar:
print(cookie.name)
main()
I am building a home web radio server. Basically, the device talk to my home server that redirect request to the real web radio server. I parse the returned HTML to change the binding of the play buttons to make them playing on the home server instead of the device. It works very well.
I use tornado, my main handler is:
class MainHandler(tornado.web.RequestHandler):
def get(self, url):
query = ''
if self.request.query:
query = '?' + self.request.query
url = base_url + urllib.request.quote(url) + query # base url is the real server address
raw_data = opener.open(url).read()
data = filter_data(raw_data) # parse HTML with Beautiful soup
self.write(data)
def post(self, url): # exclusively AJAX calls
url = base_url + urllib.request.quote(url)
request = urllib.request.Request(url, data=self.request.body, method='POST')
data = opener.open(request).read().decode('utf-8')
try:
json.loads(data)
self.set_header('Content-Type', 'application/json')
except json.decoder.JSONDecodeError:
self.set_header('Content-Type', 'text/html; charset=utf-8')
self.write(data)
The opener is required:
cj = CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
I don't understand how it works: the request that I construct with urllib.request has no headers, so how the remote server can reply?
The problem is that it works only in the context of a Tornado request. If I try to do the same thing outside it, it does not work:
def get_song(radioUID):
url = base_url + 'fr/OnAir/GetCurrentSong' # same url that works in tornado
data = bytes(json.dumps({ 'radioUID': radioUID}), encoding='utf-8')
request = urllib.request.Request(url, data=data, method='POST')
response = opener.open(request).read().decode('utf-8')
print(response)
Answers always 'false' instead of the expected json object. Can anyone help me?
I would like to access a web page from a python program.
I have to set up cookies to load the page.
I used the httplib2 library, but I didn't find how add my own cookie
resp_headers, content = h.request("http://www.theURL.com", "GET")
How can I create cookies with the right name and value, add it to the function and then load the page?
Thanks
From http://code.google.com/p/httplib2/wiki/Examples hope will help )
Cookies
When automating something, you often need to "login" to maintain some sort of session/state with the server. Sometimes this is achieved with form-based authentication and cookies. You post a form to the server, and it responds with a cookie in the incoming HTTP header. You need to pass this cookie back to the server in subsequent requests to maintain state or to keep a session alive.
Here is an example of how to deal with cookies when doing your HTTP Post.
First, lets import the modules we will use:
import urllib
import httplib2
Now, lets define the data we will need. In this case, we are doing a form post with 2 fields representing a username and a password.
url = 'http://www.example.com/login'
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
Now we can send the HTTP request:
http = httplib2.Http()
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))
At this point, our "response" variable contains a dictionary of HTTP header fields that were returned by the server. If a cookie was returned, you would see a "set-cookie" field containing the cookie value. We want to take this value and put it into the outgoing HTTP header for our subsequent requests:
headers['Cookie'] = response['set-cookie']
Now we can send a request using this header and it will contain the cookie, so the server can recognize us.
So... here is the whole thing in a script. We login to a site and then make another request using the cookie we received:
#!/usr/bin/env python
import urllib
import httplib2
http = httplib2.Http()
url = 'http://www.example.com/login'
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))
headers = {'Cookie': response['set-cookie']}
url = 'http://www.example.com/home'
response, content = http.request(url, 'GET', headers=headers)
http = httplib2.Http()
# get cookie_value here
headers = {'Cookie':cookie_value}
response, content = http.request("http://www.theURL.com", 'GET', headers=headers)
You may want to add another header parameters to specify another HTTP request parameters.
You can also use just urllib2 library
import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookie1=value1;cookie2=value2'))
f = opener.open("http://www.example.com/")
the_page_html = f.read()
With this code, urllib2 make a GET request:
#!/usr/bin/python
import urllib2
req = urllib2.Request('http://www.google.fr')
req.add_header('User-Agent', '')
response = urllib2.urlopen(req)
With this one (which is almost the same), a POST request:
#!/usr/bin/python
import urllib2
headers = { 'User-Agent' : '' }
req = urllib2.Request('http://www.google.fr', '', headers)
response = urllib2.urlopen(req)
My question is: how can i make a GET request with the second code style ?
The documentation (http://docs.python.org/release/2.6.5/library/urllib2.html) says that
headers should be a dictionary, and
will be treated as if add_header() was
called with each key and value as
arguments
Yeah, except that in order to use the headers parameter, you have to pass data, and when data is passed, the request become a POST.
Any help will be very appreciated.
Use:
req = urllib2.Request('http://www.google.fr', None, headers)
or:
req = urllib2.Request('http://www.google.fr', headers=headers)