How to debug urllib2 request that uses a basic authentication handler - python

I'm making a request using urllib2 and the HTTPBasicAuthHandler like so:
import urllib2
theurl = 'http://someurl.com'
username = 'username'
password = 'password'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
params = "foo=bar"
response = urllib2.urlopen('http://someurl.com/somescript.cgi', params)
print response.info()
I'm currently getting a httplib.BadStatusLine exception when running this code. How could I go about debugging? Is there a way to see what the raw response is regardless of the unrecognized HTTP status code?

Have you tried setting the debug level in your own HTTP handler? Change your code to something like this:
>>> import urllib2
>>> handler=urllib2.HTTPHandler(debuglevel=1)
>>> opener = urllib2.build_opener(handler)
>>> urllib2.install_opener(opener)
>>> resp=urllib2.urlopen('http://www.google.com').read()
send: 'GET / HTTP/1.1
Accept-Encoding: identity
Host: www.google.com
Connection: close
User-Agent: Python-urllib/2.7'
reply: 'HTTP/1.1 200 OK'
header: Date: Sat, 08 Oct 2011 17:25:52 GMT
header: Expires: -1
header: Cache-Control: private, max-age=0
header: Content-Type: text/html; charset=ISO-8859-1
... the remainder of the send / reply other than the data itself
So the three lines to prepend are:
handler=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
... the rest of your urllib2 code...
That will show the raw HTTP send / reply cycle on stderr.
Edit from comment
Does this work?
... same code as above this line
opener=urllib2.build_opener(authhandler, urllib2.HTTPHandler(debuglevel=1))
... rest of your code

Related

How to make a POST request using multi/form-data using Python?

I am very new with API things.
I have to make a POST request to API with the following "format"
content-type: multipart/form-data
Content-Disposition: form-data; name=""; filename=""
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Form data:
file = file.xlsx
How can I perform the API request using Python?
Using requests library, can I perform it:
requests.post(
'api_url',
headers = {'Content-Type':'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'},
data = {"filename.xlsx": open(filepath, "rb")}
)
Thanks
I prefer pool manager as this can manage timeout, retry, etc. easily:
import urllib3
from urllib3.util import Retry, Timeout
http_client = urllib3.PoolManager(retries=Retry(connect=5, read=2, redirect=5),
timeout=Timeout(connect=5.0, read=10.0),
num_pools=2)
data = {'asd': 'asd'}
request = http_client.request('POST', "http://localhost:8081", fields=data, encode_multipart=True)
This will give you:
>nc -l 127.0.0.1 8081
POST / HTTP/1.1
Host: localhost:8081
Accept-Encoding: identity
Content-Length: 125
Content-Type: multipart/form-data; boundary=6ce0c07687204c761cc1e5a6d6f6046e
User-Agent: python-urllib3/1.26.4
--6ce0c07687204c761cc1e5a6d6f6046e
Content-Disposition: form-data; name="asd"
asd
--6ce0c07687204c761cc1e5a6d6f6046e--

Login to website using http.client

I am trying to login to a website using http.client in Python using the following code:
import urllib.parse
import http.client
payload = urllib.parse.urlencode({"username": "USERNAME-HERE",
"password": "PASSWORD-HERE",
"redirect": "index.php",
"sid": "",
"login": "Login"})
conn = http.client.HTTPConnection("osu.ppy.sh:80")
conn.request("POST", "/forum/ucp.php?mode=login", payload)
response = conn.getresponse()
data = response.read()
# print the HTML after the request
print(bytes(str(data), "utf-8").decode("unicode_escape"))
I know that a common suggestion is to just use the Requests library, and I have tried this, but I specifically want to know how to do this without using Requests.
The behavior I am looking for can be replicated with the following code that successfully logs in to the site using Requests:
import requests
payload = {"username": "USERNAME-HERE",
"password": "PASSWORD-HERE",
"redirect": "index.php",
"sid": "",
"login": "Login"}
p = requests.Session().post('https://osu.ppy.sh/forum/ucp.php?mode=login', payload)
# print the HTML after the request
print(p.text)
It seems to me that the http.client code is not "delivering" the payload, while the Requests code is.
Any insights? Am I overlooking something?
EDIT: Adding conn.set_debuglevel(1) gives the following output:
send: b'POST /forum/ucp.php?mode=login HTTP/1.1
Host: osu.ppy.sh
Accept-Encoding: identity
Content-Length: 70'
send: b'redirect=index.php&sid=&login=Login&username=USERNAME-HERE&password=PASSWORD-HERE'
reply: 'HTTP/1.1 200 OK'
header: Date
header: Content-Type
header: Transfer-Encoding
header: Connection
header: Set-Cookie
header: Cache-Control
header: Expires
header: Pragma
header: X-Frame-Options
header: X-Content-Type-Options
header: Server
header: CF-RAY
since you are urlencoding your payload, you must send the header: application/x-www-form-urlencoded

Username and Password Failing to load properly in website

I have created a script that takes two text files "username.txt" and "password.txt". My script try's to log in with the given data provided in the 2 text files.I have I have user1#mymail.com to user10#mymail.com in the username.txt file and password1 to password10 in the password.txt. If it succeeds to log in it ought to give me a HTTP request code 200, if unsuccessful it should give me a code 400. My code is running the first line only and doesn't run the rest. How can I fix this issue. Here is my code.
import urllib, urllib2
user = open ('users.txt' , 'r')
password = open ('password.txt' , 'r')
pa = ''.join(password)
for users in user:
login_data = pa + users
base_url = 'http://mymail.com'
# login action we want to post data to
response = urllib2.urlopen(base_url)
login_action = '/auth/login'
login_action = base_url + login_action
response = urllib2.urlopen(login_action, login_data)
response.read()
print response.headers
print response.getcode()
Here is my output when I run the script. Mark I have set the users that are supposed to fail but I am getting a code 200.
Date: Mon, 29 Jul 2013 14:54:59 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Set-Cookie: PHPSESSID=o3jlu86jgs7uj24fod107aps26; path=/
Cache-Control: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
200
All I had to do was make sure that my loop goes back to the beginning of the nested loop and use the .seek functionality and it loads all passwords and try's them out.
import urllib, urllib2
user = open ('users.txt' , 'r')
password = open ('password.txt' , 'r')
for users in user:
password.seek(0)
for pass_list in password:
login_data = users + '\n' + pass_list
print login_data
base_url = 'http://my-site.com'
#login action we want to post data to
response = urllib2.urlopen(base_url)
login_action = '/auth/login'
login_action = base_url + login_action
response = urllib2.urlopen(login_action, login_data)
response.read()
print response.headers
print response.getcode()

How to programmatically retrieve access_token from client-side OAuth flow using Python?

This question was posted on StackApps, but the issue may be more a programming issue than an authentication issue, hence it may deserve a better place here.
I am working on an desktop inbox notifier for StackOverflow, using the API with Python.
The script I am working on first logs the user in on StackExchange, and then requests authorisation for the application. Assuming the application has been authorised through web-browser interaction of the user, the application should be able to make requests to the API with authentication, hence it needs the access token specific to the user. This is done with the URL: https://stackexchange.com/oauth/dialog?client_id=54&scope=read_inbox&redirect_uri=https://stackexchange.com/oauth/login_success.
When requesting authorisation via the web-browser the redirect is taking place and an access code is returned after a #. However, when requesting this same URL with Python (urllib2), no hash or key is returned in the response.
Why is it my urllib2 request is handled differently from the same request made in Firefox or W3m? What should I do to programmatically simulate this request and retrieve the access_token?
Here is my script (it's experimental) and remember: it assumes the user has already authorised the application.
#!/usr/bin/env python
import urllib
import urllib2
import cookielib
from BeautifulSoup import BeautifulSoup
from getpass import getpass
# Define URLs
parameters = [ 'client_id=54',
'scope=read_inbox',
'redirect_uri=https://stackexchange.com/oauth/login_success'
]
oauth_url = 'https://stackexchange.com/oauth/dialog?' + '&'.join(parameters)
login_url = 'https://openid.stackexchange.com/account/login'
submit_url = 'https://openid.stackexchange.com/account/login/submit'
authentication_url = 'http://stackexchange.com/users/authenticate?openid_identifier='
# Set counter for requests:
counter = 0
# Build opener
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
def authenticate(username='', password=''):
'''
Authenticates to StackExchange using user-provided username and password
'''
# Build up headers
user_agent = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'
headers = {'User-Agent' : user_agent}
# Set Data to None
data = None
# 1. Build up URL request with headers and data
request = urllib2.Request(login_url, data, headers)
response = opener.open(request)
# Build up POST data for authentication
html = response.read()
fkey = BeautifulSoup(html).findAll(attrs={'name' : 'fkey'})[0].get('value').encode()
values = {'email' : username,
'password' : password,
'fkey' : fkey}
data = urllib.urlencode(values)
# 2. Build up URL for authentication
request = urllib2.Request(submit_url, data, headers)
response = opener.open(request)
# Check if logged in
if response.url == 'https://openid.stackexchange.com/user':
print ' Logged in! :) '
else:
print ' Login failed! :( '
# Find user ID URL
html = response.read()
id_url = BeautifulSoup(html).findAll('code')[0].text.split('"')[-2].encode()
# 3. Build up URL for OpenID authentication
data = None
url = authentication_url + urllib.quote_plus(id_url)
request = urllib2.Request(url, data, headers)
response = opener.open(request)
# 4. Build up URL request with headers and data
request = urllib2.Request(oauth_url, data, headers)
response = opener.open(request)
if '#' in response.url:
print 'Access code provided in URL.'
else:
print 'No access code provided in URL.'
if __name__ == '__main__':
username = raw_input('Enter your username: ')
password = getpass('Enter your password: ')
authenticate(username, password)
To respond to comments below:
Tamper data in Firefox requests the above URL (as oauth_url in the code) with the following headers:
Host=stackexchange.com
User-Agent=Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip, deflate
Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection=keep-alive
Cookie=m=2; __qca=P0-556807911-1326066608353; __utma=27693923.1085914018.1326066609.1326066609.1326066609.1; __utmb=27693923.3.10.1326066609; __utmc=27693923; __utmz=27693923.1326066609.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); gauthed=1; ASP.NET_SessionId=nt25smfr2x1nwhr1ecmd4ok0; se-usr=t=z0FHKC6Am06B&s=pblSq0x3B0lC
In the urllib2 request the header provides the user-agent value only. The cookie is not passed explicitly, but the se-usr is available in the cookie jar at the time of the request.
The response headers will be first the redirect:
Status=Found - 302
Server=nginx/0.7.65
Date=Sun, 08 Jan 2012 23:51:12 GMT
Content-Type=text/html; charset=utf-8
Connection=keep-alive
Cache-Control=private
Location=https://stackexchange.com/oauth/login_success#access_token=OYn42gZ6r3WoEX677A3BoA))&expires=86400
Set-Cookie=se-usr=t=kkdavslJe0iq&s=pblSq0x3B0lC; expires=Sun, 08-Jul-2012 23:51:12 GMT; path=/; HttpOnly
Content-Length=218
Then the redirect will take place through another request with the fresh se-usr value from that header.
I don't know how to catch the 302 in urllib2, it handles it by itself (which is great). It would be nice however to see if the access token as provided in the location header would be available.
There's nothing special in the last response header, both Firefox and Urllib return something like:
Server: nginx/0.7.65
Date: Sun, 08 Jan 2012 23:48:16 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Content-Length: 5664
I hope I didn't provide confidential info, let me know if I did :D
The token does not appear because of the way urllib2 handles the redirect. I am not familiar with the details so I won't elaborate here.
The solution is to catch the 302 before the urllib2 handles the redirect. This can be done by sub-classing the urllib2.HTTPRedirectHandler to get the redirect with its hashtag and token. Here is a short example of subclassing the handler:
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Going through 302:\n"
print headers
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
In the headers the location attribute will provide the redirect URL in full length, i.e. including the hashtag and token:
Output extract:
...
Going through 302:
Server: nginx/0.7.65
Date: Mon, 09 Jan 2012 20:20:11 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Location: https://stackexchange.com/oauth/login_success#access_token=K4zKd*HkKw5Opx(a8t12FA))&expires=86400
Content-Length: 218
...
More on catching redirects with urllib2 on StackOverflow (of course).

Making HTTP POST request

I'm trying to make a POST request to retrieve information about a book.
Here is the code that returns HTTP code: 302, Moved
import httplib, urllib
params = urllib.urlencode({
'isbn' : '9780131185838',
'catalogId' : '10001',
'schoolStoreId' : '15828',
'search' : 'Search'
})
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
conn = httplib.HTTPConnection("bkstr.com:80")
conn.request("POST", "/webapp/wcs/stores/servlet/BuybackSearch",
params, headers)
response = conn.getresponse()
print response.status, response.reason
data = response.read()
conn.close()
When I try from a browser, from this page: http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackMaterialsView?langId=-1&catalogId=10001&storeId=10051&schoolStoreId=15828 , it works. What am I missing in my code?
EDIT:
Here's what I get when I call print response.msg
302 Moved Date: Tue, 07 Sep 2010 16:54:29 GMT
Vary: Host,Accept-Encoding,User-Agent
Location: http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch
X-UA-Compatible: IE=EmulateIE7
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Seems that the location points to the same url I'm trying to access in the first place?
EDIT2:
I've tried using urllib2 as suggested here. Here is the code:
import urllib, urllib2
url = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch'
values = {'isbn' : '9780131185838',
'catalogId' : '10001',
'schoolStoreId' : '15828',
'search' : 'Search' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
print response.geturl()
print response.info()
the_page = response.read()
print the_page
And here is the output:
http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch
Date: Tue, 07 Sep 2010 16:58:35 GMT
Pragma: No-cache
Cache-Control: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: JSESSIONID=0001REjqgX2axkzlR6SvIJlgJkt:1311s25dm; Path=/
Vary: Accept-Encoding,User-Agent
X-UA-Compatible: IE=EmulateIE7
Content-Length: 0
Connection: close
Content-Type: text/html; charset=utf-8
Content-Language: en-US
Set-Cookie: TSde3575=225ec58bcb0fdddfad7332c2816f1f152224db2f71e1b0474c866f3b; Path=/
Their server seems to want you to acquire the proper cookie. This works:
import urllib, urllib2, cookielib
cookie_jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)
# acquire cookie
url_1 = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackMaterialsView?langId=-1&catalogId=10001&storeId=10051&schoolStoreId=15828'
req = urllib2.Request(url_1)
rsp = urllib2.urlopen(req)
# do POST
url_2 = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch'
values = dict(isbn='9780131185838', schoolStoreId='15828', catalogId='10001')
data = urllib.urlencode(values)
req = urllib2.Request(url_2, data)
rsp = urllib2.urlopen(req)
content = rsp.read()
# print result
import re
pat = re.compile('Title:.*')
print pat.search(content).group()
# OUTPUT: Title: Statics & Strength of Materials for Arch (w/CD)<br />
You might want to use the urllib2 module which should handle redirects better. Here's an example of POSTING with urllib2.
Perhaps that's what the browser gets, and you'll just have to follow the 302 redirect.
If all else fails, you can monitor the dialogue between Firefox and the Web Server using FireBug or tcpdump or wireshark, and see which HTTP headers are different. Possibly it's just the User Agent: header.

Categories