Problems saving cookies when making HTTP requests using Python - python

I‘m trying to make a web-spider using python but I've got some problems when I tried to login the web site Pixiv.My code is as below:
import sys
import urllib
import urllib2
import cookielib
url="https://www.secure.pixiv.net/login.php"
cookiename='123.txt'
cookie = cookielib.MozillaCookieJar(cookiename)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
cookie.save()
values={'model':'login',
'return_to':'/',
'pixiv_id':'username',
'pass':'password',
'skip':'1'}
headers = { 'User-Agent' : 'User-Agent' }
data=urllib.urlencode(values)
req=urllib2.Request(url,data)
response=urllib2.urlopen(req)
the_page=response.read()
cookie.save()
To make sure it works, I used the cookielib to save the cookie as a txt file.I ran the code and got a "cookie.txt",but when I open the file I found that it was rmpty,in another word,my code didn't work.
I don't know what's wrong with it.

The problem is you're not using the opener that you created with the cookiejar attached to it in order to make the request. urllib2.urlopen has no way of knowing that you want to use that opener to start the request.
You can either use the opener's open method directly or, if you want to use this by default for the rest of your application, you can install it as the default opener for all requests made with urllib2 using urllib2.install_opener. So give that a try and see if it does the trick.

Related

Python CookieJar saves cookie, but doesn't send it to website

I am trying to login to website using urllib2 and cookiejar. It saves the session id, but when I try to open another link, which requires authentication it says that I am not logged in. What am I doing wrong?
Here's the code, which fails for me:
import urllib
import urllib2
import cookielib
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
# Gives response saying that I logged in succesfully
response = opener.open("http://site.com/login", "username=testuser&password=" + md5encode("testpassword"))
# Gives response saying that I am not logged in
response1 = opener.open("http://site.com/check")
Your implementation seems fine... and should work.
It should be sending in the correct cookies, but I see it as the case when the site is actually not logging you in.
How can you say that its not sending the cookies or may be cookies that you are getting are not the one that authenticates you.
Use : response.info() to see the headers of the responses to see what cookies you are receiving actually.
The site may not be logging you in because :
Its having a check on User-agent that you are not setting, since some sites open from 4 major browsers only to disallow bot access.
The site might be looking for some special hidden form field that you might not be sending in.
1 piece of advise:
from urllib import urlencode
# Use urlencode to encode your data
data = urlencode(dict(username='testuser', password=md5encode("testpassword")))
response = opener.open("http://site.com/login", data)
Moreover 1 thing is strange here :
You are md5 encoding your password before sending it over. (Strange)
This is generally done by the server before comparing to database.
This is possible only if the site.com implements md5 in javascript.
Its a very rare case, since only may be 0.01 % websites do that..
Check that - that might be the problem, and you are providing the hashed form and not the actual password to the server.
So, server would have been again calculating a md5 for your md5 hash.
Check out.. !!
:)
I had a similar problem with my own test server, which worked fine with a browser, but not with the urllib2.build_opener solution.
The problem seems to be in urllib2. As these answers suggest, it's easy to use more powerful mechanize library instead of urllib2:
cookieJar = cookielib.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cookieJar)
opener = mechanize.build_opener(*browser.handlers)
And the opener will work as expected!

Access to the cookies of the default browser

I want to write a program that opens the browser and open a url with a given cookie. I dont know how to do this. Maybe I could modify the cookies in the default place.
import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
f = opener.open("http://example.com/")
Modules to look into:
urllib2
cookielib
Cookie
In python, you can emulate a browser with the mechanize library. Also, there is good documentation about mechanize and cookies.

Passing data to j_security_check with python

I have a j_security_check page on a server, and I need to pass data to it. I use Python urllib2 module, sending POST-request with j_username and j_password as parameters. The problem is that I have HTTPError 408 as a response: "The time allowed for the login process has been exceeded".
What should I do with it?
You could try GETing the login page first and storing the cookie.
This j_security_check-thingie looks like acegi security stuff.
import urllib, urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor)
urllib2.install_opener(opener)
urllib2.urlopen('http://server/login_form/')
urllib2.urlopen('http://server/j_security_check',
data=urllib.urlencode({'j_username':'scott','j_password':'wombat'}))

How to clear cookies using python 2.6.x cookielib

It seems my previous description was not clear, so rewriting it.
Using python urllib2, I am automating fileupload task in my webapp. And am using Cookielib to store session information, and also I could able to successfully automate the fileupload task. Problem is, when I change the login credentials and did not supply those or supply wrong login credentials to automated python script, it still processing fileupload successfully. In this case, it should actually fail.
All I want is, how to clear the cookielib generated cookies.
Below is the code snippet....
cookies = cookielib.CookieJar()
cookies.clear_session_cookies()
#cookies.clear() tried this as well
opener = urllib2.build_opener(SmartRedirectHandler,HTTPCookieProcessor(cookies),MultipartPostHandler)
urllib2.install_opener(opener)
login_req = urllib2.Request(login_url, login_params)
res = urllib2.urlopen(login_req)
#after login, do fileupload
fileupload_req = urllib2.Request(fileupload_url, params)
response = urllib2.urlopen(import_req)
I tried using clear() and clear_session_cookies() but still cookies are not cleared.
you need to install the opener that you have built, otherwise it will just keep using the default
Instead of relying on cookies, I am restricting page access based response headers. Now, I could able to stop the file upload process when wrong credentials supplied. Thanks guys.

Cookie Problem in Python

I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)#make opener w/ handlers
#build the url
login_info = {'username':USER,'password':PASS}#USER and PASS are defined
data = urllib.urlencode(login_info)
req = urllib2.Request("http://www.hulu.com/account/authenticate",data)#make the request
test = opener.open(req) #open the page
print test.read() #print html results
The code compiles and runs, but all that prints is:
Login.onError("Please \074a href=\"/support/login_faq#cant_login\"\076enable cookies\074/a\076 and try again.");
I assume there is some error in how I'm handling cookies, but just can't seem to spot it. I've heard Mechanize is a very useful module for this type of program, but as this seems to be the only speed bump left, I was hoping to find my bug.
What you're seeing is a ajax return. It is probably using javascript to set the cookie, and screwing up your attempts to authenticate.
The error message you are getting back could be misleading. For example the server might be looking at user-agent and seeing that say it's not one of the supported browsers, or looking at HTTP_REFERER expecting it to be coming from hulu domain. My point is there are two many variables coming in the request to keep guessing them one by one
I recommend using an http analyzer tool, e.g. Charles or the one in Firebug to figure out what (header fields, cookies, parameters) the client sends to server when you doing hulu login via a browser. This will give you the exact request that you need to construct in your python code.

Categories