Are cookies kept in a Mechanize browser between opening URLs? - python

I have code similar to this:
br = mechanize.Browser()
br.open("https://mysite.com/")
br.select_form(nr=0)
#do stuff here
response = br.submit()
html = response.read()
#now that i have the login cookie i can do this...
br.open("https://mysite.com/")
html = response.read()
However, my script is responding like it's not logged in for the second request. I checked the first request and yes, it logs in successfully. My question is: do cookies in Mechanize browsers need to be managed or do I need to setup a CookieJar or something, or does it keep track of all of them for you?
The first example here talks about cookies being carried between requests, but they don't talk about browsers.

Yes you will have to store the cookie between open requests in mechanize. Something similar to the below should work as you can add the cookiejar to the br object and as long as that object exists it maintains that cookie.
import Cookie
import cookielib
cookiejar =cookielib.LWPCookieJar()
br = mechanize.Browser()
br.set_cookiejar(cookiejar)
br.open("https://mysite.com/")
br.select_form(nr=0)
#do stuff here
response = br.submit()
html = response.read()
#now that i have the login cookie i can do this...
br.open("https://mysite.com/")
html = response.read()
The Docs cover it in more detail.
I use perl mechanize alot, but not python so I may have missed something python specific for this to work, so if I did I apologize, but I did not want to answer with a simple yes.

Related

How can I login this page and read it?

I know there are alot of question about this matter but I try most of them.
my goal is to get the article from this page and use this in gae.
If I try to log in, it redirects to a long url ,after I log in there it redirects back to the article.
first I try urllib2 which is mentioned in here how to login to a website with python and mechanize and it didnt work.
then I took SelectLoginForm and login functions from https://github.com/cdhigh/KindleEar/blob/master/books/base.py it didnt work neither.
selenium wouldnt work because I gonna use it in gae. I guess gae cant support selenium
I started looking into mechanize module. my current code is :
# -*- coding: cp1254 -*-
import cookielib
import urllib2
import mechanize
b=mechanize.Browser()
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize.HTTPRefreshProcessor(),max_time=1)
b.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]
b.open('https://hurpass.com/iframe/login?appkey=52da7ef64037f9497f0acb091390051062215&secret=52da7f0c4037f9497f0acb0b1390051084754&domain=sosyal.hurriyet.com.tr&callback_url=http://sosyal.hurriyet.com.tr/Account/AutoLogin?returnUrl=http://sosyal.hurriyet.com.tr/yazar/ahmet-hakan_131/baskanlik-diktatorluk-getirir-diyenleri-girtlaklamak-istiyorum_28116073&referer=http://sosyal.hurriyet.com.tr&user_page=http://sosyal.hurriyet.com.tr/Account/AutoLogin?returnUrl=http://sosyal.hurriyet.com.tr/yazar/ahmet-hakan_131/baskanlik-diktatorluk-getirir-diyenleri-girtlaklamak-istiyorum_28116073&is_mobile=0&session_timeout=0&is_vative=0&email=')
b.select_form(name='frm_login')
b["email"]="tasklak#hotmail.com"
b["password"]="123456"
b.submit(type="submit")
url='http://sosyal.hurriyet.com.tr/yazar/ahmet-hakan_131/baskanlik-diktatorluk-getirir-diyenleri-girtlaklamak-istiyorum_28116073'
last_response = b.response()
http_header_dict = last_response.info().dict
html_string_list = last_response.readlines()
html_data = "".join(html_string_list)
page = br.open(url)
print page.read().decode("UTF-8")
ha=open("test.html",'w')
ha.write(html_data)
ha.close
again I cant get this working but if I open the html it created, it redirects to logged article page. may it be mechanize redirection problem or is it impossible to login this page?
edit after mihail's answer:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
user = 'tasklak#hotmail.com'
password = '123456'
xor_password = ''.join(chr(12 ^ ord(c)) for c in password)
auth_url = 'http://auth.hurriyet.com.tr/api/loginuser/{}/?{}'.format(user, xor_password)
url='http://www.hurriyet.com.tr/anasayfa/'
sessionidd=urllib2.urlopen(auth_url).read().split(',')[1].split('\"')[3]
print sessionidd
opener.open(url+';ASPSESSIONID='+sessionidd)
print cj
edit 2:
sessionidd=urllib2.urlopen(auth_url).read().split(',')[1].split('\"')[3]
print sessionidd
opener.open(url)
k=0
for a in cj:
if k<2:
a.value=sessionidd
k+=1
print cj
First of all, you should know that if there isn't a publicly available API to do all this without scraping then it's very likely that what you are doing is not welcomed by the website owners, against their terms of service and could even be illegal and punishable by law depending on where you live.
Unless mechanize can interpret javascript code (which I doubt it does although I might be wrong) it's not going to be very helpful, although, skimming through the links you provided with Chrome's DevTools it looks like you could implement what you want with a few pure urlib2 requests.
For example, when you login for the first time, you'll see a GET request to http://auth.hurriyet.com.tr/api/loginuser/tasklak#hotmail.com/?%3D%3E%3F89%3A URL which includes your username and encoded password and returns some session IDs. The reason mechanize wouldn't work is because the password is encoded via a javascript code that's not being interpreted when you are submitting the form in your code.
Going into the source code of the login form you'll see that when the "Submit" button is clicked a loginUser() function is called which when you'll find you'll see that the password is being xor'ed with the following code:
for (i = 0; i < password.length; ++i) {
encoded_password += String.fromCharCode(12 ^ password.charCodeAt(i));
}
which you would have to rewrite in python, so to recieve the initial session IDs you'd have something like:
import urllib2
user = 'tasklak#hotmail.com'
password = '123456'
xor_password = ''.join(chr(12 ^ ord(c)) for c in password)
auth_url = 'http://auth.hurriyet.com.tr/api/loginuser/{}/?{}'.format(user, xor_password)
print(urllib2.urlopen(auth_url).read())
It looks like you're then going to need to validate the session IDs you received and retrieve session cookies which you then can use to get full articles but I will leave that to you.

How to save mechanize.Browser() cookies to file?

How could I make Python's module mechanize (specifically mechanize.Browser()) to save its current cookies to a human-readable file? Also, how would I go about uploading that cookie to a web page with it?
Thanks
Deusdies,I just figured out a way with refrence to Mykola Kharechko's post
#to save cookie
>>>cookiefile=open('cookie','w')
>>>cookiestr=''
>>>for c in br._ua_handlers['_cookies'].cookiejar:
>>> cookiestr+=c.name+'='+c.value+';'
>>>cookiefile.write(cookiestr)
#binding this cookie to another Browser
>>>while len(cookiestr)!=0:
>>> br1.set_cookie(cookiestr)
>>> cookiestr=cookiestr[cookiestr.find(';')+1:]
>>>cookiefile.close()
If you want to use the cookie for a web request such as a GET or POST (which mechanize.Browser does not support), you can use the requests library and the cookies as follows
import mechanize, requests
br = mechanize.Browser()
br.open (url)
# assuming first form is a login form
br.select_form (nr=0)
br.form['login'] = login
br.form['password'] = password
br.submit()
# if successful we have some cookies now
cookies = br._ua_handlers['_cookies'].cookiejar
# convert cookies into a dict usable by requests
cookie_dict = {}
for c in cookies:
cookie_dict[c.name] = c.value
# make a request
r = requests.get(anotherUrl, cookies=cookie_dict)
The CookieJar has several subclasses that can be used to save cookies to a file. For browser compatibility use MozillaCookieJar, for a simple human-readable format go with LWPCookieJar, just like this (an authentication via HTTP POST):
import urllib
import cookielib
import mechanize
params = {'login': 'mylogin', 'passwd': 'mypasswd'}
data = urllib.urlencode(params)
br = mechanize.Browser()
cj = mechanize.LWPCookieJar("cookies.txt")
br.set_cookiejar(cj)
response = br.open("http://example.net/login", data)
cj.save()

Python auth_handler not working for me

I've been reading about Python's urllib2's ability to open and read directories that are password protected, but even after looking at examples in the docs, and here on StackOverflow, I can't get my script to work.
import urllib2
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri='https://webfiles.duke.edu/',
user='someUserName',
passwd='thisIsntMyRealPassword')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
socks = urllib2.urlopen('https://webfiles.duke.edu/?path=/afs/acpub/users/a')
print socks.read()
socks.close()
When I print the contents, it prints the contents of the login screen that the url I'm trying to open will redirect you to. Anyone know why this is?
auth_handler is only for basic HTTP authentication. The site here contains a HTML form, so you'll need to submit your username/password as POST data.
I recommend you using the mechanize module that will simplify the login for you.
Quick example:
import mechanize
browser = mechanize.Browser()
browser.open('https://webfiles.duke.edu/?path=/afs/acpub/users/a')
browser.select_form(nr=0)
browser.form['user'] = 'username'
browser.form['pass'] = 'password'
req = browser.submit()
print req.read()

loop in python !

can anyone help me with loop i want loop that code
login_form_data = urllib.urlencode(login_form_seq)
opener = urllib2.build_opener()
site = opener.open(B, login_form_data).read()
the code allow me to login to site but site have problem and the problem is: you can't login from first time
that mean I have to press submit then when page reload press submit again... so i think loop will do that but How!?
You need to handle cookies. Look at the cookielib module.
If it is a cookie handling problem, use the "HTTPCookieProcessor" in urllib2.
By applying it to your opener.
cookieHandler = urllib2.HTTPCookieProcessor() # Needed for cookie handling
# Apply the handler to an opener
opener = urllib2.build_opener(cookieHandler)
It seems that you are not accepting and saving the cookie(s) required by the page you are trying to access. This is not surprising given that urllib2 does not automatically do this for you. As others have said you'll have to explicitly write code to accept cookies. Something like this:
import urllib2, cookielib
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
login_form_data = urllib.urlencode(login_form_seq)
site = opener.open(B, login_form_data).read()
This would be a good time to read up about cookielib and HTTP state management in Python.

Using urllib2 for posting data, following redirects and maintaining cookies

I am using urllib2 in Python to post login data to a web site.
After successful login, the site redirects my request to another page. Can someone provide a simple code sample on how to do this in Python with urllib2? I guess I will need cookies also to be logged in when I get redirected to another page. Right?
Thanks a lot in advace.
First, get mechanize: http://wwwsearch.sourceforge.net/mechanize/
You could do this kind of stuff with just urllib2, but you will be writing tons of boilerplate code, and it will be buggy.
Then:
import mechanize
br = mechanize.Browser()
br.open('http://somesite.com/account/signin/')
br.select_form('loginForm')
br['username'] = 'jekyll'
br['password'] = 'bananas'
br.submit()
# At this point, you're logged in, redirected, and the
# br object has the cookies and all that.
br.geturl() # e.g. http://somesite.com/loggedin/
Then you can use the Browser object br and do whatever you have to do, click on links, etc. Check the samples on the mechanize site

Categories