How could I make Python's module mechanize (specifically mechanize.Browser()) to save its current cookies to a human-readable file? Also, how would I go about uploading that cookie to a web page with it?
Thanks
Deusdies,I just figured out a way with refrence to Mykola Kharechko's post
#to save cookie
>>>cookiefile=open('cookie','w')
>>>cookiestr=''
>>>for c in br._ua_handlers['_cookies'].cookiejar:
>>> cookiestr+=c.name+'='+c.value+';'
>>>cookiefile.write(cookiestr)
#binding this cookie to another Browser
>>>while len(cookiestr)!=0:
>>> br1.set_cookie(cookiestr)
>>> cookiestr=cookiestr[cookiestr.find(';')+1:]
>>>cookiefile.close()
If you want to use the cookie for a web request such as a GET or POST (which mechanize.Browser does not support), you can use the requests library and the cookies as follows
import mechanize, requests
br = mechanize.Browser()
br.open (url)
# assuming first form is a login form
br.select_form (nr=0)
br.form['login'] = login
br.form['password'] = password
br.submit()
# if successful we have some cookies now
cookies = br._ua_handlers['_cookies'].cookiejar
# convert cookies into a dict usable by requests
cookie_dict = {}
for c in cookies:
cookie_dict[c.name] = c.value
# make a request
r = requests.get(anotherUrl, cookies=cookie_dict)
The CookieJar has several subclasses that can be used to save cookies to a file. For browser compatibility use MozillaCookieJar, for a simple human-readable format go with LWPCookieJar, just like this (an authentication via HTTP POST):
import urllib
import cookielib
import mechanize
params = {'login': 'mylogin', 'passwd': 'mypasswd'}
data = urllib.urlencode(params)
br = mechanize.Browser()
cj = mechanize.LWPCookieJar("cookies.txt")
br.set_cookiejar(cj)
response = br.open("http://example.net/login", data)
cj.save()
Related
I have code similar to this:
br = mechanize.Browser()
br.open("https://mysite.com/")
br.select_form(nr=0)
#do stuff here
response = br.submit()
html = response.read()
#now that i have the login cookie i can do this...
br.open("https://mysite.com/")
html = response.read()
However, my script is responding like it's not logged in for the second request. I checked the first request and yes, it logs in successfully. My question is: do cookies in Mechanize browsers need to be managed or do I need to setup a CookieJar or something, or does it keep track of all of them for you?
The first example here talks about cookies being carried between requests, but they don't talk about browsers.
Yes you will have to store the cookie between open requests in mechanize. Something similar to the below should work as you can add the cookiejar to the br object and as long as that object exists it maintains that cookie.
import Cookie
import cookielib
cookiejar =cookielib.LWPCookieJar()
br = mechanize.Browser()
br.set_cookiejar(cookiejar)
br.open("https://mysite.com/")
br.select_form(nr=0)
#do stuff here
response = br.submit()
html = response.read()
#now that i have the login cookie i can do this...
br.open("https://mysite.com/")
html = response.read()
The Docs cover it in more detail.
I use perl mechanize alot, but not python so I may have missed something python specific for this to work, so if I did I apologize, but I did not want to answer with a simple yes.
I have the following details about a POST request
1)The URL: "http://kuexams.org/get_results"
2)Request body: "htno=001111505&ecode=kuBA3_Supply_Dec_2013".
I got this from analyzing HTTP traffic. Then I found this site where you specify these values and it reurns the response. https://requestable.pieterhordijk.com/sSd7o
I need to know how to make something similar to what the site does. Just post the Request body to the URL and return the data for parsing.
P.S: I've tried multiple methods for POST to this site http://kuexams.org/results/3GKZ-D_QBHLWXrg7lZ2IGoKBI7lGfpSK37GNoykJ8k5UerNGYn21FN6w_R5XZ8IQVUHRb8ZYVwq-zN4BhIjusQ,,/ugresults/ug
Here is what I tired. It uses a different method but fails miserably.
import mechanize
import cookielib
import urllib
import logging
import sys
import re
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=3)
br.addheaders = [('User-agent', 'Firefox')]
r = br.open('http://kuexams.org/results/06JZ4SYmTV97s4oROGuLYglPFH3XxJKAunIilJkDBV0gBxSU6YVJ_kXRL0UZb3cIjz9aFdnkYaE-T_S3ubaXPg,,/ugresults/ug')
scrape = r.read()
#print(scrape)
pattern = re.compile('=5&code=(.{5})\'')
img = pattern.findall(scrape)
print(img)
imgstr = str()
imgstr = img[0]
print(imgstr)
br.select_form("appForm")
br.form["htno"] ='001111441'
br.form["entered_captcha"] = imgstr
response = br.submit()
print response.read()
br.back()
I can see post request there not using any cookie or captcha details. Its implemented only client side, So no need for storing cookie or getting captcha therefore this code will work for you. Use requests its cool and easy to use.
import requests
url='http://kuexams.org/get_results'
payload={'htno': '001111441', 'ecode': 'kuBA2_Supply_Dec_2013'}
headers={"User-Agent": "Some Cool Thing"}
r=requests.post(url,headers=headers,data=payload)
print r.content
Note that server is not accepting default User-agent by requests so i added custom one without that it won't accept POST request or might give Forbidden Error.
I have the following code:
import requests
import sys
import urllib2
import re
import mechanize
import cookielib
#import json
#import imp
#print(imp.find_module("requests"))
#print(requests.__file__)
EMAIL = "******"
PASSWORD = "*******"
URL = 'https://www.imleagues.com/Login.aspx'
address = "http://www.imleagues.com/School/Team/Home.aspx?Team=27d6c31187314397b00293fb0cfbc79a"
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.add_password(URL, EMAIL, PASSWORD)
br.open(URL)
#br.open(URL)
#br.select_form(name="aspnetForm")
#br.form["ctl00$ContentPlaceHolder1$inUserName"] = EMAIL
#br.form["ctl00$ContentPlaceHolder1$inPassword"] = PASSWORD
#response = br.submit()
#br= mechanize.Browser()
site = br.open(address)
# Start a session so we can have persistant cookies
#session = requests.Session()
# This is the form data that the page sends when logging in
#login_data = {
# 'ctl00$ContentPlaceHolder1$inUserName': EMAIL,
# 'ctl00$ContentPlaceHolder1$inPassword': PASSWORD,
# 'aspnetFrom': 'http://www.imleagues.com/Members/Home.aspx',
#}
#URL_post = 'http://www.imleagues.com/Members/Home.aspx'
# Authenticate
#r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
#r = session.get('http://www.imleagues.com/School/Team/Home.aspx?Team=27d6c31187314397b00293fb0cfbc79a')
website = site.read()
f = open('crypt.txt', 'wb')
f.write(website)
#print(website_html)
I am trying to log into this site to monitor game times and make sure they aren't changed on me (again). I've tried various ways to do this, most commented out above, but all of them redirect me back to the login page. Any ideas? Thanks.
As I see in given website login button is not in submit tag. Login is javascript function
<a ... href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$btnLogin','')" </a>
and mechanize cannot handle javascript. I faced very similiar problem and came up with solution to use Spynner.
It is headless web browser. So you can acomplish same tasks as you use mechanize and it has javascript support.
I need to use Python to download a large number of URLS, but they require a password to access them (similar to systems like cpanel, for example).
Is there a way I can do this, storing the cookie?
I'd like to use urllib2 if possible.
EDIT: To clarify, it's my website and I have the login details.
UPDATE:
OK I'm using this:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'login_name' : username, 'password' : password})
opener.open(loginURL, login_data)
productlist = opener.open(productURL)
print productlist.read()
But it just spits out the login page again. What isn't working?
(Variables are there, I just didn't show you what they are for security)
You have to use the urllib2.HTTPCookieProcessor, like this:
import urllib2
from cookielib import CookieJar
cookiejar = CookieJar()
opener = urllib2.build_opener()
cookieproc = urllib2.HTTPCookieProcessor(cookiejar)
opener.add_handler(cookieproc)
Then you just use opener.open() to access URLs, and cookies will automatically be saved and reused in future requests.
I've been reading about Python's urllib2's ability to open and read directories that are password protected, but even after looking at examples in the docs, and here on StackOverflow, I can't get my script to work.
import urllib2
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri='https://webfiles.duke.edu/',
user='someUserName',
passwd='thisIsntMyRealPassword')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
socks = urllib2.urlopen('https://webfiles.duke.edu/?path=/afs/acpub/users/a')
print socks.read()
socks.close()
When I print the contents, it prints the contents of the login screen that the url I'm trying to open will redirect you to. Anyone know why this is?
auth_handler is only for basic HTTP authentication. The site here contains a HTML form, so you'll need to submit your username/password as POST data.
I recommend you using the mechanize module that will simplify the login for you.
Quick example:
import mechanize
browser = mechanize.Browser()
browser.open('https://webfiles.duke.edu/?path=/afs/acpub/users/a')
browser.select_form(nr=0)
browser.form['user'] = 'username'
browser.form['pass'] = 'password'
req = browser.submit()
print req.read()