I have created an opener with urllib2.build_opener() that contains a cookielib.CookieJar(), and now I wish to manually add a cookie to the opener.
How can I achieve this?
Like the second example of the cookielib documentation suggests:
import os, cookielib, urllib2
cj = cookielib.MozillaCookieJar()
cj.load(os.path.join(os.path.expanduser("~"), ".netscape", "cookies.txt"))
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
Here's the link:
Cookies examples
Above example applies to Mozilla cookies, but generic algorithm is the same.
If adding by hand is required, reading the documentation further, you can use:
http://docs.python.org/library/cookie.html#module-Cookie Cookie object, which you fill up the way you see fit and further on add it to a CookieJar with
CookieJar.set_cookie(cookie)
Set a Cookie, without checking with policy to see whether or not it should be set.
Related
I am a very beginner of Python. And I tried to crawl some product information from my www.Alibaba.com console. When I came to the visitor details page, I found the cookie changed every time when I clicked the search button. I found the cookie changed for each request. I can not crawl the data in the way I crawled from other pages where the cookies were fixed in a certain period.
After comparing the cookie data, I found here were only 3 key-value pairs were changed. I think those 3 values made me fail to crawl the data. So I want to know how to handle such situation.
For python3 the http.client in the standard library can be configured to use an http.cookiejar CookieJar which will keep track of cookies within the client automatically.
You can set this up like this:
import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
If you're using pyhton2 then a similar approach works with urllib:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
r = opener.open("http://example.com/")
I want to write a program that opens the browser and open a url with a given cookie. I dont know how to do this. Maybe I could modify the cookies in the default place.
import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
f = opener.open("http://example.com/")
Modules to look into:
urllib2
cookielib
Cookie
In python, you can emulate a browser with the mechanize library. Also, there is good documentation about mechanize and cookies.
I need to use Python to download a large number of URLS, but they require a password to access them (similar to systems like cpanel, for example).
Is there a way I can do this, storing the cookie?
I'd like to use urllib2 if possible.
EDIT: To clarify, it's my website and I have the login details.
UPDATE:
OK I'm using this:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'login_name' : username, 'password' : password})
opener.open(loginURL, login_data)
productlist = opener.open(productURL)
print productlist.read()
But it just spits out the login page again. What isn't working?
(Variables are there, I just didn't show you what they are for security)
You have to use the urllib2.HTTPCookieProcessor, like this:
import urllib2
from cookielib import CookieJar
cookiejar = CookieJar()
opener = urllib2.build_opener()
cookieproc = urllib2.HTTPCookieProcessor(cookiejar)
opener.add_handler(cookieproc)
Then you just use opener.open() to access URLs, and cookies will automatically be saved and reused in future requests.
Is there a way to append a cookie into an already made and used openerdirector object?
Try this:
import urllib2
import cookielib
# load cookies from file
saved_cookies = cookielib.MozillaCookieJar('cookie_file_name')
saved_cookies.load()
opener = urllib2.build_opener() # your opener director
# do something...
opener.add_handler(urllib2.HTTPCookieProcessor(saved_cookies))
EDIT: According to Python cookielib document, old cookies are kept unless overwritten by newly loaded ones.
can anyone help me with loop i want loop that code
login_form_data = urllib.urlencode(login_form_seq)
opener = urllib2.build_opener()
site = opener.open(B, login_form_data).read()
the code allow me to login to site but site have problem and the problem is: you can't login from first time
that mean I have to press submit then when page reload press submit again... so i think loop will do that but How!?
You need to handle cookies. Look at the cookielib module.
If it is a cookie handling problem, use the "HTTPCookieProcessor" in urllib2.
By applying it to your opener.
cookieHandler = urllib2.HTTPCookieProcessor() # Needed for cookie handling
# Apply the handler to an opener
opener = urllib2.build_opener(cookieHandler)
It seems that you are not accepting and saving the cookie(s) required by the page you are trying to access. This is not surprising given that urllib2 does not automatically do this for you. As others have said you'll have to explicitly write code to accept cookies. Something like this:
import urllib2, cookielib
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
login_form_data = urllib.urlencode(login_form_seq)
site = opener.open(B, login_form_data).read()
This would be a good time to read up about cookielib and HTTP state management in Python.