I want to write a program that opens the browser and open a url with a given cookie. I dont know how to do this. Maybe I could modify the cookies in the default place.
import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
f = opener.open("http://example.com/")
Modules to look into:
urllib2
cookielib
Cookie
In python, you can emulate a browser with the mechanize library. Also, there is good documentation about mechanize and cookies.
Related
I am a very beginner of Python. And I tried to crawl some product information from my www.Alibaba.com console. When I came to the visitor details page, I found the cookie changed every time when I clicked the search button. I found the cookie changed for each request. I can not crawl the data in the way I crawled from other pages where the cookies were fixed in a certain period.
After comparing the cookie data, I found here were only 3 key-value pairs were changed. I think those 3 values made me fail to crawl the data. So I want to know how to handle such situation.
For python3 the http.client in the standard library can be configured to use an http.cookiejar CookieJar which will keep track of cookies within the client automatically.
You can set this up like this:
import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
If you're using pyhton2 then a similar approach works with urllib:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
r = opener.open("http://example.com/")
I‘m trying to make a web-spider using python but I've got some problems when I tried to login the web site Pixiv.My code is as below:
import sys
import urllib
import urllib2
import cookielib
url="https://www.secure.pixiv.net/login.php"
cookiename='123.txt'
cookie = cookielib.MozillaCookieJar(cookiename)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
cookie.save()
values={'model':'login',
'return_to':'/',
'pixiv_id':'username',
'pass':'password',
'skip':'1'}
headers = { 'User-Agent' : 'User-Agent' }
data=urllib.urlencode(values)
req=urllib2.Request(url,data)
response=urllib2.urlopen(req)
the_page=response.read()
cookie.save()
To make sure it works, I used the cookielib to save the cookie as a txt file.I ran the code and got a "cookie.txt",but when I open the file I found that it was rmpty,in another word,my code didn't work.
I don't know what's wrong with it.
The problem is you're not using the opener that you created with the cookiejar attached to it in order to make the request. urllib2.urlopen has no way of knowing that you want to use that opener to start the request.
You can either use the opener's open method directly or, if you want to use this by default for the rest of your application, you can install it as the default opener for all requests made with urllib2 using urllib2.install_opener. So give that a try and see if it does the trick.
I know this sounds weird, but I have got no choice, I searched the google and I found nothing, So..
I'm following a video tutorial https://www.youtube.com/watch?v=JEW50aEVi4k on 'building a webbrowser in python', I was wondering if cookies can be saved, So is it possible ?
If yes, then could you give some suggestions.
Cookies are not a problem - you can use mechanize (https://pypi.python.org/pypi/mechanize/) which saves and sends the cookies automatically.
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
response = browser.open('http://www.youtube.com')
#Headers are handled automatically. You can access them:
headers = browser.request.header_items()
>>> headers
[('Host', 'www.youtube.com'), ('Cookie', 'YSC=cNcoiHG71bY; VISITOR_INFO1_LIVE=uLHsDODGalg; PREF=f1=50000000'), ('User-agent', 'Python-urllib/2.7')]
It is very hard to write a browser with Javascript support. If you need javasctipt then i suggest you to use selenium with PhantomJS which acts just like a real browser.
I have created an opener with urllib2.build_opener() that contains a cookielib.CookieJar(), and now I wish to manually add a cookie to the opener.
How can I achieve this?
Like the second example of the cookielib documentation suggests:
import os, cookielib, urllib2
cj = cookielib.MozillaCookieJar()
cj.load(os.path.join(os.path.expanduser("~"), ".netscape", "cookies.txt"))
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
Here's the link:
Cookies examples
Above example applies to Mozilla cookies, but generic algorithm is the same.
If adding by hand is required, reading the documentation further, you can use:
http://docs.python.org/library/cookie.html#module-Cookie Cookie object, which you fill up the way you see fit and further on add it to a CookieJar with
CookieJar.set_cookie(cookie)
Set a Cookie, without checking with policy to see whether or not it should be set.
can anyone help me with loop i want loop that code
login_form_data = urllib.urlencode(login_form_seq)
opener = urllib2.build_opener()
site = opener.open(B, login_form_data).read()
the code allow me to login to site but site have problem and the problem is: you can't login from first time
that mean I have to press submit then when page reload press submit again... so i think loop will do that but How!?
You need to handle cookies. Look at the cookielib module.
If it is a cookie handling problem, use the "HTTPCookieProcessor" in urllib2.
By applying it to your opener.
cookieHandler = urllib2.HTTPCookieProcessor() # Needed for cookie handling
# Apply the handler to an opener
opener = urllib2.build_opener(cookieHandler)
It seems that you are not accepting and saving the cookie(s) required by the page you are trying to access. This is not surprising given that urllib2 does not automatically do this for you. As others have said you'll have to explicitly write code to accept cookies. Something like this:
import urllib2, cookielib
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
login_form_data = urllib.urlencode(login_form_seq)
site = opener.open(B, login_form_data).read()
This would be a good time to read up about cookielib and HTTP state management in Python.