Can someone tell me why this doesn't work?
import cookielib
import urllib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
data = urllib.urlencode({'session[username_or_email]':'twitter handle' , 'session[password]':'password'})
opener.open('https://twitter.com' , data)
stuff = opener.open('https://twitter.com')
print stuff.read()
Why doesn't this give the html of the page after logging in?
Please consider using an Oauth library for your task. Scraping the site using mechanize is not recommended because twitter can change the HTML specific stuffs any time, and then your code will break.
Check this out: Python-twitter at http://code.google.com/p/python-twitter/
Simplest example to post an update:
>>> import twitter
>>> api = twitter.Api(
consumer_key='yourConsumerKey',
consumer_secret='consumerSecret',
access_token_key='accessToken',
access_token_secret='accessTokenSecret')
>>> api.PostUpdate('Blah blah lbah!')
There can be many reasons why it is failing:
Twitter probably expects a User-Agent header, which you are not providing.
I didn't look at the HTML, but many be there's some Javascript at play before the form is actually submitted (I actually think this is the case, because I vaguely remember writing a very detailed answer on this exact thing (and I dont seem to find the link to it!)).
Related
I am a very beginner of Python. And I tried to crawl some product information from my www.Alibaba.com console. When I came to the visitor details page, I found the cookie changed every time when I clicked the search button. I found the cookie changed for each request. I can not crawl the data in the way I crawled from other pages where the cookies were fixed in a certain period.
After comparing the cookie data, I found here were only 3 key-value pairs were changed. I think those 3 values made me fail to crawl the data. So I want to know how to handle such situation.
For python3 the http.client in the standard library can be configured to use an http.cookiejar CookieJar which will keep track of cookies within the client automatically.
You can set this up like this:
import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
If you're using pyhton2 then a similar approach works with urllib:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
r = opener.open("http://example.com/")
I can manually see the cookies set in the browser.
How can I fetch the cookie from a Python script?
import requests
res=requests.get("https://stackoverflow.com/questions/50404771/python-get-cookiesof-a-website-saved-in-a-browser-chrome-firefox")
res.cookies
print(res.cookies.keys())
print(res.cookies["prov"])
I hope I read your question right.
You may want to ask "how do I read cookies already stored in my browser?". Which I don't think you can do. But Selenium would give you access to a new browser session with which you can obtain more cookies.
UPDATE
Thanks to Sraw for the pointer, I've tried this now but it wouldn't transfer my login to the requests API. So maybe it is not possible on modern sites, or the OP could try these tools since their question is clearer in their mind than ours.
import requests
import browsercookie
url = "https://stackoverflow.com/questions/50404771/python-get-cookiesof-a-website-saved-in-a-browser-chrome-firefox"
res=requests.get(url)
cj = browsercookie.chrome()
res2 = requests.get(url, cookies=cj)
import re
get_title = lambda html: re.findall('(.*?)', html, flags=re.DOTALL)[0].strip()
get_me = lambda html: re.findall('John', html, flags=re.DOTALL)
# At this point I had deleted my answer so this got nothing
# now my answer is reinstated it will return me but not in place of the login button.
print(len(get_me(res2.text)))
I am trying to login to website using urllib2 and cookiejar. It saves the session id, but when I try to open another link, which requires authentication it says that I am not logged in. What am I doing wrong?
Here's the code, which fails for me:
import urllib
import urllib2
import cookielib
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
# Gives response saying that I logged in succesfully
response = opener.open("http://site.com/login", "username=testuser&password=" + md5encode("testpassword"))
# Gives response saying that I am not logged in
response1 = opener.open("http://site.com/check")
Your implementation seems fine... and should work.
It should be sending in the correct cookies, but I see it as the case when the site is actually not logging you in.
How can you say that its not sending the cookies or may be cookies that you are getting are not the one that authenticates you.
Use : response.info() to see the headers of the responses to see what cookies you are receiving actually.
The site may not be logging you in because :
Its having a check on User-agent that you are not setting, since some sites open from 4 major browsers only to disallow bot access.
The site might be looking for some special hidden form field that you might not be sending in.
1 piece of advise:
from urllib import urlencode
# Use urlencode to encode your data
data = urlencode(dict(username='testuser', password=md5encode("testpassword")))
response = opener.open("http://site.com/login", data)
Moreover 1 thing is strange here :
You are md5 encoding your password before sending it over. (Strange)
This is generally done by the server before comparing to database.
This is possible only if the site.com implements md5 in javascript.
Its a very rare case, since only may be 0.01 % websites do that..
Check that - that might be the problem, and you are providing the hashed form and not the actual password to the server.
So, server would have been again calculating a md5 for your md5 hash.
Check out.. !!
:)
I had a similar problem with my own test server, which worked fine with a browser, but not with the urllib2.build_opener solution.
The problem seems to be in urllib2. As these answers suggest, it's easy to use more powerful mechanize library instead of urllib2:
cookieJar = cookielib.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cookieJar)
opener = mechanize.build_opener(*browser.handlers)
And the opener will work as expected!
I am using urllib2 in Python to post login data to a web site.
After successful login, the site redirects my request to another page. Can someone provide a simple code sample on how to do this in Python with urllib2? I guess I will need cookies also to be logged in when I get redirected to another page. Right?
Thanks a lot in advace.
First, get mechanize: http://wwwsearch.sourceforge.net/mechanize/
You could do this kind of stuff with just urllib2, but you will be writing tons of boilerplate code, and it will be buggy.
Then:
import mechanize
br = mechanize.Browser()
br.open('http://somesite.com/account/signin/')
br.select_form('loginForm')
br['username'] = 'jekyll'
br['password'] = 'bananas'
br.submit()
# At this point, you're logged in, redirected, and the
# br object has the cookies and all that.
br.geturl() # e.g. http://somesite.com/loggedin/
Then you can use the Browser object br and do whatever you have to do, click on links, etc. Check the samples on the mechanize site
I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)#make opener w/ handlers
#build the url
login_info = {'username':USER,'password':PASS}#USER and PASS are defined
data = urllib.urlencode(login_info)
req = urllib2.Request("http://www.hulu.com/account/authenticate",data)#make the request
test = opener.open(req) #open the page
print test.read() #print html results
The code compiles and runs, but all that prints is:
Login.onError("Please \074a href=\"/support/login_faq#cant_login\"\076enable cookies\074/a\076 and try again.");
I assume there is some error in how I'm handling cookies, but just can't seem to spot it. I've heard Mechanize is a very useful module for this type of program, but as this seems to be the only speed bump left, I was hoping to find my bug.
What you're seeing is a ajax return. It is probably using javascript to set the cookie, and screwing up your attempts to authenticate.
The error message you are getting back could be misleading. For example the server might be looking at user-agent and seeing that say it's not one of the supported browsers, or looking at HTTP_REFERER expecting it to be coming from hulu domain. My point is there are two many variables coming in the request to keep guessing them one by one
I recommend using an http analyzer tool, e.g. Charles or the one in Firebug to figure out what (header fields, cookies, parameters) the client sends to server when you doing hulu login via a browser. This will give you the exact request that you need to construct in your python code.