I am trying to use a python script to login and grab the html from my Google Checkout account. It seems to login but returns a strange page:
Which doesn't have any of the order info which I am trying to parse. I know Google Checkout has an API but there is no way to parse just the payout totals, which is all I care about.
Here is my code:
import urllib, urllib2, cookielib
username = 'username'
password = 'password'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'Email' : username, 'Passwd' : password})
opener.open('https://accounts.google.com/ServiceLogin?service=sierra&passive=1200&continue=https://checkout.google.com/sell/orders&followup=https://checkout.google.com/sell/orders<mpl=seller&scc=1&authuser=0', login_data)
resp = opener.open('https://checkout.google.com/sell/payouts')
f = file('test.html', 'w')
f.write(resp.read())
f.close()
print "Finished"
How can I get this code to display the proper HTML of my account so I can parse it?
It depends on what sort of browser-detection or javascript tricks Google Checkout may be using. It may be enough simply to set your User-Agent to that of a well-known desktop browser- from the screenshot, it seems Google Checkout is assuming you're on a mobile browser.
Related
I have TP Link router (WR841N).I want to login into my TP link router and needs to change primary and secondary DNSusing script.
I tried to login using below script but not succeeded:
import urllib2
import urllib
import cookielib
def main():
userName = 'admin'
pcPassword = 'admin'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'userName' : userName, 'pcPassword' : pcPassword})
resp = opener.open('http://192.168.0.1/userRpm/LoginRpm.htm', login_data)
print(resp.read())
if __name__ == '__main__':
main()
And then how to change primary and secondary dns using script.
CookieProcessor doesn't set POST header, obviously.
You need to set Content-Type and Content-Length to match your login_data.
I would recommend you to install the opener you built using urllib2.install_opener(), and then use request:
r = urllib2.Request('http://192.168.0.1/userRpm/LoginRpm.htm')
r.add_header("Content-Type", "application/x-www-form-urlencoded")
r.add_header("Content-Length", str(len(login_data)))
r.add_data(login_data)
u = urllib2.urlopen(r)
print u.read()
u.close()
Then you have to continue with filling all other forms to change what you want.
If cookies aren't managed by javascript, you will be able to do it. If they are, perhaps even then if you examine the code carefully and extract cookie results manually from javascript code. I did it before.
But, yeah, SSH or telnet or rlogin would be easier than HTTP. To continue using HTTP, take a look at Requests package, it can be helpful, and make your code smaller. It includes session managing for you.
Adding urlencoded type to content-type might not help if login form has enctype attribute set to something else. (plain text or multipart).
I don't think that'll be a case, but if it is you can still do it with a bit more work.
I'm working on a screen scraper using BeautifulSoup for what.cd using Python. I came across this script while working and decided to look at it, since it seems to be similar to what I'm working on. However, every time I run the script I get a message that my credentials are wrong, even though they are not.
As far as I can tell, I'm getting this message because when the script tries to log into what.cd, what.cd is supposed to return a cookie containing the information that lets me request pages later in the script. So where the script is failing is:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username,
'password' : password})
check = opener.open('http://what.cd/login.php', login_data)
soup = BeautifulSoup(check.read())
warning = soup.find('span', 'warning')
if warning:
exit(str(warning)+'\n\nprobably means username or pw is wrong')
I've tried multiple methods of authenticating with the site including using CookieFileJar, the script located here, and the Requests module. I've gotten the same HTML message with each one. It says, in short, that "Javascript is disabled", and "Cookies are disabled", and also provides a login box in HTML.
I don't really want to mess around with Mechanize, but I don't see any other way to do it at the moment. If anyone can provide any help, it would be greatly appreciated.
After a few more hours of searching, I found the solution to my problem. I'm still not sure why this code works as apposed to the version above, but it does. Here is the code I'm using now:
import urllib
import urllib2
import cookielib
cj = cookielib.LWPCookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
request = urllib2.Request("http://what.cd/index.php", None)
f = urllib2.urlopen(request)
f.close()
data = urllib.urlencode({"username": "your-login", "password" : "your-password"})
request = urllib2.Request("http://what.cd/login.php", data)
f = urllib2.urlopen(request)
html = f.read()
f.close()
Credit goes to carl.waldbieser from linuxquestions.org. Thanks for everyone who gave input.
I need to use Python to download a large number of URLS, but they require a password to access them (similar to systems like cpanel, for example).
Is there a way I can do this, storing the cookie?
I'd like to use urllib2 if possible.
EDIT: To clarify, it's my website and I have the login details.
UPDATE:
OK I'm using this:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'login_name' : username, 'password' : password})
opener.open(loginURL, login_data)
productlist = opener.open(productURL)
print productlist.read()
But it just spits out the login page again. What isn't working?
(Variables are there, I just didn't show you what they are for security)
You have to use the urllib2.HTTPCookieProcessor, like this:
import urllib2
from cookielib import CookieJar
cookiejar = CookieJar()
opener = urllib2.build_opener()
cookieproc = urllib2.HTTPCookieProcessor(cookiejar)
opener.add_handler(cookieproc)
Then you just use opener.open() to access URLs, and cookies will automatically be saved and reused in future requests.
I've been reading about Python's urllib2's ability to open and read directories that are password protected, but even after looking at examples in the docs, and here on StackOverflow, I can't get my script to work.
import urllib2
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri='https://webfiles.duke.edu/',
user='someUserName',
passwd='thisIsntMyRealPassword')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
socks = urllib2.urlopen('https://webfiles.duke.edu/?path=/afs/acpub/users/a')
print socks.read()
socks.close()
When I print the contents, it prints the contents of the login screen that the url I'm trying to open will redirect you to. Anyone know why this is?
auth_handler is only for basic HTTP authentication. The site here contains a HTML form, so you'll need to submit your username/password as POST data.
I recommend you using the mechanize module that will simplify the login for you.
Quick example:
import mechanize
browser = mechanize.Browser()
browser.open('https://webfiles.duke.edu/?path=/afs/acpub/users/a')
browser.select_form(nr=0)
browser.form['user'] = 'username'
browser.form['pass'] = 'password'
req = browser.submit()
print req.read()
I am looking for a link checker to spider my website and log invalid links, the problem is that I have a Login page at the start which is required. What i want is a link checker to run through the command post login details then spider the rest of the website.
Any ideas guys will be appreciated.
I've just recently solved a similar problem like this:
import urllib
import urllib2
import cookielib
login = 'user#host.com'
password = 'secret'
cookiejar = cookielib.CookieJar()
urlOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
# adjust this to match the form's field names
values = {'username': login, 'password': password}
data = urllib.urlencode(values)
request = urllib2.Request('http://target.of.POST-method', data)
url = urlOpener.open(request)
# from now on, we're authenticated and we can access the rest of the site
url = urlOpener.open('http://rest.of.user.area')
You want to look at the cookielib module: http://docs.python.org/library/cookielib.html. It implements a full implementation of cookies, which will let you store login details. Once you're using a CookieJar, you just have to get login details from the user (say, from the console) and submit a proper POST request.