How to programmatically log into website in Python - python

I have searched all over the Internet, looking at many examples and have tried every one I've found, yet none of them are working for me, so please don't think this is a duplicate - I need help with my specific case.
I'm trying to log into a website using Python (in this instance I'm trying with v2.7 but am not opposed to using a more recent version, it's just I've been able to find the most info on 2.7).
I need to fill out a short form, consisting simply of a username and password.
The form of the webpage I need to fill out and log in to is as follows (it's messy, I know):
<form method="post" action="login.aspx?ReturnUrl=..%2fwebclient%2fstorepages%2fviewshifts.aspx" id="Form1">
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTU4MTgwOTM1NWRkBffWXYjjifsi875vSMg9OVkhxOQYYstGTNcN9/PFb+M=" />
</div>
<div class="aspNetHidden">
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAAVrmuRkG3j6RStt7rezNSLKVK7BrRAtEiqu9nGFEI+jB3Y2+Mc6SrnAqio3oCKbxYY85pbWlDO2hADfoPXD/5td+Ot37oCEEXP3EjBFcbJhKJGott7i4PNQkjYd3HFozLgRvbhbY2j+lPBkCGQJXOEe" />
</div>
<div><span></span>
<table style="BORDER-COLLAPSE: collapse" borderColor="#000000" cellSpacing="0" cellPadding="0"
width="600" align="center" border="1">
<tr>
<td>
<table cellSpacing="0" cellPadding="0" width="100%" align="center" border="0">
<tr>
<td width="76%"><span id="centercontentTitle"></span>
<H1 align="center"><br>
<span>
<IMG height="52" src="../images/logo-GMR.jpg" width="260"></span><span><br>
</span></H1>
<div id="centercontentbody">
<div align="center">
<TABLE width="350">
<TR>
<TD class="style7">Username:</TD>
<TD>
<div align="right"><input name="txtUsername" type="text" id="txtUsername" style="width:250px;" /></div>
</TD>
</TR>
<TR>
<TD class="style7">Password:</TD>
<TD>
<div align="right"><input name="txtPassword" type="password" id="txtPassword" style="width:250px;" /></div>
</TD>
</TR>
<TR>
<TD></TD>
<TD align="right"><input type="submit" name="btnSubmit" value="Submit" id="btnSubmit" /><input type="submit" name="btnCancel" value="Cancel" id="btnCancel" /></TD>
</TR>
<TR>
<TD colspan="2" align="center"></TD>
</TR>
</TABLE>
</div>
</div>
</td>
<td>
<div align="center" style='height:250px'></div>
</td>
</tr>
</table>
</td>
</tr>
</table>
<br>
<br>
<p> </p>
</form>
From searching around online, the best Python code I have found to fill out this form and log into the website is as follows:
Note: This is not my code, I got it from this question/example, where many people have said they've found it to work well.
import cookielib
import urllib
import urllib2
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'LoginTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = '<URL I am trying to log into>'
# Input parameters we are going to send
payload = {
'__EVENTVALIDATION': '/wEdAAVrmuRkG3j6RStt7rezNSLKVK7BrRAtEiqu9nGFEI+jB3Y2+Mc6SrnAqio3oCKbxYY85pbWlDO2hADfoPXD/5td+Ot37oCEEXP3EjBFcbJhKJGott7i4PNQkjYd3HFozLgRvbhbY2j+lPBkCGQJXOEe"',
'txtUsername': '<USERNAME>',
'txtPassword': '<PASSWORD>',
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
Unfortunately, this is not working for me and I'm unable to figure out why. If someone could please please please look over the code and tell me how I could improve it so as it works as it should. It would be so greatly appreciated!
Thanks in advance for all help I receive :)

__EVENTVALIDATION is probably not static, you need to load the login page in python, get the __EVENTVALIDATION field and then do the login.
Something like this should work:
import requests
from bs4 import BeautifulSoup
s = requests.session()
def get_eventvalidation():
r = s.get("http://url.to.login.page")
bs = BeautifulSoup(r.text)
return bs.find("input", {"name":"__EVENTVALIDATION"}).attrs['value']
authentication_url = '<URL I am trying to log into>'
payload = {
'__EVENTVALIDATION': get_eventvalidation(),
'txtUsername': '<USERNAME>',
'txtPassword': '<PASSWORD>',
}
login = s.post(authentication_url, data=payload)
print login.text
You need the requests module and beautifulsoup4. Or you can just rewrite it to not use libraries.
Edit:
You probably need __VIEWSTATE as a POST value.

Related

Scraping a website with python 3 that requires login

Just a question regarding some scraping authentication. Using BeautifulSoup:
#importing the requests lib
import requests
from bs4 import BeautifulSoup
#specifying the page
page = requests.get("http://localhost:8080/login?from=%2F")
#parsing through the api
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
From here the output, I think would be important:
<table>
<tr>
<td>
User:
</td>
<td>
<input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/>
</td>
</tr>
<tr>
<td>
Password:
</td>
<td>
<input name="j_password" type="password"/>
</td>
</tr>
<tr>
<td align="right">
<input id="remember_me" name="remember_me" type="checkbox"/>
</td>
<td>
<label for="remember_me">
Remember me on this computer
</label>
</td>
</tr>
</table>
This scrapes the website fine, but it requires a login. Here I am using the mechanicalsoup library:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("http://localhost:8080/login?from=%2F")
browser.get_url()
browser.get_current_page()
browser.get_current_page().find_all('form')
browser["j_username"] = "admin"
browser ["j_password"] = "password"
browser.launch_browser()
However it still won't let me login.
Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?
I see you're using requests. The syntax for logging in to a site is as follows:
import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))
Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/
With MechanicalSoup, you first need to specify the form you want to fill-in and submit. If you have only one form, use:
browser.select_form()
Then, after filling-in the form, you need to submit it:
browser.submit_selected()
You may read the (newly written) MechanicalSoup tutorial or look at examples like logging in into GitHub with MechanicalSoup.

Using Python Requests Module to Submit a Form without Input Name

I'm trying to make a Post request using the python request module, but the input I am most interested in only has an id and no name attribute. And all the examples I've seen involve using that name attribute. How can I do this Post request for the following form:
<form id="search" method="post">
<select id="searchOptions" onchange="javascript:keepSearch(this);">
<option value="horses" selected>Horses</option>
<option value="jockeys">Jockeys</option>
<option value="trainers">Trainers</option>
<option value="owners">Owners</option>
<option value="tracks">Tracks</option>
<option value="stakes">Gr. Stakes</option>
</select>
<input type="hidden" id="searchVal" value="horses" name="searchVal">
<input class="input" id="searchInput" type="text" placeholder="Horse Name">
<span class="glyphicon glyphicon-search"></span>
<input type="submit" value="">
<span style="clear: both;">.</span>
</form>
I'm looking specifically at the input with the id="searchInput".
Currently, I'm trying this code: (which is only getting me the original homepage with the search bar)
data = {
'searchInput': name,
'searchVal' : "horses"
}
r = requests.post(self.equibaseHomeUrl, data=data)
If you take a look in firebug or chrome developer tools you can see how the post request is made:
So using that we can:
p = {"searchVal":"horses",
"horse_name":"zenyatta"}
import requests
r = requests.post("http://www.equibase.com/profiles/Results.cfm?type=Horse",p)
print(r.content)
Which if you look at the content you can see the search result for Zenyatta.
<table class="table-hover">
<tr>
<th>Horse</th>
<th>YOB</th>
<th>Sex</th>
<th>Sire</th>
<th>Dam</th>
</tr>
<tr>
<td ><a href='/profiles/Results.cfm?type=Horse&refno=8575618&registry=Q'>#Zenyatta-QH-5154943</a></td>
<td >2009</td>
<td >Gelding</td>
<td >
<a href='/profiles/Results.cfm?type=Horse&refno=7237823&registry=Q'>#Mr Ice Te-QH</a>
</td>
<td >
<a href='/profiles/Results.cfm?type=Horse&refno=6342673&registry=Q'>#She Sings Soprano-QH</a>
</td>
</tr>
<tr>
<td ><a href='/profiles/Results.cfm?type=Horse&refno=7156465&registry=T'>Zenyatta</a></td>
<td >2004</td>
<td >Mare</td>
<td >
<a href='/profiles/Results.cfm?type=Horse&refno=4531602&registry=T'>Street Cry (IRE)</a>
</td>
<td >
<a href='/profiles/Results.cfm?type=Horse&refno=4004138&registry=T'>Vertigineux</a>
</td>
</tr>
</table>
Or if you want to use the base url and pass the query:
data = {"searchVal": "horses",
"horse_name": "zenyatta"}
import requests
r = requests.post("http://www.equibase.com/profiles/Results.cfm",
data, params={"type": "Horse"})
Which if you run it you will see url get constructed correctly:
In [11]: r = requests.post("http://www.equibase.com/profiles/Results.cfm",
....: data, params={"type": "Horse"})
In [12]:
In [12]: print(r.url)
http://www.equibase.com/profiles/Results.cfm?type=Horse

Login to remote website

I'm trying to login to this site (now dead link). I provide my username and password (this site is not important) so that you can try it by your own, and test if it really works or not.
There are 2 problems:
How does this page handle CSRF? It doesn't save it on any cookie. How did it get it?
I use this code and it gives me HTTP 200, but it doesn't log me in. I need to login with my username and password and get the next page HTML.
import requests
>>> url = 'http://dining.ut.ac.ir/login'
>>> signin = {'username' : '810192485' , 'password' : '0923122265' , '_csrf_token' : '14e993b708cbe5f8f7b356b6944bff98'}
>>> x = requests.post(url, data = signin)
>>> x
<Response [200]>
The login part of login page HTML:
<form action="/login" method="post">
<input type="hidden" name="signin[_csrf_token]" value="14e993b708cbe5f8f7b356b6944bff98" id="signin__csrf_token" />
<table id="loginDatagrid">
<tr>
<td width="300" align="left" valign="bottom"><label style="position:relative;left:5px;bottom:5px;" for="signin_username">نام‌ کاربري (شماره دانشجویی/پرسنلی) : </label></td>
<td width="100" align="right" valign="bottom"><div class="loginboxdiv"><input class="loginbox" type="text" name="signin[username]" id="signin_username" class="text" size="5" onclick='inputSelected("signin_username")'/></div> </td>
<td width="45"> </td>
</tr>
<tr>
<td width="300" align="left" valign="top"><label style="position:relative;left:5px;top:5px; "for="signin_password">رمز عبور (کد ملی): </label></td>
<td width="100" align="right" valign="top"><div class="loginboxdiv"><input class="loginbox" type="password" name="signin[password]" id="signin_password" class="text" onclick='inputSelected("signin_password")'/> </div>
</td>
<td width="45" align="right" valign="top"> <input SRC="images/submit_form.jpg" type="image" value="" /> </td>
</tr>
</table>
</form >
You're not posting the fields the form expects. As you can see from the HTML, all the form fields are in Rails/PHP hash style: you need to use the same format.
signin = {'signin[username]' : '810192485' , 'signin[password]' : '0923122265' , 'signing[_csrf_token]' : '14e993b708cbe5f8f7b356b6944bff98'}

Grab Form Data Via Python

I'm looking to grab the form data that needs to be passed along to a specific website and submit it. Below is the html(form only) that I need to simulate. I've been working on this for a few hours, but can't seem to get anything to work. I want this to work in Google App Engine. Any help would be nice.
<form method="post" action="/member/index.bv">
<table cellspacing="0" cellpadding="0" border="0" width="100%">
<tr>
<td align="left">
<h3>member login</h3><input type="hidden" name="submit" value="login" /><br />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
email:<br />
<input type="text" name="email" style="width: 140px;" />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
password:<br />
<input type="password" name="password" style="width: 140px;" />
</td>
</t>
<tr>
<td>
<input type="image" class="formElementImageButton" src="/resources/default/images/btnLogin.gif" style="width: 46px; height: 17px;" />
</td>
</tr>
<tr>
<td align="left">
<div style="line-height: 1.5em;">
join<br />
forgot password?<input type="hidden" name="lastplace" value="%2F"><br />
having trouble logging on, click here for help
</div>
</td>
</tr>
</table>
</form>
currently I'm trying to use this code to access it, but it's not working. I'm pretty new to this, so maybe I'm just missing it.
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'email' : 'someemail#gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
Is this login page for a 3rd party site? If so, there may be more to it than simply posting the form inputs.
For example, I just tried this with the login page on one of my own sites. A simple post request won't work in my case, and this may be the same with the login page you are accessing as well.
For starters the login form may have a hidden csrf token value that you have to send when posting your login request. This means you'd have to first get the login page and parse the resulting html for the csrf token value. The server may also require its session cookie in the login request.
I'm using the requests module to handle the get/post and beautifulsoup to parse the data.
import requests
import zlib
from BeautifulSoup import BeautifulSoup
# first get the login page
response = requests.get('https://www.site.com')
# if content is zipped, then you'll need to unzip it
html = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
# parse the html for the csrf token
soup = BeautifulSoup(html)
csrf_token = soup.find(name='input', id='csrf_token')['value']
# now, submit the login data, including csrf token and the original cookie data
response = requests.post('https://www.site.com/login',
{'csrf_token': csrf_token,
'username': 'username',
'password': 'ckrit'},
cookies=response.cookies)
login_result = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
print login_result
I cannot say if GAE will allow any of this or not, but at least it might be helpful in figuring out what you may require in your particular case. Also, as Carl points out, if a submit input is used to trigger the post you'd have to include it. In my particular example, this isn't required.
You're missing the hidden submit=login argument. Have you tried:
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'submit':'login',
'email' : 'someemail#gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()

Log onto a Website and select options using Python

I am trying to log onto a website using Python. I have written the code to connect to the target but I need to login and select a button on the website and wait for a response. I have looked at the HTTP Protocol in Python and was thinking of using 'HTTPConnection.putrequest'. I am not sure how to do this, I have the code I have so far below:
def testHTTPS(self):
c = httplib.HTTPSConnection(ip)
c.request("GET", "/")
response = c.getresponse()
self.assertEqual(response.status, 200) # '200' is success code
conn.close()
And the code for the logon function on the website is:
<td align="right" id="lgn_userName"></td>
<td><input type="text" class="button" name="username" id="username" size="24" maxlength="16" accesskey="u" tabindex="1" value=""/></td>
</tr>
<tr>
<td align="right" id="lgn_userPwd"></td>
<td><input type="password" class="button" name="password" id="password" size="24" maxlength="20" accesskey="p" tabindex="2" value=""/></td>
</tr>
<tr>
<td align="right"> </td>
<td>
<input type="submit" id="lgn_button" class="button" tabindex="3" accesskey="s" />
</td>
Does anyone know how to go about this?
Thanks
Yes, you use mechanize, which a sort of a "webbrowser" for Python. With it you can easily open web pages, find forms, fill in form values and submit the forms from Python. I use it (via Zopes testbrowser module) for testing web applications.
Use urllib2 and create a POST request.
For more information, read:
urllib2: submitting a form and then redirecting
How to make python urllib2 follow redirect and keep post method
How do I send a HTTP POST value to a (PHP) page using Python?.

Categories