I am trying to log onto a website using Python. I have written the code to connect to the target but I need to login and select a button on the website and wait for a response. I have looked at the HTTP Protocol in Python and was thinking of using 'HTTPConnection.putrequest'. I am not sure how to do this, I have the code I have so far below:
def testHTTPS(self):
c = httplib.HTTPSConnection(ip)
c.request("GET", "/")
response = c.getresponse()
self.assertEqual(response.status, 200) # '200' is success code
conn.close()
And the code for the logon function on the website is:
<td align="right" id="lgn_userName"></td>
<td><input type="text" class="button" name="username" id="username" size="24" maxlength="16" accesskey="u" tabindex="1" value=""/></td>
</tr>
<tr>
<td align="right" id="lgn_userPwd"></td>
<td><input type="password" class="button" name="password" id="password" size="24" maxlength="20" accesskey="p" tabindex="2" value=""/></td>
</tr>
<tr>
<td align="right"> </td>
<td>
<input type="submit" id="lgn_button" class="button" tabindex="3" accesskey="s" />
</td>
Does anyone know how to go about this?
Thanks
Yes, you use mechanize, which a sort of a "webbrowser" for Python. With it you can easily open web pages, find forms, fill in form values and submit the forms from Python. I use it (via Zopes testbrowser module) for testing web applications.
Use urllib2 and create a POST request.
For more information, read:
urllib2: submitting a form and then redirecting
How to make python urllib2 follow redirect and keep post method
How do I send a HTTP POST value to a (PHP) page using Python?.
Related
Just a question regarding some scraping authentication. Using BeautifulSoup:
#importing the requests lib
import requests
from bs4 import BeautifulSoup
#specifying the page
page = requests.get("http://localhost:8080/login?from=%2F")
#parsing through the api
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
From here the output, I think would be important:
<table>
<tr>
<td>
User:
</td>
<td>
<input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/>
</td>
</tr>
<tr>
<td>
Password:
</td>
<td>
<input name="j_password" type="password"/>
</td>
</tr>
<tr>
<td align="right">
<input id="remember_me" name="remember_me" type="checkbox"/>
</td>
<td>
<label for="remember_me">
Remember me on this computer
</label>
</td>
</tr>
</table>
This scrapes the website fine, but it requires a login. Here I am using the mechanicalsoup library:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("http://localhost:8080/login?from=%2F")
browser.get_url()
browser.get_current_page()
browser.get_current_page().find_all('form')
browser["j_username"] = "admin"
browser ["j_password"] = "password"
browser.launch_browser()
However it still won't let me login.
Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?
I see you're using requests. The syntax for logging in to a site is as follows:
import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))
Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/
With MechanicalSoup, you first need to specify the form you want to fill-in and submit. If you have only one form, use:
browser.select_form()
Then, after filling-in the form, you need to submit it:
browser.submit_selected()
You may read the (newly written) MechanicalSoup tutorial or look at examples like logging in into GitHub with MechanicalSoup.
Python/Flask/Bootstrap noob here. I'm trying to build a web-app to control a speaker selector. I'm using bootstrap and Ti-Ta Toggles to beautify the app a bit, but basically it consists of 4-5 checkbox/toggles. Here's what my HTML looks like right now:
<form name="input" action="/" method="post">
<div class="row">
<div class="col-md-6">
<table class="table">
<tbody>
<tr>
<td>Living Room</td>
<td>
<div class="checkbox checkbox-slider-lg checkbox-slider--a checkbox-slider-info">
<label>
<input name="spkrs-00" type="checkbox" onclick="this.form.submit()" checked><span></span>
</label>
</div>
</td>
</tr>
<tr>
<td>Kitchen</td>
<td>
<div class="checkbox checkbox-slider-lg checkbox-slider--a checkbox-slider-info">
<label>
<input name="spkrs-01" type="checkbox" onclick="this.form.submit()"><span></span>
</label>
</div>
</td>
</tr>
<tr>
<td>Dining Room</td>
<td>
<div class="checkbox checkbox-slider-lg checkbox-slider--a checkbox-slider-info">
<label>
<input name="spkrs-02" type="checkbox" onclick="this.form.submit()"><span></span>
</label>
</div>
</td>
</tr>
<tr>
<td>Unconnected</td>
<td>
<div class="checkbox checkbox-slider-lg checkbox-slider--a checkbox-slider-info">
<label>
<input name="spkrs-03" type="checkbox" onclick="this.form.submit()" disabled><span></span>
</label>
</div>
</td>
</tr>
<tr>
<td>Protection</td>
<td>
<div class="checkbox checkbox-slider-lg checkbox-slider--a checkbox-slider-warning">
<label>
<input name="protection" type="checkbox" onclick="this.form.submit()"><span></span>
</label>
</div>
</td>
</tr>
</tbody>
</table>
</div>
So, what I'm trying to figure out is how to handle the POST data from the checkbox inputs in my Python/Flask app. I was trying to do a simple test which looks like the following:
from flask import Flask, request, render_template
import time
app = Flask(__name__)
#app.route('/', methods=['POST','GET'])
def change():
if request.method == 'POST':
spkr_00_state = request.args['spkrs-00']
spkr_01_state = request.args['spkrs-01']
spkr_02_state = request.args['spkrs-02']
protection_state = request.args['protection']
speaker_states = [spkrs_00_state, spkrs_01_state, spkrs_02_state, protection_state]
return render_template('index.html', speaker_states=speakers_states)
else:
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=80)
However, I get Bad Request messages, etc. So, I'm a bit lost on how this should work. Should I create separate forms for each toggle? Should I put "try" if statements around the request.args?
OK, just in case someone else stumbles upon this post later and is curious, I was able to figure out what my issues were. Mainly, my issue was that by default checkboxes will only POST when checked. Therefore if you do not have a particular box checked (in this case it was the toggle switches I was using in bootstrap Ti-Ta Toggles) then there will be no POST information when checked.
In Flask/Python, when you try to request the post data for a particular checkbox/toggle, and it doesn't exist, then you will get a bad request error. For example, the following will likely generate an error if the checkbox spkrs_02 after POST.
spkr_state[1] = request.form['spkrs_02']
The way to get around this is to use a hidden input tag after the input tag for the checkbox. This will return a value in post, even if the input tag isn't checked/toggled.
For example it would look like something like this (in your HTML file) if you were setting up a checkbox(toggle) using :
<input name="spkrs_02" type="checkbox" onclick="this.form.submit()"><span>Kitchen</span>
<input name="spkrs_02" type="hidden" value="off">
That last line will, as mentioned above, provide some feedback in post, when the "box" is not checked.
Also a side note that I used onclick="this.form.submit()" which was helpful in tacking action on a toggle/checkbox immediately when it is clicked. I'll be honest that I'm not sure if that is the proper way to handle this, but it worked well for me.
Anyway, good luck!
I have searched all over the Internet, looking at many examples and have tried every one I've found, yet none of them are working for me, so please don't think this is a duplicate - I need help with my specific case.
I'm trying to log into a website using Python (in this instance I'm trying with v2.7 but am not opposed to using a more recent version, it's just I've been able to find the most info on 2.7).
I need to fill out a short form, consisting simply of a username and password.
The form of the webpage I need to fill out and log in to is as follows (it's messy, I know):
<form method="post" action="login.aspx?ReturnUrl=..%2fwebclient%2fstorepages%2fviewshifts.aspx" id="Form1">
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTU4MTgwOTM1NWRkBffWXYjjifsi875vSMg9OVkhxOQYYstGTNcN9/PFb+M=" />
</div>
<div class="aspNetHidden">
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAAVrmuRkG3j6RStt7rezNSLKVK7BrRAtEiqu9nGFEI+jB3Y2+Mc6SrnAqio3oCKbxYY85pbWlDO2hADfoPXD/5td+Ot37oCEEXP3EjBFcbJhKJGott7i4PNQkjYd3HFozLgRvbhbY2j+lPBkCGQJXOEe" />
</div>
<div><span></span>
<table style="BORDER-COLLAPSE: collapse" borderColor="#000000" cellSpacing="0" cellPadding="0"
width="600" align="center" border="1">
<tr>
<td>
<table cellSpacing="0" cellPadding="0" width="100%" align="center" border="0">
<tr>
<td width="76%"><span id="centercontentTitle"></span>
<H1 align="center"><br>
<span>
<IMG height="52" src="../images/logo-GMR.jpg" width="260"></span><span><br>
</span></H1>
<div id="centercontentbody">
<div align="center">
<TABLE width="350">
<TR>
<TD class="style7">Username:</TD>
<TD>
<div align="right"><input name="txtUsername" type="text" id="txtUsername" style="width:250px;" /></div>
</TD>
</TR>
<TR>
<TD class="style7">Password:</TD>
<TD>
<div align="right"><input name="txtPassword" type="password" id="txtPassword" style="width:250px;" /></div>
</TD>
</TR>
<TR>
<TD></TD>
<TD align="right"><input type="submit" name="btnSubmit" value="Submit" id="btnSubmit" /><input type="submit" name="btnCancel" value="Cancel" id="btnCancel" /></TD>
</TR>
<TR>
<TD colspan="2" align="center"></TD>
</TR>
</TABLE>
</div>
</div>
</td>
<td>
<div align="center" style='height:250px'></div>
</td>
</tr>
</table>
</td>
</tr>
</table>
<br>
<br>
<p> </p>
</form>
From searching around online, the best Python code I have found to fill out this form and log into the website is as follows:
Note: This is not my code, I got it from this question/example, where many people have said they've found it to work well.
import cookielib
import urllib
import urllib2
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'LoginTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = '<URL I am trying to log into>'
# Input parameters we are going to send
payload = {
'__EVENTVALIDATION': '/wEdAAVrmuRkG3j6RStt7rezNSLKVK7BrRAtEiqu9nGFEI+jB3Y2+Mc6SrnAqio3oCKbxYY85pbWlDO2hADfoPXD/5td+Ot37oCEEXP3EjBFcbJhKJGott7i4PNQkjYd3HFozLgRvbhbY2j+lPBkCGQJXOEe"',
'txtUsername': '<USERNAME>',
'txtPassword': '<PASSWORD>',
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
Unfortunately, this is not working for me and I'm unable to figure out why. If someone could please please please look over the code and tell me how I could improve it so as it works as it should. It would be so greatly appreciated!
Thanks in advance for all help I receive :)
__EVENTVALIDATION is probably not static, you need to load the login page in python, get the __EVENTVALIDATION field and then do the login.
Something like this should work:
import requests
from bs4 import BeautifulSoup
s = requests.session()
def get_eventvalidation():
r = s.get("http://url.to.login.page")
bs = BeautifulSoup(r.text)
return bs.find("input", {"name":"__EVENTVALIDATION"}).attrs['value']
authentication_url = '<URL I am trying to log into>'
payload = {
'__EVENTVALIDATION': get_eventvalidation(),
'txtUsername': '<USERNAME>',
'txtPassword': '<PASSWORD>',
}
login = s.post(authentication_url, data=payload)
print login.text
You need the requests module and beautifulsoup4. Or you can just rewrite it to not use libraries.
Edit:
You probably need __VIEWSTATE as a POST value.
FIXED! updated with working code
I have been going about trying to make this auto login thing working for me.
Do note I'm still a Python novice at this point.
The following is the html code I found when I inspected the relevant form:
<form action="/cgi-bin/netlogin.pl" method="post" name="netlogin">
<tr>
<td><div align="right">Intranet userid:</div></td>
<td><input type="text" size="20" maxlength="50" name="uid" id="uid" class="formField" /></td>
</tr>
<tr>
<td><div align="right">Wachtwoord:</div></td>
<td><input type="password" size="20" maxlength="50" name="pwd29296" class="formField" autocomplete="off"/></td>
</tr>
<tr>
<td> </td>
<td><input type="submit" name="submit" value="Login" /></td>
</tr>
</form>
I start off with getting the portal html content to find the right "pwdXXXX" variable. As with every refresh, the ID for the password form changes. To my understanding this, and the regex works.
It does go wrong when trying to pass the password. Which makes me think I got it wrong with the form its keys? I have absolutely no clue. I also tried using the urllib2 approach instead of using mechanize. No result either.
Working code:
url = "https://netlogin.kuleuven.be/cgi-bin/wayf2.pl?inst=kuleuven&lang=nl&submit=Ga+verder+%2F+Continue"
br = mechanize.Browser()
br.set_handle_robots(False)
br.open(url)
br.select_form(name = "netlogin")
form = str(br.form)
uID = "uid"
dynamic_pwID = re.findall(r"(pwd\d+)", form) #pwID changes when page is refreshed
pwID = dynamic_pwID[0]
br[uID] = "xxxx"
br[pwID]= "xxxx"
res = br.submit()
A part of your problem may well be that since you close the socket that you read the page with urllib with your mechanize session will have a different ID and so require a new token.
You will need to keep a single connection open for the duration of the session. So I think that you will need to parse the contents of the reply to br.read() to find your value for pwID.
Comment From OP:
I left out the urllib part and it's working now. I used str(br.form) instead of br.read() though.
I'm looking to grab the form data that needs to be passed along to a specific website and submit it. Below is the html(form only) that I need to simulate. I've been working on this for a few hours, but can't seem to get anything to work. I want this to work in Google App Engine. Any help would be nice.
<form method="post" action="/member/index.bv">
<table cellspacing="0" cellpadding="0" border="0" width="100%">
<tr>
<td align="left">
<h3>member login</h3><input type="hidden" name="submit" value="login" /><br />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
email:<br />
<input type="text" name="email" style="width: 140px;" />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
password:<br />
<input type="password" name="password" style="width: 140px;" />
</td>
</t>
<tr>
<td>
<input type="image" class="formElementImageButton" src="/resources/default/images/btnLogin.gif" style="width: 46px; height: 17px;" />
</td>
</tr>
<tr>
<td align="left">
<div style="line-height: 1.5em;">
join<br />
forgot password?<input type="hidden" name="lastplace" value="%2F"><br />
having trouble logging on, click here for help
</div>
</td>
</tr>
</table>
</form>
currently I'm trying to use this code to access it, but it's not working. I'm pretty new to this, so maybe I'm just missing it.
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'email' : 'someemail#gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
Is this login page for a 3rd party site? If so, there may be more to it than simply posting the form inputs.
For example, I just tried this with the login page on one of my own sites. A simple post request won't work in my case, and this may be the same with the login page you are accessing as well.
For starters the login form may have a hidden csrf token value that you have to send when posting your login request. This means you'd have to first get the login page and parse the resulting html for the csrf token value. The server may also require its session cookie in the login request.
I'm using the requests module to handle the get/post and beautifulsoup to parse the data.
import requests
import zlib
from BeautifulSoup import BeautifulSoup
# first get the login page
response = requests.get('https://www.site.com')
# if content is zipped, then you'll need to unzip it
html = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
# parse the html for the csrf token
soup = BeautifulSoup(html)
csrf_token = soup.find(name='input', id='csrf_token')['value']
# now, submit the login data, including csrf token and the original cookie data
response = requests.post('https://www.site.com/login',
{'csrf_token': csrf_token,
'username': 'username',
'password': 'ckrit'},
cookies=response.cookies)
login_result = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
print login_result
I cannot say if GAE will allow any of this or not, but at least it might be helpful in figuring out what you may require in your particular case. Also, as Carl points out, if a submit input is used to trigger the post you'd have to include it. In my particular example, this isn't required.
You're missing the hidden submit=login argument. Have you tried:
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'submit':'login',
'email' : 'someemail#gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()