I'm looking to grab the form data that needs to be passed along to a specific website and submit it. Below is the html(form only) that I need to simulate. I've been working on this for a few hours, but can't seem to get anything to work. I want this to work in Google App Engine. Any help would be nice.
<form method="post" action="/member/index.bv">
<table cellspacing="0" cellpadding="0" border="0" width="100%">
<tr>
<td align="left">
<h3>member login</h3><input type="hidden" name="submit" value="login" /><br />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
email:<br />
<input type="text" name="email" style="width: 140px;" />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
password:<br />
<input type="password" name="password" style="width: 140px;" />
</td>
</t>
<tr>
<td>
<input type="image" class="formElementImageButton" src="/resources/default/images/btnLogin.gif" style="width: 46px; height: 17px;" />
</td>
</tr>
<tr>
<td align="left">
<div style="line-height: 1.5em;">
join<br />
forgot password?<input type="hidden" name="lastplace" value="%2F"><br />
having trouble logging on, click here for help
</div>
</td>
</tr>
</table>
</form>
currently I'm trying to use this code to access it, but it's not working. I'm pretty new to this, so maybe I'm just missing it.
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'email' : 'someemail#gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
Is this login page for a 3rd party site? If so, there may be more to it than simply posting the form inputs.
For example, I just tried this with the login page on one of my own sites. A simple post request won't work in my case, and this may be the same with the login page you are accessing as well.
For starters the login form may have a hidden csrf token value that you have to send when posting your login request. This means you'd have to first get the login page and parse the resulting html for the csrf token value. The server may also require its session cookie in the login request.
I'm using the requests module to handle the get/post and beautifulsoup to parse the data.
import requests
import zlib
from BeautifulSoup import BeautifulSoup
# first get the login page
response = requests.get('https://www.site.com')
# if content is zipped, then you'll need to unzip it
html = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
# parse the html for the csrf token
soup = BeautifulSoup(html)
csrf_token = soup.find(name='input', id='csrf_token')['value']
# now, submit the login data, including csrf token and the original cookie data
response = requests.post('https://www.site.com/login',
{'csrf_token': csrf_token,
'username': 'username',
'password': 'ckrit'},
cookies=response.cookies)
login_result = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
print login_result
I cannot say if GAE will allow any of this or not, but at least it might be helpful in figuring out what you may require in your particular case. Also, as Carl points out, if a submit input is used to trigger the post you'd have to include it. In my particular example, this isn't required.
You're missing the hidden submit=login argument. Have you tried:
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'submit':'login',
'email' : 'someemail#gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
Related
So, im trying to make a program that can click a button from multiple links, links that i will get from a list, but first i need to understand how can I do this with only one link. They all have the same HTML structure, but I dont know how to do this.
HTML:
<div class="_55wr">
<form method="post">
<input type="hidden" name="fb_dtsg" value="AQG7lSxYN2mb:AQFMMcWJcZtZ" autocomplete="off">
<input type="hidden" name="jazoest" value="22090" autocomplete="off">
<table class="btnBar">
<tbody>
<tr>
<td>
<button type="submit" value="Bloquear" class="_54k8 _52jg _56bs _26vk _56b_ _56bu" name="confirmed" data-sigil="touchable"><span class="_55sr">Bloquear</span></button>
</td>
<td>
<button type="submit" value="Cancelar" class="_54k8 _52jg _56bs _26vk _56b_ _56bt" name="canceled" data-sigil="touchable"><span class="_55sr">Cancelar</span></button>
</td>
</tr>
</tbody>
</table>
</form>
</div>
The idea is to click the first button ('<button type="submit" value="Bloquear"...').
Current code:
import requests
auth = ('email#email.com', 'pass')
payload = {}
url = 'https://www.example.com'
s = requests.Session()
res = s.get('https://www.example.com')
cookies = res.cookies
r = requests.post(url, cookies = cookies, auth = auth, verify = False, payload = payload)
I searched for similar questions, but every question was using some "id" ({'id':'value'}), which I don't have here. So, what value should i use in payload?
The requests library makes HTTP requests which means that it does not render the JS and it can not click buttons. Monitor your network behavior using google dev tools to know what data are sent to the server when you click a button, and then make a POST request sending the same data by the params keyword in the request. For example
data = {'button' : 'clicked1'}
r.requests.post('your_url.com', params = data)
For clicking buttons, I would personally use the selenium library which emulates the browser and provides its automation
I would use Selenium.
The code would be like so:
driver = webdriver.Firefox()
button = driver.find_element_by_xpath("//button[#value = \'Bloquear\']")
button.click()
Just a question regarding some scraping authentication. Using BeautifulSoup:
#importing the requests lib
import requests
from bs4 import BeautifulSoup
#specifying the page
page = requests.get("http://localhost:8080/login?from=%2F")
#parsing through the api
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
From here the output, I think would be important:
<table>
<tr>
<td>
User:
</td>
<td>
<input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/>
</td>
</tr>
<tr>
<td>
Password:
</td>
<td>
<input name="j_password" type="password"/>
</td>
</tr>
<tr>
<td align="right">
<input id="remember_me" name="remember_me" type="checkbox"/>
</td>
<td>
<label for="remember_me">
Remember me on this computer
</label>
</td>
</tr>
</table>
This scrapes the website fine, but it requires a login. Here I am using the mechanicalsoup library:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("http://localhost:8080/login?from=%2F")
browser.get_url()
browser.get_current_page()
browser.get_current_page().find_all('form')
browser["j_username"] = "admin"
browser ["j_password"] = "password"
browser.launch_browser()
However it still won't let me login.
Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?
I see you're using requests. The syntax for logging in to a site is as follows:
import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))
Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/
With MechanicalSoup, you first need to specify the form you want to fill-in and submit. If you have only one form, use:
browser.select_form()
Then, after filling-in the form, you need to submit it:
browser.submit_selected()
You may read the (newly written) MechanicalSoup tutorial or look at examples like logging in into GitHub with MechanicalSoup.
I have searched all over the Internet, looking at many examples and have tried every one I've found, yet none of them are working for me, so please don't think this is a duplicate - I need help with my specific case.
I'm trying to log into a website using Python (in this instance I'm trying with v2.7 but am not opposed to using a more recent version, it's just I've been able to find the most info on 2.7).
I need to fill out a short form, consisting simply of a username and password.
The form of the webpage I need to fill out and log in to is as follows (it's messy, I know):
<form method="post" action="login.aspx?ReturnUrl=..%2fwebclient%2fstorepages%2fviewshifts.aspx" id="Form1">
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTU4MTgwOTM1NWRkBffWXYjjifsi875vSMg9OVkhxOQYYstGTNcN9/PFb+M=" />
</div>
<div class="aspNetHidden">
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAAVrmuRkG3j6RStt7rezNSLKVK7BrRAtEiqu9nGFEI+jB3Y2+Mc6SrnAqio3oCKbxYY85pbWlDO2hADfoPXD/5td+Ot37oCEEXP3EjBFcbJhKJGott7i4PNQkjYd3HFozLgRvbhbY2j+lPBkCGQJXOEe" />
</div>
<div><span></span>
<table style="BORDER-COLLAPSE: collapse" borderColor="#000000" cellSpacing="0" cellPadding="0"
width="600" align="center" border="1">
<tr>
<td>
<table cellSpacing="0" cellPadding="0" width="100%" align="center" border="0">
<tr>
<td width="76%"><span id="centercontentTitle"></span>
<H1 align="center"><br>
<span>
<IMG height="52" src="../images/logo-GMR.jpg" width="260"></span><span><br>
</span></H1>
<div id="centercontentbody">
<div align="center">
<TABLE width="350">
<TR>
<TD class="style7">Username:</TD>
<TD>
<div align="right"><input name="txtUsername" type="text" id="txtUsername" style="width:250px;" /></div>
</TD>
</TR>
<TR>
<TD class="style7">Password:</TD>
<TD>
<div align="right"><input name="txtPassword" type="password" id="txtPassword" style="width:250px;" /></div>
</TD>
</TR>
<TR>
<TD></TD>
<TD align="right"><input type="submit" name="btnSubmit" value="Submit" id="btnSubmit" /><input type="submit" name="btnCancel" value="Cancel" id="btnCancel" /></TD>
</TR>
<TR>
<TD colspan="2" align="center"></TD>
</TR>
</TABLE>
</div>
</div>
</td>
<td>
<div align="center" style='height:250px'></div>
</td>
</tr>
</table>
</td>
</tr>
</table>
<br>
<br>
<p> </p>
</form>
From searching around online, the best Python code I have found to fill out this form and log into the website is as follows:
Note: This is not my code, I got it from this question/example, where many people have said they've found it to work well.
import cookielib
import urllib
import urllib2
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'LoginTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = '<URL I am trying to log into>'
# Input parameters we are going to send
payload = {
'__EVENTVALIDATION': '/wEdAAVrmuRkG3j6RStt7rezNSLKVK7BrRAtEiqu9nGFEI+jB3Y2+Mc6SrnAqio3oCKbxYY85pbWlDO2hADfoPXD/5td+Ot37oCEEXP3EjBFcbJhKJGott7i4PNQkjYd3HFozLgRvbhbY2j+lPBkCGQJXOEe"',
'txtUsername': '<USERNAME>',
'txtPassword': '<PASSWORD>',
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
Unfortunately, this is not working for me and I'm unable to figure out why. If someone could please please please look over the code and tell me how I could improve it so as it works as it should. It would be so greatly appreciated!
Thanks in advance for all help I receive :)
__EVENTVALIDATION is probably not static, you need to load the login page in python, get the __EVENTVALIDATION field and then do the login.
Something like this should work:
import requests
from bs4 import BeautifulSoup
s = requests.session()
def get_eventvalidation():
r = s.get("http://url.to.login.page")
bs = BeautifulSoup(r.text)
return bs.find("input", {"name":"__EVENTVALIDATION"}).attrs['value']
authentication_url = '<URL I am trying to log into>'
payload = {
'__EVENTVALIDATION': get_eventvalidation(),
'txtUsername': '<USERNAME>',
'txtPassword': '<PASSWORD>',
}
login = s.post(authentication_url, data=payload)
print login.text
You need the requests module and beautifulsoup4. Or you can just rewrite it to not use libraries.
Edit:
You probably need __VIEWSTATE as a POST value.
FIXED! updated with working code
I have been going about trying to make this auto login thing working for me.
Do note I'm still a Python novice at this point.
The following is the html code I found when I inspected the relevant form:
<form action="/cgi-bin/netlogin.pl" method="post" name="netlogin">
<tr>
<td><div align="right">Intranet userid:</div></td>
<td><input type="text" size="20" maxlength="50" name="uid" id="uid" class="formField" /></td>
</tr>
<tr>
<td><div align="right">Wachtwoord:</div></td>
<td><input type="password" size="20" maxlength="50" name="pwd29296" class="formField" autocomplete="off"/></td>
</tr>
<tr>
<td> </td>
<td><input type="submit" name="submit" value="Login" /></td>
</tr>
</form>
I start off with getting the portal html content to find the right "pwdXXXX" variable. As with every refresh, the ID for the password form changes. To my understanding this, and the regex works.
It does go wrong when trying to pass the password. Which makes me think I got it wrong with the form its keys? I have absolutely no clue. I also tried using the urllib2 approach instead of using mechanize. No result either.
Working code:
url = "https://netlogin.kuleuven.be/cgi-bin/wayf2.pl?inst=kuleuven&lang=nl&submit=Ga+verder+%2F+Continue"
br = mechanize.Browser()
br.set_handle_robots(False)
br.open(url)
br.select_form(name = "netlogin")
form = str(br.form)
uID = "uid"
dynamic_pwID = re.findall(r"(pwd\d+)", form) #pwID changes when page is refreshed
pwID = dynamic_pwID[0]
br[uID] = "xxxx"
br[pwID]= "xxxx"
res = br.submit()
A part of your problem may well be that since you close the socket that you read the page with urllib with your mechanize session will have a different ID and so require a new token.
You will need to keep a single connection open for the duration of the session. So I think that you will need to parse the contents of the reply to br.read() to find your value for pwID.
Comment From OP:
I left out the urllib part and it's working now. I used str(br.form) instead of br.read() though.
I am trying to log onto a website using Python. I have written the code to connect to the target but I need to login and select a button on the website and wait for a response. I have looked at the HTTP Protocol in Python and was thinking of using 'HTTPConnection.putrequest'. I am not sure how to do this, I have the code I have so far below:
def testHTTPS(self):
c = httplib.HTTPSConnection(ip)
c.request("GET", "/")
response = c.getresponse()
self.assertEqual(response.status, 200) # '200' is success code
conn.close()
And the code for the logon function on the website is:
<td align="right" id="lgn_userName"></td>
<td><input type="text" class="button" name="username" id="username" size="24" maxlength="16" accesskey="u" tabindex="1" value=""/></td>
</tr>
<tr>
<td align="right" id="lgn_userPwd"></td>
<td><input type="password" class="button" name="password" id="password" size="24" maxlength="20" accesskey="p" tabindex="2" value=""/></td>
</tr>
<tr>
<td align="right"> </td>
<td>
<input type="submit" id="lgn_button" class="button" tabindex="3" accesskey="s" />
</td>
Does anyone know how to go about this?
Thanks
Yes, you use mechanize, which a sort of a "webbrowser" for Python. With it you can easily open web pages, find forms, fill in form values and submit the forms from Python. I use it (via Zopes testbrowser module) for testing web applications.
Use urllib2 and create a POST request.
For more information, read:
urllib2: submitting a form and then redirecting
How to make python urllib2 follow redirect and keep post method
How do I send a HTTP POST value to a (PHP) page using Python?.