Python requests module not posting to certain input fields - python

I'm trying to data scrape from a website behind a login screen, and I've run into a problem with posting parts of the login info with the post() method from python's requests module.
I've gotten the names of each HTML input field that needs to be filled in and placed them in a dictionary along with their required value, and then passed that dictionary to the post() method.
The HTML from the login page:
<input name="ctl00$ContentPlaceHolder1$TextBox1" type="text" value="" id="ContentPlaceHolder1_TextBox1" tabindex="1" class="form-control " placeholder="username" required="">
<input name="ctl00$ContentPlaceHolder1$TextBox2" type="password" id="ContentPlaceHolder1_TextBox2" tabindex="2" class="form-control" placeholder="password" required="" value="">
Then, using the name value to create the dictionary that's passed to post()
formData = {
"ctl00$ContentPlaceHolder1$TextBox1": "FakeUsername",
"ctl00$ContentPlaceHolder1$TextBox2": "FakePassword"
}
r = session.get(loginUrl) # get cookies necessary for login
r = session.post(loginUrl, data=formData)
This works properly for the username field, but it does not post the password in the password field. If I read the HTML from the login page after posting the data, I get:
<input name="ctl00$ContentPlaceHolder1$TextBox1" type="text" value="FakeUsername" id="ContentPlaceHolder1_TextBox1" tabindex="1" class="form-control " placeholder="username" required="" />
<input name="ctl00$ContentPlaceHolder1$TextBox2" type="password" id="ContentPlaceHolder1_TextBox2" tabindex="2" class="form-control" placeholder="password" required="" />
The "value" parameter of the password input field is no longer listed, not even as an empty parameter. Attempting a login after this of course does not work.
I have been unable to figure out why this is happening. I've made sure to fill in any hidden input fields (EVENTVALIDATION, VIEWSTATE, etc.) and have also
looked at the webpage headers, but have still had no luck.
The website I'm trying to log in to is:
https://panel.forcad.org/Default.aspx
I would really appreciate help figuring out what is going wrong.

You said you looked at the headers, but you should be able to replicate the browser behavior with request headers and cookies. Try copying the exact params for and cookies on a known successful login. So you can narrow it down if you can even use requests to send the data it already wants. Maybe it has some JS tricks, or does some stuff requests can not do, if you can't re-login with valid cookies. In that case, more reverse engineeering, or try selenium. pyvirtualdisplay can hide the browser and can use JS to stop() loading of the page

Related

(BadRequestKeyError) Don't know where the problem is

Im trying to make a sign up page and on the page there are two password fields, one for the password itself and the second to confirm the password (and of course a username):
HTML:
<div class="fields">
<input id="usr" type="text" name="username" placeholder="Username" required>
<input id="pass" type="password" name="password" placeholder="Password" required>
<input id="confirmpass" type="password" name="confirm_password" placeholder="Confirm password" required>
</div>
I know what the error is and means, it's a KeyError meaning that it can't find the key i've passed into requests.form, most cases of these errors are misspellings so I checked the spelling multiple times and even copy and pasted the same string.
My problem is that I don't know why the third field^ isn't in requests.form, maybe it's because I have two password types? But I haven't seen anything anywhere saying that it's not allowed.
Error:
File ...
if request.form["confirm_password"]==request.form["password"]
File ...
raise exceptions.BadRequestKeyError(key)
werkzeug.exceptions.BadRequestKeyError: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
The error gets raised before the webpage even loads up by the way, not when I submit the form.
Python:
# Accounts = JSON File
#app.route("/signup",methods=["GET","POST"])
def signup():
if request.method=="GET":
if request.form["confirm_password"]==request.form["password"]: # Where the error traces back to
if request.form["username"]not in Accounts.keys():
Accounts[request.form["username"]]=Accounts["Default"]
Accounts[request.form["username"]]["Password"]=request.form["password"]
redirect(url_for("login",name=request.form["username"]))
else:return render_template(Signup,valid="Username already taken",name=request.form["username"])
else:return render_template(Signup,valid="Password confirmation does not match password",name=request.form["username"])
else:return render_template(Signup)
My login page works perfectly it's just this.
The way you've set up your route indicates that you're expecting to receive a POST request. So your code is currently incorrectly expecting a GET request.
The quick fix is to change the first line of the route to:
def signup():
if request.method == "POST":
# ...

Strange PHP form post

So I'm writing a web crawler to batch download PDFs from my university's website, as I don't fancy downloading them one by one.
I've got most the code working, using the 'requests' module. The issue is, you have to be signed in to a university account to access the PDFs, so I've set up requests to use cookies to sign into my university account before downloading the PDFs, however the HTML form to sign in on the university page is rather peculiar.
I've abstracted the HTML which can be found here:
<form action="/login" method="post">
<fieldset>
<div>
<label for="username">Username:</label>
<input id="username" name="username" type="text" value="" />
<label for="password">Password:</label>
<input id="password" name="password" type="password" value=""/>
<input type="hidden" name="lt" value="" />
<input type="hidden" name="execution" value="*very_long_encrypted_code*" />
<input type="hidden" name="_eventId" value="submit" />
<input type="submit" name="submit" value="Login" />
</div>
</fieldset>
</form>
Firstly the action parameter in the form does not reference a PHP file which I don't understand. Is action="/login" referencing the page itself, or http://www.blahblah/login/login? (the HTML is taken from the page http://www.blahblah/login.
Secondly, what's with all the 'hidden' inputs? I'm not sure how this page is taking the given login data and passing it to a PHP script.
This has led to the failure of the requests sign on in my python script:
import requests
user = input("User: ")
passw = input("Password: ")
payload = {"username" : user, "password" : passw}
s = requests.Session()
s.post(loginURL, data = payload)
r = s.get(url)
I would have thought this would take the login data and sign me into the page, but r is just assigned the original logon page. I'm assuming it's to do with the strange PHP interation in the HTML. Any ideas what I need to change?
EDIT: Thought I'd also mention there is no javascript on the page at all. Purely HTML & CSS
What you are looking at is likely a CSRF token
The linked answer is very good, but a summary is, these tokens used to make sure that you can't send malicious requests to a site from another page in your web browser. In this case it is a bit silly, because logging in has no consequences. It was likely added automatically by the framework your university website uses.
You will have to extract this token from the login page before doing your login POST and then include it with your data.
The full steps would be the following:
Fetch the login page
extract the token with e.g. BeautifulSoup or requests-html
Send the login request:
payload = {"username" : user, "password" : passw, "execution": token}

web scraping a webpage which has dynamic contents loaded via ajax

Say I wish to scrape products on this page(http://shop.coles.com.au/online/national/bread-bakery/fresh/bread#pageNumber=2&currentPageSize=20)
But the products is loaded from a post request. A lot of posts here suggest to simulate a request to get dynamic contents, but in my case the Form Data is unknown for me, i.e. catalogId, categoryId.
I'm wondering is it possible to get the response after the ajax call is finished?
You can get the catalogId and other parameter values needed to make the POST request from the form with id="search":
<form id="search" name="search" action="http://shop.coles.com.au/online/SearchDisplay?pageView=image&catalogId=10576&beginIndex=0&langId=-1&storeId=10601" method="get" role="search">
<input type="hidden" name="storeId" value="10601" id="WC_CachedHeaderDisplay_FormInput_storeId_In_CatalogSearchForm_1">
<input type="hidden" name="catalogId" value="10576" id="WC_CachedHeaderDisplay_FormInput_catalogId_In_CatalogSearchForm_1">
<input type="hidden" name="langId" value="-1" id="WC_CachedHeaderDisplay_FormInput_langId_In_CatalogSearchForm_1">
<input type="hidden" name="beginIndex" value="0" id="WC_CachedHeaderDisplay_FormInput_beginIndex_In_CatalogSearchForm_1">
<input type="hidden" name="browseView" value="false" id="WC_CachedHeaderDisplay_FormInput_browseView_In_CatalogSearchForm_1">
<input type="hidden" name="searchSource" value="Q" id="WC_CachedHeaderDisplay_FormInput_searchSource_In_CatalogSearchForm_1">
...
</form>
Use the FormRequest to submit this form.
I'm wondering is it possible to get the response after the ajax call is finished?
Scrapy is not a browser - it does not make additional AJAX requests to load the page and there is nothing built-in to execute JavaScript. You may look into using a real browser and solve it on a higher level - look into selenium package. There is also the related scrapy-splash project.
See also:
selenium with scrapy for dynamic page

Login using Python in basic HTML form [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python: How do you login to a page and view the resulting page in a browser?
I wanted to know how I can perform login's on pages like http://www.deshabhimani.com/signin.php which has a php-based login prompt using python. This form is used to login to http://www.deshabhimani.com/epaper.php
The site does not provide a HTTP API.
I want to later use python to download all the pages of the epaper(which are individual) and then make it into a final one file pdf.
The file which I want to download is http://www.deshabhimani.com/epaper.php?page=43210&ddate=27-07-2012&edition=Kochi which is only accessible by logging in
well first of all check the page code ,
to know what kind of method so send a data , and the username and password name
<form action="signin.php" method="post" name="log_in" id="log_in" onsubmit="return login()">
<label for="name">User Name:</label><br>
<input type="text" maxlength="80" size="25" id="username" name="username" style="border:1px dotted #1a64a3; margin-bottom:10px">
<label for="email">Password:</label><br>
<input type="password" maxlength="80" size="25" id="password" name="password" style="border:1px dotted #1a64a3">
<input type="submit" name="submit" value="Login" style="background:url(images/submit.gif) no-repeat; width:59px; height:22px; color:#FFFFFF; padding-bottom:3px">
</form>
as you see from above , first we scope to the form ,to see what kind of method and what is the name of fileds
so let's handle it in python
import urllib
login_data=urllib.urlencode({'username':'your username','password':'your password','submit':'Login'}) # replace username and password with filed name
op = urllib.urlopen('www.exmaple.com/sign-in.php',login_data)

Using Urllib instead of action in post form

I need to allow users to upload content directly to Amazon S3. This form works:
<form action="https://me.s3.amazonaws.com/" method="post" enctype='multipart/form-data' class="upload-form">{% csrf_token %}
<input type="hidden" name="key" value="videos/test.jpg">
<input type="hidden" name="AWSAccessKeyId" value="<access_key>">
<input type="hidden" name="acl" value="public-read">
<input type="hidden" name="policy" value="{{policy}}">
<input type="hidden" name="signature" value="{{signature}}">
<input type="hidden" name="Content-Type" value="image/jpeg">
<input type="submit" value="Upload" name="upload">
</form>
And in the function, I define policy and signature. However, I need to pass two variables to the form -- Content-Type and Key, which will only be known when the user presses the upload button. Thus, I need to pass these two variables to the template after the POST request but before the re-direction to Amazon.
It was suggested that I use urllib to do this. I have tried doing so the following way, but I keep getting an inscrutable HTTPError. This is what I currently have:
if request.method == 'POST':
# define the variables
urllib2.urlopen("https://me.amazonaws.com/",
urllib.urlencode([('key','videos/test3.jpg'),
('AWSAccessKeyId','<access_key'),
('acl','public-read'),
('policy',policy),
('signature',signature),
('Content-Type',content_type),
('file',file)]))
I have also tried hardcoding all the values instead of using variables but still get the same error. What am I doing incorrectly and what do I need to change to be able to redirect the form to Amazon, so the content can be uploaded directly to Amazon?
I recommend watching the form do its work with Firebug, enabled and set to the Net tab.
After completing the POST, click its [+] icon to expand, study the Headers, POST, Response tabs to see what you are missing and/or doing wrong.
Next separate this script from Django and put into a standalone file. Add one thing at a time to it and retest until it works. The lines below should increase visibility into your script.
import httplib
httplib.HTTPConnection.debuglevel = 1
I tried poking around with urllib myself, but as I don't have an account on AWS I didn't get farther than getting a 400 Bad Request response. Seems like a good sign, probably I just need valid host and key params etc.

Categories