I would like to submit a form on a webpage.
The page has however several forms :
<form method="post" action="https://mywebsite.com/pageA" id="order" class="order ajaxForm">
<input type="text" class="decimal" name="value" id="fieldA" value="0" />
</label>
</form>
<form method="post" action="https://mywebsite.com/pageB" id="previousorder" class="order ajaxForm">
<input type="text" class="decimal" name="value" id="fieldB" value="0" />
</label>
</form>
Is there an easy way to trigger a specific form using python & request ?
I'd go with some more advanced tools like mechanize or MechanicalSoup. The latter is actually based on requests internally (I assume you meant requests package by "request"). Both of these tools allow to "select a desired form" and then submit it specifying the required parameters.
For instance, submitting the order form with MechanicalSoup would look something like this:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://yourwebsite.com")
# Fill-in the order form
browser.select_form('#order')
browser["value"] = "100"
browser.submit_selected()
You have to look at the DevTools Network tab while posting a form.
Every form will have different request url and post parameters. Generally, what you will need to do with requests is something like that:
req = requests.post('https://mywebsite.com/pageB',
data = {'fieldB':'value_you_want_to_submit'})
But better first investigate it with DevTools.
Try something like this: (prob need to make some modifications but it will be close to what you want this example is for login form):
install lxml
import requests
from lxml import html
payload = {
"username": "<USER NAME>",
"password": "<PASSWORD>",
"csrfmiddlewaretoken": "<CSRF_TOKEN>"
}
sessionReq = requests.session()
login_url = "https://example.be/account/login.php"
result = sessionReq.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[#name='csrfmiddlewaretoken']/#value")))[0]
result = sessionReq.post(login_url,data = payload, headers = dict(referer = login_url)
url = 'https://bitbucket.org/dashboard/overview'
I hope this helps you :)
Related
Problem
I'm trying to make a scraping in a page using request's python lib, however I'm getting errors (Like Bad request or Method not allowed).
The page has two forms: one with get, and another one, with post (which I wish). I did pass values to text fields using 'data requests'.
I don't wanna pass an image for the form, just a text field.
I have six buttons in the form, for each button I have a different value.
HTML code
<form enctype="multipart/form-data" action="/page1" method="GET"> ... </form>
...
<form enctype="multipart/form-data" action="/page2" method="POST">
<input type="file" name="smiles_file">
<input type="text" name="smiles_str">
...
<button name="pred_type" type="submit" value="adme"> BT1 </button>
<button name="pred_type" type="submit" value="toxicity"> BT2 </button>
</form>
Python3 code
#imports
import requests
from bs4 import BeautifulSoup as bs
#commmon vars
url = 'www.exampleurl.com/site'
hd = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36"
}
dt = {
'smiles_str': 'CC(=O)OC1=CC=CC=C1C(=O)O',
'pred_type': 'adme'
}
#scraping
with requests.Session() as rs:
result = rs.get(url, data=dt, headers=hd)
print ("Code: %s\nHTML\n%s" % (result.status_code, result.text))
EDIT
Using get:
status_code: 405 (Method ... )
Using post:
status_code: 400 (Bad request)
I don't see a reference to /page1 nor /page2 in your example, but the rs.get should probably be using the named parameter params instead of data and should correspond to the first form URL, while for the second form URL you'd need to use the rs.post method, where using data is okay.
I think I found the answer. It seems that selenium does not work well on pages that work with js background. I'm using selenium, and I'm not having problems with it.
I have a form like following:
url = "http:/foo.com"
<table>
<form action="showtree.jsp" method="post" id="form" name="form">
<input type="hidden" id="sortAction" name="sortAction" value="">
<tr>
<td style="text-align: right;font-weight:bold;">State: </td>
<td><select name="state">
<option value="ca">CA</option>
<option value="or">OR</option>
<option value="al">AL</option>
</select></td>
</tr>
<tr>
<td style="text-align: right;font-weight:bold;">Populartion: </td>
<td><select id="pop" name="population" onchange="disableShowOnAll()">
<option value="100">100</option>
<option value="200">200</option>
<option value="300">300</option>
</select></td>
</tr>
<tr>
<td></td>
<td>
<button id="showbutton" class="btn btn-default" onclick="submitForm('show')">Show Tree
</button>
</td>
</tr>
</form>
So, basically the form has two options, State and Population and each has some options.. The idea is to select the options from the form and then submit.
On submit, the results are displayed in the same page..
So, basically how do i submit this post request in python...and then get the results (when the submit is pressed.. and the page is refreshed with the results?)
Let me know if this makes sense?
Thanks
What you're trying to do is submit a POST request to http://example.com/showtree.jsp
Using the requests library (recommended)
Reference: http://docs.python-requests.org/en/master/
The requests library greatly simplifies making HTTP requests, but is an extra dependency
import requests
# Create a dictionary containing the form elements and their values
data = {"state": "ca", "population": 100}
# POST to the remote endpoint. The Requests library will encode the
# data automatically
r = requests.post("http://example.com/showtree.js", data=data)
# Get the raw body text back
body_data = r.text
Using the inbuilt urllib
Relevant answer here: Python - make a POST request using Python 3 urllib
from urllib import request, parse
# Create a dictionary containing the form elements and their values
data = {"state": "ca", "population": 100}
# This encodes the data to application/x-www-form-urlencoded format
# then converts it to bytes, needed before using with request.Request
encoded_data = parse.urlencode(data).encode()
# POST to the remote endpoint
req = request.Request("http://example.com/showtree.js", data=encoded_data)
# This will contain the response page
with request.urlopen(req) as resp:
# Reads and decodes the body response data
# Note: You will need to specify the correct response encoding
# if it is not utf-8
body_data = resp.read().decode('utf-8')
Edit: Addendum
Added based on t.m.adam's comment, below
The above examples are a simplified way of submitting a POST request to most URI endpoints, such as APIs, or basic web pages.
However, there are a few common complications:
1) There are CSRF tokens
... or other hidden fields
Hidden fields will still be shown in the source code of a <form> (e.g. <input type="hidden" name="foo" value="bar">
If the hidden field stays the same value on every form load, then just include it in your standard data dictionary, i.e.
data = {
...
"foo": "bar",
...
}
If the hidden field changes between page loads, e.g. a CSRF token, you must load the form's page first (e.g with a GET request), parse the response to get the value of the form element, then include it in your data dictionary
2) The page needs you to be logged in
...or some other circumstance that requires cookies.
Your best approach is to make a series of requests, to go through the steps needed before you would normally use the target page (e.g. submitting a POST request to a login form)
You will require the use of a "cookie jar". At this point I really start recommending the requests library; you can read more about cookie handling here
3) Javascript needs to be run on the target form
Occasionally forms require Javascript to be run before submitting them.
If you're unlucky enough to have such a form, unfortunately I recommend that you no longer use python, and switch to some kind of headless browser, like PhantomJS
(It is possible to control PhantomJS from Python, using a library like Selenium; but for simple projects it is likely easier to work directly with PhantomJS)
Im trying to login to a website using python requests. My code is as follows:
url = "https://program.uffiliates.com/en/Auth/Login"
payload = {
'uerName': '',
'pasword': ''}
try:
with requests.Session() as s:
r = s.post(url, data=payload)
print(r.text)
except requests.exceptions.RequestException as e:
print(e)
return
This doesnt work, it prints out the html-code of the login page and not what i should see when im logged in. I assume im using the wrong names of the formular. In the html code i find the following:
<form action="/en/Auth/LogInAction" method="post">
<input id="hiddenUrl" name="hiddenUrl" type="hidden" value="" />
<input id="SST_ID" name="SST_ID" type="hidden" value="2" />
<input id="serial" name="serial" type="hidden" value="" />
<input id="referer" name="referer" type="hidden" value="" />
This confuses me. Can someone tell me what names i have to use for the "payload" to put in my username and password.
And after logging in how can i navigate through the backend? Should i just use requests.get with the specific url since im already logged in, or should i somehow use requests to click on buttons/links to navigate?
Thanks alot!
If I understand your question correctly, you need to POST your username and password via requests and be logged in.
first of all, the url you are posting the data to should be the form's action. not the login url.
secondly, there are a few hidden inputs in the URL, which I suspect the server will look at to confirm that the request is coming from an HTML that it recognises. You will need the inputs in your requests too.
try
url = "https://program.uffiliates.com/en/Auth/LogInAction"
payload = {
'uerName': '',
'pasword': '',
'hiddenUrl':'',
'SST_ID' : '',
'serial':'',
'referer':''}
try:
with requests.Session() as s:
r = s.post(url, data=payload)
print(r.text)
except requests.exceptions.RequestException as e:
print(e)
return
I use Requests (2.2.1) to login a url http://tx3.netease.com/logging.php?action=login, but the login logic of this url is different from Django's csrf token mechanism, that is:
When you GET this url, there is two import values formhash and sts in html text, both of which will be used in a js function do_encrypt (in file http://tx3.netease.com/forumdata/cache/rsa/rsa_min.js). This is fine, I can easily grab them via re.
The key part of html text is:
<form method="post" name="login" id="loginform" class="s_clear" onsubmit="do_encrypt('ori_password','password');pwdclear = 1;" action="logging.php?action=login&loginsubmit=yes">
<input type="hidden" name="formhash" value="91e54489" />
<input type="hidden" name="referer" value="http://tx3.netease.com/" />
<input type="hidden" name="sts" id="sts" value="1409414053" />
<input type="hidden" name="password" id="password" />
...
<input type="password" id="ori_password" name="ori_password" onfocus="clearpwd()" onkeypress="detectCapsLock(event, this)" size="36" class="txt" tabindex="1" autocomplete="off" />
...
</form>
2. After entering email and original password ori_password, clicking submit button will call do_encrypt, which will use formhash, sts and ori_password to set the real password password for the post dict. Problem comes out -- There seems no way to get password string directly. (For contrast, you can directly get csrfmiddlewaretoken from session_client.cookies['csrftoken'] in Django case)
This is the code:
import requests
import json
import re
loginUrl = "http://tx3.netease.com/logging.php?action=login"
client = requests.session()
r = client.get(loginUrl)
r.encoding='gb18030'
stsPat = re.compile('<input type="hidden" name="sts" id="sts" value="(\d+?)" />')
formhashPat = re.compile('<input type="hidden" name="formhash" value="([\d\w]+?)" />')
sts = stsPat.search(r.text).groups()[0]
formhash = formhashPat.search(r.text).groups()[0]
loginData={
'username' : "smaller9#163.com",
'password' : ..., # Set by js function do_encrypt
'referer':'/',
'loginfield':'username',
'ori_password':'', # it's `111111`, but `do_encrypt` will set it to empty.
'loginsubmit':'true',
'sts':sts,
'formhash':formhash,
}
# r = client.post(url=loginUrl,data=loginData)
Assuming you have permission to do so, try logging in with selenium as i think that will be more inline with what you are ultimately trying to do.
from selenium import webdriver
USERNAME = "foo#bar.com"
PASSWORD = "superelite"
# create a driver
driver = webdriver.Firefox()
# get the homepage
driver.get("http://tx3.netease.com/logging.php?action=login")
un_elm = driver.find_element_by_id("username")
pw_elm = driver.find_element_by_id("ori_password")
submit = driver.find_element_by_css_selector("[name=loginsubmit]")
un_elm.send_keys(USERNAME)
pw_elm.send_keys(PASSWORD)
# click submit
submit.click()
# get the PHPSESSID cookie as that has your login data, if you want to use
# it elsewhere
# print driver.get_cookies():
# do something else ...
I have a critical issue. I would like integrate my application with another much older application. This service is simply a web form, probably behind a framework (I think ASP Classic maybe). I have an action URL, and I have the HTML code for replicating this service.
This is a piece of the old service (the HTML page):
<FORM method="POST"
url="https://host/path1/path2/AdapterHTTP?action_name=myactionWebAction&NEW_SESSION=true"
enctype="multipart/form-data">
<INPUT type="text" name="AAAWebView-FormAAA-field1" />
<INPUT type="hidden" name="AAAWebView-FormAAA-field2" value="" />
<INPUT type="submit" name="NAV__BUTTON__press__AAAWebView-FormAAA-enter" value="enter" />
</FORM>
My application should simulate form submission of this old application from code-behind with Python. For now, I didn't have so much luck.
For now I do this
import requests
payload = {'AAAWebView-FormAAA-field1': field1Value, \
'AAAWebView-FormAAA-field2': field2Value, \
'NAV__BUTTON__press__AAAWebView-FormAAA-enter': "enter"
}
url="https://host/path1/path2/AdapterHTTP?action_name=myactionWebAction&NEW_SESSION=true"
headers = {'content-type': 'multipart/form-data'}
r = requests.post(url, data=payload, headers=headers)
print r.status_code
I receive a 200 HTTP response code, but if I click on submit button on the HTML page, the action saves the values, but my code does not do the same. How do I fix this problem?
The owner of an old application sent me this Java exception log. Any ideas?
org.apache.commons.fileupload.FileUploadException: the request was rejected because no multipart boundary was found
Try passing an empty dictionary as files with requests.post. This will properly construct a request with multipart boundary I think.
r = requests.post(url, data=payload, headers=headers, files={})