I want to click a button with python, the info for the form is automatically filled by the webpage. the HTML code for sending a request to the button is:
INPUT type="submit" value="Place a Bid">
How would I go about doing this?
Is it possible to click the button with just urllib or urllib2? Or will I need to use something like mechanize or twill?
Use the form target and send any input as post data like this:
<form target="http://mysite.com/blah.php" method="GET">
......
......
......
<input type="text" name="in1" value="abc">
<INPUT type="submit" value="Place a Bid">
</form>
Python:
# parse the page HTML with the form to get the form target and any input names and values... (except for a submit and reset button)
# You can use XML.dom.minidom or htmlparser
# form_target gets parsed into "http://mysite.com/blah.php"
# input1_name gets parsed into "in1"
# input1_value gets parsed into "abc"
form_url = form_target + "?" + input1_name + "=" + input1_value
# form_url value is "http://mysite.com/blah.php?in1=abc"
# Then open the new URL which is the same as clicking the submit button
s = urllib2.urlopen(form_url)
You can parse the HTML with HTMLParser
And don't forget to urlencode any post data with:
urllib.urlencode(query)
You may want to take a look at IronWatin - https://github.com/rtyler/IronWatin to fill the form and "click" the button using code.
Using urllib.urlopen, you could send the values of the form as the data parameter to the page specified in the form tag. But this won't automate your browser for you, so you'd have to get the form values some other way first.
Related
I pass the Query string through a form like this:
<form action="stockChart" autocomplete="off" method="GET">
<input type="text" required="required" name="Ticker" maxlength="5">
</form>
and then it redirects me to the page with all the data corresponding to the input and puts my input in the url /stockChart?Ticker=AAPL
views.py
def stockChart(request):
TICKER = request.GET['Ticker'].upper()
But if I go to another tab where I also want to use the same ticker it doesn't work, since the URL doesn't have the query string in it.
Right now I'm using TICKER = request.session['Ticker'] but by doing that the URL doesn't contain the query string. Is there a way to keep the string (?Ticker?AAPL) in the url, when navigating to other pages?
Not 100% sure what "if I go to another tab" means.
By assuming you're accessing URL /stockChart in another tab and you want it to show your last input Ticker, you could do this in your view:
if request.GET['Ticker'] has a value, save it to request.session['Ticker'] and display page
if is missing request.GET['Ticker'] and request.session['Ticker'] has data, redirect to '/stockChart?Ticker={}'.format(request.session['Ticker'])
I would like to submit a form on a webpage.
The page has however several forms :
<form method="post" action="https://mywebsite.com/pageA" id="order" class="order ajaxForm">
<input type="text" class="decimal" name="value" id="fieldA" value="0" />
</label>
</form>
<form method="post" action="https://mywebsite.com/pageB" id="previousorder" class="order ajaxForm">
<input type="text" class="decimal" name="value" id="fieldB" value="0" />
</label>
</form>
Is there an easy way to trigger a specific form using python & request ?
I'd go with some more advanced tools like mechanize or MechanicalSoup. The latter is actually based on requests internally (I assume you meant requests package by "request"). Both of these tools allow to "select a desired form" and then submit it specifying the required parameters.
For instance, submitting the order form with MechanicalSoup would look something like this:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://yourwebsite.com")
# Fill-in the order form
browser.select_form('#order')
browser["value"] = "100"
browser.submit_selected()
You have to look at the DevTools Network tab while posting a form.
Every form will have different request url and post parameters. Generally, what you will need to do with requests is something like that:
req = requests.post('https://mywebsite.com/pageB',
data = {'fieldB':'value_you_want_to_submit'})
But better first investigate it with DevTools.
Try something like this: (prob need to make some modifications but it will be close to what you want this example is for login form):
install lxml
import requests
from lxml import html
payload = {
"username": "<USER NAME>",
"password": "<PASSWORD>",
"csrfmiddlewaretoken": "<CSRF_TOKEN>"
}
sessionReq = requests.session()
login_url = "https://example.be/account/login.php"
result = sessionReq.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[#name='csrfmiddlewaretoken']/#value")))[0]
result = sessionReq.post(login_url,data = payload, headers = dict(referer = login_url)
url = 'https://bitbucket.org/dashboard/overview'
I hope this helps you :)
I have been working on a web scraping script that gets past the login page but I can't find a way to get this radio button selected as the script attempts to login.
Here's the button:
<input type="radio" name="UserType" id="UserType" value="PARENTSWEB-PARENT" tabindex="4">
When it is clicked it looks like this:
<input type="radio" name="UserType" id="UserType" value="PARENTSWEB-PARENT" tabindex="4" checked="checked">
I am using a dictionary to submit the username and password but not sure how to add the button functionality into it.
Dict:
payload={
'username':USERNAME,
'password':PASSWORD,
'DistrictCode':DistCode,
'PARENTSWEB-STUDENT':'checked'
}
I am using the requests library and lxml to submit and look at the data. But if there is a better library or another one I can also use I'm open to anything.
I am trying to web-scrape some elements and their values off a page with Python; However, to get more elements, I need to simulate a click on the next button. There is a post back tied to these buttons, so I am trying to call it. Unfortunately, Python is only printing the same values over and over again [meaning the post back for the next button isn't being called]. I am using requests to do my POST/GET.
import re
import time
import requests
TARGET_GROUP_ID = 778092
SESSION = requests.Session()
REQUEST_HEADERS = {"Accept-Encoding": "gzip,deflate"}
GROUP_URL = "http://roblox.com/groups/group.aspx?gid=%d"%(TARGET_GROUP_ID)
POST_BUTTON_HTML = 'pagerbtns next'
EVENTVALIDATION_REGEX = re.compile(r'id="__EVENTVALIDATION" value="(.+)"').search
VIEWSTATE_REGEX = re.compile(r'id="__VIEWSTATE" value="(.+)"').search
VIEWSTATEGENERATOR_REGEX = re.compile(r'id="__VIEWSTATEGENERATOR" value="(.+)"').search
TITLE_REGEX = re.compile(r'<a id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_ctrl\d+_hlAvatar".*?title="(\w+)".*?ID=(\d+)"')
page = SESSION.get(GROUP_URL, headers = REQUEST_HEADERS).text
while 1:
if POST_BUTTON_HTML in page:
for (ids,names) in re.findall(TITLE_REGEX, page):
print ids,names
postData = {
"__EVENTVALIDATION": EVENTVALIDATION_REGEX(page).group(1),
"__VIEWSTATE": VIEWSTATE_REGEX(page).group(1),
"__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR_REGEX(page).group(1),
"__ASYNCPOST": True,
"ct1000_cphRoblox_rbxGroupRoleSetMembersPane_currentRoleSetID": "4725789",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox": "3"
}
page=SESSION.post(GROUP_URL, data = postData, stream = True).text
time.sleep(2)
How can I properly call the post back in ASP.NET from Python to fix this issue? As stated before, it's only printing out the same values each time.
This is the HTML Element of the button
<a class="pagerbtns next" href="javascript:__doPostBack('ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00','')"> </a>
And this is the div it is in:
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_MembersPagerPanel" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton')">
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_Div1" class="paging_wrapper">
Page <input name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox" type="text" value="1" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_PageTextBox" class="paging_input"> of
<div class="paging_pagenums_container">125</div>
<input type="submit" name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton" value="" onclick="loading('members');" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton" class="pagerbtns translate" style="display:none;">
</div>
</div>
I was thinking of using a JS library and executing the JS __postback method, however, I would like to first see if this can be achieved in pure Python.
Yes it should be achievable you just have to submit correct values on correct fields. But i assume web page you are trying parse uses asp.net web forms so it should be really time consuming to find values and such. I suggest you to look into selenium with that you can easily call click and events on a webpage without writing so much code.
driver = webdriver.Firefox()
driver.get("http://site you are trying to parse")
driver.find_element_by_id("button").click()
//then get the data you want
I have to test an internal web page.
The web page contains text fields, buttons, and radio buttons.
Based on a specific radio button selected another sub-form is displayed in the web page.
I'm using urllib2, and some of its modules, to successfully connect to the web server and perform some actions.
However, not able to select the radio button, via a POST from the python script, I'm not able to proceed with the test automation.
Reading some of the online posts about selecting radio buttons I read that some people are using "mechanize". I'm not familiar with this. Is there another specific module in urllib2 that would allow me to send a POST request to select a specific radio button.
Roland
They are referring to this python module: http://wwwsearch.sourceforge.net/mechanize/. You can emulate the selected radio button in the form by posting the form properly with urllib2. Discussed in this post: urllib2: submitting a form and then redirecting.
Imagine you have a form with a radio input like this:
<input type="radio" value="1" name="something" />
The post body would be: something=1.
If we have for example this HTML code:
<input type="radio" name="register" value="0" checked="checked"> "Yes it's my password"
then you must put in your payload:
payload["register"]="Yes it's my password"