I pass the Query string through a form like this:
<form action="stockChart" autocomplete="off" method="GET">
<input type="text" required="required" name="Ticker" maxlength="5">
</form>
and then it redirects me to the page with all the data corresponding to the input and puts my input in the url /stockChart?Ticker=AAPL
views.py
def stockChart(request):
TICKER = request.GET['Ticker'].upper()
But if I go to another tab where I also want to use the same ticker it doesn't work, since the URL doesn't have the query string in it.
Right now I'm using TICKER = request.session['Ticker'] but by doing that the URL doesn't contain the query string. Is there a way to keep the string (?Ticker?AAPL) in the url, when navigating to other pages?
Not 100% sure what "if I go to another tab" means.
By assuming you're accessing URL /stockChart in another tab and you want it to show your last input Ticker, you could do this in your view:
if request.GET['Ticker'] has a value, save it to request.session['Ticker'] and display page
if is missing request.GET['Ticker'] and request.session['Ticker'] has data, redirect to '/stockChart?Ticker={}'.format(request.session['Ticker'])
Related
i want to fill in a form from a website using following code :
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("Web page url")
browser.follow_link("login")
browser.get_url()
browser.select_form('div[class="p30"]')
browser.get_current_form().print_summary()
>>> <input class="form-input" id="mail" type="text"/>
>>> <input class="form-input" id="pass" type="password"/>
as you can see .print_summary() return exact fields that i want to assign values to, but there is no attribute "name" for any of them so i can't change it.
I've read Mechanicalsoup tutorial and the form in it has that attribute "name":
<input name="custname"/>
<input name="custtel" type="tel"/>
<input name="custemail" type="email"/>
and it can simply be changed using:
browser["custname"] = "Me"
browser["custtel"] = "00 00 0001"
browser["custemail"] = "nobody#example.com"
i'm new to mechincalsoup so any help is greatly appreciated.
The mechanicalsoup Q&A section has specificly answered your question:
If you believe you are using MechanicalSoup correctly, but form
submission still does not behave the way you expect, the likely
explanation is that the page uses JavaScript to dynamically generate
response content when you submit the form in a real browser. A common
symptom is when form elements are missing required attributes (e.g. if
form is missing the action attribute or an input is missing the name
attribute).
In such cases, you typically have two options:
If you know what content the server expects to receive from form
submission, then you can use MechanicalSoup to manually add that
content using, i.e., new_control(). This is unlikely to be a reliable
solution unless you are testing a website that you own.
2.Use a tool
that supports JavaScript, like Selenium.
I have a form like following:
url = "http:/foo.com"
<table>
<form action="showtree.jsp" method="post" id="form" name="form">
<input type="hidden" id="sortAction" name="sortAction" value="">
<tr>
<td style="text-align: right;font-weight:bold;">State: </td>
<td><select name="state">
<option value="ca">CA</option>
<option value="or">OR</option>
<option value="al">AL</option>
</select></td>
</tr>
<tr>
<td style="text-align: right;font-weight:bold;">Populartion: </td>
<td><select id="pop" name="population" onchange="disableShowOnAll()">
<option value="100">100</option>
<option value="200">200</option>
<option value="300">300</option>
</select></td>
</tr>
<tr>
<td></td>
<td>
<button id="showbutton" class="btn btn-default" onclick="submitForm('show')">Show Tree
</button>
</td>
</tr>
</form>
So, basically the form has two options, State and Population and each has some options.. The idea is to select the options from the form and then submit.
On submit, the results are displayed in the same page..
So, basically how do i submit this post request in python...and then get the results (when the submit is pressed.. and the page is refreshed with the results?)
Let me know if this makes sense?
Thanks
What you're trying to do is submit a POST request to http://example.com/showtree.jsp
Using the requests library (recommended)
Reference: http://docs.python-requests.org/en/master/
The requests library greatly simplifies making HTTP requests, but is an extra dependency
import requests
# Create a dictionary containing the form elements and their values
data = {"state": "ca", "population": 100}
# POST to the remote endpoint. The Requests library will encode the
# data automatically
r = requests.post("http://example.com/showtree.js", data=data)
# Get the raw body text back
body_data = r.text
Using the inbuilt urllib
Relevant answer here: Python - make a POST request using Python 3 urllib
from urllib import request, parse
# Create a dictionary containing the form elements and their values
data = {"state": "ca", "population": 100}
# This encodes the data to application/x-www-form-urlencoded format
# then converts it to bytes, needed before using with request.Request
encoded_data = parse.urlencode(data).encode()
# POST to the remote endpoint
req = request.Request("http://example.com/showtree.js", data=encoded_data)
# This will contain the response page
with request.urlopen(req) as resp:
# Reads and decodes the body response data
# Note: You will need to specify the correct response encoding
# if it is not utf-8
body_data = resp.read().decode('utf-8')
Edit: Addendum
Added based on t.m.adam's comment, below
The above examples are a simplified way of submitting a POST request to most URI endpoints, such as APIs, or basic web pages.
However, there are a few common complications:
1) There are CSRF tokens
... or other hidden fields
Hidden fields will still be shown in the source code of a <form> (e.g. <input type="hidden" name="foo" value="bar">
If the hidden field stays the same value on every form load, then just include it in your standard data dictionary, i.e.
data = {
...
"foo": "bar",
...
}
If the hidden field changes between page loads, e.g. a CSRF token, you must load the form's page first (e.g with a GET request), parse the response to get the value of the form element, then include it in your data dictionary
2) The page needs you to be logged in
...or some other circumstance that requires cookies.
Your best approach is to make a series of requests, to go through the steps needed before you would normally use the target page (e.g. submitting a POST request to a login form)
You will require the use of a "cookie jar". At this point I really start recommending the requests library; you can read more about cookie handling here
3) Javascript needs to be run on the target form
Occasionally forms require Javascript to be run before submitting them.
If you're unlucky enough to have such a form, unfortunately I recommend that you no longer use python, and switch to some kind of headless browser, like PhantomJS
(It is possible to control PhantomJS from Python, using a library like Selenium; but for simple projects it is likely easier to work directly with PhantomJS)
I am trying to web-scrape some elements and their values off a page with Python; However, to get more elements, I need to simulate a click on the next button. There is a post back tied to these buttons, so I am trying to call it. Unfortunately, Python is only printing the same values over and over again [meaning the post back for the next button isn't being called]. I am using requests to do my POST/GET.
import re
import time
import requests
TARGET_GROUP_ID = 778092
SESSION = requests.Session()
REQUEST_HEADERS = {"Accept-Encoding": "gzip,deflate"}
GROUP_URL = "http://roblox.com/groups/group.aspx?gid=%d"%(TARGET_GROUP_ID)
POST_BUTTON_HTML = 'pagerbtns next'
EVENTVALIDATION_REGEX = re.compile(r'id="__EVENTVALIDATION" value="(.+)"').search
VIEWSTATE_REGEX = re.compile(r'id="__VIEWSTATE" value="(.+)"').search
VIEWSTATEGENERATOR_REGEX = re.compile(r'id="__VIEWSTATEGENERATOR" value="(.+)"').search
TITLE_REGEX = re.compile(r'<a id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_ctrl\d+_hlAvatar".*?title="(\w+)".*?ID=(\d+)"')
page = SESSION.get(GROUP_URL, headers = REQUEST_HEADERS).text
while 1:
if POST_BUTTON_HTML in page:
for (ids,names) in re.findall(TITLE_REGEX, page):
print ids,names
postData = {
"__EVENTVALIDATION": EVENTVALIDATION_REGEX(page).group(1),
"__VIEWSTATE": VIEWSTATE_REGEX(page).group(1),
"__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR_REGEX(page).group(1),
"__ASYNCPOST": True,
"ct1000_cphRoblox_rbxGroupRoleSetMembersPane_currentRoleSetID": "4725789",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox": "3"
}
page=SESSION.post(GROUP_URL, data = postData, stream = True).text
time.sleep(2)
How can I properly call the post back in ASP.NET from Python to fix this issue? As stated before, it's only printing out the same values each time.
This is the HTML Element of the button
<a class="pagerbtns next" href="javascript:__doPostBack('ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00','')"> </a>
And this is the div it is in:
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_MembersPagerPanel" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton')">
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_Div1" class="paging_wrapper">
Page <input name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox" type="text" value="1" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_PageTextBox" class="paging_input"> of
<div class="paging_pagenums_container">125</div>
<input type="submit" name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton" value="" onclick="loading('members');" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton" class="pagerbtns translate" style="display:none;">
</div>
</div>
I was thinking of using a JS library and executing the JS __postback method, however, I would like to first see if this can be achieved in pure Python.
Yes it should be achievable you just have to submit correct values on correct fields. But i assume web page you are trying parse uses asp.net web forms so it should be really time consuming to find values and such. I suggest you to look into selenium with that you can easily call click and events on a webpage without writing so much code.
driver = webdriver.Firefox()
driver.get("http://site you are trying to parse")
driver.find_element_by_id("button").click()
//then get the data you want
Suppose I have a form like the following with some hidden input:
<form id="myForm" method="post" action="http://www.X.Y/index.php?page=login>
<input type="hidden" name="Hidden1" value="" />
<input type="hidden" name="Hidden2" value="abcdef" />
<input type="hidden" name="Hidden3" value="1234" />
<input type="text" name="firstTextBox" value=""/>
<input type="button" name="clickButton" value="OK"/>
</form>
I would run a python POST request via:
import requests
s = requests.Session()
postRequest = {'Hidden1': '',
'Hidden2': 'abcdef',
'Hidden3': '',
'firstTextBox': 'Typed in first text box',
'clickButton': 'OK'
}
s.auth = HttpNtlmAuth(username, password)
s.post(url, data=manufacturingRequest)
My question is, did I HAVE to include the hidden inputs in the postRequest dictionary? Can you submit a POST request if you omit elements with a type attribute value of "hidden"?
What's the purpose of websites even having hidden inputs if their values are set to EMPTY string, such as the Hidden1 element in the myForm example above.
EDIT - Second Half
After doing a bit of research, I noticed that some hidden elements had different values each time I visited the page
i.e.
<input type="hidden" name="__REQUESTDIGEST" id="__REQUESTDIGEST" value="0xEB8842A77FE88CA990D2EA0D4BAA0392C13FCEF3DCF3250EBF575B90C03BFBC9AD4D61180DA81DF7B09144BBB04BBFF1565C2ADEE650CCC3D81B149034E711A4,18 Sep 2013 19:48:18 -0000" />
has a time stamp
as well as
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="SOME+VERY+LONG+RSA+STRING">
which had some RSA-like string as its value
Turns out the site gives me a 200 error code if I try to submit a POST with different values than these. Are these extra security measures for the website?
..and IF SO, how can I programmatically send POST requests, accounting for the changing element values?
did I HAVE to include the hidden inputs in the postRequest dictionary?
No, you do not have to include the hidden inputs. There is no law, treaty, or standard that requires you to include any particular input elements.
On the other hand, if you fail to include them, then you are doing something different than a browser would do, and the website might take notice of that.
Can you submit a POST request if you omit elements with a type attribute value of "hidden"?
Yes, you can. You can also omit elements with a type of text or button. How the website responds is entirely up to it.
What's the purpose of websites even having hidden inputs if their values are set to EMPTY string,
The purposes of the website developer is really up to them. You might ask the developers of the website that you are trying to submit to.
One possible purpose is to identify which form is doing the submission.
Are these extra security measures for the website?
Again, ask the owners of the website. It might be security, it might be session management, or it might carry your preferences.
..and IF SO, how can I programmatically send POST requests, accounting for the changing element values?
Fetch the page that contains the form, parse that page, and submit the form with the indicated form variables.
I want to click a button with python, the info for the form is automatically filled by the webpage. the HTML code for sending a request to the button is:
INPUT type="submit" value="Place a Bid">
How would I go about doing this?
Is it possible to click the button with just urllib or urllib2? Or will I need to use something like mechanize or twill?
Use the form target and send any input as post data like this:
<form target="http://mysite.com/blah.php" method="GET">
......
......
......
<input type="text" name="in1" value="abc">
<INPUT type="submit" value="Place a Bid">
</form>
Python:
# parse the page HTML with the form to get the form target and any input names and values... (except for a submit and reset button)
# You can use XML.dom.minidom or htmlparser
# form_target gets parsed into "http://mysite.com/blah.php"
# input1_name gets parsed into "in1"
# input1_value gets parsed into "abc"
form_url = form_target + "?" + input1_name + "=" + input1_value
# form_url value is "http://mysite.com/blah.php?in1=abc"
# Then open the new URL which is the same as clicking the submit button
s = urllib2.urlopen(form_url)
You can parse the HTML with HTMLParser
And don't forget to urlencode any post data with:
urllib.urlencode(query)
You may want to take a look at IronWatin - https://github.com/rtyler/IronWatin to fill the form and "click" the button using code.
Using urllib.urlopen, you could send the values of the form as the data parameter to the page specified in the form tag. But this won't automate your browser for you, so you'd have to get the form values some other way first.