Submit form and print page with python requests - python

I have been reading a lot on how to submit a form with python and then read and scrap the obtained page. However I do not manage to do it in the specific form I am filling. My code returns the html of the form page. Here is my code :
import requests
values = {}
values['archive'] = "1"
values['descripteur[]'] = ["mc82", "mc84"]
values['typeavis[]'] = ["10","6","7","8","9"]
values['dateparutionmin'] = "01/01/2015"
values['dateparutionmax'] = "31/12/2015"
req = requests.post('https://www.boamp.fr/avis/archives', data=values)
print req.text
Any suggestion appreciated.
req.text looks like :

You may post data to a wrong page. I access the url and post one, then i found the post data is sent to https://www.boamp.fr/avis/liste. (sometime fiddler is useful to figure out the process)
So your code should be this
req = requests.post('https://www.boamp.fr/avis/liste', data=values)

Related

Python post request not working

import requests
url = "https://stackoverflow.com/"
payload = {"q": "python"}
s = requests.session()
r = s.post(url, data=payload)
print r.text
I wish to use a post request in order to obtain the subsequent webpage. However, the above code prints the source code of the home page and not the the next page. Can someone tell me what I should do to obtain the source code of the next page? I have searched through many questions on StackOverflow related to this and haven't found a solution.
Thanks in advance.

Using requests function in python to submit data to a website and call back a response

I am trying to use the requests function in python to post the text content of a text file to a website, submit the text for analysis on said website, and pull the results back in to python. I have read through a number of responses here and on other websites, but have not yet figured out how to correctly modify the code to a new website.
I'm familiar with beautiful soup so pulling in webpage content and removing HTML isn't an issue, its the submitting the data that I don't understand.
My code currently is:
import requests
fileName = "texttoAnalyze.txt"
fileHandle = open(fileName, 'rU');
url_text = fileHandle.read()
url = "http://www.webpagefx.com/tools/read-able/"
payload = {'value':url_text}
r = requests.post(url, payload)
print r.text
This code comes back with the html of the website, but hasn't recognized the fact that I'm trying to a submit a form.
Any help is appreciated. Thanks so much.
You need to send the same request the website is sending, usually you can get these with web debugging tools (like chrome/firefox developer tools).
In this case the url the request is being sent to is: http://www.webpagefx.com/tools/read-able/check.php
With the following params: tab=Test+by+Direct+Link&directInput=SOME_RANDOM_TEXT
So your code should look like this:
url = "http://www.webpagefx.com/tools/read-able/check.php"
payload = {'directInput':url_text, 'tab': 'Test by Direct Link'}
r = requests.post(url, data=payload)
print r.text
Good luck!
There are two post parameters, tab and directInput:
import requests
post = "http://www.webpagefx.com/tools/read-able/check.php"
with open("in.txt") as f:
data = {"tab":"Test by Direct Link",
"directInput":f.read()}
r = requests.post(post, data=data)
print(r.content)

Website form login using Python urllib2

I've breen trying to learn to use the urllib2 package in Python. I tried to login in as a student (the left form) to a signup page for maths students: http://reg.maths.lth.se/. I have inspected the code (using Firebug) and the left form should obviously be called using POST with a key called pnr whose value should be a string 10 characters long (the last part can perhaps not be seen from the HTML code, but it is basically my social security number so I know how long it should be). Note that the action in the header for the appropriate POST method is another URL, namely http://reg.maths.lth.se/login/student.
I tried (with a fake pnr in the example below, but I used my real number in my own code).
import urllib
import urllib2
url = 'http://reg.maths.lth.se/'
values = dict(pnr='0000000000')
data = urllib.urlencode(values)
req = urllib2.Request(url,data)
resp = urllib2.urlopen(req)
page = resp.read()
print page
While this executes, the print is the source code of the original page http://reg.maths.lth.se/, so it doesn't seem like I logged in. Also, I could add any key/value pairs to the values dictionary and it doesn't produce any error, which seems strange to me.
Also, if I go to the page http://reg.maths.lth.se/login/student, there is clearly no POST method for submitting data.
Any suggestions?
If you would inspect what request is sent to the server when you enter the number and submit the form, you would notice that it is a POST request with pnr and _token parameters:
You are missing the _token parameter which you need to extract from the HTML source of the page. It is a hidden input element:
<input name="_token" type="hidden" value="WRbJ5x05vvDlzMgzQydFxkUfcFSjSLDhknMHtU6m">
I suggest looking into tools like Mechanize, MechanicalSoup or RoboBrowser that would ease the form submission. You may also parse the HTML with an HTML parser, like BeautifulSoup yourself, extract the token and send via urllib2 or requests:
import requests
from bs4 import BeautifulSoup
PNR = "00000000"
url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
# extract token
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
# navigate to the main page again (should be logged in)
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
print(soup.title)

Trouble with requests/Beautiful soup

I'm trying to learn to use som web features of Python, and thought I'd practice by writing a script to login to a webpage at my university. Initially I wrote the code using urllib2, but user alecxe kindly provided me with a code using requests/BeautifulSoup (please see:Website form login using Python urllib2)
I am trying to login to the page http://reg.maths.lth.se/. The page features one login form for students and one for teachers (I am obviously trying to log in as a student). To login one should provide a "Personnummer" which is basically the equivalent of a social security number, so I don't want to post my valid number. However, I can reveal that it should be 10 digits long.
The code I was provided (with a small change to the final print statement) is given below:
import requests
from bs4 import BeautifulSoup
PNR = "00000000"
url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
# extract token
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
# navigate to the main page again (should be logged in)
#response = session.get(url) ##This is deliberately commented out
soup = BeautifulSoup(response.content, "html.parser")
print(soup)
It is thus supposed to print the source code of the page obtained after POSTing the pnr.
While the code runs, it always returns the source code of the main page http://reg.maths.lth.se/ which is not correct. For example, if you try to manually enter a pnr of the wrong length, i.e. 0, you should be directed to a page which looks like this:
located at the url http://reg.maths.lth.se/login/student whose source code is obiously different from that of the main page.
Any suggestions?
You aren't assigning the POST result to response, and are just printing out the result of the first GET request.
So,
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
should be
response = session.post(login_url, data={
"_token": token,
"pnr": PNR
})

How do I get the url of the response page after I've submitted a form in mechanize?

Using mechanize (and python) I can go to a website, log in, find a form, fill in some answers, and submit that form. However, I don't know how I can open the "response" page - that is, the page that automatically loads once you've submitted the form.
Here's the python code:
br.select_form(name="simTrade")
br.form["symbolTextbox"] = "KO"
br.form["quantityTextbox"] = "10"
br.form["previewOrderButton"]
preview = br.submit()
print preview.read
With the above code, I can see what the response page holds. But I want to actually open that page and interact with it. How can I do that with mechanize? Thank you.
EDIT: So I answered my own question soon after posting this. Here's the code:
br.select_form(name="simTrade")
br.form["symbolTextbox"] = symbol
br.form["transactionTypeDropDown"] = [order_type]
br.form["quantityTextbox"] = amount
br.form["previewOrderButton"]
no_url = br.submit()
final = no_url.geturl()
x = br.open(final)
print x.read()
To get the html source code of the response page (the page that loads when you submit a form), I simply had to get the url of br.submit(). And there's a built in mechanize function for that, geturl().
The OP's answer is a bit convoluted and resulted in a AttributeError. This worked better for me:
br.submit()
base_url = br.geturl()
print base_url
Getting the URL of the new page and opening it isn't necessary. Once the form has been submitted the new page opens automatically and you can start interacting with it using the same mechanize browser object.
Using the original code from your question, if you wanted to submit the form and store all links on the new page in a list:
br.select_form(name="simTrade")
br.form["symbolTextbox"] = "KO"
br.form["quantityTextbox"] = "10"
br.form["previewOrderButton"]
br.submit()
# Here we store all links on the new page
# but we can use br do any necessary processing.
links = [link for link in br.links()]
# This will take us back to the original page with the "simTrade" form.
br.back()

Categories