import requests
url = "https://stackoverflow.com/"
payload = {"q": "python"}
s = requests.session()
r = s.post(url, data=payload)
print r.text
I wish to use a post request in order to obtain the subsequent webpage. However, the above code prints the source code of the home page and not the the next page. Can someone tell me what I should do to obtain the source code of the next page? I have searched through many questions on StackOverflow related to this and haven't found a solution.
Thanks in advance.
Related
I understand there are similar questions out there, however, I couldn't make this code to work out. Does anyone know how to login and scrape the data from this website?
from bs4 import BeautifulSoup
import requests
# Start the session
session = requests.Session()
# Create the payload
payload = {'login':<USERNAME>,
'password':<PASSWORD>
}
# Post the payload to the site to log in
s = session.post("https://www.beeradvocate.com/community/login", data=payload)
# Navigate to the next page and scrape the data
s = session.get('https://www.beeradvocate.com/place/list/?c_id=AR&s_id=0&brewery=Y')
soup = BeautifulSoup(s.text, 'html.parser')
soup.find('div', class_='titleBar')
print(soup)
The process is different for almost each site, the best way to know how to do it is to use your browser's request inspector (firefox) and look at how the site behaves when you try to login.
For your website, when you click the login button a post request is sent to https://www.beeradvocate.com/community/login/login, with a little bit of trial and error your should be able to replicate it.
Make sure you match the content-type and request headers (specifically cookies in case you need auth tokens).
I'm trying to make a web scraper using Python. The website has a login form though and I've been trying to log in for a few days with no results. The code looks like this:
session_requests = requests.Session()
r = session_requests.get(login_url, headers=dict(referer=login_url))
print(r.content)
tree = html.fromstring(r.text)
authenticity_token = list(set(tree.xpath('//input[#name="_csrf_token"]/#value')))[0]
payload = {"_csrf_token": authenticity_token, "_username": "-username-", "_password": "-password-",}
r = session_requests.post(login_url, data=payload, headers=dict(referer=login_url))
print(r.content)
You can see I print out r.content both before and after posting to the login page, and in theory I should get different outputs (because the second one should be the content of the actual web page after the login), but unfortunately I get the exact same output.
Here's a screenshot of what the login page requires to log in:
enter image description here
Also, I know for sure that the _csrf_token is correct because I have tested it a few times, so no doubts about that part.
Another thing that might be useful: I don't think I really need to include the headers because the outputs are exactly the same with or without them (I include them just because). Thanks in advance.
Edit: the URL is https://nuvola.madisoft.it/login
Here's some more useful stuff:
I have been reading a lot on how to submit a form with python and then read and scrap the obtained page. However I do not manage to do it in the specific form I am filling. My code returns the html of the form page. Here is my code :
import requests
values = {}
values['archive'] = "1"
values['descripteur[]'] = ["mc82", "mc84"]
values['typeavis[]'] = ["10","6","7","8","9"]
values['dateparutionmin'] = "01/01/2015"
values['dateparutionmax'] = "31/12/2015"
req = requests.post('https://www.boamp.fr/avis/archives', data=values)
print req.text
Any suggestion appreciated.
req.text looks like :
You may post data to a wrong page. I access the url and post one, then i found the post data is sent to https://www.boamp.fr/avis/liste. (sometime fiddler is useful to figure out the process)
So your code should be this
req = requests.post('https://www.boamp.fr/avis/liste', data=values)
I am trying to use the requests function in python to post the text content of a text file to a website, submit the text for analysis on said website, and pull the results back in to python. I have read through a number of responses here and on other websites, but have not yet figured out how to correctly modify the code to a new website.
I'm familiar with beautiful soup so pulling in webpage content and removing HTML isn't an issue, its the submitting the data that I don't understand.
My code currently is:
import requests
fileName = "texttoAnalyze.txt"
fileHandle = open(fileName, 'rU');
url_text = fileHandle.read()
url = "http://www.webpagefx.com/tools/read-able/"
payload = {'value':url_text}
r = requests.post(url, payload)
print r.text
This code comes back with the html of the website, but hasn't recognized the fact that I'm trying to a submit a form.
Any help is appreciated. Thanks so much.
You need to send the same request the website is sending, usually you can get these with web debugging tools (like chrome/firefox developer tools).
In this case the url the request is being sent to is: http://www.webpagefx.com/tools/read-able/check.php
With the following params: tab=Test+by+Direct+Link&directInput=SOME_RANDOM_TEXT
So your code should look like this:
url = "http://www.webpagefx.com/tools/read-able/check.php"
payload = {'directInput':url_text, 'tab': 'Test by Direct Link'}
r = requests.post(url, data=payload)
print r.text
Good luck!
There are two post parameters, tab and directInput:
import requests
post = "http://www.webpagefx.com/tools/read-able/check.php"
with open("in.txt") as f:
data = {"tab":"Test by Direct Link",
"directInput":f.read()}
r = requests.post(post, data=data)
print(r.content)
I'm trying to learn to use som web features of Python, and thought I'd practice by writing a script to login to a webpage at my university. Initially I wrote the code using urllib2, but user alecxe kindly provided me with a code using requests/BeautifulSoup (please see:Website form login using Python urllib2)
I am trying to login to the page http://reg.maths.lth.se/. The page features one login form for students and one for teachers (I am obviously trying to log in as a student). To login one should provide a "Personnummer" which is basically the equivalent of a social security number, so I don't want to post my valid number. However, I can reveal that it should be 10 digits long.
The code I was provided (with a small change to the final print statement) is given below:
import requests
from bs4 import BeautifulSoup
PNR = "00000000"
url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
# extract token
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
# navigate to the main page again (should be logged in)
#response = session.get(url) ##This is deliberately commented out
soup = BeautifulSoup(response.content, "html.parser")
print(soup)
It is thus supposed to print the source code of the page obtained after POSTing the pnr.
While the code runs, it always returns the source code of the main page http://reg.maths.lth.se/ which is not correct. For example, if you try to manually enter a pnr of the wrong length, i.e. 0, you should be directed to a page which looks like this:
located at the url http://reg.maths.lth.se/login/student whose source code is obiously different from that of the main page.
Any suggestions?
You aren't assigning the POST result to response, and are just printing out the result of the first GET request.
So,
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
should be
response = session.post(login_url, data={
"_token": token,
"pnr": PNR
})