I'm working on using the python requests module to login to a webpage. I'm getting the csrf_token by doing a GET request on the url, parsing it with BeautifulSoup to get the csrf_token which is fine and working great, but when I do a POST and use the csrf_token from the GET, the csrf_token has changed and I can't login with an invalid csrf_token.
I understand csrf changes from GET to GET but shouldn't change from GET to POST.
How do I get the csrf_token to not change? I do have access to the source too, but I didn't write it.
When I step into the code with pdb.set_trace() I can change the csrf_token to what I got from the GET and continue then everything works.
here is the requests code I have:
import sys
import requests
from bs4 import BeautifulSoup
#~ URL = 'https://portal.bitcasa.com/login'
URL = 'http://0.0.0.0:12080/login'
EMAIL = 'foo#foo.com'
PASSWORD = 'abc123'
CLIENT_ID = 12345
client = requests.session(config={'verbose': sys.stderr})
# Retrieve the CSRF token first
soup = BeautifulSoup(client.get(URL).content)
csrftoken = soup.find('input', dict(name='csrf_token'))['value']
print csrftoken
# Parameters to pass
data = dict(email=EMAIL, password=PASSWORD, csrf_token=csrftoken)
headers = dict(referer=URL)
params = dict(client_id=CLIENT_ID)
r = client.post(URL, data=data, headers=headers, params=params)
print r
print r.text
I can login to other web pages with this method.
What other information can I provide to help you help me?
Thanks
Related
I am having the same use case as here. I would like to access to the gitlab page to get html page content (private repo) but it always direct me to sign in page even I already pass the authentication which I refer to here
Below is my code:
import urllib, re, sys, requests
from bs4 import BeautifulSoup
LOGIN_URL = 'https://gitlab.devtools.com//users/auth/ldapmain/callback'
session = requests.Session()
data = {'username': username,
'password': password,
'authenticity_token': token}
r = session.post(LOGIN_URL, data=data)
print r.status_code
url = "https://gitlab.devtools.com/Sandbox/testing/merge_requests/2"
html = session.get(url)
print html.url
Any idea on this? Am I missing anything?
I try to scrape website 'https://laboral.pjud.cl/SITLAPORWEB/InicioAplicacionPortal.do', but each time I got the same page with error. I think problem is that I have to authenticate on this website first.
I've tried to create session object and send post request, but nothing seems to change.
import requests
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth
username = 'user'
password = 'pass'
scrape_url = 'https://laboral.pjud.cl/SITLAPORWEB/InicioAplicacionPortal.do'
login_url = 'https://laboral.pjud.cl/SITLAPORWEB/jsp/LoginPortal/LoginPortal.jsp'
r = requests.get(login_url, auth=HTTPBasicAuth(username, password))
print(r.text)
>>>
<form name="InicioAplicacionForm" method="POST"
action="/SITLAPORWEB/InicioAplicacionPortal.do"><INPUT
type="hidden" name="FLG_Autoconsulta" value="1"><input
type="hidden" name="D0E0F02E"
value="764C8AA111F42E621BC10BA16CD8D8B2">
</form><script>document.InicioAplicacionForm.submit();</script>
login_info = {'username': username,'password': password, "D0E0F02E":"764C8AA111F42E621BC10BA16CD8D8B2"}
session = requests.session()
session.post(url=login_url, data=login_info)
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
print(soup)
I'm new to web scraping and I just couldn't find the solution to my problem.
I'm stuck at the login page.
import requests
POST_LOGIN_URL = 'https://ocjene.skole.hr/pocetna/prijava' # Login page
REQUEST_URL = 'https://ocjene.skole.hr/pregled/predmeti' # Goal page for scraping
with requests.Session() as session:
session.get(POST_LOGIN_URL) # Loading all cookies...
login_page = session.get(POST_LOGIN_URL) # Login page content (for comparison)
token = session.cookies["csrf_cookie"] # This cookie on chrome has a valid csrf token
payload = {
'csrf_token': token,
'user_login': 'xxx',
'user_password': 'xxx'
}
post = session.post(POST_LOGIN_URL, data=payload) # Logging in...
afterLogin = session.get(REQUEST_URL) # This is where I need to get all the content, but...
print(subject_math.content)
print(login_page.content)
# These two share exact same content, except the csrf token is different
I'm not sure if logging in was successful. I double-checked everything,
the form data is correct and I also tried replacing the request headers like so:
post = session.post(POST_LOGIN_URL, data=payload, headers=headers)
What am I missing? thanks.
It looks like chrome is posting to posalji/
Also inspect post.content after the request, that should tell you if it was ok.
After some discussion with my problem on Unable to print links using beautifulsoup while automating through selenium
I realized that the main problem is in the URL which the request is not able to extract. URL of the page is actually https://society6.com/discover but I am using selenium to log into my account so the URL becomes https://society6.com/society?show=2
However, I can't use the second URL with request since its showing error. How do i scrap information from URL like this.
You need to log in first!
To do that you can use the bs4.BeautifulSoup library.
Here is an implementation that I have used:
import requests
from bs4 import BeautifulSoup
BASE_URL = "https://society6.com/"
def log_in_and_get_session():
"""
Get the session object with login details
:return: requests.Session
"""
ss = requests.Session()
ss.verify = False # optinal for uncertifaied sites.
text = ss.get(f"{BASE_URL}login").text
csrf_token = BeautifulSoup(text, "html.parser").input["value"]
data = {"username": "your_username", "password": "your_password", "csrfmiddlewaretoken": csrf_token}
# results = ss.post("{}login".format(BASE_URL), data=data)
results = ss.post("{}login".format(BASE_URL), data=data)
if results.ok:
print("Login success", results.status_code)
return ss
else:
print("Can't login", results.status_code)
Using the 'post` method to log in...
Hope this helps you!
Edit
Added the beginning of the function.
I am trying to login to a website using 'request and post' before I scrape some data.
The login does not seem to work, as in, the data I get before and after I login does not differ. However if I manually login using my browser the data before and after login is different, for example I can see my profile on the main page. I have also put the login in a try-except format to see if it's showing any exceptions, but with no luck.
I have checked and made sure I am inputting all the 'form data' requested by login on the page.
Any suggestions would be greatly appreciated.
My code is below:
import urllib
import requests
from bs4 import BeautifulSoup as soup
POST_LOGIN_URL = 'https://wex.nz/login'
REQUEST_URL = 'https://wex.nz'
payload = {'email':'testemail#gmail.com','password':'testpassword'}
with requests.Session() as session:
try:
post = session.post(POST_LOGIN_URL, data=payload, headers={"Referer":"https://wex.nz"})
except:
print('login failed')
r = session.get(REQUEST_URL)
page_html= r.content
page_soup= soup(page_html, "html.parser")
profile_container=page_soup.findAll("div",{"class":"profile"})
print(profile_container)