I have been trying to log on to this site with the python requests module, but I keep seeing "The CSRF token could not be verified". I have tried doing what other answers said but it doesn't seem to work.
client = requests.Session()
url = 'https://www.biopharmcatalyst.com/account/login/'
client.get(url)
token = client.cookies['CRAFT_CSRF_TOKEN']
headers = {'Cookie':token}
print(token)
login_data = {'loginName':'login',
'password':'pass',
'CRAFT_CSRF_TOKEN':token}
r1=client.post(url,data=login_data, headers=dict(Referer=url))
print(r1.text)
I'm really not sure what I'm doing wrong here. When I go to the html, I see a different value for the CRAFT_CSRF_TOKEN than what the cookie shows under headers.
The page is setting the CRSF token using JS and the DOM.
You will need to parse the initial GET request to get the token and then pass that to your login.
Something like the following:
import re
client = requests.Session()
url = 'https://www.biopharmcatalyst.com/account/login/'
r = client.get(url)
match = re.search(r'window.csrfTokenValue = "(.*?)";', r.text)
if match:
crsf = match.group(1)
login_data = {
'loginName':'login',
'password':'pass',
'CRAFT_CSRF_TOKEN':crsf}
Handling the headers and so forth will be different depending on what happens after you get the login (I don't have an account).
Related
This is what I have so far. I'm very new to this so point out if I'm doing anything wrong.
import requests
url = "https://9anime.to/user/watchlist"
payload = {
"username":"blahblahblah",
"password":"secretstringy"
# anything else?
}
with requests.Session() as s:
res = s.post(url, data=payload)
print(res.status_code)
You will need to inspect the form and see where the form is posting to in the action tag. In this case it is posting to user/ajax/login. Instead of requesting the watchlist URL you should post those details to the loginurl. Once you are logged in you can request your watchlist.
from lxml.html import fromstring
import requests
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"
payload = {
"username":"someemail#gmail.com",
"password":"somepass"
}
with requests.Session() as s:
res = s.post(loginurl, data=payload)
print(res.content)
# b'{"success":true,"message":"Login successful"}'
res = s.get(url)
tree = fromstring(res.content)
elem = tree.cssselect("div.watchlist div.widget-body")[0]
print(elem.text_content())
# Your watch list is empty.
You would need to have knowledge (documentation of some form) on what that URL is expecting and how you are expected to interact with it. There is no way to know just given the information you have provided.
If you have some system that is able to interact with that URL already (e.g. your able to log in with your browser), then you could try to reverse-engineer what it is your browser is doing...
I'm new to web scraping and I just couldn't find the solution to my problem.
I'm stuck at the login page.
import requests
POST_LOGIN_URL = 'https://ocjene.skole.hr/pocetna/prijava' # Login page
REQUEST_URL = 'https://ocjene.skole.hr/pregled/predmeti' # Goal page for scraping
with requests.Session() as session:
session.get(POST_LOGIN_URL) # Loading all cookies...
login_page = session.get(POST_LOGIN_URL) # Login page content (for comparison)
token = session.cookies["csrf_cookie"] # This cookie on chrome has a valid csrf token
payload = {
'csrf_token': token,
'user_login': 'xxx',
'user_password': 'xxx'
}
post = session.post(POST_LOGIN_URL, data=payload) # Logging in...
afterLogin = session.get(REQUEST_URL) # This is where I need to get all the content, but...
print(subject_math.content)
print(login_page.content)
# These two share exact same content, except the csrf token is different
I'm not sure if logging in was successful. I double-checked everything,
the form data is correct and I also tried replacing the request headers like so:
post = session.post(POST_LOGIN_URL, data=payload, headers=headers)
What am I missing? thanks.
It looks like chrome is posting to posalji/
Also inspect post.content after the request, that should tell you if it was ok.
Site url is http://rajresults.nic.in/resbserx18.htm when send data, but when response comes URL changes in ASP. So which URL user need to send request ASP or html?
Request:
import requests
# data for get result
>>> para = {'roll_no':'2000000','B1':'Submit'}
# this is url where data is entered and get asp response
>>> url = 'http://rajresults.nic.in/resbserx18.htm'
>>> result = requests.post(url,data=para)
>>> result.text
Response
'The page you are looking for cannot be displayed because an invalid method (HTTP verb) is being used.'
Okay after a little bit of work, I found it's some issue with the headers.
I did some trial and error, and found that it checks to make sure the Host header is set.
To debug this, I just incrementally removed chrome's request headers and found which one this web service was particular about.
import requests
headers = {
"Host": "rajresults.nic.in"
}
r = requests.post('http://rajresults.nic.in/resbserx18.asp',
headers = headers,
data = {'roll_no': 2000000, 'B1': 'Submit'}
)
print(r.text)
Python newbie here, so I'm sure this is a trivial challenge...
Using Requests module to make a POST request to the Instagram API in order to obtain a code which is used later in the OAuth process to get an access token. The code is usually accessed on the client-side as it's provided at the end of the redirect URL.
I have tried using Request's response history method, like this (client ID is altered for this post):
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
OAuth_AccessRequest = requests.post(OAuthURL)
ResHistory = OAuth_AccessRequest.history
for resp in ResHistory:
print resp.status_code, resp.url
print OAuth_AccessRequest.status_code, OAuth_AccessRequest.url
But the URLs this returns are not revealing the code number string, instead, the redirect just looks like this:
302 https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.dashboard.com/whathappened&response_type=code
200 https://instagram.com/accounts/login/?force_classic_login=&next=/oauth/authorize/%3Fclient_id%cb0096f08a3848e67355f%26redirect_uri%3Dhttps%3A//www.smashboarddashboard.com/whathappened%26response_type%3Dcode
Where if you do this on the client side, using a browser, code would be replaced with the actual number string.
Is there a method or approach I can add to the POST request that will allow me to have access to the actual redirect URL string that appears in the web browser?
It should work in a browser if you are already logged in at Instagram. If you are not logged in you are redirected to a login page:
https://instagram.com/accounts/login/?force_classic_login=&next=/oauth/authorize/%3Fclient_id%3Dcb0096f08a3848e67355f%26redirect_uri%3Dhttps%3A//www.smashboarddashboard.com/whathappened%26response_type%3Dcode
Your Python client is not logged in and so it is also redirected to Instagram's login page as shown by the value of OAuth_AccessRequest.url :
>>> import requests
>>> OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
>>> OAuth_AccessRequest = requests.get(OAuthURL)
>>> OAuth_AccessRequest
<Response [200]>
>>> OAuth_AccessRequest.url
u'https://instagram.com/accounts/login/?force_classic_login=&next=/oauth/authorize/%3Fclient_id%3Dcb0096f08a3848e67355f%26redirect_uri%3Dhttps%3A//www.smashboarddashboard.com/whathappened%26response_type%3Dcode'
So, to get to the next step, your Python client needs to login. This requires that the client extract and set fields to be posted back to the same URL. It also requires cookies and that the Referer header be properly set. There is a hidden CSRF token that must be extracted from the page (you could use BeautifulSoup for example), and form fields username and password must be set. So you would do something like this:
import requests
from bs4 import BeautifulSoup
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
session = requests.session() # use session to handle cookies
OAuth_AccessRequest = session.get(OAuthURL)
soup = BeautifulSoup(OAuth_AccessRequest.content)
form = soup.form
login_data = {form.input.attrs['name'] : form.input['value']}
login_data.update({'username': 'your username', 'password': 'your password'})
headers = {'Referer': OAuth_AccessRequest.url}
login_url = 'https://instagram.com{}'.format(form.attrs['action'])
r = session.post(login_url, data=login_data, headers=headers)
>>> r
<Response [400]>
>>> r.json()
{u'error_type': u'OAuthException', u'code': 400, u'error_message': u'Invalid Client ID'}
Which looks like it will work once provided a valid client ID.
As an alternative, you could look at mechanize which will handle the form submission for you, including the hidden CSRF field:
import mechanize
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
br = mechanize.Browser()
br.open(OAuthURL)
br.select_form(nr=0)
br.form['username'] = 'your username'
br.form['password'] = 'your password'
r = br.submit()
response = r.read()
But this doesn't work because the referer header is not being set, however, you could use this method if you can figure out a solution to that.
my target is to login within this website:
http://www.six-swiss-exchange.com/indices/data_centre/login.html
And once logged, access the page:
http://www.six-swiss-exchange.com/downloads/indexdata/composition/close_smic.csv
To do this, I am using requests (password and email are unfortunately fake there):
import requests
login_url = "http://www.six-swiss-exchange.com/indices/data_centre/login_en.html"
dl_url = "http://www.six-swiss-exchange.com/downloads/indexdata/composition/close_smic.csv"
with requests.Session() as s:
payload = {
'username':'GG#gmail.com',
'password':'SummerTwelve'
}
r1 = s.post(login_url, data=payload)
r2 = s.get(dl_url, cookies=r1.cookies)
print 'You are not allowed' in r2.content
And the script always returns False. I am using Chrome and inspect to check the form to fill, this is the result of inspect when I manually login:
payload = {
'viewFrom':'viewLOGIN',
'cugname':'swxindex',
'forward':'/indices/data_centre/adjustments_en.html',
'referer':'/ssecom//indices/data_centre/login.html',
'hashPassword':'xxxxxxx',
'username':'GG#gmail.com',
'password':'',
'actionSUBMIT_LOGIN':'Submit'
}
I tried with this, with no result, where XXXXX is the encoded value of SummerTwelve... I clearly do not know how to solve this out! Maybe by mentionning the headers ? The server could reject script request?
I had a similar problem today, and in my case the problem was starting the website interaction with a 'post' command. Due to this, I did not have a valid session cookie which I could provide to the website, and therefore I got the error message "your browser does not support cookies".
The solution was to load the login-page once using get, then send the login-data using post:
s = requests.Session()
r = s.get(url_login)
r = s.post(url_login, data=logindata)
My logindata corresponds to your payload.
With this, the session cookie is managed by the session and you don't have to care about it.