How to fix 'CSRF verification failed' using requests module - python

I'm using python requests library and I'm trying to login to https://www.udemy.com/join/login-popup/, the problem is when I use the following header:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
it returns CSRF verification failed. Request aborted.
When I change it to:
headers = {'Referer': url}
it returns Please verify that you are a human.
any suggestions?
My code:
import requests
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/74.0.3729.131 Safari/537.36'}
request = s.get(url, headers=headers)
cookies = dict(cookies=request.cookies)
csrf = request.cookies['csrftoken']
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail',
'password': 'maypassword'}
request = s.post(url, data=data_login, headers={'Referer': url}, cookies=cookies['cookies'])
print(request.content)

There are a couple of issues with your current code:
The header you are using is missing a few things
The value that you are passing for csrfmiddlewaretoken isn't correct
As you're using requests.session() you shouldn't include cookies manually (in this case)
Try this code:
import requests
from bs4 import BeautifulSoup
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Referer": "https://www.udemy.com/join/login-popup/", "Upgrade-Insecure-Requests": "1"}
request = s.get(url, headers=headers)
soup = BeautifulSoup(request.text, "lxml")
csrf = soup.find("input",{"name":"csrfmiddlewaretoken"})["value"]
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail#test.com','password': 'maypassword'}
request = s.post(url, data=data_login, headers=headers)
print(request.content)
(PS: I'm using the Beautifulsoup library in order to find the value of csrfmiddlewaretoken)
Hope this helps

Related

requests returning HTTP 404 when logging in after POST

I am trying to scrape this course review website for my college, but to do so I need to log in. I think I'm doing everything right in the login process:
The payload is complete with all of the relevant information. I used inspect element and network to verify that I hadn't missed any input fields and get_authenticity_token is successfully scraping the relevant string.
Maybe I'm doing something wrong in my header? I just copied someone else's code for that. Might not even need a header.
import requests
from bs4 import BeautifulSoup
session = requests.Session()
session.headers = {'User-Agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36')}
payload = {'email':'person#email.com',
'password':'secret',
'utf8':'✓',
'commit': 'Sign In'
}
def get_authenticity_token(html):
soup = BeautifulSoup(html, "html.parser")
token = soup.find('input', attrs={'name': 'authenticity_token'})
if not token:
print('could not find `authenticity_token` on login form')
return token.get('value').strip()
s = session.get("https://pomonastudents.org/login")
payload.update({
'authenticity_token': get_authenticity_token(s.text)
})
s = session.post("https://pomonastudents.org/login", data=payload)
print(s.text)
print(payload)
Why might this not be working? What steps can I take to investigate possible causes?
edit: fixed awkward wording and added last sentence.
The following is how I meant. Try this:
import requests
from bs4 import BeautifulSoup
payload = {
'utf8': '✓',
'authenticity_token': '',
'email': 'person#email.com',
'password': 'secret',
'commit': 'Sign In'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
s.headers['X-Requested-With'] = 'XMLHttpRequest'
res = s.get("https://pomonastudents.org/login")
soup = BeautifulSoup(res.text, "html.parser")
payload['authenticity_token'] = soup.select_one("[name='authenticity_token']")["value"]
s.headers['X-CSRF-Token'] = soup.select_one("[name='csrf-token']")["content"]
resp = s.post('https://pomonastudents.org/login/credentials',data=payload)
print(resp.status_code)

How would I log into Instagram using BeautifulSoup4 and Requests, and how would I determine it on my own?

I've looked at these two posts on Stack Overflow so far:
I can't login to Instagram with Requests and Instagram python requests log in without API. Both of the solutions don't work for me.
How would I do this now, and how would someone go about finding what requests to make where? To make that clearer, if I were to send a post request to log in, how would I go about knowing what and where to send it?
I don't want to use Instagram's API or Selenium, as I want to try out Requests and (maybe) bs4.
In case you'd want some code:
import requests
main_url = 'https://www.instagram.com/'
login_url = main_url+'accounts/login/ajax'
user_agent = 'User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523 Safari/8536.25'
session = requests.session()
session.headers = {"user-agent": user_agent}
session.headers.update({'Referer': main_url})
req = session.get(main_url)
session.headers.update({'set-cookie': req.cookies['csrftoken']})
print(req.status_code)
login_data = {"csrfmiddlewaretoken": req.cookies['csrftoken'], "username": "myusername", "password": "mypassword"}
login = session.post(login_url, data=login_data, allow_redirects=True)
print(login.status_code)
session.headers.update({'set-cookie': login.cookies['csrftoken']})
cookies = login.cookies
print(login.headers)
print(login.status_code)
This gives me a 405 error.
you can use this code to login to instagram
import re
import requests
from bs4 import BeautifulSoup
from datetime import datetime
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
time = int(datetime.now().timestamp())
payload = {
'username': 'login',
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{time}:your_password',
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
r = s.get(link)
csrf = re.findall(r"csrf_token\":\"(.*?)\"", r.text)[0]
r = s.post(login_url, data=payload, headers={
"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
"Referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken": csrf
})
print(r.status_code)
Hint: I needed to modify the line
r = s.get(link)
into
r = s.get(link,headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
to get a proper reply. Without it, I got "page not found" using JupyterNotebook.

How to post a request to .aspx page with python

I am trying to scrape the following website: https://wwwapps.ncmedboard.org/Clients/NCBOM/Public/LicenseeInformationResults.aspx
In order to get each page to scrape, I need to first conduct a search on this .aspx page by inputting a first name and last name and initiating the search.
Using resources on the internet, I put together the following http POST request:
url = 'https://wwwapps.ncmedboard.org/Clients/NCBOM/Public/LicenseeInformationResults.aspx'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en;q=0.9,en-US;q=0.8,zh-TW;q=0.7,zh;q=0.6,zh-CN;q=0.5'
}
session = requests.session()
response = session.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'})
soup = BeautifulSoup(response.content, 'html.parser')
form_data = {
'__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value'),
'__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value'),
'waLastName':'Smith',
'waFirstName':'John',
'__EVENTTARGET':'btnNext'
}
f = session.post(url, data=form_data, headers=headers)
soup = BeautifulSoup(f.content, 'html.parser')
for a in soup.find_all('a', href=True):
print("Found the URL:" + a['href'])
It doesn't seem like the post has any effect, since when you look at the html after the post request, it doesn't seem to show the results page. Any pointers on why this is the case?
Thanks!
you may have to set asp.net session cookie which will be generated new for each new session. which in your website's case (https://wwwapps.ncmedboard.org) is ASP.NET_SessionId=(value of your sessionid provided by website)
if CSRF is not validated correctly it will may be bypass it.

urllib.request.Request use Python code for scrape data from Twitter

I am trying to SCRAPE DATA FROM TWITTER. previously working code does not work any more the twitter is sending weard html commends. i am using different urllib.request.Request commands stil not getting the html structure as i seen it in browers. i could not find a solution to my problem
alternative 1
import urllib.parse
import urllib.request
url = 'https://twitter.com/search?l=&q=%22gsm%22%20since%3A2017-01-01%20until%3A2017-05-02&src=typd'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
values = {'name': 'Michael Foord',
'location': 'Northampton',
'language': 'Python' }
headers = {'User-Agent': user_agent}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data, headers)
with urllib.request.urlopen(req) as response:
the_page = response.read()
alternative 2:
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
is there any way that i could solve this issue with urllib or any other way

python requests does not POST after redirect

For some reason python requests does not do rePOST after encountered redirect header
import requests
proxies = {'http': 'http://127.0.0.1:8888',}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url, data, headers=headers, timeout=timeout, proxies=proxies, allow_redirects=True,)
html = r.text
So it means I can't login to any form that is behind redirect. How can I solve this issue? Thank you!

Categories