How to post a request to .aspx page with python - python

I am trying to scrape the following website: https://wwwapps.ncmedboard.org/Clients/NCBOM/Public/LicenseeInformationResults.aspx
In order to get each page to scrape, I need to first conduct a search on this .aspx page by inputting a first name and last name and initiating the search.
Using resources on the internet, I put together the following http POST request:
url = 'https://wwwapps.ncmedboard.org/Clients/NCBOM/Public/LicenseeInformationResults.aspx'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en;q=0.9,en-US;q=0.8,zh-TW;q=0.7,zh;q=0.6,zh-CN;q=0.5'
}
session = requests.session()
response = session.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'})
soup = BeautifulSoup(response.content, 'html.parser')
form_data = {
'__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value'),
'__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value'),
'waLastName':'Smith',
'waFirstName':'John',
'__EVENTTARGET':'btnNext'
}
f = session.post(url, data=form_data, headers=headers)
soup = BeautifulSoup(f.content, 'html.parser')
for a in soup.find_all('a', href=True):
print("Found the URL:" + a['href'])
It doesn't seem like the post has any effect, since when you look at the html after the post request, it doesn't seem to show the results page. Any pointers on why this is the case?
Thanks!

you may have to set asp.net session cookie which will be generated new for each new session. which in your website's case (https://wwwapps.ncmedboard.org) is ASP.NET_SessionId=(value of your sessionid provided by website)
if CSRF is not validated correctly it will may be bypass it.

Related

Unable to log in to a website using the requests module

I'm trying to log in to a website using requests module. It seems I have incorporated the manual steps into the script based on what I see in dev tools while logging in to that site manually. However, when I run the script and check the content it received as a response, I see this line: There was an unexpected error.
I've created a free account there for the purpose of testing only. The login details are hardcoded within the parameters.
import requests
from bs4 import BeautifulSoup
link = 'https://www.apartments.com/customers/login'
login_url = 'https://auth.apartments.com/login?{}'
params = {
'dsrv.xsrf': '',
'sessionId': '',
'username': 'shahin.iqbal80#gmail.com',
'password': 'SShift1234567$'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
headers_post = {
'origin': 'https://auth.apartments.com',
'referer': '',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'x-requested-with': 'XMLHttpRequest'
}
with requests.Session() as s:
s.headers.update(headers)
resp = s.get(link)
soup = BeautifulSoup(resp.text,"lxml")
res = s.get(soup.select_one("#auth-signin-iframe")['src'])
soup = BeautifulSoup(res.text,"lxml")
post_url = login_url.format(soup.select_one("[id='signinform']")['action'].split("/login?")[1])
headers_post['referer'] = post_url
s.headers.update(headers_post)
params['dsrv.xsrf'] = soup.select_one("input[name='idsrv.xsrf']")['value']
params['sessionId'] = soup.select_one("input[id='sessionId']")['value']
resp = s.post(post_url,data=params)
print(resp.status_code)
print(resp.content)
print(resp.url)
How can I make the login successful using the requests module?

Amazon login and acess to orders

Problem:
I've read myself trough articles for days and try to login into my amazon account with python. But I'm failing each time. Since every article has a different approach is very hard to find the potential error source. Especially as a lot of articles are older than 2-3 years.
I think from my current point of view the most straight forward way is to use BeautifulSoup bs4and requests. Which parser is the best is another discussion but I've seen html.parser, html5lib and lxml as most amazon login related articles are working with html.parser this is the one currently in my code even if I would love to use lxml or html5lib later on.
All kinds of input and feedback helps to summaries all important points and turnarounds.
I'm currently trying to get to the login page via 'https://www.amazon.de/gp/css/order-history?ref_=nav_orders_first' as the 'https://www.amazon.de/ap/signin' gives me an error, at least in my browser. So I'm going to a page where a user needs to login (my orders) to be forwarded to the login page and try to log in there. Is there a possibility to be logged out again when making a new requests to another subsite like switching pages? Also, I found an article using with requests.Session() as s:is this a better way to request a site compared to not doing with an intend and Session(). I'm by the way using "de" in the URL but you can exchange that with "com" I guess.
Current code:
import bs4
from bs4 import BeautifulSoup
import requests
amazon_orders_url = r'https://www.amazon.de/gp/css/order-history?ref_=nav_orders_first' # First time visit login
amazon_login_url = r'https://www.amazon.de/ap/signin' # Not working by browser access
credentials = {'email': "EMAILADRESS", "password": "PASSWORD"}
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'en,de-DE;q=0.9,de;q=0.8,en-US;q=0.7',
'referer': 'https://www.amazon.de/ap/signin'}
# print(credentials['email']) # print Email address
with requests.Session() as s:
s.headers = headers
site = s.get(amazon_orders_url) # , headers=headers
# HTML parsing
soup = BeautifulSoup(site.content, "html.parser") # Alternative "html5lib" / , "html.parser" / , "lxml"
# Print whole page
# print(soup)
# Check if Anmelden/Login exists
for div in soup.find_all('div', class_='a-box'):
headline = div.h1
print(headline)
signin_data = {s["name"]: s["value"]
for s in soup.select("form[name=signIn]")[0].select("input[name]")
if s.has_attr("value")}
# signin_data = {}
# signin_form = soup.find('form', {'name': 'signIn'})
# for field in signin_form.find_all('input'):
# try:
# signin_data[field['name']] = field['value']
# except:
# pass
signin_data[u'email'] = credentials['email']
signin_data[u'password'] = credentials['password']
post_response = s.post('https://www.amazon.de/ap/signin', data=signin_data)
soup = BeautifulSoup(post_response.text, "html.parser")
warning = soup.find('div', {'id': 'message_warning'})
# if warning:
# print('Failed to login: {0}'.format(warning.text))
print(soup)
# print(post_response.content)
Also I have a similar problem in my project so far I can obtain some of the parameters of the header to enter into the logging process.
from pprint import pprint
from bs4 import BeautifulSoup
import os
import requests
cookie = AMAZON_COOKIE = os.getenv("AMAZON_COOKIE", "")
#https://read.amazon.com/notebook
res = requests.get('https://read.amazon.com/notebook',
headers={
'Accept-Encoding': 'gzip, deflate',
'Content-Type': 'application/json; charset=UTF-8',
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:87.0) Gecko/20100101 Firefox/87.0",
'Cookie': cookie
}
)
pprint(res.text)
def login():
signin_page_res = requests.get('https://read.amazon.com/notebook',
headers={
'Accept-Encoding': 'gzip, deflate',
'Content-Type': 'application/json; charset=UTF-8',
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:87.0) Gecko/20100101 Firefox/87.0",
}
)
soup = BeautifulSoup(signin_page_res.text, 'html.parser')
login_inputs = {}
for input_tag in soup.find_all('input'):
name = input_tag.get('name')
value = input_tag.get('value')
if not value and name not in ('email', 'password'):
continue
login_inputs[name] = value
login_inputs['email'] = os.getenv("AMAZON_USERNAME", "email")
login_inputs['password'] = os.getenv("AMAZON_PASSWORD", "pass")
pprint(login_inputs)
login_res = requests.post('https://read.amazon.com/notebook',
headers={
'Accept-Encoding': 'gzip, deflate',
'Content-Type': 'application/json; charset=UTF-8',
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:87.0) Gecko/20100101 Firefox/87.0",
},
data=login_inputs
)
print(login_res.text)
with open('index.html','w',encoding="utf-8") as f:
f.write(login_res.text)
if __name__ == '__main__':
login()
Recently I found that there is one project called audible where they used the api to connect to the account of amazon, and fix the problem of the encryption of password. here

How to fix 'CSRF verification failed' using requests module

I'm using python requests library and I'm trying to login to https://www.udemy.com/join/login-popup/, the problem is when I use the following header:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
it returns CSRF verification failed. Request aborted.
When I change it to:
headers = {'Referer': url}
it returns Please verify that you are a human.
any suggestions?
My code:
import requests
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/74.0.3729.131 Safari/537.36'}
request = s.get(url, headers=headers)
cookies = dict(cookies=request.cookies)
csrf = request.cookies['csrftoken']
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail',
'password': 'maypassword'}
request = s.post(url, data=data_login, headers={'Referer': url}, cookies=cookies['cookies'])
print(request.content)
There are a couple of issues with your current code:
The header you are using is missing a few things
The value that you are passing for csrfmiddlewaretoken isn't correct
As you're using requests.session() you shouldn't include cookies manually (in this case)
Try this code:
import requests
from bs4 import BeautifulSoup
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Referer": "https://www.udemy.com/join/login-popup/", "Upgrade-Insecure-Requests": "1"}
request = s.get(url, headers=headers)
soup = BeautifulSoup(request.text, "lxml")
csrf = soup.find("input",{"name":"csrfmiddlewaretoken"})["value"]
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail#test.com','password': 'maypassword'}
request = s.post(url, data=data_login, headers=headers)
print(request.content)
(PS: I'm using the Beautifulsoup library in order to find the value of csrfmiddlewaretoken)
Hope this helps

Logging into website and scraping data

The website I am trying to log in to is https://realitysportsonline.com/RSOLanding.aspx. I can't seem to get the login to work since the process is a little different to a typical site that has a login specific page. I haven't got any errors, but the log in action doesn't work, which then causes the main to redirect to the homepage.
import requests
url = "https://realitysportsonline.com/RSOLanding.aspx"
main = "https://realitysportsonline.com/SetLineup_Contracts.aspx?leagueId=3000&viewingTeam=1"
data = {"username": "", "password": "", "vc_btn3 vc_btn3-size-md vc_btn3-shape-rounded vc_btn3-style-3d vc_btn3-color-danger" : "Log In"}
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer': 'https://realitysportsonline.com/RSOLanding.aspx',
'Host': 'realitysportsonline.com',
'Connection': 'keep-alive',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'}
s = requests.session()
s.get(url)
r = s.post(url, data, headers=header)
page = requests.get(main)
First of all, you create a session and assuming your POST request worked, you then request an authorised page without using your previously created session.
You need to make the request with the s object you created like so:
page = s.get(main)
However, there were also a few issues with your POST request. You were making a request to the home page instead of the /Login route. You were also missing the Content-Type header.
import requests
url = "https://realitysportsonline.com/Services/AccountService.svc/Login"
main = "https://realitysportsonline.com/LeagueSetup.aspx?create=true"
payload = {"username":"","password":""}
headers = {
'Content-Type': "text/json",
'Cache-Control': "no-cache"
}
s = requests.session()
response = s.post(url, json=payload, headers=headers)
page = s.get(main)
PS your main request url redirects to the homepage, even with a valid session (at least for me).

Using Requests Post to login to this site not working

I know there are tons of threads and videos on how to do this, I've gone through them all and am in need of a little advanced guidance.
I am trying to log into this webpage where I have an account so I can send a request to download a report.
First I send the get request to the login page, then send the post request but when I print(resp.content) I get the code back for the login page. I do get a code[200] but I can't get to the index page. No matter what page I try to get after the post it keeps redirecting me back to the login page
Here are a couple things I'm not sure if I did correctly:
For the header I just put everything that was listed when I inspected the page
Not sure if I need to do something with the cookies?
Below is my code:
import requests
import urllib.parse
url = 'https://myurl.com/login.php'
next_url = 'https://myurl.com/index.php'
username = 'myuser'
password = 'mypw'
headers = {
'Host': 'url.myurl.com',
'Connection': 'keep-alive',
'Content-Length': '127',
'Cache-Control': 'max-age=0',
'Origin': 'https://url.myurl.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Referer': 'https://url.myurl.com/login.php?redirect=1',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.8',
'Cookie': 'PHPSESSID=3rgtou3h0tpjfts77kuho4nnm3'
}
login_payload = {
'XXX_login_name': username,
'XXX_login_password': password,
}
login_payload = urllib.parse.urlencode(login_payload)
r = requests.Session()
r.get(url, headers = headers)
r.post(url, headers = headers, data = login_payload)
resp = r.get(next_url, headers = headers)
print(resp.content)
You don't need to send separate requests for authorization and file download. You need to send single POST with specifying credentials. Also in most cases you don't need to send headers. In common your code should looks like follow:
from requests.auth import HTTPBasicAuth
url_to_download = "http://some_site/download?id=100500"
response = requests.post(url_to_download, auth=HTTPBasicAuth('your_login', 'your_password'))
with open('C:\\path\\to\\save\\file', 'w') as my_file:
my_file.write(response.content)
There are a few more fields in the form data to post:
import requests
data = {"redirect": "1",
"XXX_login_name": "your_username",
"XXX_login_password": "your_password",
"XXX_actionSUBMITLOGIN": "Login",
"XXX_login_php": "1"}
with requests.Session() as s:
s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
r1 = s.get("https://eym.sicomasp.com/login.php")
s.headers["cookie"] = r1.headers["Set-Cookie"]
pst = s.post("https://eym.sicomasp.com/login.php", data=data)
print(pst.history)
You may get redirected to index.php automatically after the post, you can check r1.history and r1.content to see exactly what is happening.
So I figured out what my problem was, just in case anyone in the future has the same issue. I am sure different websites have different requirements but in this case the Cookie: I was sending in the request header was blocking it. What I did was grab my cookie in the headers AFTER I logged in. I updated my headers and then I sent the request. This is what ended up working:
(also the form data needs to be encoded in HTML)
import requests
import urllib.parse
headers = {
'Host' : 'eym.sicomasp.com',
'Content-Length' : '62',
'Origin' : 'https://eym.sicomasp.com',
'Upgrade-Insecure-Requests' : '1',
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'Referer' : 'https://eym.sicomasp.com/login.php?redirect=1',
'Cookie' : 'PHPSESSID=vdn4er761ash4sb765ud7jakl0; SICOMUSER=31+147234553'
} #Additional cookie information after logging in ^^^^
data = {
'XXX_login_name': 'myuser',
'XXX_login_password': 'mypw',
}
data = urllib.parse.urlencode(data)
with requests.Session() as s:
s.headers.update(headers)
resp = s.post('https://eym.sicomasp.com/index.php', data=data2)
print(resp.content)

Categories