Unable to log in to a website using the requests module - python

I'm trying to log in to a website using requests module. It seems I have incorporated the manual steps into the script based on what I see in dev tools while logging in to that site manually. However, when I run the script and check the content it received as a response, I see this line: There was an unexpected error.
I've created a free account there for the purpose of testing only. The login details are hardcoded within the parameters.
import requests
from bs4 import BeautifulSoup
link = 'https://www.apartments.com/customers/login'
login_url = 'https://auth.apartments.com/login?{}'
params = {
'dsrv.xsrf': '',
'sessionId': '',
'username': 'shahin.iqbal80#gmail.com',
'password': 'SShift1234567$'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
headers_post = {
'origin': 'https://auth.apartments.com',
'referer': '',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'x-requested-with': 'XMLHttpRequest'
}
with requests.Session() as s:
s.headers.update(headers)
resp = s.get(link)
soup = BeautifulSoup(resp.text,"lxml")
res = s.get(soup.select_one("#auth-signin-iframe")['src'])
soup = BeautifulSoup(res.text,"lxml")
post_url = login_url.format(soup.select_one("[id='signinform']")['action'].split("/login?")[1])
headers_post['referer'] = post_url
s.headers.update(headers_post)
params['dsrv.xsrf'] = soup.select_one("input[name='idsrv.xsrf']")['value']
params['sessionId'] = soup.select_one("input[id='sessionId']")['value']
resp = s.post(post_url,data=params)
print(resp.status_code)
print(resp.content)
print(resp.url)
How can I make the login successful using the requests module?

Related

Failed to get a JSON response using the requests module

I'm trying to get a JSON response from this webpage by running the script below. Every time I run the script, I get status code 403. I can see the data within the json content when I navigate to this link manually, though.
import requests
url = 'https://clutch.co/developers'
link = 'https://clutch.co/directory/facets'
headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'referer': 'https://clutch.co/developers',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
}
params = {
'sort_by': 'Sponsorship',
'path': '/developers',
'nonce': 'MvUtFcRmUautzeQV',
'page': '1',
'mask': 'false'
}
resp = requests.get(link,params=params,headers=headers)
print(resp.status_code)
print(resp.json())
Try installing the cloudscraper module. This can bypass Cloudflare for the URL you're interested in.
python -m pip install cloudscraper
Then:
import cloudscraper
params = {
'sort_by': 'Sponsorship',
'path': '/developers',
'nonce': 'MvUtFcRmUautzeQV',
'page': '1',
'mask': 'false'
}
scraper = cloudscraper.CloudScraper()
(r := scraper.get('https://clutch.co/directory/facets', params=params)).raise_for_status()

Can't get data from site using requests in Python

I'm trying to get text from this site. It is just a simple plain site with only text. When running the code below, the only thing it prints out is a newline. I should say that websites content/text is dynamic, so it changes over a few minutes. My requests module version is 2.27.1. I'm using Python 3.9 on Windows.
What could be the problem?
import requests
url='https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
}
content=requests.get(url, headers=headers)
print(content.text)
This is the example of how the website should look.
That particular server appears to be gating responses not on the User-Agent, but on the Accept-Encoding settings. You can get a normal response with:
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
headers = {
"Accept-Encoding": "gzip, deflate, br",
}
content = requests.get(url, headers=headers)
print(content.text)
Depending on how the server responds over time, you might need to install the brotli package to allow requests to decompress content compressed with it.
You just need to add user-agent like below.
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
payload={}
headers = {
'User-Agent': 'PostmanRuntime/7.29.0',
'Accept': '*/*',
'Cache-Control': 'no-cache',
'Host': 'www.spaceweatherlive.com',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
response = requests.get(url, headers=headers)
print(response.text)

Invalid URL when using Python Requests

I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:
import requests
session = requests.session()
url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'
session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}
scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)
data = r.content
print(data)
session.close()
This is returning the following only:
b'<HTML><HEAD>\n<TITLE>Invalid URL</TITLE>\n</HEAD><BODY>\n<H1>Invalid URL</H1>\nThe requested URL "[no URL]", is invalid.<p>\nReference #9.3c0f0317.1608324989.5902cff\n</BODY></HTML>\n'
I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?
Thanks
The issue is the Host session header value, don't set it.
That should be enough. But I've done some additional things as well:
add the X-* headers:
session.headers.update(**{
'X-SkyOTT-Proposition': 'NOWTV',
'X-SkyOTT-Language': 'en',
'X-SkyOTT-Platform': 'PC',
'X-SkyOTT-Territory': 'GB',
'X-SkyOTT-Device': 'COMPUTER'
})
visit the main page without XHR header set and with a broader Accept header value:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
I've also used params for the GET parameters - you don't have to do it, I think. It's just cleaner:
In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'
In [34]: response = session.get(url, params={
'slug': '/entertainment/collections/all-entertainment',
'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
}, headers={
'Accept': 'application/json, text/plain, */*',
'X-Requested-With':'XMLHttpRequest'
})
In [35]: response
Out[35]: <Response [200]>
In [36]: response.text
Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
...

How should I fix the bad request response I am getting when sending a POST request?

I am trying to log on a site using python (Requests) and keep getting 400 Bad request error.
I have tried different header formats, even copied the headers from different browsers (Chrome, Edge, Firefox) but I am always getting 400 error.
I've tried browsing around but can't find anything that would help me.
import requests
with requests.Session() as c:
url = 'https://developer.clashofclans.com/api/login'
e='xxx#xxx.xxx'
p='yyyyy'
header = {'authority': 'developer.clashofclans.com',
'method': 'POST',
'path': '/api/login',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-IN,en-US;q=0.9,en;q=0.8',
'content-length': '57',
'content-type': 'application/json',
'cookie': 'cookieconsent_status=dismiss',
'origin': 'https://developer.clashofclans.com',
'referer': 'https://developer.clashofclans.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'}
login_data = dict(email=e,password=p)
x = c.post(url,data=login_data,headers=header)
print(x)
some website expected the data as json format. in requests you can easly do this by using json params, so your code will be something like this:
python
x = c.post(url, json=login_data, headers=header)

Logging into website and scraping data

The website I am trying to log in to is https://realitysportsonline.com/RSOLanding.aspx. I can't seem to get the login to work since the process is a little different to a typical site that has a login specific page. I haven't got any errors, but the log in action doesn't work, which then causes the main to redirect to the homepage.
import requests
url = "https://realitysportsonline.com/RSOLanding.aspx"
main = "https://realitysportsonline.com/SetLineup_Contracts.aspx?leagueId=3000&viewingTeam=1"
data = {"username": "", "password": "", "vc_btn3 vc_btn3-size-md vc_btn3-shape-rounded vc_btn3-style-3d vc_btn3-color-danger" : "Log In"}
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer': 'https://realitysportsonline.com/RSOLanding.aspx',
'Host': 'realitysportsonline.com',
'Connection': 'keep-alive',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'}
s = requests.session()
s.get(url)
r = s.post(url, data, headers=header)
page = requests.get(main)
First of all, you create a session and assuming your POST request worked, you then request an authorised page without using your previously created session.
You need to make the request with the s object you created like so:
page = s.get(main)
However, there were also a few issues with your POST request. You were making a request to the home page instead of the /Login route. You were also missing the Content-Type header.
import requests
url = "https://realitysportsonline.com/Services/AccountService.svc/Login"
main = "https://realitysportsonline.com/LeagueSetup.aspx?create=true"
payload = {"username":"","password":""}
headers = {
'Content-Type': "text/json",
'Cache-Control': "no-cache"
}
s = requests.session()
response = s.post(url, json=payload, headers=headers)
page = s.get(main)
PS your main request url redirects to the homepage, even with a valid session (at least for me).

Categories