I'm trying to get a JSON response from this webpage by running the script below. Every time I run the script, I get status code 403. I can see the data within the json content when I navigate to this link manually, though.
import requests
url = 'https://clutch.co/developers'
link = 'https://clutch.co/directory/facets'
headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'referer': 'https://clutch.co/developers',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
}
params = {
'sort_by': 'Sponsorship',
'path': '/developers',
'nonce': 'MvUtFcRmUautzeQV',
'page': '1',
'mask': 'false'
}
resp = requests.get(link,params=params,headers=headers)
print(resp.status_code)
print(resp.json())
Try installing the cloudscraper module. This can bypass Cloudflare for the URL you're interested in.
python -m pip install cloudscraper
Then:
import cloudscraper
params = {
'sort_by': 'Sponsorship',
'path': '/developers',
'nonce': 'MvUtFcRmUautzeQV',
'page': '1',
'mask': 'false'
}
scraper = cloudscraper.CloudScraper()
(r := scraper.get('https://clutch.co/directory/facets', params=params)).raise_for_status()
Related
I'm trying to log in to a website using requests module. It seems I have incorporated the manual steps into the script based on what I see in dev tools while logging in to that site manually. However, when I run the script and check the content it received as a response, I see this line: There was an unexpected error.
I've created a free account there for the purpose of testing only. The login details are hardcoded within the parameters.
import requests
from bs4 import BeautifulSoup
link = 'https://www.apartments.com/customers/login'
login_url = 'https://auth.apartments.com/login?{}'
params = {
'dsrv.xsrf': '',
'sessionId': '',
'username': 'shahin.iqbal80#gmail.com',
'password': 'SShift1234567$'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
headers_post = {
'origin': 'https://auth.apartments.com',
'referer': '',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'x-requested-with': 'XMLHttpRequest'
}
with requests.Session() as s:
s.headers.update(headers)
resp = s.get(link)
soup = BeautifulSoup(resp.text,"lxml")
res = s.get(soup.select_one("#auth-signin-iframe")['src'])
soup = BeautifulSoup(res.text,"lxml")
post_url = login_url.format(soup.select_one("[id='signinform']")['action'].split("/login?")[1])
headers_post['referer'] = post_url
s.headers.update(headers_post)
params['dsrv.xsrf'] = soup.select_one("input[name='idsrv.xsrf']")['value']
params['sessionId'] = soup.select_one("input[id='sessionId']")['value']
resp = s.post(post_url,data=params)
print(resp.status_code)
print(resp.content)
print(resp.url)
How can I make the login successful using the requests module?
I'm trying to get text from this site. It is just a simple plain site with only text. When running the code below, the only thing it prints out is a newline. I should say that websites content/text is dynamic, so it changes over a few minutes. My requests module version is 2.27.1. I'm using Python 3.9 on Windows.
What could be the problem?
import requests
url='https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
}
content=requests.get(url, headers=headers)
print(content.text)
This is the example of how the website should look.
That particular server appears to be gating responses not on the User-Agent, but on the Accept-Encoding settings. You can get a normal response with:
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
headers = {
"Accept-Encoding": "gzip, deflate, br",
}
content = requests.get(url, headers=headers)
print(content.text)
Depending on how the server responds over time, you might need to install the brotli package to allow requests to decompress content compressed with it.
You just need to add user-agent like below.
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
payload={}
headers = {
'User-Agent': 'PostmanRuntime/7.29.0',
'Accept': '*/*',
'Cache-Control': 'no-cache',
'Host': 'www.spaceweatherlive.com',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
response = requests.get(url, headers=headers)
print(response.text)
Scraping an AJAX web page using python and requests
I used the script in above link to get a table on Barchart webite and it somehow stopped working recently with the error message {'error': {'message': 'The payload is invalid.', 'code': 400}}. I guess some of the filed names have been changed but I am pretty new to web scanning and I couldn't figure out how to fix it. Any suggestions?
import requests
geturl=r'https://www.barchart.com/futures/quotes/CLJ19/all-futures'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'
getheaders={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
getpay={
'page': 'all'
}
s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)
headers={
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'referer': 'https://www.barchart.com/futures/quotes/CLJ19/all-futures?page=all',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
}
payload={
'fields': 'symbol,contractSymbol,lastPrice,priceChange,openPrice,highPrice,lowPrice,previousPrice,volume,openInterest,tradeTime,symbolCode,symbolType,hasOptions',
'list': 'futures.contractInRoot',
'root': 'CL',
'meta': 'field.shortName,field.type,field.description',
'hasOptions': 'true',
'raw': '1'
}
r=s.get(apiurl,params=payload,headers=headers)
j=r.json()
print(j)
OUT: {'error': {'message': 'The payload is invalid.', 'code': 400}}
This happened with me too. This is because the website gets the table from an internal API and the cookies should be decoded to avoid this error.
Try this solution:
1- import the unquote function at the beginning of your code
from urllib.parse import unquote
2- Change this line:
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
to this:
'x-xsrf-token': unquote(unquote(s.cookies.get_dict()['XSRF-TOKEN']))
Task is to get JSON responce from POST request from particular website.
Everything works fine in browser as follows. You may simulate the case yourself tryin to start enter text into Start Location field.
webaddress to check: https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html
Chrome Dev Tool Screen 1 - Request URL and Header
Chrome Dev Tool Screen 2 - POST data
JSON RESPONCE (it must be like this)
{"rows":[{"LOCATION_COUNTRYABBREV":"GE","LOCATION_BUSINESSPOSTALCODE":"","LOCATION_BUSINESSLOCATIONNAME":"BATUMI","LOCATION_BUSINESSLOCODE":"GEBUS","STANDARDLOCATION_BUSINESSLOCODE":"GEBUS","LOCATION_PORTTYPE":"S","DISPLAYNAME":""}]}
My code as follows:
import requests
url = 'https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html?_sschedules_interactive=_raction&action=getTypeAheadService'
POST_QUERY = 'batumi'
params = {
'query': POST_QUERY,
'reportname': 'FRTA0101',
'callConfiguration': "[resultLines=10,readDef1=location_businessLocationName STARTSWITH,readDef2=location_businessLocode STARTSWITH,readClause1=location_businessLocode<>'' AND location_portType='S' AND stdSubLocation_string10='STD',readClause2=location_businessLocode<>'' AND location_portType<>'S' AND stdSubLocation_string10='STD',readClause3=location_businessLocode<>'' AND location_portType='S' AND stdSubLocation_string10='SUB',readClause4=location_businessLocode<>'' AND stdSubLocation_string10='SUB',readClause5=location_businessLocode='' AND stdSubLocation_string10='SUB',sortDef1=location_businessLocationName ASC,resultAttr1=location_businessLocationName,resultAttr2=location_businessLocode,resultAttr3=location_businessPostalCode,resultAttr4=standardLocation_businessLocode,resultAttr5=location_countryAbbrev,resultAttr6=location_portType]"
}
headers = {
"Accept": "*/*",
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-EN,en;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'no-cache',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'DNT': '1',
'Host': 'www.hapag-lloyd.com',
'Origin': 'https://www.hapag-lloyd.com',
'Pragma': 'no-cache',
# 'Proxy-Connection': 'keep-alive',
'Referer': 'https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
print('Testing location: ', POST_QUERY)
var_cities = requests.post(url,data=params,headers=headers)
print(var_cities.content) #it does print some %$#%$
Python Print Content Screen
My question is "How to get right JSON responce from POST request from PYTHON script"?
I think using BeautifulSoup is a better option.
Try this
Python Convert HTML into JSON using Soup
print(var_cities.text)
This returns the html as string. Is this what you expected to get as a response? And to convert this into json, look at the answer above...
I am trying to log on a site using python (Requests) and keep getting 400 Bad request error.
I have tried different header formats, even copied the headers from different browsers (Chrome, Edge, Firefox) but I am always getting 400 error.
I've tried browsing around but can't find anything that would help me.
import requests
with requests.Session() as c:
url = 'https://developer.clashofclans.com/api/login'
e='xxx#xxx.xxx'
p='yyyyy'
header = {'authority': 'developer.clashofclans.com',
'method': 'POST',
'path': '/api/login',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-IN,en-US;q=0.9,en;q=0.8',
'content-length': '57',
'content-type': 'application/json',
'cookie': 'cookieconsent_status=dismiss',
'origin': 'https://developer.clashofclans.com',
'referer': 'https://developer.clashofclans.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'}
login_data = dict(email=e,password=p)
x = c.post(url,data=login_data,headers=header)
print(x)
some website expected the data as json format. in requests you can easly do this by using json params, so your code will be something like this:
python
x = c.post(url, json=login_data, headers=header)