Can't get data from site using requests in Python

Can't get data from site using requests in Python - python

I'm trying to get text from this site. It is just a simple plain site with only text. When running the code below, the only thing it prints out is a newline. I should say that websites content/text is dynamic, so it changes over a few minutes. My requests module version is 2.27.1. I'm using Python 3.9 on Windows.
What could be the problem?
import requests
url='https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
}
content=requests.get(url, headers=headers)
print(content.text)
This is the example of how the website should look.

That particular server appears to be gating responses not on the User-Agent, but on the Accept-Encoding settings. You can get a normal response with:
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
headers = {
"Accept-Encoding": "gzip, deflate, br",
}
content = requests.get(url, headers=headers)
print(content.text)
Depending on how the server responds over time, you might need to install the brotli package to allow requests to decompress content compressed with it.

You just need to add user-agent like below.
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
payload={}
headers = {
'User-Agent': 'PostmanRuntime/7.29.0',
'Accept': '*/*',
'Cache-Control': 'no-cache',
'Host': 'www.spaceweatherlive.com',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
response = requests.get(url, headers=headers)
print(response.text)

Related

Invalid URL when using Python Requests

I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:
import requests
session = requests.session()
url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'
session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}
scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)
data = r.content
print(data)
session.close()
This is returning the following only:
b'<HTML><HEAD>\n<TITLE>Invalid URL</TITLE>\n</HEAD><BODY>\n<H1>Invalid URL</H1>\nThe requested URL "[no URL]", is invalid.<p>\nReference #9.3c0f0317.1608324989.5902cff\n</BODY></HTML>\n'
I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?
Thanks

The issue is the Host session header value, don't set it.
That should be enough. But I've done some additional things as well:
add the X-* headers:
session.headers.update(**{
'X-SkyOTT-Proposition': 'NOWTV',
'X-SkyOTT-Language': 'en',
'X-SkyOTT-Platform': 'PC',
'X-SkyOTT-Territory': 'GB',
'X-SkyOTT-Device': 'COMPUTER'
})
visit the main page without XHR header set and with a broader Accept header value:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
I've also used params for the GET parameters - you don't have to do it, I think. It's just cleaner:
In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'
In [34]: response = session.get(url, params={
'slug': '/entertainment/collections/all-entertainment',
'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
}, headers={
'Accept': 'application/json, text/plain, */*',
'X-Requested-With':'XMLHttpRequest'
})
In [35]: response
Out[35]: <Response [200]>
In [36]: response.text
Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
...

How to get right JSON responce from POST request from PYTHON script

Task is to get JSON responce from POST request from particular website.
Everything works fine in browser as follows. You may simulate the case yourself tryin to start enter text into Start Location field.
webaddress to check: https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html
Chrome Dev Tool Screen 1 - Request URL and Header
Chrome Dev Tool Screen 2 - POST data
JSON RESPONCE (it must be like this)
{"rows":[{"LOCATION_COUNTRYABBREV":"GE","LOCATION_BUSINESSPOSTALCODE":"","LOCATION_BUSINESSLOCATIONNAME":"BATUMI","LOCATION_BUSINESSLOCODE":"GEBUS","STANDARDLOCATION_BUSINESSLOCODE":"GEBUS","LOCATION_PORTTYPE":"S","DISPLAYNAME":""}]}
My code as follows:
import requests
url = 'https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html?_sschedules_interactive=_raction&action=getTypeAheadService'
POST_QUERY = 'batumi'
params = {
'query': POST_QUERY,
'reportname': 'FRTA0101',
'callConfiguration': "[resultLines=10,readDef1=location_businessLocationName STARTSWITH,readDef2=location_businessLocode STARTSWITH,readClause1=location_businessLocode<>'' AND location_portType='S' AND stdSubLocation_string10='STD',readClause2=location_businessLocode<>'' AND location_portType<>'S' AND stdSubLocation_string10='STD',readClause3=location_businessLocode<>'' AND location_portType='S' AND stdSubLocation_string10='SUB',readClause4=location_businessLocode<>'' AND stdSubLocation_string10='SUB',readClause5=location_businessLocode='' AND stdSubLocation_string10='SUB',sortDef1=location_businessLocationName ASC,resultAttr1=location_businessLocationName,resultAttr2=location_businessLocode,resultAttr3=location_businessPostalCode,resultAttr4=standardLocation_businessLocode,resultAttr5=location_countryAbbrev,resultAttr6=location_portType]"
}
headers = {
"Accept": "*/*",
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-EN,en;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'no-cache',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'DNT': '1',
'Host': 'www.hapag-lloyd.com',
'Origin': 'https://www.hapag-lloyd.com',
'Pragma': 'no-cache',
# 'Proxy-Connection': 'keep-alive',
'Referer': 'https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
print('Testing location: ', POST_QUERY)
var_cities = requests.post(url,data=params,headers=headers)
print(var_cities.content) #it does print some %$#%$
Python Print Content Screen
My question is "How to get right JSON responce from POST request from PYTHON script"?

I think using BeautifulSoup is a better option.
Try this
Python Convert HTML into JSON using Soup

print(var_cities.text)
This returns the html as string. Is this what you expected to get as a response? And to convert this into json, look at the answer above...

How should I fix the bad request response I am getting when sending a POST request?

I am trying to log on a site using python (Requests) and keep getting 400 Bad request error.
I have tried different header formats, even copied the headers from different browsers (Chrome, Edge, Firefox) but I am always getting 400 error.
I've tried browsing around but can't find anything that would help me.
import requests
with requests.Session() as c:
url = 'https://developer.clashofclans.com/api/login'
e='xxx#xxx.xxx'
p='yyyyy'
header = {'authority': 'developer.clashofclans.com',
'method': 'POST',
'path': '/api/login',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-IN,en-US;q=0.9,en;q=0.8',
'content-length': '57',
'content-type': 'application/json',
'cookie': 'cookieconsent_status=dismiss',
'origin': 'https://developer.clashofclans.com',
'referer': 'https://developer.clashofclans.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'}
login_data = dict(email=e,password=p)
x = c.post(url,data=login_data,headers=header)
print(x)

some website expected the data as json format. in requests you can easly do this by using json params, so your code will be something like this:
python
x = c.post(url, json=login_data, headers=header)

unable to decode Python web request

I am trying to make web request to my trading account . Python is unable to decode the web request. Web request is successful with code - 200.
Here is the code below
import requests
headers = {
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
'x-kite-version': '1.2.1',
'accept': 'application/json, text/plain, */*',
'referer': 'https://kite.zerodha.com/orders',
'authority': 'kite.zerodha.com',
'cookie': '__cfduid=db8fb54c76c53442fb672dee32ed58aeb1521962031; _ga=GA1.2.1516103745.1522000590; _gid=GA1.2.581693731.1522462921; kfsession=CfawFIZq2T6SghlCd8FZegqFjNIKCYuO; public_token=7FyfBbbxhiRRUso3425TViK2VmVszMCK; user_id=XE4670',
'x-csrftoken': '7FyfBbbxhiRRUso3425TViK2VmVszMCK',
}
response = requests.get('https://kite.zerodha.com/api/orders', headers=headers)
x=str(response.content.decode("utf-8") )
b"1X\x14\x00 \xfe\xa7\x9b\xd3\xca\xbd9-\x12\x83\xbfULS1\x1d8\x9d\x0e\xd4\xcf\xbd\xb8\xd1\xbd4\xc0\x00\x13~\x94}\xe4\x81\xa4\x90P\x1cfs\xcd\x1e\xaeG\x9b},m\xbd\t\x84L1\xde\xa8e\x8a\xf1h\x0e\x0c)\x1a\x12\xfb\x06z\xec\x18\xe4r\xa1\x1c\x11\xe8 \xbcO\xec\xe2|\xa6\x90\xa9\xdf\xf2\xe1\xfa\xf3\x1e\x04\x0e\xa2\x8d\x0e\xc4\tw\xeb\xd9\xba\n\xf1H'l\xeb>\x08\x85L\r\x0cY\xf8\x81D;\x92!o\xfd\xbd\xe3u>3\x10\xe1\x8c;\xb8\x9e\xceA\xae\x0exX\xc9\x19s\xeb\xe5r~1\x98\xed0\xb8\xdc\xb4\x17:\x14\x96xAn\xb9\xf0\xce\xf2l\\xa6G?5O\x9b\xf3\xc1\\x1f\x0f\x8fs\x1b/\x17\x1a\x0c[ySAX\x1d'\xe7\xbb\nx\xacR~\xbb\x9f\xe0\x8c?s\xc0\x8f\xe0\x97\xff\xde'\xc7#\x8f\x97\xaf\xaa%\xf2\xf9\xfaC|\xcf\t\xf3\xeb\xaa\xdcs\xcc\xf5\xa3RM\xbaOY\xf5\x9fe\xfc\x07\xff\x01"
Unable to decode this. Tried unicode- utf 8 and various codes available on the stakoverflow but its failed.

According to the response.headers (that you did not provide, but that are easily recoverable by running your code), the response is encoded using Brotli compression (Content-Encoding': 'br'). You can decompress it with brotlipy:
import brotli
brotli.decompress(response.content)
#b'{"status":"success","data":[{"placed_by":"XE4670","order_id":"180331000000385",
#"exchange_order_id":null,"parent_order_id":null,"status":"REJECTED",
#"status_message":"ADAPTER is down","order_timestamp":"2018-03-31 07:59:42",
#"exchange_update_timestamp":null,...}
Now, it's JSON, as promised ('Content-Type': 'application/json').

If the server only returns brotli compressed response, response should be decompressed then it will be ready to use.
Fortunately, since v2.26.0 update, requests library supports Brotli compression if either the brotli or brotlicffi package is installed. So, if the response encoding is br, request library will automatically handle it and decompress it.
First;
pip install brotli,
then;
import requests
r = requests.get('some_url')
r.json()

I fixed this issue by changing the request header,
headers = {
'accept-encoding': 'gzip, deflate, br',
...}
to
headers = {
'accept-encoding': 'gzip, deflate, utf-8',
...}

requests.post to hastebin.com not working

I'm trying to use the requests module from python to make a post in http://hastebin.com/
but I've been failing and doesn't know what to do anymore. is there any way I can really make a post on the site? here my current code:
import requests
payload = "s2345"
headers = {
'Host': 'hastebin.com',
'Connection': 'keep-alive',
'Content-Length': '5',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Origin': 'http://hastebin.com',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36',
'Content-Type': 'application/json; charset=UTF-8',
'Referer': 'http://hastebin.com/',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8'
}
req = requests.post('http://hastebin.com/',headers = headers, params=payload)
print (req.json())

Looking over the provided haste client code the server expects a raw post of the file, without a specific content type. The client also posts to the /documents path, not the root URL.
They are also not being picky about headers, just leave those all to requests to set; the following works for me and creates a new document on the site:
import requests
payload = "s2345"
response = requests.post('http://hastebin.com/documents', data=payload)
if response.status_code == 200:
print(response.json()['key'])
Note that I used data here, not the params option which sets the URL query paramaters.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't get data from site using requests in Python - python

Related

Invalid URL when using Python Requests

How to get right JSON responce from POST request from PYTHON script

How should I fix the bad request response I am getting when sending a POST request?

unable to decode Python web request

requests.post to hastebin.com not working

Categories

Resources