Task is to get JSON responce from POST request from particular website.
Everything works fine in browser as follows. You may simulate the case yourself tryin to start enter text into Start Location field.
webaddress to check: https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html
Chrome Dev Tool Screen 1 - Request URL and Header
Chrome Dev Tool Screen 2 - POST data
JSON RESPONCE (it must be like this)
{"rows":[{"LOCATION_COUNTRYABBREV":"GE","LOCATION_BUSINESSPOSTALCODE":"","LOCATION_BUSINESSLOCATIONNAME":"BATUMI","LOCATION_BUSINESSLOCODE":"GEBUS","STANDARDLOCATION_BUSINESSLOCODE":"GEBUS","LOCATION_PORTTYPE":"S","DISPLAYNAME":""}]}
My code as follows:
import requests
url = 'https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html?_sschedules_interactive=_raction&action=getTypeAheadService'
POST_QUERY = 'batumi'
params = {
'query': POST_QUERY,
'reportname': 'FRTA0101',
'callConfiguration': "[resultLines=10,readDef1=location_businessLocationName STARTSWITH,readDef2=location_businessLocode STARTSWITH,readClause1=location_businessLocode<>'' AND location_portType='S' AND stdSubLocation_string10='STD',readClause2=location_businessLocode<>'' AND location_portType<>'S' AND stdSubLocation_string10='STD',readClause3=location_businessLocode<>'' AND location_portType='S' AND stdSubLocation_string10='SUB',readClause4=location_businessLocode<>'' AND stdSubLocation_string10='SUB',readClause5=location_businessLocode='' AND stdSubLocation_string10='SUB',sortDef1=location_businessLocationName ASC,resultAttr1=location_businessLocationName,resultAttr2=location_businessLocode,resultAttr3=location_businessPostalCode,resultAttr4=standardLocation_businessLocode,resultAttr5=location_countryAbbrev,resultAttr6=location_portType]"
}
headers = {
"Accept": "*/*",
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-EN,en;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'no-cache',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'DNT': '1',
'Host': 'www.hapag-lloyd.com',
'Origin': 'https://www.hapag-lloyd.com',
'Pragma': 'no-cache',
# 'Proxy-Connection': 'keep-alive',
'Referer': 'https://www.hapag-lloyd.com/en/online-business/schedules/interactive-schedule.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
print('Testing location: ', POST_QUERY)
var_cities = requests.post(url,data=params,headers=headers)
print(var_cities.content) #it does print some %$#%$
Python Print Content Screen
My question is "How to get right JSON responce from POST request from PYTHON script"?
I think using BeautifulSoup is a better option.
Try this
Python Convert HTML into JSON using Soup
print(var_cities.text)
This returns the html as string. Is this what you expected to get as a response? And to convert this into json, look at the answer above...
Related
I am trying to log in to the site using a post request, as I need to log in to it. I pass all the information from the Chrome request headers in the header. I am transferring everything from the form data to the Data, but as a result I get error 403. Maybe someone can tell you? Unfortunately, I can't provide a website
(Where test - I have a real site specified)
RequestHeaders scrin
FormData
url = 'test.ru'
header = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
'Accept-Encoding': 'gzip, deflate'
'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7'
'Cache-Control': 'max-age=0'
'Connection': 'keep-alive'
'Content-Length': '82'
'Content-Type': 'application/x-www-form-urlencoded'
'Cookie': 'cookie'
'Host': 'test'
'Origin': 'test.ru'
'Referer': 'test.ru/sign'
'Upgrade-Insecure-Requests': '1'
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'
}
data = {
'username':'admin'
'password':'q1'
'remember-me':'1'
'_csrf':'6a973144-4dec-42-ab-9fda-f62f6054fe68'
}
s = requests.Session()
s.post(url, data=data, headers=header)
*The screenshots are fresh, and I wrote the code yesterday, so some data may vary, but when testing the code, I took the current ones from the browser
I'm trying to get text from this site. It is just a simple plain site with only text. When running the code below, the only thing it prints out is a newline. I should say that websites content/text is dynamic, so it changes over a few minutes. My requests module version is 2.27.1. I'm using Python 3.9 on Windows.
What could be the problem?
import requests
url='https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
}
content=requests.get(url, headers=headers)
print(content.text)
This is the example of how the website should look.
That particular server appears to be gating responses not on the User-Agent, but on the Accept-Encoding settings. You can get a normal response with:
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
headers = {
"Accept-Encoding": "gzip, deflate, br",
}
content = requests.get(url, headers=headers)
print(content.text)
Depending on how the server responds over time, you might need to install the brotli package to allow requests to decompress content compressed with it.
You just need to add user-agent like below.
import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
payload={}
headers = {
'User-Agent': 'PostmanRuntime/7.29.0',
'Accept': '*/*',
'Cache-Control': 'no-cache',
'Host': 'www.spaceweatherlive.com',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
response = requests.get(url, headers=headers)
print(response.text)
I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:
import requests
session = requests.session()
url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'
session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}
scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)
data = r.content
print(data)
session.close()
This is returning the following only:
b'<HTML><HEAD>\n<TITLE>Invalid URL</TITLE>\n</HEAD><BODY>\n<H1>Invalid URL</H1>\nThe requested URL "[no URL]", is invalid.<p>\nReference #9.3c0f0317.1608324989.5902cff\n</BODY></HTML>\n'
I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?
Thanks
The issue is the Host session header value, don't set it.
That should be enough. But I've done some additional things as well:
add the X-* headers:
session.headers.update(**{
'X-SkyOTT-Proposition': 'NOWTV',
'X-SkyOTT-Language': 'en',
'X-SkyOTT-Platform': 'PC',
'X-SkyOTT-Territory': 'GB',
'X-SkyOTT-Device': 'COMPUTER'
})
visit the main page without XHR header set and with a broader Accept header value:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
I've also used params for the GET parameters - you don't have to do it, I think. It's just cleaner:
In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'
In [34]: response = session.get(url, params={
'slug': '/entertainment/collections/all-entertainment',
'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
}, headers={
'Accept': 'application/json, text/plain, */*',
'X-Requested-With':'XMLHttpRequest'
})
In [35]: response
Out[35]: <Response [200]>
In [36]: response.text
Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
...
I'm trying to simulate a request
that has various headers and bracketed form data.
Form Data:
{"username": "MY_USERNAME", "pass": "MY_PASS", "AUTO": "true"}
That is the form data shown in the console of Chrome
So I tried putting it together with Python's requests library:
import requests
reqUrl = 'http://website.com/login'
postHeaders = {
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'Content-Length': '68',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'website.com',
'Origin': 'http://www.website.com',
'Referer': 'http://www.website.com/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36'
}
payload = {"username": "MY_USERNAME",
"pass": "MY_PASS",
"AUTO": "true"
}
session = requests.Session()
response = session.post(reqUrl, data=payload, headers=postHeaders)
I'm receiving a response but it shows:
{"status":"failure","error":"Invalid request data"}
Am I going about implementing the form data wrong? I was also thinking it could have to do with modifying the Content-Length?
Yes, you are setting a content length, overriding anything requests might set. You are setting too many headers, leave most of those to the library instead:
postHeaders = {
'Accept-Language': 'en-US,en;q=0.8',
'Origin': 'http://www.website.com',
'Referer': 'http://www.website.com/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36'
}
is plenty. All the others will be generated for you.
However, from your description of the form data, it looks like you are posting JSON instead. In that case, use the json keyword argument instead of data, which will encode your payload to JSON and set the Content-Type header to application/json:
response = session.post(reqUrl, json=payload, headers=postHeaders)
I'm trying to use the requests module from python to make a post in http://hastebin.com/
but I've been failing and doesn't know what to do anymore. is there any way I can really make a post on the site? here my current code:
import requests
payload = "s2345"
headers = {
'Host': 'hastebin.com',
'Connection': 'keep-alive',
'Content-Length': '5',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Origin': 'http://hastebin.com',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36',
'Content-Type': 'application/json; charset=UTF-8',
'Referer': 'http://hastebin.com/',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8'
}
req = requests.post('http://hastebin.com/',headers = headers, params=payload)
print (req.json())
Looking over the provided haste client code the server expects a raw post of the file, without a specific content type. The client also posts to the /documents path, not the root URL.
They are also not being picky about headers, just leave those all to requests to set; the following works for me and creates a new document on the site:
import requests
payload = "s2345"
response = requests.post('http://hastebin.com/documents', data=payload)
if response.status_code == 200:
print(response.json()['key'])
Note that I used data here, not the params option which sets the URL query paramaters.