How to get a specific value from a html header

How to get a specific value from a html header - python

Im using selenium to get request headers from a web page,
the problem is that it prints out all request headers sent and i want to get only one value from one of them. I dont know how to do it and i havent found anything on the internet, I used this line of code to get the request headers ```
from seleniumwire import webdriver
for req in driver.requests:
print(req.headers)
and this is ex. what the console prints out
/Users/andrew/Downloads/andrewtate/respon.py:8: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
element = driver.find_element_by_id("liveSwitch")
content-length: 537
x-ms-apiversion: 2
uaid: 4276e35e10994d41a4f8b9d772597415
sec-ch-ua-mobile: ?0
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36
canary: ZsWLDeZFgz9FMStm3ag92konX3u1nFMb5JrOsAmAVOXif1MhviASbSrIbTSEBeVN3swoIqb9Nxrhv29kmwRlAreLZxaTbyMXlhcBjeXQL/xXtNVRAHzRRn841XYbOsaAnTtD+m/+0eseb0AOal3OUUrXRuCNiMQTfWV94E4rG3wbhVc+wOFKr6CJdRbdnHCJJ+KKCCyD6Zi5iiyYy/WDeOpiRm4bGTJfxovswmLBI+vSh8+8gmclbitKCSQ9WNCi:2:3c
content-type: application/json
hpgid: 200639
accept: application/json
tcxt: IC/zaA2dVnLbOoFiW8OysFIi9WGFKCMw3gul+hWJLmefmKsjmQrWIqWZCo8QxPdDls8y71f/SzThuntkwVS/j97OYNFlvyZ1i9dX/9gOt+vc5YF7DPd9K2UETiLyWfr8q4UfLF7XZejtifraNmhTXKKnWHlY9wzl3qs6lMPuHxrwkRCHSqA/vhyzar0B0YV2uaC3EJJLBn7X2UYtNegmlvZsMyBBkOwEaRad0Yv+J2X2hsdKUxLu+c6pnKVgRYQetqDlFy9ZWjSFekw898Q7dWwziwXGv+CnlMvFOMq4kRz+O9KtpNpLP7ASxmBYiJHsI9TPtyLOae4aYyhFkg5trpzbobT2dS+r/uiz4hmySwHA44gRVz7JCVM9+xnxA+Q2UuizMPniv+4pz4KdNpuxO3ib2ptN0IsBJ0syUMbBriC3/UcdAs9YcueQlojUhnFOb+WEBRzEfbY2Dn3l4RhuaUXdCsNNaNGY78vtQ0X6XlduYnrwtVAf9sGDraiT3aQ7+g/2BJohKnfnH69WjkaWVf/OUQ0fVpZJDzglZIaf7sJOCOEPNm3mqgBJ/S+K0awK:2:3
uiflvr: 1001
scid: 100118
sec-ch-ua: "Chromium";v="104", " Not A;Brand";v="99", "Google Chrome";v="104"
x-ms-apitransport: xhr
sec-ch-ua-platform: "macOS"
origin: https://signup.live.com
sec-fetch-site: same-origin
sec-fetch-mode: cors
sec-fetch-dest: empty
referer: https://signup.live.com/signup?lic=1&uaid=4276e35e10994d41a4f8b9d772597415
accept-encoding: gzip, deflate, br
accept-language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7
cookie: amsc=b55INNzzHR5PZiPunWJJsx3fMAbVhvWRQlJkJfRh2zNI+jGPtSvnOYjXiJqnEctddyQyWNV64rFgnX9Qf57EA+YiA/VCFcqp6ogo+iXCvLBLpDne1H1QImvS5lgln9YT1eI5JbSQYm789c/XXcVi3mCEE9vKzuLq4CovhNTdvcBk/hXxtcDePtoNIZOjrfUSfP4/Gp34hMJFNK79iAo2rhl/WK9jO894h00DrAKnMEiA4Cd57LVili0DAOXInzcFSQ7TjRXBpNr2KoO2RB1ARd9sjjaQfycR/n6r6P5Fj+o=:2:3c
sec-ch-ua: "Chromium";v="104", " Not A;Brand";v="99", "Google Chrome";v="104"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
sec-fetch-site: same-site
sec-fetch-mode: navigate
sec-fetch-dest: iframe
referer: https://signup.live.com/
accept-encoding: gzip, deflate, br
accept-language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7
cookie: amsc=b55INNzzHR5PZiPunWJJsx3fMAbVhvWRQlJkJfRh2zNI+jGPtSvnOYjXiJqnEctddyQyWNV64rFgnX9Qf57EA+YiA/VCFcqp6ogo+iXCvLBLpDne1H1QImvS5lgln9YT1eI5JbSQYm789c/XXcVi3mCEE9vKzuLq4CovhNTdvcBk/hXxtcDePtoNIZOjrfUSfP4/Gp34hMJFNK79iAo2rhl/WK9jO894h00DrAKnMEiA4Cd57LVili0DAOXInzcFSQ7TjRXBpNr2KoO2RB1ARd9sjjaQfycR/n6r6P5Fj+o=:2:3c
I only want to get canary: value and put it to request headers so when i do print(requestToWebsite.request.headers) i can see the value
Any help would be appreciated,
Thank you.

Try this:
import re
driverReq=str(driver.requests);
ans= re.search('canary:(.*)content-type:', driverReq)
print(ans.group(1))

Related

How to get response with this request in python?

Hi I have a request as so
GET https://dropezy.goadda.in/master/v1/id/products/search?q=buah&storeId=619e02f0f580166d48277fc0&sortBy=relevance HTTP/2.0
accept: application/json
enro-api-key: rhcvwcdd-7aza-zvv7qskyu-4b8jm8kwppkf
cache-control: max-age=31536000
store-id: 619e02f0f580166d48277fc0
user-agent: Mozilla/5.0 (Linux; Android 11; sdk_gphone_x86 Build/RSR1.201013.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/83.0.4103.106 Mobile Safari/537.36
content-type: application/json; charset=utf-8
origin: https://www.dropezy.com
x-requested-with: com.enroco.dropezy
sec-fetch-site: cross-site
sec-fetch-mode: cors
sec-fetch-dest: empty
referer: https://www.dropezy.com/id/search/
accept-encoding: gzip, deflate
accept-language: en-US,en;q=0.9
content-length: 0
When I tried using request in python it returns message not found
This is my current code
url = 'https://dropezy.goadda.in/master/v1/id/products/search?q=buah&storeId=619e02f0f580166d48277fc0&sortBy=relevance HTTP/2.0'
accept='application/json'
jsonData = requests.post(url, accept).json()
Can somebody help? How to insert the other components such as user-agent, enro-api-key into requests?

The two requests aren't the same. The first is a GET request, but the second is a POST.
Use requests.get to make a GET request.
If you need to supply headers in a request, yes, you can do that: https://docs.python-requests.org/en/latest/user/quickstart/#custom-headers.
Generally, the docs for Requests are quite good: https://docs.python-requests.org/en/latest/

How to mimic export CSV functionality via Python code (where

When I use the export CSV functionality in testrail, I see that it does a POST request to the following API : /index.php?/plans/export_csv/:plan_id.
_token: <APIToken>
section_ids:
section_include:
columns: tests:id,cases:title,tests:assignedto_id,tests:original_case_id,tests:elapsed,cases:priority_id,tests:run_name,tests:tested_by,tests:tested_on
layout: results
separator_hint: 1
format: excel
Along with the following request headers.
authority: <Authority>
:method: POST
:path: /index.php?/plans/export_csv/:plan_id
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: max-age=0
content-length: 320
content-type: application/x-www-form-urlencoded
cookie: notificationbar=12345; tr_session=<session_id>; tr_rememberme=<id>
origin: <Origin>
referer: <Referer>
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
Is there a way for me to mimic this via Python?

You should take a look at the requests package.
Requests is a library that helps you make HTTP1.1 requests.

Return 302 in Web Crawler

After I simulate to log in, when I try to post the original website, it returns 302. When I open the original website in Chrome, it returns 415.
I tried several ways:
session.post(url,headers = headers,data = data)
requests.post(url,headers = headers,data = data)
urllib.request.urlopen.read(url).decode()
import requets
import json
header = {'Host': 'sty.js118114.com:8080',
'Connection': 'keep-alive',
'Content-Length': '8188',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
'Content-Type': 'text/plain;charset=UTF-8;application/xml',
'Origin': 'http://sty.js118114.com:8080',
'Referer':
'http://sty.js118114.com:8080/Report/report/movecar_list.html',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': cookie_name + '=' + cookie_value
}
data = {"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
res = requests.post(target,data = json.dumps(data),headers = header)
print(res.content.decode())
I expect the content must be the json version or html version so that I can use re model or xpath to get the infomation I want.(without any redirects
Lastly, I provide the necessary infomation about the problem:
Chrome Network
General
Request URL: http://sty.js118114.com:8080/Report/movecar/list/1/10
Request Method: POST
Status Code: 200 OK
Remote Address: 127.0.0.1:8888
Referrer Policy: no-referrer-when-downgrade
Response Headers
Content-Length: 8150
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:47:51 GMT
Server: Apache-Coyote/1.1
Request Headers
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Content-Length: 66
Content-Type: text/plain;charset=UTF-8;
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
Host: sty.js118114.com:8080
Origin: http://sty.js118114.com:8080
Proxy-Connection: keep-alive
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
X-Requested-With: XMLHttpRequest
Request Payload
{calling_no: "", begin_time: "", end_time: "", called_car_no: ""}
begin_time: ""
called_car_no: ""
calling_no: ""
end_time: ""
Fiddler Inspectors Raw
POST http://sty.js118114.com:8080/Report/movecar/list/1/10 HTTP/1.1
Host: sty.js118114.com:8080
Connection: keep-alive
Content-Length: 66
Accept: */*
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
Content-Type: text/plain;charset=UTF-8;
Origin: http://sty.js118114.com:8080
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
{"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
Response Raw
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:27:59 GMT
Content-Length: 8150

Can't login to a specific ASP.NET website using python requests

So I've been trying for the last 6 hours to make this work, but I couldn't and endless searches didn't help, So I guess I'm either doing something very fundamental wrong, or it's just a trivial bug which happens to match my logic so I need extra eyes to help me fix it.
The website url is this.
I wrote a piece of messy python code to just login and read the next page, but All I get is a nasty 500 error saying something on the server went wrong processing my request.
Here is the request made by a browser which works just fine, no problem.
HTTP Response code to this request is 302 (Redirect)
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Content-Length: 303
Cache-Control: max-age=0
Origin: https://www.wes.org
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.wes.org/appstatus/index.aspx
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cookie: ASP.NET_SessionId=bu2gemmlh3hvp4f5lqqngrbp; _ga=GA1.2.1842963052.1473348318; _gat=1
__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&__VIEWSTATEGENERATOR=189D346C&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&txtPWD=PASSWORDREMOVED&Submit=Log+In&Hidden1=
and this one is the request made by my script.
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Accept-Encoding: gzip, deflate, br
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Origin: https://www.wes.org
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cache-Control: max-age=0
Referer: https://www.wes.org/appstatus/indexca.aspx
Cookie: ASP.NET_SessionId=nxotmb55jjwf5x4511rwiy45
Content-Length: 303
txtPWD=PASSWORDREMOVED&Submit=Log+In&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&Hidden1=&__VIEWSTATEGENERATOR=189D346C
And this is the script making the request, I'm sorry if it's so messy, just need something quick.
import requests
import bs4
import urllib.parse
def main():
session = requests.Session()
headers = {"Origin": "https://www.wes.org",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Connection": "close",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
"Referer": "https://www.wes.org/appstatus/indexca.aspx", "Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.8,fa;q=0.6", "Content-Type": "application/x-www-form-urlencoded"}
r = session.get('https://www.wes.org/appstatus/index.aspx',headers=headers)
cookies = r.cookies
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState=urllib.parse.quote(str(soup.select('#__VIEWSTATE')[0]).split('value="')[1].split('"/>')[0])
viewStateGenerator=urllib.parse.quote(str(soup.select('#__VIEWSTATEGENERATOR')[0]).split('value="')[1].split('"/>')[0])
eventValidation=urllib.parse.quote(str(soup.select('#__EVENTVALIDATION')[0]).split('value="')[1].split('"/>')[0])
paramsPost = {}
paramsPost.update({'__VIEWSTATE':viewState})
paramsPost.update({'__VIEWSTATEGENERATOR':viewStateGenerator})
paramsPost.update({'__EVENTVALIDATION':eventValidation})
paramsPost.update({"txtUID": "My#Email.Removed"})
paramsPost.update({"txtPWD": "My_So_Called_Password"})
paramsPost.update({"Submit": "Log In"})
paramsPost.update({"Hidden1": ""})
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers,
cookies=cookies)
print("Status code:", response.status_code) #Outputs 500.
#print("Response body:", response.content)
if __name__ == '__main__':
main()
Any help would be so much appreciated.

You are doing way too much work and in doing so not passing valid data,you extract value attribute directly i.e .select_one('#__VIEWSTATEGENERATOR')["value"] and the same for all the rest, the cookies will be set in the Session object after your initial get so the logic boils down to:
with requests.Session() as session:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"}
r = session.get('https://www.wes.org/appstatus/index.aspx', headers=headers)
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState = soup.select_one('#__VIEWSTATE')["value"]
viewStateGenerator = soup.select_one('#__VIEWSTATEGENERATOR')["value"]
eventValidation = soup.select_one('#__EVENTVALIDATION')["value"]
paramsPost = {'__VIEWSTATE': viewState,'__VIEWSTATEGENERATOR': viewStateGenerator,
'__EVENTVALIDATION': eventValidation,"txtUID": "My#Email.Removed",
"txtPWD": "My_So_Called_Password",
"Submit": "Log In","Hidden1": ""}
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers)
print("Status code:", response.status_code)
Python by convention uses CamelCase for class names and lowercase with underscores to separate multiple words, you might want to consider applying that to your code.

How to get request headers rather than response headers using Python Requests

How can I grab the request headers for an XHR requests using Python Requests module? Using the following code seems to return the response headers:
import requests
r = requests.get('http://www.whoscored.com/tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false')
headers = r.headers
print headers
This returns an object that looks like this:
{'content-length': '624', 'content-encoding': 'gzip', 'expires': '-1', 'vary': 'Accept-Encoding', 'server': 'Microsoft-IIS/8.0', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'date': 'Tue, 15 Dec 2015 14:41:34 GMT', 'x-powered-by': 'ASP.NET', 'content-type': 'text/html; charset=utf-8'}
However, when I look in Chrome developer tools the request header looks like this:
Host: www.whoscored.com
Connection: keep-alive
Accept: text/plain, */*; q=0.01
Model-Last-Mode: W50hFYr7jwZWt40WUb9udPVFxmB6g9yct204X0/gmf4=
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Referer: http://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: __gads=ID=d09f8c0cdc1a4258:T=1449875272:S=ALNI_MbTPDtXiIlHK49F4FOqdDap__pfCA; nlsnocrvu=1; OX_plg=swf|shk|pm; _ga=GA1.3.578623339.1449875271; _gat=1; _ga=GA1.2.578623339.1449875271
Can anyone assist?
Thanks

You need to check for request headers like this
r.request.headers
That would give you something like
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.7.0 CPython/2.7.10 Darwin/15.0.0'}
For obvious reasons it won't be the same as you see in the Chrome developer tools, because the browser adds its own headers which the requests module doesn't.
GET /tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false HTTP/1.1
Host: www.whoscored.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cookie: _ga=GA1.2.788154924.1450195026; _gat_as25n45=1
To get these headers you need to run some js code to pull the headers.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get a specific value from a html header - python

Try this: import re driverReq=str(driver.requests); ans= re.search('canary:(.*)content-type:', driverReq) print(ans.group(1))

Related

How to get response with this request in python?

How to mimic export CSV functionality via Python code (where

Return 302 in Web Crawler

Can't login to a specific ASP.NET website using python requests

How to get request headers rather than response headers using Python Requests

Categories

Resources