I use Selenium to react to the reception of data following a GET request from a website.
The API called by the website is not public, so if I use the URL of the request to retrieve the data, I get {"message":"Unauthenticated."}.
All I've managed to do so far is to retrieve the header of the response.
I found here that using driver.execute_cdp_cmd('Network.getResponseBody', {...}) might be a solution to my problem.
Here is a sample of my code:
import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
driver = webdriver.Chrome(
r"./chromedriver",
desired_capabilities=capabilities,
)
def processLog(log):
log = json.loads(log["message"])["message"]
if ("Network.response" in log["method"] and "params" in log.keys()):
headers = log["params"]["response"]
body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
print(json.dumps(body, indent=4, sort_keys=True))
return log["params"]
logs = driver.get_log('performance')
responses = [processLog(log) for log in logs]
Unfortunately, the driver.execute_cdp_cmd('Network.getResponseBody', {...}) returns:
unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}
Do you know what I am missing?
Do you have any idea on how to retrieve the response body?
Thank you for your help!
In order to retrieve response body, you have to listen specifically to Network.responseReceived:
def processLog(log):
log = json.loads(log["message"])["message"]
if ("Network.responseReceived" in log["method"] and "params" in log.keys()):
body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
However, I ended using a different approach relying on requests. I just retrieved the authorization token from the browser console (Network > Headers > Request Headers > Authorization) and used it to get the data I wanted:
import requests
def get_data():
url = "<your_url>"
headers = {
"Authorization": "Bearer <your_access_token>",
"Content-type": "application/json"
}
params = {
key: value,
...
}
r = requests.get(url, headers = headers, params = params)
if r.status_code == 200:
return r.json()
Probably some responses don't have a body, thus selenium throws an error that "no resource" for given identifier was found. Error message is a bit ambiguous here.
Try doing like this:
from selenium.common import exceptions
try:
body = chromedriver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
log['body'] = body
except exceptions.WebDriverException:
print('response.body is null')
This way responses without body will not crash your script.
Related
I'm trying to download a excel file using a python request.
Below is the actual payload pass by web browser when I download required excel file manually.
I'm using below python code to do the same via automation.
import requests
import json
s = requests.Session()
response = s.get(url)
if response.status_code!=200:
return 'loading error!'
collectionId = response.content['collectionId']
payload = {
"collectionId": collectionId
}
response = s.post(url, json=payload)
Here I'm getting status code 400.
Can someone please help me to set payload to match with below snapshot.
I have been trying to use Python and requests-HTML to download salesforce reports programmatically. However, I can't seem to log in with Microsoft SAML + 2fa or when/where I need to render JavaScript. I am trying to adapt some code I found here as I believe it should work, but I'm struggling to figure it out.
I am unable to use Selenium as I do not have perms to add it to my PATH, and my company has no AzureAD API I could use, or at least not one I have access to.
Here is what I have so far
import requests
from requests_html import HTMLSession
from requests_html import HTML
from bs4 import BeautifulSoup as bs
# Prepare for the first request - This is the ultimate target URL
url1 = "https://redacted.my.salesforce.com/redacted?export=1&enc=UTF-8&xf=csv"
class Microsoft:
def __init__(self, username: str, password: str):
self.sess = requests.Session()
self.username = username
self.password = password
self.base = "https://login.microsoftonline.com/"
self.tenant = "redacted"
self.url = ""
def get_html_name_value(self, html: str, name: str) -> str:
return bs(html, "lxml").find("input", {"name": name}).get("value")
# get redirected to obtain flow token with appropriate SAML
def _get_tokens(self):
session = HTMLSession()
# update this
resp = session.get(
url1,
allow_redirects=True
)
self.url = resp.url
resp.html.render(sleep=3)
html = resp.html.html
data = {
"flowToken": bs(resp.html.html, "lxml")
.find(id="i0327")
.get("value"),
"ctx": self.get_html_name_value(html, "ctx"),
"canary": self.get_html_name_value(html, "canary"),
"hpgrequestid": self.get_html_name_value(html, "hpgrequestid"),
}
return session, data
def _get_saml_tokens(self):
sess, payload = self._get_tokens()
payload["login"] = self.username
payload["loginfmt"] = self.username
payload["passwd"] = self.password
headers = {
"Host": "login.microsoftonline.com",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101"
+ " Firefox/88.0",
"Referer": self.url,
"Origin": "https://login.microsoftonline.com",
}
login = sess.post(
self.base + self.tenant + "/login",
data=payload,
headers=headers,
allow_redirects=True,
verify=True
).text
login.html.render(wait=3) # Main issue here.
html = login.html.html
data = {
"SAMLResponse": self.get_html_name_value(html, "SAMLResponse"),
"RelayState": self.get_html_name_value(html, "RelayState"),
}
return sess, data
def _login(self):
sess, payload = self._get_saml_tokens()
return sess.post(
url1,
data=payload,
)
auth = Microsoft("redacted", "redacted")
auth._login()
The most immediate hurdle I am facing is how to render the javascript portion of the final sess.post(self.base + self.tenant + "/login"). When I try to render on a .post() function I get a "We received a bad request." error saying "AADSTS900561: The endpoint only accepts POST requests. Received a GET request." How do I render the Javascript on a .post() function? Do I even have to, or is there another way?
If I don't render the post function but just save it as .html I see "Verify your identity: Sorry, we're having trouble verifying your account. Please try again.". To me, it looks like the login worked, but I'm missing SOMETHING to actually send out the 2fa and wait for the SAMLResponse.
After that, I can carry the cookies forward to do normal requests.get() functions on all of my many reports. Then try to figure out asyncio so I can send all these requests concurrently and everything downloaded in a couple of minutes, but that's for future AutomateMyJob to worry about.
I tried doing API request with Python for Pexels, but get authentication error. Documentation link Here is my code:
import requests
video_base_url = 'https://api.pexels.com/v1/search'
api_key = 'my_key'
my_obj = {'Authorization':api_key, 'query':'Stock market'}
x = requests.get(video_base_url,data = my_obj)
print(x.text)
But I get "error": "Authorization field missing". Any help is appreciated.
You are sending the authorization in the body of the request
You need to send it in the header
r = requests.get(video_base_url, headers = {'Authorization': api_key}, data = my_obj)
I'm accessing the Discourse API from python using urlfetch. The Get a single user by username endpoint requires a GET request such as /users/{username}.json
From a browser, this command returns a json response as expected, however from an API call like:
from google.appengine.api import urlfetch
result = urlfetch.fetch('{}/users/{}.json'.format(domain, username))
it returns a HTML page. I've even tried setting the content type to application/json:
headers = {'Content-Type': 'application/json'}
result = urlfetch.fetch('{}/users/{}.json'.format(domain, username), headers=headers)
What am I doing wrong?
Resolved:
Need to add api_key and api_username to GET request:
result = urlfetch.fetch('{}/users/{}.json?api_key={}&api_username={}'.format(domain, username, discourse_api_key, discourse_api_username))
I am having trouble getting a cookie, passing it to my parameters list and then posting that cookie using the requests lib.
I've trapped the post with Burpsuite and sessionId is one of the parameters see screenshot below.
http://imgur.com/OuRi4bI
Source code for the web page is in the screenshot below
http://imgur.com/TLTgCjc
My code is included below:
import requests
import cookielib
import sys
from bs4 import BeautifulSoup
print "Enter the url",
url = raw_input
print url
r = requests.get(url)
c = r.content
soup = BeautifulSoup(c)
#Finding Captcha
div1 = soup.find('div', id='Custom')
comment = next(div1.children)
captcha = comment.partition(':')[-1].strip()
print captcha
#Finding viewstate
viewstate = soup.findAll("input", {"type" : "hidden", "name" : "__VIEWSTATE"})
v = viewstate[0]['value']
print v
#Finding eventvalidation
eventval = soup.findAll("input", {"type" : "hidden", "name" : "__EVENTVALIDATION"})
e = eventval[0]['value']
print e
# Get Cookie (this is where I am confused), and yes I have read through the Requests and BS docs
s = r.cookies
print s # Prints the class call but I don't get anything I can pass as a parameter
#Setting Parameters
params = {
'__VIEWSTATE' : v,
'txtMessage' : 'test',
'txtCaptcha' : captcha,
'cmdSubmit' : 'Submit',
'__EVENTVALIDATION' : e
#Need ASP.NET_SessionId Key : Value here
}
#Posting
requests.post(url, data=params)
print r.status_code
So to be clear, I am trying to take the sessionId when I connect with the web server and use it as a parameter to post to this message board. This is for a lab on a sandboxed VM, not a live site. This is my first time writing a post in Python so if I have it wrong I've done the best I can reading through the Lib documentation and other websites.
Thanks!
Pass "s" as a parameter to your post.
s = r.cookies
print s # Prints the class call but I don't get anything I can pass as a parameter
You need to pass the cookies as a parameter named "cookies". Inside the source code in https://github.com/kennethreitz/requests/blob/master/requests/sessions.py, it says that cookies can either be a CookieJar or a dictionary containing the cookies you want to pass.
In your case, it is easier to just copy your cookies over to the next post,no need to convert them to dictionary.
Setting Parameters
params = {
'__VIEWSTATE' : v,
'txtMessage' : 'test',
'txtCaptcha' : captcha,
'cmdSubmit' : 'Submit',
'__EVENTVALIDATION' : e
#Need ASP.NET_SessionId Key : Value here
}
#Posting
requests.post(url, data=params,cookies=s)
However, I would strongly suggest you use a requests.Session() object.
session = requests.Session()
session.get(url)
session.get(url2)
#It will keep track of your cookies automatically for you, for every domain you use your session on . Very handy in deed, I rarely use requests.get unless I don't care at all about cookies.