Manually solve captcha in a request session

Manually solve captcha in a request session - python

I have a software in python that use post request to reach an API from an external website.
Sometimes, instead of getting the result of my request, I got an URL with a captcha to solve.
Here the code of the requests
headers= {}
headers['Accept']= 'application/json,application/hal+json'
headers['Content-Type']= 'application/json; charset=UTF-8'
headers['Connection']= 'Keep-Alive'
url = 'https://api.website.com/v1/price'
data = [{'category':int(category_id)}]
r = requests.session()
r = requests.post(url, headers=headers, data=json.dumps(data))
So far it's OK, then I get all the date in the following variable and print then in the software using Tkinter:
json_result = json.loads(r.text)
Unfortunately, sometimes the API returns an URL with a link to solve a captcha
{"url":"https://geo.captcha-delivery.com/captcha/?initialCid=AHrlqAAAMAwIhGhOSjsfUAsLEFTg==&cid=Y.M~8XWYwAHts6n_H32O8fm~WTeMw.1cDlRGOH16q4PhWtuo8Xm~KgWeW6d1jQptRljywWkJHFMu9IgEZYRheo3OPww6XjgqXQcs1X1m&referer=https%3A%2F%2Fapi.website.com&hash=05B30BD905598BD2EE8F5A199D973&t=fe&s=575"}
If I open this url in a browser, I can solve the captcha but then I broke the chain so the software is unable to continues.
How could I open in python the url, solve the captcha and when it is solved, continue the request to get the element ?
EDIT : the solution must store the cookie in the request session
Thank you

Related

Web API Requests Authentication

I am trying a get request using the following API info:
https://github.com/iblockin/pool_web_api_doc/blob/master/api_en.md
Can someone help me with what the headers should look like, or point out where I am messing up? I keep getting an unauthenticated response.
url = "https://api-prod.poolin.com/api/public/v1/subaccount"
r = requests.get(url, headers = {'authorization': 'accessToken tokenhere'})
Sorry in advance if this has been inevitably answered in 1000 other locations.
Severe noob here, trying to learn

use this
url = "https://api-prod.poolin.com/api/public/v1/subaccount"
r = requests.get(url, headers = {'authorization': 'Bearer tokenhere'})
As mentioned here accessToken should be passed in the headeras
{'authorization':'Bearer TOKEN'} # TOKEN is replaced with the token value to be transmitted
So correct your code accordingly.

How to send back XSRF-token with session.get request in Python?

I'm trying to run a search on a website for the word 'Adrian'. I already understand that first I have to send a request to the website, in the response I will have an XSRF-token that I need to use for the second request. As I understand, if I'm using session.get(), it keeps the cookies automatically for the second request, too.
I run the first request, get a 200 OK response, I print out the cookies, the token is there. I run the second request, I get back a 400 error but if I print out the header of the second request, the token is there. I don't know where it is going wrong.
Why do I get 400 for the second one?
import requests
session = requests.Session()
response = session.get('https://www.racebets.com/en/horse-racing/formguide')
print(response)
cookies = session.cookies.get_dict()
print(cookies)
XSRFtoken = cookies['XSRF-TOKEN']
print(XSRFtoken)
response = session.get('https://www.racebets.com/ajax/formguide/search?s=Adrian')
print(response)
print(response.request.headers)
I also tried to skip session and use requests.get() in the second request and add the token to the header by myself but the result is the same:
import requests
session = requests.Session()
response = session.get('https://www.racebets.com/en/horse-racing/formguide')
print(response)
cookies = session.cookies.get_dict()
print(cookies)
XSRFtoken = cookies['XSRF-TOKEN']
print(XSRFtoken)
headers = {'XSRF-TOKEN': XSRFtoken}
response = session.get('https://www.racebets.com/ajax/formguide/search?s=Adrian', headers=headers)
print(response)
print(response.request.headers)

As Paul said:
The API you're trying to make an HTTP GET request to cares about two request headers: cookie and x-xsrf-token. Log your browser's network traffic to see what they're composed of.
The header needs to be named x-xsrf-token. Try this:
token = session.cookies.get('XSRF-TOKEN')
headers = {'X-XSRF-TOKEN': token}
response = session.get('https://www.racebets.com/ajax/formguide/search?s=Adrian', headers=headers)

Unable to get the "first" response with Python Requests

So I am trying to log in to a system using Python Requests module. When I do the post request:
r = requests.post(url, proxies=proxies, headers=headers, cookies=cookies, data=payload, verify=False)
print(r.status_code)
print(r.cookies)
and I try to print out the cookies, it shows me none. However, in the Burp Suite, I can see that it gets the response with the cookies I need, but it automatically asks for a landing page of the website and for some other stuff (basically it performs 2 more requests using the cookies I got in the first response). So when I ask for the cookies, it shows me cookies of the last response (obviously there are not any).
How can I get cookies from the first request?

A solution could be to set the flag allow_redirects=False and check the response for cookies and then follow the redirect manually. Given that it is a redirect that initiates the next request.

How can I use cookies in Python Requests?

I am trying to log in to a page and access another link in the page.
I get a "405 Not Allowed" error from this attempt:
payload={'username'=<username>,'password'=<password>}
with session() as s:
r = c.post(<URL>, data=payload)
print(r)
print(r.content)
I checked the post method details using Chrome developer tools and found a URL that appeard to be an API endpoint. I posted to that URL with the payload and it seemed to work; I got a response similar to what I could see in the developer.
Unfortunately, when trying to 'get' another URL after logging in, I am still getting the content from the login page.
Why is the login not sticking? Should I use cookies? How?

You can use a session object. It stores the cookies so you can make requests, and it handles the cookies for you
s = requests.Session()
# all cookies received will be stored in the session object
s.post('http://www...',data=payload)
s.get('http://www...')
Docs: https://requests.readthedocs.io/en/master/user/advanced/#session-objects
You can also save the cookie data to an external file, and then reload them to keep session persistent without having to login every time you run the script:
How to save requests (python) cookies to a file?

From the documentation:
get cookie from response
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies
{'example_cookie_name': 'example_cookie_value'}
give cookie back to server on subsequent request
url = 'http://httpbin.org/cookies'
cookies = {'cookies_are': 'working'}
r = requests.get(url, cookies=cookies)`

Summary (#Freek Wiekmeijer, #gtalarico) other's answer:
Logic of Login
Many resource(pages, api) need authentication, then can access, otherwise 405 Not Allowed
Common authentication=grant access method are:
cookie
auth header
Basic xxx
Authorization xxx
How use cookie in requests to auth
first get/generate cookie
send cookie for following request
manual set cookie in headers
auto process cookie by requests's
session to auto manage cookies
response.cookies to manually set cookies
use requests's session auto manage cookies
curSession = requests.Session()
# all cookies received will be stored in the session object
payload={'username': "yourName",'password': "yourPassword"}
curSession.post(firstUrl, data=payload)
# internally return your expected cookies, can use for following auth
# internally use previously generated cookies, can access the resources
curSession.get(secondUrl)
curSession.get(thirdUrl)
manually control requests's response.cookies
payload={'username': "yourName",'password': "yourPassword"}
resp1 = requests.post(firstUrl, data=payload)
# manually pass previously returned cookies into following request
resp2 = requests.get(secondUrl, cookies= resp1.cookies)
resp3 = requests.get(thirdUrl, cookies= resp2.cookies)

As others noted, Here is an example of how to add cookies as string variable to the headers parameter -
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...',
'cookie': '_fbp=fb.1.1654447470850.2143140577; _ga=GA1.2.1...'
}
response = requests.get(url, headers=headers)

Requests - determine parameterised url prior to issuing request, for inclusion in Referer header

I am writing a Python 2.7 script using Requests to automate access to a particular website. The website has a requirement that a Referer header matching the request URL is provided, for "security reasons". The URL is built up from a number of items in a params dict, passed to requests.post().
Is there a way to determine what the URL that Requests will use is, prior to making the request, so that the Referer header can be set to this correct value? Let's assume that I have a lot of parameters:
params = { 'param1' : value1, 'param2' : value2, # ... etc
}
base_url = "http://example.com"
headers = { 'Referer' : url } # but what is 'url' to be?
requests.post(base_url, params=params, headers=headers) # fails as Referer does not match final url
I suppose one workaround is to issue the request and see what the URL is, after the fact. However there are two problems with this - 1. it adds significant overhead to the execution time of the script, as there will be a lot of such requests, and 2. it's not actually a useful workaround because the server actually redirects the request to another URL, so reading it afterwards doesn't give the correct Referer value.
I'd like to note that I have this script working with urllib/urllib2, and I am attempting to write it with Requests to see whether it is possible and perhaps simpler. It's not a complicated process the script has to follow, but it may perhaps be slightly beyond the scope of Requests. That's fine, I'd just like to confirm that this is the case.

I think I found a solution, based on Prepared Requests. The concept is that Session.prepare_request() will do everything to prepare the request except send it, which allows my script to then read the prepared request's url, which now includes the parameters whose order are determined by the dict order. Then it can set the Referer header appropriately and then issue the original request.
params = {'param1' : value1, 'param2' : value2, # ... etc
}
url = "http://example.com"
# Referer must be correct
# To determine correct Referer url, prepare a request without actually sending it
req = requests.Request('POST', url, params=params)
prepped = session.prepare_request(req)
#r = session.send(prepped) # don't actually send it
# add the Referer header by examining the prepared url
headers = { 'Referer': prepped.url }
# now send normally
r = session.post(url, params=params, data=data, headers=headers)

it looks like you've found correctly the prepare_request feature in Requests.
However, if you still wanted to use your initial method, I believe you could use your base_url as your Referer:
base_url = "http://example.com"
headers = { 'Referer' : base_url }
requests.post(base_url, params=params, headers=headers)
I suspect this because your POST has the PARAMS directly attached to the base_url. If, for example, you were on:
http://www.example.com/trying-to-send-upload/
adding some params to this POST, you would then use:
referer = "http://www.example.com/trying-to-send-something/"
headers = { 'Referer' : referer, 'Host' : 'example.com' }
requests.post(referer, params=params, headers=headers)
ADDED
I would check my URL visually by using a simple statement after you've created the URL string:
print(post_url)
If this is good, you should print out the details of the reply from the server you're posting to, as it might also give you some hints as to why your query was rejected:
s = requests.post(referer, params=params, headers=headers)
print(s.status_code)
print(s.text)
Love to hear if this works as well for you.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.