Getting the ‘g-recaptcha-response’ from invisible reCAPTCHA - python

I'm creating a web scraper that will need to sign in to a site to access data. Upon inspecting the sites login page I was able to retrieve the path to send data to the site's server along with the required arguments, meaning rather than using something like Selenium to fill out form data on the page itself, I can just send a direct POST request to the server and sign in that way. There’s just one problem with that.
The data the login sends are two things:
g-recaptcha-response
username
password
token2fa
They are filled in JSON data and then sent to the server. The problem is, the site uses an invisible reCAPTCHA, so I can’t solve it and send the response.
Now I need to figure out how to get g-recaptcha-response and send it before the login page itself does. Any help is appreciated.
Not using selenium nor a captcha solving service, as a captcha solving service costs money.

Related

how to run any api which having a captcha on postman platform

I'm getting error about captcha based on url i am unable to see the captcha, is there any method to show the captcha. i want any suggestion to get that captcha visible and let my url run.
Unfortunately, it's not possible to display CAPTCHA on Postman. CAPTCHA is designed to prevent automated systems (like Postman) from accessing the API and is meant to be displayed and solved by a human. You'll need to manually access the API through a web browser to solve the CAPTCHA before making the API request.

Python requests - submit form via POST request for site with recaptcha

BACKGROUND
I'm trying to automate form submission for a site.
I'm trying to access the quote given for the form data.
The site is protected by reCAPTCHA. - I’ve seen the token being sent in network devtools
QUESTION / HELP
I want to know if it is possible to automate submitting the form; and how; WITHOUT a browser instance tool like Selenium. I have all the form data and headers from the forms POST request from devtools
WHAT I'VE TRIED
I manually filled the form with dummy values and watched the devtools network tab on clicking "submit". I found the POST request and copied the request headers and form body and pasted into POSTMAN - I never get to the second page (with the quote) however in my POSTMAN trials
WHY
For anyone wondering why I really badly want to do this, I'm trying to create a car insrance quote aggregator tool for my country as there currently isn't one and filling forms is a pain. So I want to create this service (Only ~20 providers.

Python - How to login to web form with hidden values?

I am trying to write a python script to login to the following site in order to automatically keep on eye on some account details: https://gateway.usps.com/eAdmin/view/signin
I have the right credentials, but something isn't quite working correctly, I don't know if it is because of the hidden inputs that exist on the form
import requests
from bs4 import BeautifulSoup
user='myusername'
passwd='mypassword'
s=requests.Session()
r=s.get("https://gateway.usps.com/eAdmin/view/signin")
soup=BeautifulSoup(r.content)
sp=soup.find("input",{"name":"_sourcePage"})['value']
fp=soup.find("input",{"name":"__fp"})['value']
si=soup.find("input",{"name":"securityId"})['value']
data={
"securityId": si,
"username":user,
"password":passwd,
"_sourcePage":sp,
"__fp":fp}
headers={"Content-Type":"application/x-www-form-urlencoded",
"Host":"gateway.usps.com",
"Origin":"https://gateway.usps.com",
"Referer":"https://gateway.usps.com/eAdmin/view/signin"}
login_url="https://gateway.usps.com/eAdmin/view/signin"
r=s.post(login_url,headers=headers,data=data,cookies=r.cookies)
print(r.content)
_sourcePage, securityId and __fp are all hidden input values from the page source. I am scraping this from the page, but obviously when I get to do the POST request, I'm opening the url again, so these values change and are no longer valid. However, I'm unsure how to rewrite the POST line to ensure that I extract the correct hidden values for submission.
I don't think that this is only relevant to this site, but for any site with hidden random values.
You can't do that.
You are trying to authenticate using an HTTP POST request outside the application scope, the login page and his own web form.
For security reasons the web page implements differents techniques, one of all the Anti CSRF Token ( which it's probably __sourcePage ) to ensure that the login request comes exclusively from the web page.
For this reason, every time you scrape the page grabbing the content of the security hidden inputs, the web application generate them every time. Thus when you reuse them to craft the final request of course they are not anymore valid.
See also: https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)

How to logout of website during requests session

I am attempting to scrape some data from a website which requires a login. To complicate matters, I am scraping data from three different accounts. So in other words, I need to login to the site, scrape the data and then logout, three times.
The html behind the logout button looks like this:
The (very simplified) code I've tried is below:
import requests
for account in [account1,account2,account3]:
with requests.session() as session:
[[login code here]]
[[scraping code here]]
session.get(url + "/logout")
The scraping using the first account works fine, but after that it doesn't. I'm assuming this is because I'm not logging out properly. What can I do to fix this?
It's quite simple:
You should forge correct login request.
To do it go to the login page:
open 'Inspect' tool, 'Network' tab. Checking 'Preserve log' option is quite useful as well.
Log in to the site, and you'll see login request appeared in Network tab (Usually it's a POST request).
Right-click to request, select Copy -> Copy as Curl, and then just use this brilliant tool
Usually, you can trim up and headers and cookies of the code produced by the tool(but be careful trimming Content-Type header, it can break your code).
Replace requests.[get|post](...) to session.[get|post](...)
Profit. You'll have logged in session by execution of the upper code. Logging out and any form population is made pretty much the same way.

Cross-domain Ajax POST and user verification

I'm working on app which saves things from many cross domains via Ajax POST method to my server/app. I need to find a solution how to send a POST and verify if the user who sent it is already signed on my site and than save it to the database.
I am pretty sure that I need Chrome, Firefox extension to do it, because I need to embed my js on every page my users surf on. The thing is I don't know where to start and how should it work. I could set up proxy to make JSON POST work, but I don't know how to verify if the user is signed on my site.
Should I get cookies of my users from browser via Chrome API and sent it in the POST and authenticate the cookie/session in Django? What do you suggest?
Thank you for your help. I appreciate every hint.
When the user logons at http://yourserver.com, you can set a permanent cookie to identify him. (see SESSION_EXPIRE_AT_BROWSER_CLOSE and COOKIE_AGE variables in django)
Then, when he embeds any JS from another site from yourserver.com domain, the cookies are automatically sent for this domain, and you can check on your django side for the cookie existence and validity and give the good JS.
Because of crossdomain issues, you should better use form POST as an alternative as AJAX as it is not security restricted. You can then play with iframes and javascript to make both domains communicates.
To embed the JS in another website, you can use a browser extension, or a simple bookmarklet, which will load your code in the current page when the user clicks it from any webpage.
My 2 cents;

Categories