2 factor authentication handling in selenium webdriver with python

2 factor authentication handling in selenium webdriver with python - python

I am logging to a website with valid credentials, but if my network changes or even device gets changed (within same network); it redirects to Authentication page where I have to put the access code, which I received via email. I want to skip this authentication page and navigate to other pages to continue my process.
Expected result - Home page of the site
Actual result - Secure Access Code page

When you initialise your driver you can configure the browser to load your chrome profile, that is if your using chrome. This may allow you to bypass the authentication page if you have had a previous login with this profile. Not sure if this will work but it worth a shot.

Related

Identifying log-in data using Chrome Developer (Network) for Python Requests script

I am trying to build a Python script using Requests to log into a licensed site and eventually pull / export data frames.
Through research I understand that in the Chrome Developer > Network there should be "Form Data" within site's secure login path (Request Method: POST) that provides my login information used to pass into the server.. However in my case no Form Data header is provided.
What is provided, in the Response Header, is a series of 'Set-Cookie' codes which I believe are what are being used to authenticate log in..
I am wondering if .. 1) Am I missing something via Developer where the Form Data is stored somewhere else, where I can clearly see what user name and password keys the site uses to authenticate log in.. If not, 2) Can I authenticate via Python Requests the website's data via these Set-Cookies ?
Using the secure login website and the Requests I am able to successfully pull in my cookies.. but am just at a loss on how I would use them to access data from the site.
loginurl = ('https://webisite.com/security/login')
r=requests.get(loginurl)
r.cookies
Thank you again.

Selenium gets response code of 429 but firefox private mode does not

Used Selenium in python3 to open a page. It does not open under selenium but it does open under firefox private page.
What is the difference and how to fix it?
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)
Creating a google cookie is not necessary. It is not there under firefox private page either but it works without it. However, under Selenium the behavior is different.
I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white. It does not happen in firefox private mode.
UPDATE:
I turned on the persistent log. Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url. It only happens for the first time.
On selenium however, the request does not survive the 429 response. It does report something to cdndex website. I have blocked that website so you o not see the request go through there. This is still a different behavior between firefox and selenium.
Selenium with persistent log:
Firefox with persistent log:

This is just my huch after working with selenium and webdriver for a while; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.
Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.
Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.

429 Too Many Requests
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. The 429 status code is intended for use with rate-limiting schemes.
Root Cause
When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature. The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.
The server can also identify a bot with cookies, rather than by their login credentials. Requests may also be counted on a per-request basis, across your server, or across several servers. So there are a variety of situations that can result in you seeing an error like one of these:
429 Too Many Requests
429 Error
HTTP 429
Error 429 (Too Many Requests)
This usecase
This usecase seems to be a classical case of Selenium driven GeckoDriver initiated firefox Browsing Context getting detected as a bot due to the fact:
Selenium identifies itself
References
You can find a couple of relevant detailed discussions in:
How to Conceal WebDriver in Geckodriver from BotD in Java?
How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?

Python requests - how to perform SAML SSO login (to login.microsoft.com for example)?

First of all, I googled this question but found some generic explanations which didn't provide me with good understanding how to do things.
Second - I'm a valid system user (not admin) and have access to the data. I.e. I have valid user credentials and may download file manually but for small automation I would like to have it downloaded by python script from my PC.
The download itself is simple, the only thing - I need to provide a valid session id cookie with request. I.e. finally I need to get this cookie by easiest way.
If my understaning is right in terms of SAML I'm a User Agent and want to download a file from Sevice Provider which need to authenticate me with Identity Provider (Microsoft). Usually I do it via browser and now I'm able to emulate it with help of PySide6 (QWebEngineView). I load target URL first in QWebEngineView. Actually it is a small embedded web-browser, it redirects me to login.microsoft.com, asks credentials and then redirects me back to Service Provider site and sets session id cookie. Then I'm able to use this cookie with my requests. It works but I would like to get rid of GUI (PySide) if possible.
I decided to replicate a flow that browser does and failed almost at the begining. What happens:
I'm requesting a file from my Service Provider side with usual get request.
Service provider replies with HTML page (instead of target file) as I'm not authenticated.
This HTML page contains Java script triggered by onPageLoad event - this java script simply redirects browswer to login.microsoft.com (long URL with some parameters).
Next request with this long URL for login.microsoft.com ends with "302 Moved Temporarily" with the same URL in "Location" header. And when I go with this URL it again gives me 302 with the same URL.
With the same scenario browswer gets only two redirections and finally receives an URL of web page with login/password request from microsoft.com.
I understand that I should put some more headers/cookies when I go again with URL provided in "Location" header of 302 response. But... I have no idea what login.microsoft.com expects here.
So my question is - is there any source where this message flow is described? Or maybe someone did it already and may give me advice how to proceed?
I found some SAML-related libraries for python but I see there quite complex configuration with x509 certificates and more stuff - it looks like they are more targeted for implementation on Service Provider side, not for external login.

Can I do a GET request to a page behind a login without logging in

My task was to create an automation script that would login to a website and navigate to a page and download a csv file.I cannot login to this website using a script due to some security measures (the cookies and hidden variables inside the login form change every time a website request takes place or when the website is loaded). My idea to go around this is to:
Open a window and login manually
Navigate to the page I want
Capture all the cookies, headers, etc and the session_id of this page
Using Python, do a GET request with this page URL and send the trace I captured in step 3
Get the CSV file
Is it possible? What would be an example for this scenario? Could I do this using selenium webdriver, like doing the manual login, storing the trace then using it to open the page I want with driver.get(url)?

Can I do a GET request to a page behind a login without logging in?
Yes of course, but the HTTP response will be probably 401 or 302 code ( Unauthorized or Redirect to login ) as you are not authorized to perform that action.
See also: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
Open a window and login manually
Capture all the cookies, headers, etc and the session_id of this page
Using Python, do a GET request with this page URL and send the trace I captured in step 3
Just for clear, when you open the browser, the SSL/TLS protocol initiate the connection with the remote server in order to create a new encrypted session which you will use to perform login and other navigation on the website.
Then when you perform the Python GET request (which is not a tab of your browser) that connection will fail because the client has failed to negotiate an SSL/TLS session since the client, in this case your script, has never performed the TLS Handshake.
Even if you provide the HTTP session captured before, which is different ( Layer 7 Protocol )
See also: The Transport Layer Security (TLS) Protocol Version 1.3
Get the CSV file
Again, you are trying to do something that the web application isn't build for.
Instead some organizations provides the REST API to fetch certain data without require HTTP sessions
See also: Representational state transfer
p.s. please provide some feedback if the replies are helping you, we are all sharing and learning ;)

Youtube API - terminal attempt to open the browser and fails

I'm trying to build a project using Youtube's API and Python3.
As mentioned in the Quick Start guide:
The sample attempts to open a new window or tab in your default browser. If this fails, copy the URL from the console and manually open it in your browser.
I'm using MacOS Terminal which runs the script but I really do need to copy the URL into my browser.
I guess the problem is in my machine, and I'd like to find a solution how to fix it, as it would be faster and easier, each and every time I run the script.
I've tried to find similar thread, with no luck.
If anyone can guide my through, or send me a link, for how to solve this problem.
Thanks,
Yoav.

You need a BROWSER environment variable set. This points to the location of the browser.
Use getenv BROWSER to see if it is already set
*Command may be different depending on version of Mac OS

Solution (source):
I've used the run_console() which don't attempt to run the browser, but ask for the client to open it manually.
To make it run the browser automatically, you should use run_local_server() method as shown in the example below.
The run_console function instructs the user to open the authorization
URL in their browser. After the user authorizes the application, the
authorization server displays a web page with an authorization code,
which the user then pastes into the application. The authorization
library automatically exchanges the code for an access token.
credentials = flow.run_console()
The run_local_server function attempts to open the authorization URL in the user's browser. It also
starts a local web server to listen for the authorization response.
After the user completes the auth flow, the authorization server
redirects the user's browser to the local web server. That server gets
the authorization code from the browser and shuts down, then exchanges
the code for an access token.
credentials = flow.run_local_server(host='localhost',
> port=8080,
> authorization_prompt_message='Please visit this URL: {url}',
> success_message='The auth flow is complete; you may close this window.',
> open_browser=True)
Thank you #Hassan Voyeau for the help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.