Selenium prevent redirect - python

Does selenium automatic follow redirects? Because it seems that the webdriver isn't loading the page I requested.
And when it does automatic follow the redirects is there any possibility to prevent this?
ceddy

No, Selenium drives the browser like a regular user would, which means redirects are followed when requested by the web application via either a 30X HTTP status or when triggered by javascript.
I suggest you consider a legitimate bug in the application if you consider it problematic when it happens to users.

Related

Selenium gets response code of 429 but firefox private mode does not

Used Selenium in python3 to open a page. It does not open under selenium but it does open under firefox private page.
What is the difference and how to fix it?
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)
Creating a google cookie is not necessary. It is not there under firefox private page either but it works without it. However, under Selenium the behavior is different.
I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white. It does not happen in firefox private mode.
UPDATE:
I turned on the persistent log. Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url. It only happens for the first time.
On selenium however, the request does not survive the 429 response. It does report something to cdndex website. I have blocked that website so you o not see the request go through there. This is still a different behavior between firefox and selenium.
Selenium with persistent log:
Firefox with persistent log:
This is just my huch after working with selenium and webdriver for a while; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.
Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.
Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.
429 Too Many Requests
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. The 429 status code is intended for use with rate-limiting schemes.
Root Cause
When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature. The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.
The server can also identify a bot with cookies, rather than by their login credentials. Requests may also be counted on a per-request basis, across your server, or across several servers. So there are a variety of situations that can result in you seeing an error like one of these:
429 Too Many Requests
429 Error
HTTP 429
Error 429 (Too Many Requests)
This usecase
This usecase seems to be a classical case of Selenium driven GeckoDriver initiated firefox Browsing Context getting detected as a bot due to the fact:
Selenium identifies itself
References
You can find a couple of relevant detailed discussions in:
How to Conceal WebDriver in Geckodriver from BotD in Java?
How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?

Comparing request module vs selenium in Python

I made a program that works with selenium, and it automates for posting comment to the some blogs' contents. I'm not familiar with the requests module of python. (working on it for just a week) The thing that I'm wondering is, my program with selenium is a bit slow for page loading, and it loads everything from ads to the images/videos. If I'd made my program with requests module, would it save data and a bit faster according to the selenium module?
I searched this issue at some forum-sites, generally they say request modules a bit faster, but not all. Also I couldn't find any info about saving data by comparing this modules?
Plz don't give me directly the thumbs down. I need this answer with details.
Selenium is used for web automation via clicking in web elements and sending keys to input boxes.
To speed up selenium, use headless mode, so that the visual components like ads are not loaded and the work is fast , go to selenium's documentation to learn more about headless mode.
While requests is used for HTTP methods
Like GET, POST etc. Learn more about requests from here
If the blogging site has a public api, then you can use requests module.
If you are new to API , I recommend watching this YouTube video
https://youtu.be/GZvSYJDk-us
For example to create issues on GitHub you can use GitHub API.
But to comment on a blogging site which has no public api, you need to use selenium.
Requests directly send and receive data from the server which hosts a particular service, so it is fast.
But selenium interacts with the web browser.
When you are using requests , you can do an action directly, without having to perform a bunch of clicks or send keys.
Selenium allows you to control a browser and execute actions on a webpage.
requests library is for making HTTP requests.
So, if you know how to write your program for posting comments with just using HTTP API then I’d go with requests, Selenium would be an overhead in this case
If you are proficient with HTTP requests and verb (know how to make a POST request to a server with requests library), then choose requests. If you want to test your script, use selenium or BeautifulSoup.

Simulate active session on a website with python

I'm looking for a way to simulate an active session on a website using Python. What I mean by that, is I want to create a software, which makes the website think that an actual user with an actual browser has the website open. I've found urllib3 and it's request.urlopen methon, but it seems that this only reads the content provided from url and closes the connection. Thanks for any suggestions
You can try simulate browser requests to get necessary cookies for authentication. Google Chrome Dev Tools and requests python lib will do the job.
Some websites have another way to handle sessions, but I believe the majority is using cookies set through post requests.

Python+Selenium: Is it possible to lock the browser from manual clicks and inputs?

I have written a python selenium code which automates actions on a website. Once user authenticates the log in, selenium takes over the browser and does its thing. Everything works perfectly fine, however, I notice that the code may fail if user accidentally clicks anything on any links while selenium is running.
Is there a way to prevent manual inputs from the user? something like:
br = webdriver.chrome()
br.lock_manual_userinput()
There is no such thing. You can dedicate machine/s with limited access for running the automation or just be carful if you are working while running the scripts.
Selenium is purely used for automating of repeatative Manual Tasks. At this point it is worth to mention that Selenium mocks User Interactions.
Hence the statement once user authenticates the log in, selenium takes over the browser is pretty much speculative as attempting to Reconnect Selenium to previous Browsing Session is not viable.
Now the statement, "code may fail if user accidentally clicks anything" is pretty much expected as Selenium needs browser focus. As a result of any manual user interaction Selenium will loose focus and raise an error.
Finally, there is no other way to prevent manual inputs as mentioned earlier that Selenium mocks User Interactions. The best possible way to execute your Selenium based Automated Tests preventing manual inputs from the user would be to :
Set up a Test Bed with all the required Hardware and Software configuration.
Create the Test Bed in an isolated environment preferably in a Test Lab free from Manual Intervention.
Automate only the required usecases which needs no manual intervention.

Some websites block selenium webdriver, how does this work?

So I'm trying to web crawl clothing websites to build a list of great deals/products to look out for, but I notice that some of the websites that I try to load, don't. How are websites able to block selenium webdriver http requests? Do they look at the header or something. Can you give me a step by step of how selenium webdriver sends requests and how the server receives them/ are able to block them?
Selenium uses a real web browser (typically Firefox or Chrome) to make its requests, so the website probably has no idea that you're using Selenium behind the scenes.
If the website is blocking you, it's probably because of your usage patterns (i.e. you're clogging up their web server by making 1000 requests every minute. That's rude. Don't do that!)
One exception would be if you're using Selenium in "headless" mode with the HtmlUnitDriver. The website can detect that.
It's very likely that the website is blocking you due to your AWS IP.
Not only that tells the website that somebody is likely programmatically scraping them, but most websites have a limited number of queries they will accept from any 1 IP address.
You most likely need a proxy service to pipe your requests through.

Categories