Python selenium blocked [duplicate]

Python selenium blocked [duplicate] - python

This question already has an answer here:
Unable to use Selenium to automate Chase site login
(1 answer)
Closed 1 year ago.
Here is what I tried :
from selenium import webdriver
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get("https://secure07c.chase.com/web/auth/#/logon/logon/chaseOnline?")
username = driver.find_element_by_id("userId-text-input-field")
The problem I ran into is that when I simply execute this and then manually fill the fields and click login, an error page that is made for protecting from bots pops up.
When I remove the line username = driver.find_element_by_id("userId-text-input-field") the website works correctly and I can login manually from the automated selenium webdriver driven page.
Same problem happens when doing driver.page_source and many other tests that request elements from the webpage.
I tried a lot of things (most options, flags, user agent, ...) but they are not relevant in this issue that's why I included the simplified version of the code that is causing the issue, basically any element selection.
The way selenium requests elements is suspicious I guess, simply finding elements raised suspicion in the chase bank website. I want to understand how selenium is finding / selecting elements and how anti-bots are detecting this very simple action. Is there a way around it ?

Chase can see that it is an automated script sending the request, not a real human sending it. Stack Overflow also uses similar technology. There is no way to escape this, otherwise, bots would be all over banking websites and DDoSing them.

Related

How do I hide the fact I'm using a bot?

So for my python selenium script I have to complete a lot of Captcha's. I noticed that when I get the Captcha's on my regular browser they're much easier and quicker. Is there a way for me to hide the fact that I'm using a web automation bot so I get the easier Captcha's?
I already tried randomizing the User Agent but to no success.

You can go to your website and inspect the page. Then go to the network tab and select Network. Reload the page and the select the webpage you are accessing from the list. If you scroll down, you can see the user agent that your browser is using to access the page. Use that user agent in your scraper to exactly mimick your browser.

From a generic perspective there are no proven ways to hide the fact that you are using a Selenium driven web automation bot.
You can find a relevant detailed discussion in Can a website detect when you are using Selenium with chromedriver?
However at certain times modifying the navigator.webdriver flag helps to prevent detection.
References
You can find a couple of relevant detailed discussions in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium Chrome gets detected
How does recaptcha 3 know I'm using selenium/chromedriver?

Python - requests_html screen scraping

I’m trying to log in to a pretty complex (to my beginner’s eye) website and make a reservation. Did not know a single python statement before starting the project. After many starts and stops have successfully logged in using requests_html/HTMLSession. Have overcome the security/authorization issues and arrived at target page. The page displays the server time on it and I cannot hit the proper key until the time reaches 7:00 AM.
I am unable to access the field. I have tried the .search and .find commands, but nothing. I am hoping someone can tell me how to download the time into my program so I can test the time and wait until it reaches, or almost reaches 7:00. (I say almost because the reservation is for tee times and there is a real crunch at 7 – the whole point of this application is to automate the process and be the fastest!)
So I need to be able to load the time into my python and click a date file when the clock reaches 7:00.

No idea what scraping tool you are using, but generaly you would access this elemen via xpath or css selector:
response.css(".jquery_server_clock::text").extract()
This example is if you are using scrapy

Maybe you would be better off using selenium.
Selenium allows you to automate a browser window, so it's possible that it is not possible to interact with the site using requests, but using selenium the site you visit thinks you are using a normal browser but you can automate everything.
So what I would do if I were you:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("your_url.com")
input("Navigate to the desired page, then press enter")
while not driver.find_element_by_class_name("jquery_server_clock").text[0] == "7":
pass
driver.find_element_by_class_name("other_button").click()
This would wait until it is 7 AM and then click the other button immediately.

selenium login submission failure

I'm having trouble logging into a website using Selenium via Python.
I'm new to web scraping and as part of the learning process I'm attempting to web scrape my account activity from American Airlines with Python. This requires logging in, which is where my code (see below) fails. All the form fields are populated, however, when I submit, the page just seems to refresh and clears my entries.
Checks I've perform:
Login information is correct. I've manually logged in.
I've played around with different lengths for sleep time. No success
Cleared form prior to entry. No success
Run the code up to (but no including) the submit() line, and then manually click the "Log In" button. Login still fails. This makes me think that the fields are somehow populated incorrectly, but I can't see where the issue is.
Thanks in advance for any help!
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome()
driver.get("https://www.aa.com/homePage.do")
loginId = driver.find_element_by_name("loginId")
lastName = driver.find_element_by_name("lastName")
password = driver.find_element_by_name("password")
time.sleep(2)
loginId.send_keys("my-email-here")
lastName.send_keys("my-last-name-here")
password.send_keys("my-password-here")
time.sleep(2)
password.submit()

I believe that AA and other airlines have sophisticated bot detection that know you are using selenium to find and manipulate elements in the DOM, causing the page to navigate back to login. Even if I do a:
driver.find()
on the webpage and fill out the fields myself it fails, but if I do just a:
driver.get('https://www.aa.com')
and then fill out the fields it will allow me to continue. I have found a few posts on google and reddit that have gone over this if you do some searching. Also Can a website detect when you are using selenium with chromedriver? , goes over some of the ways they might be doing it.

Python - Automating form entry on a .aspx website and storing output in a file (using Selenium?)

I've just started to learn coding this month and started with Python. I would like to automate a simple task (my first project) - visit a company's career website, retrieve all the jobs posted for the day and store them in a file. So this is what I would like to do, in sequence:
Go to http://www.nov.com/careers/jobsearch.aspx
Select the option - 25 Jobs per page
Select the date option - Today
Click on Search for Jobs
Store results in a file (just the job titles)
I looked around and found that Selenium is the best way to go about handling .aspx pages.
I have done steps 1-4 using Selenium. However, there are two issues:
I do not want the browser opening up. I just need the output saved to a file.
Even if I am ok with the browser popping up, using the Python code (exported from Selenium as Web Driver) on IDLE (i have windows OS) results in errors. When I run the Python code, the browser opens up and the link is loaded. But none of the form selections happen and I get the foll error message (link below), before the browser closes. So what does the error message mean?
http://i.stack.imgur.com/lmcDz.png
Any help/guidance will be appreciated...Thanks!

First about the error you've got, I should say that according to the expression NoSuchElementException and the message Unable to locate element, the selector you provided for the web-driver is wrong and web-driver can't find the element.
Well, since you did not post your code and I can't open the link of the website you entered, I can just give you a sample code and I will count as much details as I can.
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("url")
number_option = driver.find_element_by_id("id_for_25_option_indicator")
number_option.click()
date_option = driver.find_element_by_id("id_for_today_option_indicator")
date_option.click()
search_button = driver.find_element_by_id("id_for_search_button")
search_button.click()
all_results = driver.find_elements_by_xpath("some_xpath_that_is_common_between_all_job_results")
result_file = open("result_file.txt", "w")
for result in all_results:
result_file.write(result.text + "\n")
driver.close()
result_file.close()
Since you said you just started to learn coding recently, I think I have to give some explanations:
I recommend you to use driver.find_element_by_id in all cases that elements have ID property. It's more robust.
Instead of result.text, you can use result.get_attribute("value") or result.get_attribute("innerHTML").
That's all came into my mind by now; but it's better if you post your code and we see what is wrong with that. Additionally, it would be great if you gave me a new link to the website, so I can add more details to the code; your current link is broken.

Concerning the first issue, you can simply use a headless browser. This is possible with Chrome as well as Firefox.
Check Grey Li's answer here for example: Python - Firefox Headless
from selenium import webdriver
options = webdriver.FirefoxOptions()
options.add_argument('headless')
driver = webdriver.Firefox(options=options)

Selenium Webdriver for Python: get page, enter values, click submit, get source

Alright, I'm confused. So I want to scrape a page using Selenium Webdriver and Python. I've recorded a test case in the Selenium IDE. It has stuff like
Command Taget
click link=14
But I don't see how to run that in Python. The desirable end result is that I have the source of the final page.
Is there a run_test_case command? Or do I have to write individual command lines? I'm rather missing the link between the test case and the actual automation. Every site tells me how to load the initial page and how to get stuff from that page, but how do I enter values and click on stuff and get the source?
I've seen:
submitButton=driver.find_element_by_xpath("....")
submitButton.click()
Ok. And enter values? And get the source once I've submitted a page? I'm sorry that this is so general, but I really have looked around and haven't found a good tutorial that actually shows me how to do what I thought was the whole point of Selenium Webdriver.

I've never used the IDE. I just write my tests or site automation by hand.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.google.com")
print browser.page_source
You could put that in a script and just do python wd_script.py or you could open up a Python shell and type it in by hand, watch the browser open up, watch it get driven by each line. For this to work you will obviously need Firefox installed as well. Not all versions of Firefox work with all versions of Selenium. The current latest versions of each (Firefox 19, Selenium 2.31) do though.
An example showing logging into a form might look like this:
username_field = browser.find_element_by_css_selector("input[type=text]")
username_field.send_keys("my_username")
password_field = browser.find_element_by_css_selector("input[type=password]")
password_field.send_keys("sekretz")
browser.find_element_by_css_selector("input[type=submit]").click()
print browser.page_source
This kind of stuff is much easier to write if you know css well. Weird errors can be caused by trying to find elements that are being generated in JavaScript. You might be looking for them before they exist for instance. It's easy enough to tell if this is the case by putting in a time.sleep for a little while and seeing if that fixes the problem. More elegantly you can abstract some kind of general wait for element function.
If you want to run Webdriver sessions as part of a suite of integration tests then I would suggest using Python's unittest to create them. You drive the browser to the site under test, and make assertions that the actions you are taking leave the page in a state you expect. I can share some examples of how that might work as well if you are interested.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python selenium blocked [duplicate] - python

Chase can see that it is an automated script sending the request, not a real human sending it. Stack Overflow also uses similar technology. There is no way to escape this, otherwise, bots would be all over banking websites and DDoSing them.

Related

How do I hide the fact I'm using a bot?

Python - requests_html screen scraping

selenium login submission failure

Python - Automating form entry on a .aspx website and storing output in a file (using Selenium?)

Selenium Webdriver for Python: get page, enter values, click submit, get source

Categories

Resources