never thought I would ever need to create a new account and post this question. But after researching for more than 4 hours we think we need the experts help.
We are currently trying to find a element from a website while using selenium. This used to work fine for the past 6 months until something changed on the website and it kept failing.
We extracted the page source using driver.page_source and discovered the following error -
"Internet Explorer is not supported with the "Company". Please use Google Chrome, Mozilla Firefox or Safari as your browser."
However, we are using chrome as the browser in selenium.
What is happening is because the website thinks it is IE, it hides the top navigation and this does not exist anywhere in the source code (tried changing stlye from none to block).
Any help on this is really appreciated
So far we have tried the following but none of them worked -
Tried Firefox as browser (Same issues)
Tried using undected chrome driver (no luck).
Sleep timer (no luck)
Adding user agents (no luck)
Running a mobile emulator (still the same error as above)
Related
I'm learning how selenium crawls data, but I find that when a website opens through selenium, it's different from what I used to get when I used other normal browsers. Even I add headers. And I'm very confused.
I really want to upload two contrast pictures, but I can't upload them in stackoverflow at present. I even tried to open the chrome driver and enter the web address manually, but the result is still different.
I use Python 3.6, selenium and chrome 75.0.3770.80
from selenium import webdriver
driver = webdriver.Chrome() #创建driver实例
url = 'https://www.free-ss.ooo'
driver.get(url)
At present, I can't post pictures on stack overflow, but I just want to figure out how I can use selenium to get normal web pages.
Aha,I found the problem, really because the target site detected selenium, the solution is to add options
Chrome_options. add_experiment_option ('excludeSwitches', ['enable-automation'])
Faced same issue and was able to resolve it by removing or fixing appropriate user-agent argument and it worked fine in both headless and non-headless mode.
The resolution was inspired by PDHide post
Please note, this question is Python 3.5.2, only Python answers will be accepted. Unless this can definitely be handled in Java? Automating a process as part of an internal project. Everything works just fine using the IE webdriver, but not phantomJS web driver (which is expected due to limited functionality). However, a work-around / solution is required.
When opening the internal site, a Windows Security login dialog box comes up prompting for a username, password and press 'Ok'. With the IE web driver, it is handled just fine with:
loginAlert = driver.switch_to_alert()
loginAlert.authenticate(username, password)
The javascript:
driver.execute_script("window.confirm = function(){return true;}")
Being run before loading the page that gives the prompt, doesn't seem to confirm the login alert, for either phantom or IE. Even if it did, this doesn't type in the login details. As mentioned, it's a Windows Security prompt from the browser, not an element.
Once logged in, the page is reloaded with an ASP.NET_SessionId Cookie which expires once the session is ended. I've tried logging in through IE, then adding the cookie into Phantom, but it doesn't seem to match up the domains.
I've tried using:
driver.save_screenshot(filename) to see what's happening in phantom
Which works with IE driver, but with PhantomJS, only a transparent image is saved. The whole http://username:pass#site.com thing doesn't work for either IE or phantom driver. It can't load / use the URL when this is done.
How can the Windows Security login dialog be handled, or worked around? I tried looking into alternatives, such as pyvirtualdisplay, but found no information on how to get this working with Python 3 on windows.
I have also tried setting phantomjs desired capabilities custom header authentication, but that doesn't seem to do anything for this either.
I have also tried using ActionChains, however they don't work when the Alert is there (in either IE or phantom driver). An UnexpectedAlertPresentException is thrown, even if this is caught and you try to perform the actions, once caught, the alert seems to close.
My bad!
Whilst the username:pass#domain.com didn't work in the IE webdriver - it did work in the PhantomJS web driver.
However, the website has limited browser compatibility - it doesn't load properly in either Chrome or Firefox, it is IE particular.
PhantomJS seems to handle the site the same way as Chrome / Firefox based on page source comparisons.
As such, I am trying to find a way to make the current IE driver invisible / hidden.
I have found:
headless-selenium-for-win using Python
However, despite the user here saying they got it to work, when I try to initialize the driver, it just hangs, the code doesn't proceed and no error messages are provided.
Asking another question regarding this.
I am writing a webscraper using selenium on python. I wrote the script to pull information from one site, then go to another and pull different information (emails).
When I run the script with browser = webdriver.Firefox(), the script behaves perfectly. However, for speed purposes I decided to switch to browser = webdriver.PhantomJS().
When I do this, (I tested both scenarios), the driver doesnt seem to go change to the second website and instead pull the second round of information (searching for an email) from the first site.
Why would the script behave differently with phantomJS when all other things are exactly the same?
I found the answer. With PhantomJS, you need to specify browser.get('http://www.' + website), which is not required for Firefox.
I've just started to learn coding this month and started with Python. I would like to automate a simple task (my first project) - visit a company's career website, retrieve all the jobs posted for the day and store them in a file. So this is what I would like to do, in sequence:
Go to http://www.nov.com/careers/jobsearch.aspx
Select the option - 25 Jobs per page
Select the date option - Today
Click on Search for Jobs
Store results in a file (just the job titles)
I looked around and found that Selenium is the best way to go about handling .aspx pages.
I have done steps 1-4 using Selenium. However, there are two issues:
I do not want the browser opening up. I just need the output saved to a file.
Even if I am ok with the browser popping up, using the Python code (exported from Selenium as Web Driver) on IDLE (i have windows OS) results in errors. When I run the Python code, the browser opens up and the link is loaded. But none of the form selections happen and I get the foll error message (link below), before the browser closes. So what does the error message mean?
http://i.stack.imgur.com/lmcDz.png
Any help/guidance will be appreciated...Thanks!
First about the error you've got, I should say that according to the expression NoSuchElementException and the message Unable to locate element, the selector you provided for the web-driver is wrong and web-driver can't find the element.
Well, since you did not post your code and I can't open the link of the website you entered, I can just give you a sample code and I will count as much details as I can.
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("url")
number_option = driver.find_element_by_id("id_for_25_option_indicator")
number_option.click()
date_option = driver.find_element_by_id("id_for_today_option_indicator")
date_option.click()
search_button = driver.find_element_by_id("id_for_search_button")
search_button.click()
all_results = driver.find_elements_by_xpath("some_xpath_that_is_common_between_all_job_results")
result_file = open("result_file.txt", "w")
for result in all_results:
result_file.write(result.text + "\n")
driver.close()
result_file.close()
Since you said you just started to learn coding recently, I think I have to give some explanations:
I recommend you to use driver.find_element_by_id in all cases that elements have ID property. It's more robust.
Instead of result.text, you can use result.get_attribute("value") or result.get_attribute("innerHTML").
That's all came into my mind by now; but it's better if you post your code and we see what is wrong with that. Additionally, it would be great if you gave me a new link to the website, so I can add more details to the code; your current link is broken.
Concerning the first issue, you can simply use a headless browser. This is possible with Chrome as well as Firefox.
Check Grey Li's answer here for example: Python - Firefox Headless
from selenium import webdriver
options = webdriver.FirefoxOptions()
options.add_argument('headless')
driver = webdriver.Firefox(options=options)
I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the browser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks
using Selenium Webdriver and Python, you would simply access the .page_source property to get the source of the current page.
for example, using Firefox() driver:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.example.com/')
print(driver.page_source)
driver.quit()
There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it
Ok, so here is how I ended up doing this, for anyone who needs this in the future..
You have to use firefox for this to work.
1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this
2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.
3) then just start the selenium server to use the profile you created (below is an example for linux)
cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/
Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages