I have a web scraping script in Python, using Selenium and (tor) Firefox. This runs completely fine when I start the script manually, in my IDE or from cmd. However, this script needs to be run when I am not in the office, so I run it via a batch script (along with some other scraping scripts) which is started by the windows task scheduler. Again, when I run this batch file manually, the script runs perfectly. When the scheduled task runs, on the other hand, it fails as soon as the webdriver is set with the following unhelpful error message:
Traceback (most recent call last):
...
#private library traceback
...
File "redacted.py", line 322, in redacted_func
driver = webdriver.Firefox()
File "C:\Program Files\Python 3.5\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 152, in __init__
keep_alive=True)
File "C:\Program Files\Python 3.5\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 98, in __init__
self.start_session(desired_capabilities, browser_profile)
File "C:\Program Files\Python 3.5\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 188, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "C:\Program Files\Python 3.5\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 256, in execute
self.error_handler.check_response(response)
File "C:\Program Files\Python 3.5\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status: 0
Does anyone have any idea why this error occurs only when run from the task scheduler?
After further investigation, I found that the error happens when the task is processed in the background. This happens when the task is set to 'Run whether user is logged on or not'. Changing this setting to 'Run only when user is logged on' allows the task to run without issue, in the foreground.
For some reason, running in the background causes firefox to crash, but chromedriver (which is used in some other scraping scripts I run) is unaffected by this. This seems slightly bizarre, but this workaround is sufficient to get this running on windows.
Related
I'm working on a GUI desktop app to give users control over a web scraper. I have two executables, one runs a Selenium scraper (Py27) and one runs the GUI which starts and stops the scraper using subprocess.Popen (Python 38). Unfortunately, using the same Python version in both apps is not possible for me right now. Although we don't seem to be hitting environment problems because of the Python version discrepancy.
I'm on Windows, using Selenium 3.141, Geckodriver 0.26, cx_Freeze 5, and Firefox 77.
I am able to run both exes manually without problem, but when I try to run the scraper exe through the GUI exe, it gets caught in an infinite loop of creating a tmp directory with User.js inside, opening a geckodriver command prompt, and then closing and starting over. It ends up continually generating more and more of these tmp directories until I kill the GUI window. The traceback tells me a few things -- Geckodriver is hitting a permissions error, and the app is having trouble establishing the home directory.
I've tested starting a scrape process using Popen, so that's not the issue.
When I run it manually and dump the Firefox options, and compare it to the Firefox options when I run it through the GUI, there is some difference. This is from the FF profile on a manual (successful) run:
'userPrefs': 'c:\\users\\atadmin\\appdata\\local\\temp\\tmpqctvna\\user.js',
'profile_dir': 'c:\\users\\atadmin\\appdata\\local\\temp\\tmpqctvna',
'extensionsDir': 'c:\\users\\atadmin\\appdata\\local\\temp\\tmpqctvna\\extensions',
and this is the profile from an unsuccessful GUI run:
'userPrefs': 'c:\\users\\atadmin\\documents\\aeleads-master\\aeleads-portal-testing\\build\\exe.win-amd64-3.8\\tmpu9ktpl\\user.js',
'profile_dir': 'c:\\users\\atadmin\\documents\\aeleads-master\\aeleads-portal-testing\\build\\exe.win-amd64-3.8\\tmpu9ktpl',
'extensionsDir': 'c:\\users\\atadmin\\documents\\aeleads-master\\aeleads-portal-testing\\build\\exe.win-amd64-3.8\\tmpu9ktpl\\extensions',
And here is the latest traceback:
Can't determine home directory
utils.tss_logging: Logging initailizing with None/None
Error: Traceback (most recent call last):
File "li_scraper.py", line 63, in run_scraper
File "C:\Users\atadmin\Documents\aeleads-master\aeleads-portal-testing\scrape\scraper.py", line 394, in collection
self.login()
File "C:\Users\atadmin\Documents\aeleads-master\aeleads-portal-testing\scrape\scraper.py", line 85, in login
driver = self._get_driver(use_proxy=self.use_proxy)
File "C:\Users\atadmin\Documents\aeleads-master\aeleads-portal-testing\scrape\scraper.py", line 622, in _get_driver
use_proxy=use_proxy, reserve_proxy=False)
File "C:\Users\atadmin\Documents\aeleads-master\aeleads-portal-testing\lib\webdrivers.py", line 164, in open_webdriver
capabilities=capabilities, log_path="C:\\Users\\atadmin\\Documents\\aeleads-master\\geckodriver.log")
File "C:\Python27\Py27\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 174, in __init__
keep_alive=True)
File "C:\Python27\Py27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "C:\Python27\Py27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "C:\Python27\Py27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Python27\Py27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
WebDriverException: Message: permission denied
It seems like cx_Freeze creates its own build environment, maybe causing me to lose the user, based off the permissions error. I'm fairly new to Python.
Figured it out. When passing the environment into Popen, I was passing a stripped out environment that didn't include the TEMP path. I passed in my full environment and then changed the variables that needed to be changed, and now it is working.
I have a script called stock_automation.sh that rus a python script called stock_automation.py.
When I run it from terminal it works perfectly fine, but when I try to use cron to run it automatically, it gives an error with the selenium chromedriver.
my crontab entry is the following: 25 14 * * * /home/user/project-scrapers/stock_automation.sh >> /home/daniel/amazon-project-scrapers/stock_automation.sh.log 2>&1
The content of my script is the following:
#!/usr/bin/env bash
echo "Hello!"
source /home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/bin/activate
python /home/user/project-scrapers/stock_automation.py
The error in my logfile is the following:
Hello!
/home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/lib/python3.7/site-packages/pandas/compat/__init__.py:117: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
warnings.warn(msg)
Traceback (most recent call last):
File "/home/user/project-scrapers/stock_automation.py", line 235, in <module>
dfs = stock(webpage, MAXIMUM_VALUE)
File "/home/user/project-scrapers/stock_automation.py", line 24, in stock
executable_path=executable_path)
File "/home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/user/.local/share/virtualenvs/project-scrapers-kMVxSu_o/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
If you suspect this is an IPython 7.13.0 bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev#python.org
You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
%config Application.verbose_crash=True
I have a python selenium script. I want to run the script on remotely connected(using ssh) PC. when i run the script directly on that PC, it run. but, while i run the same script remotely, it throw error.
Traceback (most recent call last):
File "crawling_script.py", line 14, in <module>
driver = webdriver.Chrome(executable_path='/var/www/html/hariharan/health_grades/chromedriver')
File "/var/www/html/hariharan/health_grades/env/local/lib/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/var/www/html/hariharan/health_grades/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/var/www/html/hariharan/health_grades/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/var/www/html/hariharan/health_grades/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/var/www/html/hariharan/health_grades/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Linux 4.4.0-145-generic x86_64)
Please give me a solution to run the selenium python script remotely.
Thanks in advance.
Try ssh -X, it forwards graphical output to local machine.
I'm try to run the selenium python package in Debian 9 Stretch for web-scraping purposes; I installed such versions for the following softwares:
Python 2.7.13 (with Pycharm 2018.2 Community Edition)
Mozilla Firefox Quantum 61.0.1 (64 bit)
Selenium 3.14 (with GeckoDriver v0.21.0)
When I try to call the web driver by running:
driver = webdriver.Firefox(executable_path="/home/quant/Documenti/Executable/geckodriver")
I get the following error message in the python console:
Traceback (most recent call last): File "", line 1, in
File
"/home/quant/Scrivania/BettingDataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/firefox/webdriver.py",
line 167, in init
keep_alive=True) File "/home/quant/Scrivania/BettingDataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py",
line 156, in init
self.start_session(capabilities, browser_profile) File "/home/quant/Scrivania/BettingDataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py",
line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters) File "/home/quant/Scrivania/BettingDataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py",
line 320, in execute
self.error_handler.check_response(response) File "/home/quant/Scrivania/BettingDataDownload/venv/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py",
line 242, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.SessionNotCreatedException: Message: Unable
to find a matching set of capabilities
What's wrong?
The path of the executable is correct and the file is executable; moreover, by adding the firefox_binary option to the webdriver.Firefox function
as follows:
driver = webdriver.Firefox(firefox_binary="/snap/bin/firefox", executable_path="/home/quant/Documenti/Executable/geckodriver")
one gets the same error shown above.
Any help or suggestion will be appreciated.
Thanks all.
So i use PyCharm 2018.2.3, geckodriver 0.20, Firefox 63.0 and Python 3.6.5 in it for my Selenium auto-tests.
What I try to do in Python Console to run the driver:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.accept_untrusted_certs = True
wd = webdriver.Firefox(executable_path="C:\\Users\\user\\geckodriver.exe", firefox_profile=profile)
So the geckodriver runs in a new window. Then I ususally write
url = "https://website-address.com/"
wd.get(url)
But this time I get the message:
Previous command is still running. Please wait or press Ctrl+C in console to interrupt.
And then in like a couple of minutes geckodriver exit with these messages in log:
Traceback (most recent call last):
File "<input>", line 4, in <module>
File "C:\Users\user\project\venv\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 170, in __init__
keep_alive=True)
File "C:\Users\user\project\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 156, in __init__
self.start_session(capabilities, browser_profile)
File "C:\Users\user\project\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "C:\Users\user\project\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "C:\Users\user\project\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: newSession
Please help me to understand what is going on and how to fix it.
I was having the same problem and was fixed after I update the geckodriver to v0.21.0
So I've opened an issue on geckodriver github (https://github.com/mozilla/geckodriver/issues/1369), which was closed by a developer and I got an actual answer here in another similar issue: https://github.com/mozilla/geckodriver/issues/1305 - they've released a new version of geckodriver which fixed everything.
P.S. I'm almost absolutely sure I had no issues with my firewall.