Run a scheduled task with ChromeDriver - python

I have made a script based on Selenium and Chromedriver. Basically a program that login into a site. Writes a comment (From a txt file from computer) and then closes the program and no it is not a spam script but a script I have made just to begin with python and selenium.
The program itself works very well if I start it manually. Then there is no issue and chromedriver is headless since I don't need to see the whole process chrome_options.add_argument("--headless")
Then I saw a post from here Scheduling a Python Script
and I did follow it
but the issue im having is that everytime it is the time and the program starts. It comes up the script and then a fast error which I managed to print
Which I can see there is a issue with the Chromedriver. The thing is now. How can I make this script to work through schedule tasks with Chromedriver running on the background. I might have done the setup wrong but the program works manually so I guess there might be a issue with Windows schedule tasks?
Basically I just want the script to run on the background every xx:xx time.
Please feel free to comment if needed something more information.
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument('--disable-notifications')
chrome_options.add_argument("--headless")
chrome_options.add_argument("--user-agent=Mozilla/5.0 (Linux; Android 6.0; HTC One M9 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.implicitly_wait(5)

The driver you are using is too old and does not recognize ChromeHeadless, you need to use version 2.29 or newer:
Selenium ChromeDriver does not recognize newly compiled Headless Chromium (Python)

I had the same issue (with a newer "headless" version) and the solution was to run the Windows Scheduled Task as "Administrators" (with "s").

I had this problem and almost give up from task scheduler and started to write windows service which would run my web scraping application. If you got exception chrome unreachable or chrome didn't get response from HTTP server...
This make my application work even through task scheduler.
Task Scheduler -> Go on task -> Properties -> Conditions > Under Network -> Check Network Start only if the following network connection is available -> Select "Any connection".
Go this from this post windows-10-task-scheduler-not-running

When using selenium with Chrome, you have to make sure that your selenium version is compatible with the chrome version you have installed. If you get the app to work and then update your chrome you might get an unrecognized version error.
If you have chrome v 102.0 like me, you need to get the latest selenium chrome driver that is the same version as your browser.
See Example Below:
Chrome Driver Download: https://chromedriver.chromium.org/downloads

Related

Can you set a screen size and use pyautogui / selenium chrome driver without a monitor

Does anyone know how to force pyautogui or python in general to recognize a set screen size and take in display data, even if no display is connected?
I have a task that uses pyautogui, and selenium chrome driver, which both require a display, or they fail.
It runs on a server, so the start of the program requires my laptop to remote desktop into the server, allowing it to have a display, which allows launching a page with chromedriver, and pyautogui click components / screen search to work.
The issue arises that should my home network ever be down, it cannot kick off the remote desktop, and therefore my automation would fail.
My solution would be to emulate or force the program to behave as if a display existed, so it can just be run server side.
All of my servers are windows, so XVFB does not seem to be an option based on
Xvfb on Windows
Well I am using something similar on regular basis,
I am using Windows server to run my automated python script which uses selenium webdriver.
so, first your answer is you need to use code for screen size in the script and you can run that script with Windows task scheduler so, you don't have to touch your laptop or desktop to run remote desktop.
you can use chrome options if you are using chrome as headless browser.
I also, suggest if you are using server which is controlled by some third party who provides regular updates then, you can use chromedriver_autoinstaller package to take updated or supported version chrome driver according to your current version of chrome.
code:
import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
from selenium.webdriver.chrome.options import Options
chromedriver_autoinstaller.install()
#maximize the chrome
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('website address')
For automating task Windows task scheduler is the best option. you can refer this documentation or find something according to your need.
https://towardsdatascience.com/automate-your-python-scripts-with-task-scheduler-661d0a40b279
Note: If you just need to set screen size then your answer start here #maximize the chrome
You should emulate your display driver. Run:
xvfb-run --server-args="-screen 0 1024x768x24" python my_script.py
to launch your script instead of just python my_script.py.
I know the question has been a little while...
I just found out this using PyAutoGUI on remote machines or headless machines
at here https://github.com/asweigart/pyautogui/issues/133
Can look at this guys workaround: http://fredtantini.free.fr/blog/index.php?article58/automatiser-des-actions-avec-selenium-pyautogui-et-xvfb
Thank you

Selenium cannot get webpage content on Linux but work well on Windows for a specific website

I am web-scraping Bitcoin quotations from Coinsuper. It is a javascript page. When I first develop my code on Windows using Python 3.7, Selenium, and Chromium, it works well.
I want to deploy this code on my server to fetch data continuously. However, it doesn't work under Linux.
I am sure my code can work, at least on most websites, including Apple, Google, Baidu, Xueqiu, etc.
For the OS system, I have tried Debian 9 and Ubuntu 18.04.
For webdriver, I have tried both Chrome and Firefox.
For webdriver parameters, I have tried:
Add header, including fake-useragent
Ignore SSL certificate
Disable GPU
These make no difference.
I think it might because Coinsuper has some anti-scraping strategy. But I am also confused why the similar code can work on Windows but not on Linux. Are there any differences that might cause this situation?
The code:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu') # Only included in Linux version
chrome_options.add_argument('--no-sandbox') # Only included in Linux version
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://www.coinsuper.com/trade')
print(driver.page_source)
driver.quit()
I am the one who asks this question. Thank you all for helping me! Finally, I have solved this problem.
#furas showed that my code could actually get responses from Coinsuper.
#Dalvenjia inspired me that this might be caused by IP blacklist, which is most probable for the cloud servers. And yes, I am using a cloud server.
Here is the solution:
Start a Shadowsocks server from my home IP address, or use any proxy you have.
Start Shadowsocks client on the server:
Add one more argument to ChromeDriver in Python script:
chrome_options.add_argument('--proxy-server=socks5://127.0.0.1:xxxx')
Now I can get contents by bypassing the IP blacklist.
I recommend you to use WebDriverManager dependency:
https://github.com/bonigarcia/webdrivermanager
By using WebDriverManager, you didn't need to download or manage drivers path in code.

Selenium Chromedriver Hangs?

I have a long running python app that will periodically (every 30-60 seconds) open a webpage with selenium and chrome driver, run some javascript and take a screenshot. Its running on an EC2 ubuntu instance with chrome in Xvfb and for the most part everything is working, except intermittently the program will hang. It is happening on one of these lines:
options = Options()
options.add_argument("--disable-web-security")
options.add_argument("--webdriver-logfile=webdrive.log")
dc = DesiredCapabilities.CHROME
dc['loggingPrefs'] = {'browser': 'ALL'}
driver = webdriver.Chrome(chrome_options=options, desired_capabilities=dc)
driver.get(url);
(I don't have an exact line but I know from debug statements I have put in that it is somewhere in between here)
Unfortunately, the program doesn't crash so it doesn't have any error messages, its just waiting endlessly since 7pm last night. Running strace -p 'python program pid' returns: wait4(-1, and running strace -p 'chromedriver pid' returns recvfrom(20,
I can see in ps axjf that the process is still running, its just not doing anything. I'm sort of at a loss of what to do now, any suggestions?
chromedriver version: 2.10.267518
Google Chrome 40.0.2214.111
Selenium (installed with pip): 2.42.1
#https://github.com/cgoldberg/xvfbwrapper
xvfb = Xvfb(width=1920, height=1920)
xvfb.start()
---- EDIT ----
I have just updated to ChromeDriver 2.14.313457 and Selenium 2.44.0, hopefully this will fix the issue. I'm going to leave this open for now. Thanks for the advice so far guys!
---- EDIT ----
So the service still ended up hanging. I'm wondering if this is because for every screenshot I close and restart google-chrome? Is this possibly causing a memory leak somehow? How could I diagnose this?
I ran into a similar issue and found the answer here and blogged about it here. Setting the environment variable DBUS_SESSION_BUS_ADDRESS=/dev/null worked for me without having to restart Xvfb all the time.
I never found out the specific piece of code that was causing this problem, but creating a fresh instance of Xvfb with each driver load seems to have fixed it. Perhaps there is a memory leak somewhere in the interaction between selenium and Xvfb? Either way, marking this as closed.
In my case, the problem disappeared enclosing the call within a with block, like this:
options = Options()
path = f"{os.environ['LOCALAPPDATA']}{os.path.sep}Programs{os.path.sep}Chromium{os.path.sep}Win_x64{os.path.sep}1012736{os.path.sep}"
options.binary_location = f"{path}chrome-win{os.path.sep}chrome.exe"
executable_path = f"{path}chromedriver_win32{os.path.sep}chromedriver.exe"
with webdriver.Chrome(chrome_options=options, executable_path=executable_path) as browser:
browser.get("https://live.euronext.com/en/products/equities")
sleep(3)
Selenium 4.4.0, Python 3.10, Windows 10, ChromeDriver and Chromium 104.0.5112.0

ChromeDriver Does Not Open Websites

I am experiencing a very strange behaviour when testing Chrome via selenium webdriver.
Instead of navigating to pages like one would expect the 'get' command leads only to the download of tiny files (no type or.apsx files) from the target site.
Importantly - this behavior only occurs when I pass chrome_options as an argument
to the Chrome driver.
The same testing scripts work flawless with firefox driver.
Code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options # tried with and without
proxy = '127.0.0.1:9951' # connects to a proxy application
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('whatismyip.com')
Leads to the automatic download of a file called download (no file extension, Size 2 Byte).
While calling other sites results in the download of small aspx files.
This all happens while the browser page remains blank and no interaction with
elements happen = the site is not loaded at all.
No error message, except element not found is thrown.
This is really strange.
Additional info:
I run Debian Wheezy 32 bit and use Python 2.7.
Any suggestions how to solve this issue?
I tried your code and captured the traffic on localhost using an SOCKS v5 proxy through SSH. It is definitely sending data through the proxy but no data is coming back. I have confirmed the proxy was working using Firefox.
I'm running Google Chrome on Ubuntu 14.04 LTS 64-bit. My Chrome browser gives me the following message when I try to configure a proxy in its settings menu:
When running Google Chrome under a supported desktop environment, the
system proxy settings will be used. However, either your system is not
supported or there was a problem launching your system configuration.
But you can still configure via the command line. Please see man
google-chrome-stable for more information on flags and environment
variables.
Unfortunately I don't have a man page for google-chrome-stable.
I also discovered that according to the selenium documentation Chrome is using the system wide proxy settings and according to their documentation it is unknown how to set the proxy in Chrome programmatically: http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#using-a-proxy

Cannot create browser process when using selenium from python on RHEL5

I'm trying to use selenium from python but I'm having a problem running it on a RHEL5.5 server. I don't seem to be able to really start firefox.
from selenium import webdriver
b = webdriver.Firefox()
On my laptop with ubuntu this works fine and it starts a brings up a firefox window. When I log in to the server with ssh I can run firefox from the command line and get it displayed on my laptop. It is clearly firefox from the server since it has the RHEL5.5 home page.
When I run the python script above on the server it (or run it in ipython) the script hangs at webdriver.Firefox()
I have also tried
from selenium import webdriver
fb = webdriver.FirefoxProfile()
fb.native_events_enabled=True
b=webdriver.Firefox(fb)
Which also hangs on the final line there.
I'm using python2.7 installed in /opt/python2.7. In installed selenium with /opt/python2.7/pip-2.7.
I can see the firefox process on the server with top and it is using a lot of CPU. I can also see from /proc/#/environ that the DISPLAY is set to localhost:10.0 which seems right.
How can I get a browser started with selenium on RHEL5.5? How can I figure out why Firefox is not starting?
It looks like the problem I'm encountering is this selenium bug:
http://code.google.com/p/selenium/issues/detail?id=2852
I used the fix described in comment #9 http://code.google.com/p/selenium/issues/detail?id=2852#c9
That worked for me.

Categories