How to manage several selenium scripts running at once on VDS? - python

Currently, I have two python bots running on VDS, both of them are using selenium and running headless chrome to get dynamically generated content. While there was only one script, there was no problem, but now, it appears that the two scripts fight for the chrome process (or driver?) and only get it once the other one is done.
Have to mention, that in both scripts, Webdriver is instantiated and closed within a function, that itself is ran inside a Process of multiprocessing python module.
Running in virtual environment didn't do anything, each script has their own file of chrome driver in their respective directories, and by using ps -a I found that there are two different processes of chromedriver running and closing, so I am positive that scripts aren't using the same chrome.
Sometimes, the error says "session not started" and sometimes "window already closed".
My question is - how do I properly configure everything, so that the scripts don't interfere with each other?

For anyone having the same problem - double-triple-quadriple-check that the function, that you're passing in the Process, is the one instantiating Webdriver. I can't believe this problem is fixed just like that.

Related

How to close webdriver by different python script

I want to know is there a way to create a webdriver by script1.py , but close the webdriver by script2.py . I don't use the time.sleep() because I set these scripts to execute after few month. And I'm afraid that the scripts will delay due to the network crash. Any ideas are welcome.
Thanks!
The Python Selenium framework, SeleniumBase, comes with a special pytest command-line option called, --reuse-session, which tells all your tests to reuse the same browser session, even if all the tests live in different Python files. (More info on SeleniumBase command-line options here.) When using --reuse-session mode, the first test run will spin up the web browser, and then the last test run will close it. This should give you what you're looking for, where a different Python script closes the browser than the one that originally opened it.

~~Running Python Script on Startup That Opens Chrome Causing Unexpected Problems~~ SOLVED Multi-threading

I am running a python script that runs on startup of Windows 10 due to a shortcut of it being in the startup folder. This script is checking constantly for requirements on whether or not to open a web-page. If a page is launched and chrome has not previously been closed (yes closed, not just opened), the program just "dies" and freezes. After debugging my program the loop is no longer running, reading and writing to files stops, and even if another web-page should open ... it doesn't. I'm guessing this is a Chrome issue or a problem with this code windowsChromePath = 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s'
webbrowser.get(windowsChromePath).open(url)
Anyone know why this could be?
Edit: Forgot to mention I have had problems in this same project with read and write. Debugging leads me to believe it's not this, however if you think it could be a read/write problem it might be
Turns out webbrowser "blocks" waiting for an exit response from the browser. There is no way to make a non-block call. Use multi threading to run the open as a separate task.
Edit: Make sure to use try on webbrowser since it returns a connect timeout error by default if the windows is not closed. Except then raise if you really want to see the error.

running two python script in parallel in pycharm

I have looked at this question but I am not sure I got it correctly or not.
I have opened pycharm and one python script and its running (it's topic modeling).
Also I have another python script in which I opened in another pycharm in the same server. I also run it.
Now these two program are running in the same server, I should mention that I have not changed any configuration neither server nor pycharm.
Do you think its ok in this way? or one script technically won't run(in terms of progressing I mean it just show its running but practically wont run) until the other script finished?
Edit Configurations -> Allow parallel run. Done
First, PyCharm will create independent processes on the server, so both scripts will run. You can check it with something like htop - search for processes and verify that they're running.
Second, you don't have to open second PyCharm window to run the second script. You can run both of them from the single one. There are at least two ways: with run configurations or by spawning multiple terminal windows and running scripts from there.
From the Run/Debug Configurations windows you can add a Compound configuration that contains multiple configurations that will run in parallel. The Allow parallel run option for child configurations make no difference in this case.
The default behaviour was changed starting from version 2018.3. You can allow multiple runs by selecting Allow parallel run within the Edit Configurations menu.

Selenium not deleting profiles on browser close

I'm running some fairly simple tests using browsermob and selenium to open firefox browsers and navigate through a random pages. Each firefox instance is supposed to be independent and none of them share any cookies or cache. On my mac osx machine, this works quite nicely. The browsers open, navigate through a bunch of pages and then close.
On my windows machine, however, even after the firefox browser closes, the tmp** folders remain and, after leavin the test going on for a while, they begin to take up a lot of space. I was under the impression that each newly spawned browser would have its own profile, which it clearly does, but that it would delete the profile it made when the browser closes.
Is there an explicit selenium command I'm missing to enforce this behaviour?
Additionally, I've noticed that some of the tmp folders are showing up in AppData/Local/Temp/2 and that many others are showing up in the folder where I started running the script...
On your mac, have you looked in /var/folders/? You might find a bunch of anonymous*webdriver-profile folders a few levels down. (mine appear in /var/folders/sm/jngvd6s57ldb916b7h25d57r0000dn/T/)
Also, are you using driver.close() or driver.quit()? I thought driver.quit() cleans up the temp folder, but I could be wrong.

Why doesn’t input.send_keys() work in my Selenium WebDriver Python script when run as www-data?

I have a Python script that uses Selenium WebDriver (with PyVirtualDisplay as the display) to log into Flickr.
http://pastebin.com/dqmf4Ecw (you’ll need to add your own Flickr credentials)
When I run it as myself on my Debian server, it works fine. (I’m a sudoer, but I don’t use sudo when running the script.)
When I run it as the user www-data (which is what it’ll be running as eventually, because I want to trigger it from a Django website), I get two problems, one small, one big:
(Small): the webdriver.Firefox() call takes 30–45 seconds to return, compared to 2 seconds when run as myself
(Big): the script fails to log into Flickr. In order to log in, I find the username and password fields on the Flickr signin page (http://www.flickr.com/signin/), and use element.send_keys() to enter the username and password. Although Selenium seems to find the elements (i.e. no NoSuchElementException is thrown), the values do not get entered in the fields when the script is run as www-data (according to the screenshots I take using browser.save_screenshot), unlike when the script is run as myself.
Why does send_keys() not work when the script is run as www-data? (And is it related to the browser taking much longer to start?)
Maybe you have something different in your environment.
Try copy by example your ~/.bashrc in /home/www-data
If it's not sufficient, run this command both as your current user & as www-data:
strace -tt -f -s 1000 -o /tmp/trace ./script.py
And paste it (filter out your logins/passwords) somewhere.
We will see what's happens.
Sometimes, Firefox performs some nasty plugin compatibility check during startup. As each user can have a different set of browser plugins, this could be responsible for the difference in startup times. You could try to sync your Firefox profiles between users.
Then, are you sure that Firefox as user www-data has proper network/internet access? Can you confirm that the Flickr site loads properly via SeleniumHQ? "The script fails to log into Flickr" is too unprecise. Some more details about why it fails might reveal the problem instantaneously.
Edit: Sorry, I just understood that there shouldn't be a difference in profiles, because Selenium creates one. Nevertheless, my second point might be useful, so I won't delete this answer.
Some more things to ponder about:
Could you spawn firefox manually from www-data account once and make sure that Firefox is not updating itself before every execution of the script? I once faced this problem with Selenium RC on Windows and had to let the update finish before starting the script with the updated binary.
As a workaround, I guess you could you try running the script as www-data user but connecting remotely to a webdriver server running in your login (aka "grid" mode). Would that work for you?
I would suggest getting the latest chrome from google and trying input.send_keys() in that browser instead.
Sometimes some features of webdriver get broken with new releases.. If you are bent on testing with firefox, you might have better luck with an older/newer version of selenium webdriver.
I remember having a similar issue regarding send_keys() on a mac.. My issue was that send_keys() did not work in certain modal windows after I updated selenium webdriver.. I fixed it by reverting to an older webdriver that I knew to work. However, I was using Ruby and not Python to drive webdriver.
sometimes, there might also be a problem with getting the correct ENV variables in your shell if you use it as a different user. I would suggest trying to troubleshoot and see if all the shell ENV variables are set properly under www-data.

Categories