Closing Selenium Browser that was Opened in a Child Process - python

Here's the situation:
I create a child process which opens and deals with a webdriver. The child process is finicky and might error, in which case it would close immediately, and control would be returned to the main function. In this situation, however, the browser would still be open (as the child process never completely finished running). How can I close a browser that is initialized in a child process?
Approaches I've tried so far:
1) Initializing the webdriver in the main function and passing it to the child process as an argument.
2) Passing the webdriver between the child and parent process using a queue.
The code:
import multiprocessing
def foo(queue):
driver = webdriver.Chrome()
queue.put(driver)
# Do some other stuff
# If finicky stuff happens, this driver.close() will not run
driver.close()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=foo, name='foo', args=(queue,))
# Wait for process to finish
# Try to close the browser if still open
try:
driver = queue.get()
driver.close()
except:
pass

I found a solution:
In foo(), get the process ID of the webdriver when you open a new browser. Add the process ID to the queue. Then in the main function, add time.sleep(60) to wait for a minute, then get the process ID from the queue and use a try-except to try and close the particular process ID.
If foo() running in a separate process hangs, then the browser will be closed in the main function after one minute.

Related

Can`t attach to detached selenium window in python

Cant send commands to selenium webdriver in detached session because link http://localhost:port died.
But if i put breakpoint 1 link stay alive
import multiprocessing
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def create_driver_pool(q):
options = Options()
driver = webdriver.Chrome(options=options)
pass #breakpoint 1
return driver.command_executor._url
windows_pool = multiprocessing.Pool(processes=1)
result = windows_pool.map(create_driver_pool, [1])
print(result)
pass # breakpoint 2 for testing link
why is this happening and what can i do about it?
After some research i finally found the reason for this behavor.
Thanks https://bentyeh.github.io/blog/20190527_Python-multiprocessing.html and some googling about signals.
This is not signals at all.
I found this code in selenium.common.service
def __del__(self):
print("del detected")
# `subprocess.Popen` doesn't send signal on `__del__`;
# so we attempt to close the launched process when `__del__`
# is triggered.
try:
self.stop()
except Exception:
pass
This is handler for garbage collector function, that killing subprocess via SIGTERM
self.process.terminate()
self.process.wait()
self.process.kill()
self.process = None
But if you in the debug mode with breakpoint, garbage collector wont collect this object, and del wont start.

Get error flag/message from a queued process in Python multiprocessing

I am preparing a Python multiprocessing tool where I use Process and Queue commands. The queue is putting another script in a process to run in parallel. As a sanity check, in the queue, I want to check if there is any error happing in my other script and return a flag/message if there was an error (status = os.system() will run the process and status is a flag for error). But I can't output errors from the queue/child in the consumer process to the parent process. Following are the main parts of my code (shortened):
import os
import time
from multiprocessing import Process, Queue, Lock
command_queue = Queue()
lock = Lock()
p = Process(target=producer, args=(command_queue, lock, test_config_list_path))
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock))
consumers.append(c)
p.daemon = True
p.start()
for c in consumers:
c.daemon = True
c.start()
p.join()
for c in consumers:
c.join()
if error_flag:
Stop_this_process_and_send_a_message!
def producer(queue, lock, ...):
for config_path in test_config_list_path:
queue.put((config_path, process_to_be_queued))
def consumer(queue, lock):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag = 1
time.sleep(3)
Now I want to get that error_flag and use it in the main code to handle things. But seems I can't output error_flag from the consumer (child) part to the main part of the code. I'd appreciate it if someone can help with this.
Given your update, I also pass an multiprocessing.Event instance to your to_do process. This allows you to simply issue a call to wait on the event in the main process, which will block until a call to set is called on it. Naturally, when to_do or one of its threads detects a script error, it would call set on the event after setting error_flag.value to True. This will wake up the main process who can then call method terminate on the process, which will do what you want. On a normal completion of to_do, it still is necessary to call set on the event since the main process is blocking until the event has been set. But in this case the main process will just call join on the process.
Using a multiprocessing.Value instance alone would have required periodically checking its value in a loop, so I think waiting on a multiprocessing.Event is better. I have also made a couple of other updates to your code with comments, so please review them:
import multiprocessing
from ctypes import c_bool
...
def to_do(event, error_flag):
# Run the tests
wrapper_threads.main(event, error_flag)
# on error or normal process completion:
event.set()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
# Call to time.sleep(some_number_of_seconds) should go here, right?
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
while True:
status = git_pull_change(git_path)
if status:
# The repo was just pulled, so no point in doing it again:
#repo = Repo(git_path)
#repo.remotes.origin.pull()
event = multiprocessing.Event()
error_flag = multiprocessing.Value(c_bool, False, lock=False)
process = multiprocessing.Process(target=to_do, args=(event, error_flag))
process.start()
# wait for an error or normal process completion:
event.wait()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
process.terminate() # Kill the process
else:
process.join()
break
You should always tag multiprocessing questions with the platform you are running on. Since I do not see your process-creating code within a if __name__ == '__main__': block, I have to assume you are running on a platform that uses OS fork calls to create new processes, such as Linux.
That means your newly created processes inherit the value of error_flag when they are created but for all intents and purposes, if a process modifies this variable, it is modifying a local copy of this variable that exists in an address space that is unique to that process.
You need to create error_flag in shared memory and pass it as an argument to your process:
from multiprocessing import Value
from ctypes import c_bool
...
error_flag = Value(c_bool, False, lock=False)
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock, error_flag))
consumers.append(c)
...
if error_flag.value:
...
#Stop_this_process_and_send_a_message!
def consumer(queue, lock, error_flag):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag.value = True
time.sleep(3)
But I have a questions/comments for you. You have in your original code the following statement:
if error_flag:
Stop_this_process_and_send_a_message!
But this statement is located after you have already joined all the started processes. So what processes are there to stop and where are you sending a message to (you have potentially multiple consumers any of which might be setting the error_flag -- by the way, no need to have this done under a lock since setting the value True is an atomic action). And since you are joining all your processes, i.e. waiting for them to complete, I am not sure why you are making them daemon processes. You are also passing a Lock instance to your producer and consumers, but it is not being used at all.
Your consumers return when they get a None record from the queue. So if you have N consumers, the last N elements of test_config_path need to be None.
I also see no need for having the producer process. The main process could just as well write all the records to the queue either before or even after it starts the consumer processes.
The call to time.sleep(3) you have at the end of function consumer is unreachable.
So the above code summary is the inner process to run some tests in parallel. I removed the def function part from it, but just assume that is the wrapper_threads in the following code summary. Here I'll add the parent process which is checking a variable (let's assume a commit in my git repo). The following process is meant to run indefinitely and when there is a change it will trigger the multiprocess in the main question:
def to_do():
# Run the tests
wrapper_threads.main()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
process = None
while True:
status = git_pull_change(git_path)
if status:
repo = Repo(git_path)
repo.remotes.origin.pull()
process = multiprocessing.Process(target=to_do)
process.start()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
os.system('pkill -U user XXX')
break
Now I want to propagate that error_flag from the child process to this process and stop process XXX. The problem is that I don't know how to bring that error_flag to this (grand)parent process.

How to get user input while Multiprocessing

I am trying to run multiple selenium instances in which I need to enter captchas, but I am a beginner in multiprocessing.
So while running and its time to give input it shows an error:
EOFError: EOF when reading a line
Here is an example of the code I am running:
import time
from selenium import webdriver
import multiprocessing
def first():
chromedriver = "C:\chromedriver"
driver = webdriver.Chrome(chromedriver)
driver.set_window_size(1000, 1000)
driver.get('https://www.google.com/')
time.sleep(5)
captcha1 = input("in1: ")
print(inn)
def sec():
chromedriver = "C:\chromedriver"
driverr = webdriver.Chrome(chromedriver)
driverr.set_window_size(1000, 1000)
driverr.get('https://www.google.com/')
captcha2 = input("in2: ")
print(ins)
if __name__ == '__main__':
p1 = multiprocessing.Process(target=first)
p2 = multiprocessing.Process(target=sec)
p1.start()
p2.start()
p1.join()
p2.join()
Not only do I need to know how to give input but in this instance the 'captcha2' input would be needed first, so the 'captcha1' would have to wait until 'captcha2' is given...
You need to send messages requesting user input back to the main process so that it (and only it) can ask the user about them. The simplest way to do this is probably to create a multiprocessing.Queue object for the requests (so that the main process can listen to all children) and a Pipe for each process for the answers. Each request would of course be labeled with an identifier for the process sending it so that the response could be sent to the right place.

Selenium webdriver + PhantomJS processes not closing

Here's just about the simplest open and close you can do with webdriver and phantom:
from selenium import webdriver
crawler = webdriver.PhantomJS()
crawler.set_window_size(1024,768)
crawler.get('https://www.google.com/')
crawler.quit()
On windows (7), every time I run my code to test something out, new instances of the conhost.exe and phantomjs.exe processes begin and never quit. Am I doing something stupid here? I figured the processes would quit when the crawler.quit() did...
Go figure. Problem resolved with a reboot.
Rebooting is not a solution for this problem. I have experimented this hack in LINUX system. Try modifying the stop() function defined in service.py
def stop(self):
"""
Cleans up the process
"""
if self._log:
self._log.close()
self._log = None
#If its dead dont worry
if self.process is None:
return
#Tell the Server to properly die in case
try:
if self.process:
self.process.stdin.close()
#self.process.kill()
self.process.send_signal(signal.SIGTERM)
self.process.wait()
self.process = None
except OSError:
# kill may not be available under windows environment
pass
Added line send_signal explicitly to give the signal to quit phantomjs process. Don't forget to add import signal statement at start of this file.

Restart a process if running longer than x amount of minutes

I have a program that creates a multiprocessing pool to handle a webextraction job. Essentially, a list of product ID's is fed into a pool of 10 processes that handle the queue. The code is pretty simple:
import multiprocessing
num_procs = 10
products = ['92765937', '20284759', '92302047', '20385473', ...etc]
def worker():
for workeritem in iter(q.get, None):
time.sleep(10)
get_product_data(workeritem)
q.task_done()
q.task_done()
q = multiprocessing.JoinableQueue()
procs = []
for i in range(num_procs):
procs.append(multiprocessing.Process(target=worker))
procs[-1].daemon = True
procs[-1].start()
for product in products:
time.sleep(10)
q.put(product)
q.join()
for p in procs:
q.put(None)
q.join()
for p in procs:
p.join()
The get_product_data() function takes the product, opens an instance of Selenium, and navigates to a site, logs in, and collects the details of the product and outputs to a csv file. The problem is, randomly (literally... it happens at different points of the website's navigation or extraction process) Selenium will stop doing whatever it's doing and just sit there and stop doing it's job. No exceptions are thrown or anything. I've done everything I can in the get_product_data() function to get this to not happen, but it seems to just be a problem with Selenium (i've tried using Firefox, PhantomJS, and Chrome as it's driver, and still run into the same problem no matter what).
Essentially, the process should never run for longer than, say, 10 minutes. Is there any way to kill a process and restart it with the same product id if it has been running for longer than the specified time?
This is all running on a Debian Wheezy box with Python 2.7.
You could write your code using multiprocessing.Pool and the timeout() function suggested by #VooDooNOFX. Not tested, consider it an executable pseudo-code:
#!/usr/bin/env python
import signal
from contextlib import closing
from multiprocessing import Pool
class Alarm(Exception):
pass
def alarm_handler(*args):
raise Alarm("timeout")
def mp_get_product_data(id, timeout=10, nretries=3):
signal.signal(signal.SIGALRM, alarm_handler) #XXX could move it to initializer
for i in range(nretries):
signal.alarm(timeout)
try:
return id, get_product_data(id), None
except Alarm as e:
timeout *= 2 # retry with increased timeout
except Exception as e:
break
finally:
signal.alarm(0) # disable alarm, no need to restore handler
return id, None, str(e)
if __name__=="__main__":
with closing(Pool(num_procs)) as pool:
for id, result, error in pool.imap_unordered(mp_get_product_data, products):
if error is not None: # report and/or reschedule
print("error: {} for {}".format(error, id))
pool.join()
You need to ask Selenium to wait an explicit amount of time, or wait for some implicit DOM object to be available. Take a quick look at the selenium docs about that.
From the link, here's a process that waits 10 seconds for the DOM element myDynamicElement to appear.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
ff = webdriver.Firefox()
ff.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(ff, 10).until(EC.presence_of_element_located((By.ID, "myDynamicElement")))
except TimeoutException as why:
# Do something to reject this item, possibly by re-adding it to the worker queue.
finally:
ff.quit()
If nothing is available in the given time period, a selenium.common.exceptions.TimeoutException is raised, which you can catch in a try/except loop like above.
EDIT
Another option is to ask multiprocessing to timeout the process after some amount of time. This is done using the built-in library signal. Here's an excellent example of doing this, however it's still up to you to add that item back into the work queue when you detect a process has been killed. You can do this in the def handler section of the code.

Categories