Python - Selenium - Single thread to multiple threads - python

I have an automation project made with Python and Selenium which I'm trying to make it run with multiple browsers in parallel.
The current workflow:
open a browser for manual login
save cookies for later use
in a loop, open additional browsers, load the saved session in each newly opened browser
The described workflow is opening some browsers, one by one, until all required browsers are opened.
My code contains several classes: Browser and Ui.
The object instantiated with Ui class contains a method which at some point executes the following code:
for asset in Inventory.assets:
self.browsers[asset] = ui.Browser()
# self.__open_window(asset) # if it is uncommented, the code is working properly without multi threading part; all the browsers are opened one by one
# try 1
# threads = []
# for asset in Inventory.assets:
# threads.append(Thread(target=self.__open_window, args=(asset,), name=asset))
# for thread in threads:
# thread.start()
# try 2
# with concurrent.futures.ThreadPoolExecutor() as executor:
# futures = []
# for asset in Inventory.assets:
# futures.append(executor.submit(self.__open_window, asset=asset))
# for future in concurrent.futures.as_completed(futures):
# print(future.result())
The problem appear when self.__open_window is executed within a thread. There i get an error related to Selenium, something like: 'NoneType' object has no attribute 'get', when self.driver.get(url) is called from the Browser class.
def __open_window(self, asset):
self.interface = self.browsers[asset]
self.interface.open_browser()
In class Browser:
def open_browser(self, driver_path=""):
# ...
options = webdriver.ChromeOptions()
# ...
#
web_driver = webdriver.Chrome(executable_path=driver_path, options=options)
#
self.driver = web_driver
self.opened_tabs["default"] = web_driver.current_window_handle
#
# ...
def get_url(self, url):
try:
self.driver.get(url) # this line cause problems ...
except Exception as e:
print(e)
My questions are:
Why do i have this issue in a multi threading environment?
What should i do in order to make the code work properly?
Thank You

I found the mistake, it was because of a wrong object reference.
After modification the code is working well.
I updated the following lines at __open_window:
def __open_window(self, asset, browser):
browser.interface = self.browsers[asset]
browser.interface.open_browser()
and in # try 1 code section:
threads.append(Thread(target=self.__open_window, args=(asset, browser, ), name=asset))

Related

SendMouseWheelEvent does not work on subsequent page loads

I'm trying to use SendMouseWheelEvent to pass mouse scrolls into the browser, but it is only working on the first page load. Any subsequent pageloads after that, it does not work, unless the browser is interferred with actual mouse scroll.
I've prepared the following script. The "errorrep" is actually the name of the script itself, as "errorrep.py", into which I'm assigning the browser as "mybrowser".
How can I have the browser correctly work with SendMouseWheelEvent on subsequent pageloads?
cefpython version is 66.1.
"""
SendMouseWheelEvent is correctly performed on the first page load. On any pageloads after that, it does not seem to work, but it works again when the actual mouse device performs a scroll on the webpage
'# test 1' which does a javascript code 'scrollBy', works on every pageload.
'# test 2' which does a SendMouseWheelEvent, only works on first page load.
"""
from cefpython3 import cefpython as cef
import time
from threading import Thread
import errorrep
mybrowser = None
def main():
cef.Initialize()
browser = cef.CreateBrowserSync(url="https://stackoverflow.com/")
browser.SetClientHandler(LoadHandler())
errorrep.mybrowser = browser
cef.MessageLoop()
del browser
cef.Shutdown()
class LoadHandler(object):
def OnLoadingStateChange(self, browser, is_loading, **_):
if not is_loading:
Thread(target=test).start()
def test():
print("Page loading is complete!")
for i in range(5):
print('scrolling')
# test 1
errorrep.mybrowser.ExecuteJavascript(jsCode="""scrollBy(0,25)""")
# test 2
# errorrep.mybrowser.SendMouseWheelEvent(0,0,0,-120, cef.EVENTFLAG_NONE)
time.sleep(1)
print('opening new page, clicking on the logo link')
errorrep.mybrowser.ExecuteJavascript(jsCode="""document.getElementsByClassName('s-topbar--logo js-gps-track')[0].click();""")
if __name__ == '__main__':
main()

Can`t attach to detached selenium window in python

Cant send commands to selenium webdriver in detached session because link http://localhost:port died.
But if i put breakpoint 1 link stay alive
import multiprocessing
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def create_driver_pool(q):
options = Options()
driver = webdriver.Chrome(options=options)
pass #breakpoint 1
return driver.command_executor._url
windows_pool = multiprocessing.Pool(processes=1)
result = windows_pool.map(create_driver_pool, [1])
print(result)
pass # breakpoint 2 for testing link
why is this happening and what can i do about it?
After some research i finally found the reason for this behavor.
Thanks https://bentyeh.github.io/blog/20190527_Python-multiprocessing.html and some googling about signals.
This is not signals at all.
I found this code in selenium.common.service
def __del__(self):
print("del detected")
# `subprocess.Popen` doesn't send signal on `__del__`;
# so we attempt to close the launched process when `__del__`
# is triggered.
try:
self.stop()
except Exception:
pass
This is handler for garbage collector function, that killing subprocess via SIGTERM
self.process.terminate()
self.process.wait()
self.process.kill()
self.process = None
But if you in the debug mode with breakpoint, garbage collector wont collect this object, and del wont start.

How to give each URL its own threading

I have been working on a small PoC where I have been trying to improve my knowledge with Threading but unfortunately I got stuck and here I am,
import time
found_products = []
site_catalog = [
"https://www.graffitishop.net/Sneakers",
"https://www.graffitishop.net/T-shirts",
"https://www.graffitishop.net/Sweatshirts",
"https://www.graffitishop.net/Shirts"
]
def threading_feeds():
# Create own thread for each URL as we want to run concurrent
for get_links in site_catalog:
monitor_feed(link=get_links)
def monitor_feed(link: str) -> None:
old_payload = product_data(...)
while True:
new_payload = product_data(...)
if old_payload != new_payload:
for links in new_payload:
if links not in found_products:
logger.info(f'Detected new link -> {found_link} | From -> {link}')
# Execute new thread as we don't want to block this function wait for filtering() to be done before continue
filtering(link=found_link)
else:
logger.info("Nothing new")
time.sleep(60)
continue
def filtering(found_link):
...
1 - Im currently trying to do a monitor where I have multiple links to start with, my plan is that I would like to each url to run concurrently instead of needing to wait 1 by 1:
def threading_feeds():
# Create own thread for each URL as we want to run concurrent
for get_links in site_catalog:
monitor_feed(link=get_links)
How can I do that?
2 - If we seem to find a new product that has appeared to the given URL inside the monitor_feed, how can I set a new thread to execute the call filtering(link=found_link)? I dont want to wait for it to be done before it continues to loop back for the While True but instead it should do the filtering(link=found_link) in the background while it stil executes the monitor_feed
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(monitor_feed, site_catalog)
You can use ThreadPoolExecutor.

Using WebDriver to take screenshots with always on process

I have a service that takes screenshots of given url using Selenium Web Driver.
It workes Ok, raises a process -> takes the screenshot -> closes the process.
the problem is - it takes too long to return.
is there a way that the web driver process stays always-on and waits for requests?
here is my code
class WebDriver(webdriver.Chrome):
def __init__(self, *args, **kwargs):
logger.info('Start WebDriver instance.')
self.start_time = datetime.now()
self.lock = threading.Lock()
kwargs['chrome_options'] = self.get_chrome_options()
super().__init__(*args, **kwargs)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
logger.info(f'Quiting Webdriver instance {id(self)}, took {datetime.now() - self.start_time}')
self.quit()
#staticmethod
def get_chrome_options():
chrome_options = ChromeOptions()
chrome_options.headless = True
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument("--no-sandbox") # Bypass OS security model
chrome_options.add_argument('--disable-dev-shm-usage') # overcome limited resource problems
chrome_options.add_argument("--lang=en")
chrome_options.add_argument("--disable-infobars") # disabling infobars
chrome_options.add_argument("--disable-extensions") # disabling extensions
chrome_options.add_argument("--hide-scrollbars")
return chrome_options
def capture_screenshot_from_html_string(self, html_str, window_size):
with tempfile.TemporaryDirectory() as tmpdirname:
html_filename = tmpdirname + f'/template.html'
with open(html_filename, 'w') as f:
f.write(html_str)
url = 'file://' + html_filename
img_str = self.capture_screenshot(url, window_size)
return img_str
def capture_screenshot(self, url, window_size):
self.lock.acquire()
try:
self.set_window_size(*window_size)
self.get(url)
self.maximize_window()
self.set_page_load_timeout(PAGE_LOAD_TIMEOUT)
img_str = self.get_screenshot_as_png()
except Exception as exc:
logger.error(f'Error capturing screenshot url: {url}; {exc}')
img_str = None
finally:
self.lock.release()
return img_str
After some research i found a solution and im posting it to maybe help others in similar problem.
using py-object-pool library.
Object pool library creates a pool of resource class instance and use them in your project. Pool is implemented using python built in library Queue.
Each time creating a new browser instance is time consuming task which will make client to wait.
If you have one browser instance and manage with browser tab, it will become cumbersome to maintain and debug in case of any issue arises.
Object Pool will help you to manage in that situation as it creates resource pool and provides to each client when it requests. Thus separating the process from one another without waiting or creating new instance on the spot.
Code Example
ff_browser_pool = ObjectPool(FirefoxBrowser, min_init=2)
with ff_browser_pool.get() as (browser, browser_stats):
title = browser.get_page_title('https://www.google.co.in/')
for more information see link below
https://pypi.org/project/py-object-pool/

How to enable selenium webdriver to work with Python remote objects (Pyro) in a bokeh document?

I have a python document running in bokeh server with bokeh serve view_server_so.py --show. The webpage is immediately opened in a browser tab.
For testing purposes I am using Pyro to allow remote access to some of the objects / methods running in the bokeh server.
My server testscript creates just one button and one method (exposed to remote control), bokeh serves the webpage on http://localhost:5006/view_server_so.
The button is a Toggle button, each time the button is pressed its active state alternates between True and False, and its color changes.
# view_server_so.py
# This script shall be started from the command line with
# bokeh serve view_server_so.py --show
from bokeh.layouts import layout, row
bokeh.models import Toggle
from bokeh.io import curdoc
import os
import Pyro4
import socket
import subprocess
import threading
# The layout used in the html-page
class View():
def __init__(self):
self.document = curdoc()
self.btn_task = Toggle(label="Press me",
button_type="success",
css_classes=["btn_task"])
self.layout = layout(row(self.btn_task))
#Pyro4.expose
def get_btn_task_active_state(self):
# This method is enabled for remote control
# The btn_task changes its active state each time it is pressed (bool)
print('active state: {}'.format(self.btn_task.active))
return self.btn_task.active
# As the script is started with 'bokeh serve',
# bokeh has control of the main event loop.
# To enable Pyro to listen for client requests for the object view,
# and its exposed method,
# the Pyro event loop needs to run in its own thread.
class PyroDaemon(threading.Thread):
def __init__(self, obj):
threading.Thread.__init__(self)
self.obj=obj
self.started=threading.Event()
print('PyroDaemon.__init__()')
def run(self):
daemon=Pyro4.Daemon()
obj = self.obj
self.uri=daemon.register(obj,"test")
self.started.set()
print('PyroDaemon.run()')
daemon.requestLoop()
view = View()
print('view created')
# starting a Pyro name server
hostname = socket.gethostname()
print('hostname: {}'.format(hostname))
try:
print('try Pyro4.locateNS()')
name_server=Pyro4.locateNS()
except Pyro4.errors.NamingError:
print('except Pyro4.errorsNamingError')
args = ['python', '-m', 'Pyro4.naming', '-n', hostname]
p = subprocess.Popen(args)
name_server=Pyro4.locateNS()
print('Nameserver is started')
# create a pyro daemon with object view, running in its own worker thread
pyro_thread=PyroDaemon(view)
pyro_thread.setDaemon(True)
pyro_thread.start()
pyro_thread.started.wait()
print('Daemon is started')
# The object view is registered at the Pyro name server
name_server.register('view', pyro_thread.uri, metadata = {'View'})
print('view is registered')
view.document.add_root(view.layout)
Now I have a second testscript. After some setup for the remote object, it simply calls several times the method get_btn_task_active_state() and prints the result. When between the calls the button is clicked in the browser, the active state switches and is correctly printed.
import Pyro4
import time
# looking for the Pyro name server created in view_server_so.py
nameserver=Pyro4.locateNS(broadcast = True)
# get the URI (universal resource identifier)
uris = nameserver.list(metadata_all = {'View'})
# create a proxy object that interacts (remote control)
# with the remote object 'view'
view_rc = Pyro4.Proxy(uris['view'])
for i in range(3):
time.sleep(3)
# change the state manually: click on the button
print('state: {}'.format(view_rc.get_btn_task_active_state()))
# prints
# state: False
# state: True
# state: False
As manual testing gets tedious, I want to automate the manual clicks to the button. So I am adding webdriver support, look up the button in the webpage, and have some automated clicks and the calls to get_btn_task_active_state() as before.
import Pyro4
from selenium import webdriver
import socket
import time
# looking for the Pyro name server created in view_server_so.py
nameserver=Pyro4.locateNS(broadcast = True)
# get the URI (universal resource identifier)
uris = nameserver.list(metadata_all = {'View'})
# create a proxy object that interacts (remote control)
# with the remote object 'view'
view_rc = Pyro4.Proxy(uris['view'])
# use the Chrome webdriver
driver = webdriver.Chrome()
# open a Chrome Browser and the given webpage
driver.get("http://localhost:5006/view_server_so")
time.sleep(1)
# Find the button
btn_task = driver.find_element_by_class_name("btn_task")
for i in range(3):
time.sleep(1)
print('state: {}'.format(view_rc.get_btn_task_active_state()))
btn_task.click()
# prints
# state: False
# state: False
# state: False
#
# but should alternate between False and True
An webdriver controlled browser opens, I can see the color of the button change visually as it is automatically clicked, but the active state of the button does not seems to change any more.
What changes would be necessary, so that the automated script gives the same results as the manual testing?

Categories