Selenium browser instance can be accessible from a different process?

Selenium browser instance can be accessible from a different process? - python

What I am currently trying to do is the following. There are a number of changing values (js driven) in a website that I am monitoring and saving to a database using Selenium. The values are read through infinite loops, from elements found with selenium's find_element.
This works as intended with one process. However, when I try to multiprocess this (to monitor multiple values at the same time), there seems to be no way to do it without opening one separate browser for each process (unfeasible, since we are talking about close to 60 different elements).
The browser I open before multiprocessing seems to not be available from within the various processes. Even if I find the elements before the multiprocessing step, I cannot pass them to the process since the webelements can't be pickled.
Am I doing something wrong, is selenium not the tool for the job, or is there another way?
The code below doesn't actually work, it's just meant to show the structure of what I currently have as a "working version". What I need to get away from is opening the browser from within the function and have all my processes relying on a single browser.
import time
import datetime
import os
from selenium import webdriver
from multiprocessing import Pool
def sampling(value_ID):
dir = os.path.dirname(__file__)
driver = webdriver.Firefox(dir)
driver.get("https:\\website.org")
monitored_value = driver.find_element_by_xpath('value_ID')
while(1):
print(monitored_value.text)
time.sleep(0.1)
value_array = [1,2,3,4,5,6]
if __name__ == '__main__':
with Pool(6) as p:
p.map(getSampleRT, value_array)

You can checkout selenium abstract listeners if you want to capture the changes in elements. By implementing a listener you can get rid of infinite loops. Here is an example that i think it can work for you.
class EventListeners(AbstractEventListener):
def before_change_value_of(self, element, driver):
# check if this is the element you are looking for
# do your stuff
print("element changed!")
driver_with_listeners = EventFiringWebDriver(driver, EventListeners()
# wait as much as you like while your listeners are working
driver_with_listeners.implicitly_wait(20000)
Also you can checkout this post for more complete implementation.

Related

Performance testing with JMeter JSR223 sampler

I'm doing performance testing with JMeter Python using JSR223 sampler. I want to know the following.
How to connect to existing browser window?
How to calculate performance timing?
Suppose I have 10 steps in Python code. I want to calculate timing from step 3 to step 5.
How to call methods from one JSR223 sampler to another?
Kindly help me with it.
Thanks.

If the browser was triggered from Selenium you can determine its sessionid like:
self.driver.session_id
and then start another WebDriver instance providing the aforementioned session_id as the parameter:
driver = webdriver.Remote(command_executor=url,desired_capabilities={})
driver.session_id = session_id
if the browser wasn't kicked off via Selenium - it's not possible.
You can use Transaction Controller to measure the cumulative execution time of its children
You can put your shared logic into a separate .py file and use sys.path to load it where required like:
from sys import path
path.append(path_to_your_shared_module)
import YourSharedModule
//call functions from the shared module

Getting stuck executing infinite javascript loop in Python's Selenium chromedriver

I am trying to build a service where users can insert their Javascript code and it gets executed on a website of their choice. I use webdriver from python's selenium lib and chromedriver. The problem is that the python script gets stuck if user submits Javascript code with infinite loop.
The python script needs to process many tasks like: go to a website and execute some Javascript code. So, I can't afford to let it get stuck. Infinite loop in Javascript is known to cause a browser to freeze. But isn't there some way to set a timeout for webdriver's execute_script method? I would like to get back to python after a timeout and continue to run code after the execute_script command. Is this possible?
from selenium import webdriver
chromedriver = "C:\chromedriver\chromedriver.exe"
driver = webdriver.Chrome(chromedriver)
driver.get("http://www.bulletproofpasswords.org/") # Or any other website
driver.execute_script("while (1); // Javascript infinite loop causing freeze")

You could set a timeout for your driver.execute_script("while (1);") call.
I have found another post that could solve this issue.
Basically, if you are on a Unix system, you could use signal to set a timeout for your driver.execute_script("while (1); call.
Or if you could run it in a separate process and then end the process if it takes too long using multiprocessing.Process. I'm including the example that was given in the other post:
import multiprocessing
import time
# bar
def bar():
for i in range(100):
print "Tick"
time.sleep(1)
if __name__ == '__main__':
# Start bar as a process
p = multiprocessing.Process(target=bar)
p.start()
# Wait for 10 seconds or until process finishes
p.join(10)
# If thread is still active
if p.is_alive():
print "running... let's kill it..."
# Terminate
p.terminate()
p.join()

Python Selenium send request and avoid "Waiting for (website) ...."

I am launching several requests on different tabs. While one tab loads I will iteratively go to other tabs and see whether they have loaded correctly. The mechanism works great except for one thing: a lot of time is wasted "Waiting for (website)..."
The way in which I go from one tab to the other is launching an exception whenever a key element that I have to find is missing. But, in order to check for this exception (and therefore to proceed on other tabs, as it should do) what happens is that I have to wait for the request to end (so for the message "Waiting for..." to disappear).
Would it be possible not to wait? That is, would it be possible to launch the request via browser.get(..) and then immediately change tab?

Yes you can do that. You need to change the pageLoadStrategy of the driver. Below is an example of firefox
import time
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium import webdriver
cap = DesiredCapabilities.FIREFOX
cap["pageLoadStrategy"] = "none"
print(DesiredCapabilities.FIREFOX)
driver = webdriver.Firefox(capabilities=cap)
driver.get("http://tarunlalwani.com")
#execute code for tab 2
#execute code for tab 3
Nothing will wait now and it is up to you to do all the waiting. You can also use eager instead of none

Multi-threading in selenium python

I am working on a project which needs bit automation and web-scraping for which I am using Selenium and BeautifulSoup (python2.7).
I want to open only one instance of a web browser and login to a website, keeping that session, I am trying to open new tabs which will be independently controlled by threads, each thread controlling a tab and performing their own task. How should I do it? An example code would be nice. Well here's my code:
def threadFunc(driver, tabId):
if tabId == 1:
#open a new tab and do something in it
elif tabId == 2:
#open another new tab with some different link and perform some task
.... #other cases
class tabThreads(threading.Thread):
def __init__(self, driver, tabId):
threading.Thread.__init__(self)
self.tabID = tabId
self.driver = driver
def run(self):
print "Executing tab ", self.tabID
threadFunc(self.driver, self.tabID)
def func():
# Created a main window
driver = webdriver.Firefox()
driver.get("...someLink...")
# This is the part where i am stuck, whether to create threads and send
# them the same web-driver to stick with the current session by using the
# javascript call "window.open('')" or use a separate for each tab to
# operate on individual pages, but that will open a new browser instance
# everytime a driver is created
thread1 = tabThreads(driver, 1)
thread2 = tabThreads(driver, 2)
...... #other threads
I am open to suggestions for using any other module, if needed

My understanding is that Selenium drivers are not thread-safe. In the WebDriver spec, the Thread Safety section is empty...which I take to mean they have not addressed the topic at all. https://www.w3.org/TR/2012/WD-webdriver-20120710/#thread-safety
So while you could share the driver reference with multiple threads and make calls to the driver from multiple threads, there is no guarantee that the driver will be able to handle multiple asynchronous calls correctly.
Instead, you must either synchronize calls from multiple threads to ensure one is completed before the next starts, or you should have just one thread making Selenium API calls...potentially handling commands from a queue that is filled by multiple other threads.
Also, see Can Selenium use multi threading in one browser?

I you are using the script to automatically submit forms (simply said doing GET and POST requests), I would recommend you to look at requests. You can easily capture Post requests from your Browser (Network tab in Developer Pane on both Firefox and Chrome), and submit them. Something like:
session = requests.session()
response = session.get('https://stackoverflow.com/')
soup = BeautifulSoup(response.text)
and even POST data like:
postdata = {'username':'John','password':password}
response=session.post('example.com',data=postdata,allow_redirects=True)
It can be easily threaded, Multiple times faster than using selenium, the only problem is there is no JavaScript or Form support, so you need to do it the old fashioned way.
EDIT:
Also take a look at ThreadPoolExecutor

Is it possible to parallelize selenium webdriver get_attribute calls in python?

I am running this code
from multiprocessing.Pool import ThreadPool
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(url)
elements = driver.find_elements_by_class_name("class-name")
pool = ThreadPool(4)
async = [pool.apply_async(fn_which_calls_get_attribute,(element,)) for element in elements]
results = [result.get() for result in async]
which works fine for some of the results, but throws an error of ResponseNotReady for other results. It runs as expected if I use "pool.apply" instead of the async version.
Is it a problem that I am making multiple calls to the selenium driver at once, and the error is because it cannot handle it? Or is something wrong with my parallelization?

Just Hint that Selenium run in a single thread and in a single core system. So its not possible to exercise multi-threading over selenium webdriver . Yes you can create a separate instance and attach to another core of a multi core system.
I may not answer your question but if you are trying to do something similar good not to do.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium browser instance can be accessible from a different process? - python

Related

Performance testing with JMeter JSR223 sampler

Getting stuck executing infinite javascript loop in Python's Selenium chromedriver

Python Selenium send request and avoid "Waiting for (website) ...."

Multi-threading in selenium python

Is it possible to parallelize selenium webdriver get_attribute calls in python?

Categories

Resources