I have to automate multiple webpages using Selenium. The preferred method is - WebDriver with Python on Windows. Since there number of webpages to test is very very large, I am trying to figure out if I can make this process parallel. E.g. From command line, I execute
python script1.py
Say I have 100 such scripts and I want to execute them in batches of 5 each. Also one requirement is that is 1 out of 5 scripts completes, then the master starts 6th script to always keep 5 scripts in parallel.
I have searched in docs and some forums. But I could not find any help with this. I have done similar thing in the past, but that involved firing multiple browsers actually from code, so kind of different. This involves Python and Webdriver.
Any help is appreciated.
Thanks and Regards.
I wanted to do something similar where I wanted to run multiple testcases at once. I guess this can be achieved by using Selenium Grid .
I have no idea why this was downvoted? Anyways, I found a way to do this.
It can be done by importing subprocess module and then passing arguments to the call function as -
subprocess.call(["python", "d:/pysel/hello.py"])
subprocess.call(["python", "d:/pysel/goodbye.py"])
It is not exactly parallel. But may work for my situation.
Related
So I am trying to run a gui to access certain files and a program to create those files in parallel. So once they are created, they wont be touched by the latter anymore.
My question is, how do I best implement this? Do I want multiprocessing? If so, how do I multiprocess just two functions?
I looked into it and the multiprocessing module seems to be what I want, but I havent quite understood how I can let it run two pools or how I run two specific functions with that. It just seems to take in a function and split it up arbitrarily.
My idea would be to just call two bat files in parallel which start their python process. But thats hardly an elegant solution.
So in short, how do I start two specific seperate functions with multiprocessing? Or is there any other more elegant solution, like os.fork or smth like that, that works as I want?
is it possible run functions in diferents tabs at the same time?(firefox)
let me explain
I am doing scraping...then I want to open a diferent tabs and to do scraping in these at the same time.
I used to do it using different windows (the easy way). But now the web I scraped dont let me to be loggin in diferentes windows. but if it is in the same windows with multiple tabs I can be logged in everyone.
or maybe there is other way: Is there a way to run two diferents script in the same windows?. For example, run the first script and later that the second script open a new tab in the windows opened for the first script?
Thank you for the help.
In short, no. In a single Selenium webdriver instance, you can only interact with a single window handle at any given time. This answer has more related details you may find pertinent.
With regard to running two different scripts on the same window in different tabs... it is (or at least was at some point in time) technically possible to do in at least Internet Explorer. Selenium maintainers decided that it would not be feasible to implement a general solution for this. While you may find a hack to do this, it almost certainly will be browser-specific and very fragile, therefore not recommended.
Your best bet will be to just have two separate instances.
This would lead to a more nvolved question, but I could use some help with such things as search terms to use to find more info, links (someone has already done this, etc.).
First of all, lets assume a centos7/apache/mod_wsgi serverset up.
Lets also assume no django, no nginx, no bootstrap,no php, possibly not even a database.
(That should narrow things down quite a bit).
I want to use 2 or 3 python scripts (maybe more), but 3 main ones.
Now, let's say I have an html page with a button,a href that calls a python script. What choices do I have on where to put this script? Right in the apache root directory, the cgi-bin, somewhere else? And let's say this button and script takes the user into a protected directory (the same script - is that possible?). And hooks up to another python script that does some math - maybe random numbers, and leads to another python script that sends the user somewhere else - another html page? Sorry, this is pretty vague - which is why I'm asking the question - need more info. And I think this also applies to security, a lot of these questions and answers are very outdated. Where's the best place to place python scripts in apache?
Try this for a start: https://www.linux.com/blog/configuring-apache2-run-python-scripts.
Once you've got it running, take the next step (i.e. adapt your python code to do what you want it to do).
Does that help you to take a step towards your goal?
First of all - many thanks in advance. I really appreciate it all.
So I'm in need for crawling a small amount of urls rather constantly (around every hour) and get specific data
A PHP site will be updated with the crawled data, I cannot change that
I've read this solution: Best solution to host a crawler? which seems to be fine and has the upside of using cloud services if you want something to be scaled up.
I'm also aware of the existence of Scrapy
Now, I winder if there is a more complete solution to this matter without me having to set all these things up. It seems to me that it's not a very distinguish problem that I'm trying to solve and I'd like to save time and have some more complete solution or instructions.
I would contact the person in this thread to get more specific help, but I can't. (https://stackoverflow.com/users/2335675/marcus-lind)
Currently running Windows on my personal machine, trying to mess with Scrapy is not the easiest thing, with installation problems and stuff like that.
Do you think there is no way avoiding this specific work?
In case there isn't, how do I know if I should go with Python/Scrapy or Ruby On Rails, for example?
If the data you're trying to get are reasonably well structured, you could use a third party service like Kimono or import.io.
I find setting up a basic crawler in Python to be incredibly easy. After looking at a lot of them, including Scrapy (it didn't play well with my windows machine either due to the nightmare dependencies), I settled on using Selenium's python package driven by PhantomJS for headless browsing.
Defining your crawling function would probably only take a handful of lines of code. This is a little rudimentary but if you wanted to do it super simply as a straight python script you could even do something like this and just let it run while some condition is true or until you kill the script.
from selenium import webdriver
import time
crawler = webdriver.PhantomJS()
crawler.set_window_size(1024,768)
def crawl():
crawler.get('http://www.url.com/')
# Find your elements, get the contents, parse them using Selenium or BeautifulSoup
while True:
crawl()
time.sleep(3600)
I have 2 questions, so I figured I would cram them into 1 single post instead of filling the board up with useless information
Simple description of situation: I am attempting to create a python script that opens an executable for a simple C++ program with an unknown number of inputs in a windows environment, sends some data into that program, and then check to see if it has crashed / rinse and repeat.
Question 1: This is a pipes question. Bear with me, I am still learning about pipes, so I may have a misunderstanding of exactly how they work. Forgive me if I do. Is it possible to detect how many inputs a program has? Basically what I'm attempting to do is open an executable using my python script, that I personally know nothing about, and send in garbage data into each available input. If it is NOT possible to detect how many inputs there are: would there be an adverse reaction (like crashing the program Im sending the data into) if I send more arguments into it than there are inputs? As in the C++ program takes 3 inputs and I send in 6 arguments?
Question 2: Does anyone know if it possible using a python script to detect whether a program has hung or not? So far the best information on this I've been able to find is simply detecting whether the program is running or not via FindWindow, and then I suppose I could monitor the CPU usage to see if it continues to rise... but that is hardly an ideal method (and may not even work properly!) If there are any better known methods out there I would be thrilled!
Thanks for your time :)
An Answer to Question 2
You should look into investigating psutil, hosted # https://github.com/giampaolo/psutil . I don't know whether you'll find what you're looking for, but pusutil is a decent API, offering access to info such as number of CPUs in addition to process information, which is what you want.