I am using selenium. I have the following code:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.mysite.com")
x = browser.find_elements_by_xpath("//div[#id='containeriso3']/div/a[1]")
hrefs = [i.get_attribute('href') for i in x]
Now, this works.
But I want to do this is on a server which runs on ubuntu command line. This means I cannot use this
browser = webdriver.Firefox()
in my code. What alternative can be used to this through command line?
You can use HtmlUnitDriver which is headless browser based on Rhino javascript engine.
http://code.google.com/p/selenium/wiki/HtmlUnitDriver
If your Ubuntu server and your desktop are on the same network use Selenium Grid. Your code will start on the Linux server and your tests will be executed on your desktop.
Take a look at the following link:
http://code.google.com/p/selenium/wiki/Grid2
The examples are in Java but I'm sure you can adapt them to Python or at least get the idea of what you need to do.
I think you can also try to use ghost driver that is based on phantomjs:
https://github.com/detro/ghostdriver, or you can also try to run usual firefox driver on Xvfb.
It depends on what you need.
Related
I am trying to build a TinderBot.
Both Chrome and FireFox ask for geo permission outside of code (the pop-up drops from the browser's address bar, so it's not inside the html and I cannot access it with .find_element)
I found some prompts on Chrome here: https://testingbot.com/support/selenium/permission-popups (didn't try it though, so not sure if they are up to date)
But I cannot find anything for FireFox.
I found this piece of code that disables javascipt
https://www.selenium.dev/documentation/webdriver/capabilities/firefox/
And I believe that I could build on it, but I cannot find how to set it so that it gives permission for geolocation.
Recently I though I found here a piece that at least might show me how to correctly pass '-hedless' argument but it won't open browser now.
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.options import Options
opts = Options()
opts.add_argument('-headless')
srvc = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(options=opts, service=srvc)
So generally I have 2 problems here:
How do I make add_argument work? (I mean other cases turned out deprecated)
What arguments do I need to target to allow geolocation on launching browser with the bot?
Am I even on the right path? I cannot ask questions in relevant threads because of insufficient rating, so here I am.
To initiate Firefox browser in headless mode instead of using add_argument() you need to set the headless property to true as follows:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
opts = Options()
opts.headless = True
srvc = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(options=opts, service=srvc)
You can find a relevant discussion in How to make Firefox headless programmatically in Selenium with Python?
I want to create an application which will take a keyword as input and search that on youtube and then scrape the links and save them and in a Notepad file, and all of this works in background. I am familiar about BeautifulSoup library, and selenium but I want it to work in background unlike Selenium which works in front of us. I hope the question is clear, if not you may ask.
I am familiar about Selenium, but I want to automate the search in background.
from selenium import webdriver
driver=webdriver.Chrome("C:\\Users\\MyPC\\Downloads\\chromedriver_win32\\chromedriver.exe")
driver.set_page_load_timeout(10)
driver.get("http://www.youtube.com")
driver.find_element_by_name("search_query").send_keys("Selenium Tutorial")
driver.find_element_by_id("search-icon-legacy").click()
time.sleep(4)
driver.quit()
This code opens the browser and then performs the search, but I want everything to happen in background and fast without and delay.
You can run browser with option --headless and it will not display its window. It works for Firefox and Chrome.
Firefox
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
#options.headless = True
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get('https://stackoverflow.com')
driver.save_screenshot('screenshot-firefox.png')
driver.close()
Chrome
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
#options.headless = True
options.add_argument('--headless')
#options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get('https://stackoverflow.com')
driver.save_screenshot('screenshot-chrome.png')
driver.close()
There was webdriver PhantomJS which simulated headless web browser but it is not developed any more. This code still runs but it gives me empty page_sourceand empty file screenshot.png
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('https://stackoverflow.com')
print(driver.page_source)
driver.save_screenshot('screenshot.png')
driver.close()
On Linux you could use Xvfb to create fake/virtual monitor and program can use it to display its window. This way you don't see this window on your screen.
All this methods has to render page so it may not run faster.
To scrape faster you would have to analyze requests/responses from/to web browser and do the same with python module requests - but it is not ease. But this way program doesn't have to render page and run JavaScript so it will run a lot faster.
But then you may have another problem - if you make request too offen (too fast) then server can block you and you need proxy servers to have different IPs.
theres a workaround using pyvirtualdisplay also if you want to hide the selenium browser, dont forget to close the browse when you're done
i think webhosts can detect headless browsers, so u may get different results
stop loading the page after u get what you're looking for and/or close the browser after you save the source, insert javascript
I keep getting this error:
https://sites.google.com/a/chromium.org/chromedriver/help/chromedriver-crashes
I get it when running the command:
python Web.py
However when I go into the file and run the lines 1 by 1, I don't get the error. However I always get the error when the Web.py file has finished. When I run the lines 1 by 1, it's very basic things but i feel like I"m not ending my script correctly.
import selenium
from selenium.webdriver.common.keys import Keys
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('espn.com')
I want the window with espn.com to stay on the screen, not quit when the script has finished.
I'm running this on Python. I can share my setup, maybe that's something I did incorreclty but any help would be appeciated.
You're passing an invalid url.
You need to pass the url like this:
driver.get("http://www.espn.com")
This might work in your browser, but it won't with selenium. Type in "espn.com" in your browser and then copy / paste the url and you'll see that it's actually the above url.
You should also specify the "chromedriver.exe" path.
You are getting this error because you had not installed the chrome driver for selenium on your Machine. Selenium by default provides the driver for Firefox so when you use the webdriver for Firefox, it won't rise any error. To resolve this issue with Chrome you can download the Chrome webdriver from here.
and you can specify the driver as
from selenium import webdriver
d = webdriver.Chrome(executable_path='<your Chrome driver path>')
Adding to what #Pythonista said , it's better if you keep the URL as a raw string than a normal string
driver.get(r'http://www.espn.com')
so that it won't take the slash as an escape sequence in few cases.
Hope it helps.
Try to update chrome and get updated/latest chrome driver, recently chrome made several updates in its driver you can download the last one from the link below:
https://chromedriver.storage.googleapis.com/2.27/chromedriver_win32.zip
Or perhaps "hack" the actual library code for webbrowser or Selenium to do this? I'm looking in the current documentation and not seeing that this is possible, but perhaps you could adjust the actual library code and insert this functionality?
Unfortunately, the following approach with Selenium doesn't work - it only returns the original URL:
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a387355f&redirect_uri=http://pythondev.instadev.com/instagramredirect.html&response_type=code")
print driver.current_url
No, but if you're looking for a module that gives you more control over the browser look into selenium webdriver.
In selenium the function is driver.current_url
as the bob0the0mighty's link shows, you'll need to have a loop that waits for the url to change. try something like:
initialurl="https://api.instagram.com/oauth/authorize/client_id=cb0096f08a387355f&redirect_uri=http://pythondev.instadev.com/instagramredirect.html&response_type=code"
currenturl="https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a387355f&redirect_uri=http://pythondev.instadev.com/instagramredirect.html&response_type=code"
while(true):
if(currenturl != initialurl):
print currenturl
break
The python webbrowser module launches an external process that is different depending on platform and even environment variables on that platform. So I'd say in general "no".
I just installed Selenium Web Driver and tried it out. It works great. My use case can be describe as followed:
Start Firefox on a server with pseudo X server (Xvfb)
New Driver.Firefox() object
Open 10 tabs and load a webpage in each tab
Retrieve the html from all loaded pages
The only step that is not working is step 3. I can not find out how to open new tabs. I found this here on SO : How to open a new tab using Selenium WebDriver with Java? However, I tested this locally (i.e. with visible display) on my Mac for debugging purpose and I saw that the Firefox browser (which was opened when creating the driver object) does not open any tabs when doing as described on the SO thread. So I tried this here:
driver = webdriver.Firefox()
driver.get("https://stackoverflow.com/")
body = driver.find_element_by_tag_name("body")
body.send_keys(Keys.CONTROL + 't')
As I said, it does not work for me. So, how else is it possible to open tabs? I use Selenium 2.39 (pip install selenium) and Python 2.7.
the key combination to open a new tab on OSX is Command+T, so you should use
body.send_keys(Keys.COMMAND + 't')
It's probably slightly more correct to send it to the browser via action chaining since you're not actually typing text; this also makes your code more readable imo
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
# before correction from DMfll:
# ActionChains(driver).send_keys(Keys.COMMAND, "t").perform()
# correct method
ActionChains(driver).key_down(Keys.COMMAND).send_keys("t").key_up(Keys.COMMAND).perform()