How do I change the browser used by the view(response) command in the scrapy shell? It defaults to safari on my machine but I'd like it to use chrome as the development tools in chrome are better.
As eLRuLL already mentioned, view(response) uses webbrowser to open the web page you downloaded. To change its behavior, you need to set a BROWSER environment variable.
You could do this by adding the following line at the end of your ~/.bashrc file:
export BROWSER=/usr/bin/firefox (if you would like firefox to be used).
I don't have Chrome installed, but by doing a fast search on Google, it seems its path is /usr/bin/google-chrome-stable; therefore, you could try export BROWSER=/usr/bin/google-chrome-stable instead. I didn't test it for Chrome though.
There is no current way to specify which browser to use to open the response, as it internally uses the webbrowser package. This package uses your default configured browser to open the current response.
You could always change the default browser to chrome on your system, that should make webbrowser use it.
This fixed it for me:
If you're on windows 10, find or create a random html-file on your system.
Right click the html-file
Open with
Choose another app
Select your browser (e.g Google Chrome) and check the box "Always use this app to open .html"
Now attempt to use view(response) in the Scrapy shell again and it should work.
Try this
import webbrowser
from scrapy.utils.response import open_in_browser
open_in_browser(response, _openfunc=webbrowser.get("/usr/bin/google-chrome").open)
Related
How to use normal chrome completely without chromedriver selenium python not duplicate.
I am using python 3.8.8,os is windows 7 ultimate with pycharm as
IDE and chrome version is around 96. and my problem is that whenever I use my python script to scrape a website it uses chromedriver and when I specify what's given below:
options = Options ()
options.add_argument(r"user-data-dir=my chrome path which is not Executable instead the user data")
#this works but when opening chrome it shows "browser is controlled by automated software" and changing it to normal chrome. Exe won't work
Sure it uses normal chrome with my credentials but it still needs chromedriver to work and when I delete the chromedriver it throw an error and when I go into selenium source code in a file called site.py(or sites.py) which I changed the variable self. executable to chrome.exe path and it worked and it won't show the message browser is controlled by automated software but it won't do anything , it is just stuck there and what I want to do is use chrome as the browser to scrape without chromedriver in my pc is it possible? If yes please tell me how should I go on to do it and you can ask for further Clarification and details and Thanks in advance
By default, selenium is detected as an automated software and is flagged by most websites, and the flag is unable to be removed. There are, however, external libraries that can be installed that can remove the flag.
There are options here to try to get around the default flag and hide the fact the browser is automated.
Edit
I understand the question further, and see that you want a more portable chrome option. Chrome driver is a very specific program controlled by selenium and must be used. There is no substitute. You can use Firefox driver or internet explorer, but a webdriver must be used (hence the name driver for driving the main browser). When you specify the directory for the Chrome binary, you aren’t removing the middleman of the chromedriver, only Specifying where chrome driver needs to look!
Using Selenium you won't be able to initiate/spawn a new Browsing Context i.e. Chrome Browser session without the ChromeDriver.
The Parts and Pieces
As a minimum requirement, the WebDriver i.e the ChromeDriver talks to a browser through a driver and the communication is two way:
WebDriver passes commands to the browser through the driver
Receives information back via the same route.
Hence using ChromeDriver is a mandatory requirement.
I have up and running an Apache Server with Python 3.x installed already on it. Right now I am trying to run ON the server a little python program (let's say filename.py). But this python program uses the webdriver for Chrome from Selenium. Also it uses sleep from time (but I think this comes by default, so I figure it won't be a problem)
from selenium import webdriver
When I code this program for the first time on my computer, not only I had to write the line of code above but also to manually download the webdriver for Chrome and paste it on /usr/local/bin. Here is the link to the file in case you wonder: Webdriver for Chorme
Anyway, I do not know what the equivalences are to configure this on my server. Do you have any idea how to do it? Or any concepts I could learn related to installing packages on an Apache Server?
Simple solution:
You don't need to install the driver in usr/local/bin. You can have the .exe anywhere and you can specify that with an executable path, see here for an example.
Solution for running on a server
If you have python installed on the server, ideally >3.4 which comes with pip as default. Then install ChromeDriver on a standalone server, follow the instructions here
Note that, Selenium always need an instance of a browser to control.
Luckily, there are browsers out there that aren't that heavy as the usual browsers you know. You don't have to open IE / Firefox / Chrome / Opera. You can use HtmlUnitDriver which controls HTMLUnit - a headless Java browser that does not have any UI. Or a PhantomJsDriver which drives PhantomJS - another headless browser running on WebKit.
Those headless browsers are much less memory-heavy, usually are faster (since they don't have to render anything), they don't require a graphical interface to be available for the computer they run at and are therefore easily usable server-side.
Sample code of headless setup
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
It's also worth reading on running Selenium RC, see here on that.
How do I navigate and click buttons on an already open Chrome instance/window. So the scenario is, that an application named "ApplicationNAME" is installed on my Windows machine, but its not a Windows application, because when I run the application it is only opening a chrome instance (not the default Chrome browser). I am also able to see the Developers Tool of chrome, and I am able to inspect the elements which means a chrome instance is opened.
So the idea is to automate this Application using Python and Selenium. I am open for alternate suggestions too. Thank you!
I have used the below Python (3.7.3) code but it doesn't help, because it rather opens another Chrome browser or navigate to already open Chrome window rather than going into the desired "ApplicationNAME" chrome instance.
from pywinauto.application import Application, WindowSpecification
import time
import requests
import selenium
from selenium import webdriver
app = Application().start(cmd_line=u'"C:\\Program Files (x86)\\ApplicationName.exe" ')
chromewidgetwin = app[u'ApplicationName']
chromewidgetwin.wait('ready')
chromerenderwidgethosthwnd = chromewidgetwin[u'Chrome Legacy Window']
chromerenderwidgethosthwnd.click()
Driver = webdriver.Chrome ("C:\\chromedriver.exe")
Driver.switch_to_window('ApplicationName')
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.google.com')
Can someone help me with the above code. I expect this code to open a new tab in Firefox with google.com; instead this opens a new Internet Explorer tab.
Setting up geckodriver for Selenium Python resolves the issue I think.
It needs to set geckodriver path with FirefoxDriver as below code:
self.driver = webdriver.Firefox(executable_path = 'PATH\TO\geckodriver.exe')
Download geckodriver for your suitable OS from https://github.com/mozilla/geckodriver/releases
Extract it in a folder of your choice
Set the path correctly as mentioned above
IEDriverServer and GeckoDriver both of the WebDriver variants being/getting W3C Compliant and evolving with each day, it is quite possible that trying to use one of the variants gets hooked to other variant of Web Browser because of the following reasons :
Your Automated Tests may be running in an environment which is Manually Intervened by opening/closing of Internet Explorer and Firefox Web Browsers.
You have a (couple of) dangling instance of IEDriverServer within your system which needs to be cleaned up.
Solution :
Here are a few possible solutions for the issue you are facing :
Always be explicit mentioning the absolute location of the GeckoDriver while initializing the WebDriver / Web Browser instance as follows :
driver=webdriver.Firefox(executable_path=r'C:\path\to\geckodriver.exe')
Within the tearDown() method of your Test Execution always use quit() as follows :
driver.quit()
Before you start the Test Execution ensure there are no dangling instances of any WebDriver variant.
Incase your Test Framework leaves any dangling instances add the following Windows Based Command at the end of your script to kill the dangling WebDriver.
taskkill /F /IM <webdriver_variant>.exe /T
Periodically Cleanup your Project WorkSpace in your IDE.
Run CCleaner tool to wipe away all the OS chores before and after Test Execution.
When you uninstall any Browser (any Software) from your system use Revo Uninstaller which cleans up your Unused Registry Settings as well.
I am trying to open the Chrome extension page ("chrome://extensions/") in Chrome as the last step of a local script (so I can finally reload an extension there).
Unfortunately I am failing with this. :-(
-> Python 2.7 for win 8.1 x64
import webbrowser
webbrowser.get().open("chrome://extensions/")
This with Chrome as standard browser only results in windows telling me not knowing how to handle this:
And when I call Chrome directly via...
webbrowser.get("C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s").open_new_tab("chrome://extensions/"))
... or ...
import subprocess
subprocess.Popen([r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe', 'chrome://extensions/']).wait()
... or when I try to open the URL via windows run dialog ...
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" "chrome://extensions/"
... it always just opens a new window with the New Tab page, while a HTTP(S) URL opens these calls correctly.
Has anyone an idea how to open this chrome-specific page?
I believe that by default accessing/passing chrome url's is disabled/sandboxed outside of chrome, but there are cli switches that can be passed to chrome that you can use to change this??
List of Chrome switches here
Extension code docs, this might help you if all you are trying to do is reload an extension. Instead of doing it through the chrome:// URI