How to work with SOCKS5 in selenium Firefox/Chrome - python

I tried everything that I found. Tried to connect with extension but it was unsuccessful (I didn't find a ext config). Tried to internal settings (about:config). Tried connect with JS inside chrome.
Can I just use proxy for entire process (WebDriver)?

Chrome cant work with socks5, but you can use it in Firefox with addon (Proxyfoxy)
Create WEbDrive
Install addon
Setup proxy and select it
Done!

Related

how can i use an chrome extension in my selenium python program?

im just trying to use an vpn extension with selenium. I have the extension running , but i need to click in the button and enable the vpn so it can works, there's a way to do that with selenium? im thinking to use another similar option like scrapy or pyautogui...
No there is no way to enable the VPN on your extension
If you want to use your VPN extension you have to set a profile (otherwise selenium will create a new profile without installed extension)

How to install Selenium (python) on a Apache Web Server?

I have up and running an Apache Server with Python 3.x installed already on it. Right now I am trying to run ON the server a little python program (let's say filename.py). But this python program uses the webdriver for Chrome from Selenium. Also it uses sleep from time (but I think this comes by default, so I figure it won't be a problem)
from selenium import webdriver
When I code this program for the first time on my computer, not only I had to write the line of code above but also to manually download the webdriver for Chrome and paste it on /usr/local/bin. Here is the link to the file in case you wonder: Webdriver for Chorme
Anyway, I do not know what the equivalences are to configure this on my server. Do you have any idea how to do it? Or any concepts I could learn related to installing packages on an Apache Server?
Simple solution:
You don't need to install the driver in usr/local/bin. You can have the .exe anywhere and you can specify that with an executable path, see here for an example.
Solution for running on a server
If you have python installed on the server, ideally >3.4 which comes with pip as default. Then install ChromeDriver on a standalone server, follow the instructions here
Note that, Selenium always need an instance of a browser to control.
Luckily, there are browsers out there that aren't that heavy as the usual browsers you know. You don't have to open IE / Firefox / Chrome / Opera. You can use HtmlUnitDriver which controls HTMLUnit - a headless Java browser that does not have any UI. Or a PhantomJsDriver which drives PhantomJS - another headless browser running on WebKit.
Those headless browsers are much less memory-heavy, usually are faster (since they don't have to render anything), they don't require a graphical interface to be available for the computer they run at and are therefore easily usable server-side.
Sample code of headless setup
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
It's also worth reading on running Selenium RC, see here on that.

Selenium cannot get webpage content on Linux but work well on Windows for a specific website

I am web-scraping Bitcoin quotations from Coinsuper. It is a javascript page. When I first develop my code on Windows using Python 3.7, Selenium, and Chromium, it works well.
I want to deploy this code on my server to fetch data continuously. However, it doesn't work under Linux.
I am sure my code can work, at least on most websites, including Apple, Google, Baidu, Xueqiu, etc.
For the OS system, I have tried Debian 9 and Ubuntu 18.04.
For webdriver, I have tried both Chrome and Firefox.
For webdriver parameters, I have tried:
Add header, including fake-useragent
Ignore SSL certificate
Disable GPU
These make no difference.
I think it might because Coinsuper has some anti-scraping strategy. But I am also confused why the similar code can work on Windows but not on Linux. Are there any differences that might cause this situation?
The code:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu') # Only included in Linux version
chrome_options.add_argument('--no-sandbox') # Only included in Linux version
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://www.coinsuper.com/trade')
print(driver.page_source)
driver.quit()
I am the one who asks this question. Thank you all for helping me! Finally, I have solved this problem.
#furas showed that my code could actually get responses from Coinsuper.
#Dalvenjia inspired me that this might be caused by IP blacklist, which is most probable for the cloud servers. And yes, I am using a cloud server.
Here is the solution:
Start a Shadowsocks server from my home IP address, or use any proxy you have.
Start Shadowsocks client on the server:
Add one more argument to ChromeDriver in Python script:
chrome_options.add_argument('--proxy-server=socks5://127.0.0.1:xxxx')
Now I can get contents by bypassing the IP blacklist.
I recommend you to use WebDriverManager dependency:
https://github.com/bonigarcia/webdrivermanager
By using WebDriverManager, you didn't need to download or manage drivers path in code.

ChromeDriver Does Not Open Websites

I am experiencing a very strange behaviour when testing Chrome via selenium webdriver.
Instead of navigating to pages like one would expect the 'get' command leads only to the download of tiny files (no type or.apsx files) from the target site.
Importantly - this behavior only occurs when I pass chrome_options as an argument
to the Chrome driver.
The same testing scripts work flawless with firefox driver.
Code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options # tried with and without
proxy = '127.0.0.1:9951' # connects to a proxy application
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('whatismyip.com')
Leads to the automatic download of a file called download (no file extension, Size 2 Byte).
While calling other sites results in the download of small aspx files.
This all happens while the browser page remains blank and no interaction with
elements happen = the site is not loaded at all.
No error message, except element not found is thrown.
This is really strange.
Additional info:
I run Debian Wheezy 32 bit and use Python 2.7.
Any suggestions how to solve this issue?
I tried your code and captured the traffic on localhost using an SOCKS v5 proxy through SSH. It is definitely sending data through the proxy but no data is coming back. I have confirmed the proxy was working using Firefox.
I'm running Google Chrome on Ubuntu 14.04 LTS 64-bit. My Chrome browser gives me the following message when I try to configure a proxy in its settings menu:
When running Google Chrome under a supported desktop environment, the
system proxy settings will be used. However, either your system is not
supported or there was a problem launching your system configuration.
But you can still configure via the command line. Please see man
google-chrome-stable for more information on flags and environment
variables.
Unfortunately I don't have a man page for google-chrome-stable.
I also discovered that according to the selenium documentation Chrome is using the system wide proxy settings and according to their documentation it is unknown how to set the proxy in Chrome programmatically: http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#using-a-proxy

Can't open a url when using selenium under firefox [duplicate]

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
import time
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
# go to the google home page
driver.get("http://www.google.com")
This opens a Firefox window but does not open a url.
I have a proxy server(but the address bar does not show the passed url)
I have two Firefox profiles.
Can 1 or 2 be an issue? if yes, then how can I resolve it?
It is a defect of Selenium.
I have the same problem in Ubuntu 12.04 behind the proxy.
Problem is in incorrect processing proxy exclusions. Default Ubuntu exclusions are located in no_proxy environment variable:
no_proxy=localhost,127.0.0.0/8
But it seems that /8 mask doesn't work for selenium. To workaround the problem it is enough to change no_proxy to the following:
no_proxy=localhost,127.0.0.1
Removing proxy settings before running python script also helps:
http_proxy= python script.py
I was facing exactly the same issue, after browsing for sometime,I came to know that it is basically version compatibility issue between FireFox and selenium. I have got the latest FireFox but my Selenium imported was older which is causing the issue. Issue got resolved after upgrading selenium
pip install -U selenium
OS: windows Python 2.7
I have resolved this issue.
If your jar files are older than the latest version and the browser has updated to latest version, then download:
the latest jar files from the selenium website http://www.seleniumhq.org/download/, and
the latest geckodriver.exe.
#Neeraj
I've resolved this problem, but i'm not sure if you are the same reason.
In general, my problem was caused by some permission issues.
I tried to move my whole project into ~/:
mv xxx/ ~/
and then i change give it the 777 permission:
chmod -R 777 xxx/
I'm not familiar with linux permission so i just do this to make sure i have permission to execute the program.
Even you don't have permission, the selenium program will not prompt you.
So, good luck.
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
WebDriver driver = new FirefoxDriver();
driver.manage().timeouts().implicitlyWait(30,TimeUnit.SECONDS);
driver.get("http://www.google.com");
OR
import org.openqa.selenium.support.ui.ExpectedConditions;
WebDriverWait wait = new WebDriverWait(driver,30);
driver.get("http://www.google.com");
//hplogo is the id of Google logo on google.com
wait.until(ExpectedConditions.presenceOfElementLocated(By.id("hplogo")));
I had the similar problem. All I had to do was delete the existing geckodriver.exe and download the latest release of the same. You can find the latest release here https://github.com/mozilla/geckodriver/releases.
A spent a lot of time on this issue and finally found that selenium 2.44 not working with node version 0.12.
Use node version 0.10.38.
I got the same error when issuing a URL without the protocol (like localhost:4200) instead of a correct one also specifying the protocol (e.g. http://localhost:4200).
Google Chrome works fine without the protocol (it takes http as the default), but Firefox crashes with this error.
I was getting similar problem and Stating string for URL worked for me. :)
package Chrome_Example;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
public class Launch_Chrome {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "C:\\Users\\doyes\\Downloads\\chromedriver_win324\\chromedriver.exe");
String URL = "http://www.google.com";
WebDriver driver = new ChromeDriver();
driver.get(URL);
}
}
I had the same problem but with Chrome.
Solved it using the following steps
Install Firefox/Chrome webdriver from Google
Put the webdriver in Chrome's directory.
Here's the code and it worked fine
from selenium import webdriver
class InstaBot(object):
def __init__(self):
self.driver=webdriver.Chrome("C:\Program
Files(x86)\Google\Chrome\Application\chromedriver.exe")# make sure
#it is chrome driver
self.driver.get("https://www.wikipedia.com")
InstaBot()
Check your browser version and do the following.
1. Download the Firefox/Chrome webdriver from Google
2. Put the webdriver in Chrome's directory.
I was having the save issue when trying with Chrome. I finally placed my chromedrivers.exe in the same location as my project. This fixed it for me.
Update your driver based on your browser.
In my case for chrome,
Download latest driver for your chrome from here : https://chromedriver.chromium.org/downloads
Check chrome version from your browser at : chrome://settings/help.
While initialising your driver,
use driver = webdriver.Chrome(executable_path="path/to/downloaded/driver")
Please have a look at this HowTo: http://www.qaautomation.net/?p=373
Have a close look at section "Instantiating WebDriver"
I think you are missing the following code line:
wait = new WebDriverWait(driver, 30);
Put it between
driver = webdriver.Firefox();
and
driver.getUrl("http://www.google.com");
Haven't tested it, because I'm not using Selenium at the moment. I'm familiar with Selenium 1.x.
I was having the save issue. I assume you made sure your java server was running before you started your python script? The java server can be downloaded from selenium's download list.
When I did a netstat to evaluate the open ports, i noticed that the java server wasn't running on the specific "localhost" host:
When I started the server, I found that the port number was 4444 :
$ java -jar selenium-server-standalone-2.35.0.jar
Sep 24, 2013 10:18:57 PM org.openqa.grid.selenium.GridLauncher main
INFO: Launching a standalone server
22:19:03.393 INFO - Java: Apple Inc. 20.51-b01-456
22:19:03.394 INFO - OS: Mac OS X 10.8.5 x86_64
22:19:03.418 INFO - v2.35.0, with Core v2.35.0. Built from revision c916b9d
22:19:03.681 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
22:19:03.683 INFO - Version Jetty/5.1.x
22:19:03.683 INFO - Started HttpContext[/selenium-server/driver,/selenium-server/driver]
22:19:03.685 INFO - Started HttpContext[/selenium-server,/selenium-server]
22:19:03.685 INFO - Started HttpContext[/,/]
22:19:03.755 INFO - Started org.openqa.jetty.jetty.servlet.ServletHandler#21b64e6a
22:19:03.755 INFO - Started HttpContext[/wd,/wd]
22:19:03.765 INFO - Started SocketListener on 0.0.0.0:4444
I was able to view my listening ports and their port numbers(the -n option) by running the following command in the terminal:
$netstat -an | egrep 'Proto|LISTEN'
This got me the following output
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp46 0 0 *.4444 *.* LISTEN
I realized this may be a problem, because selenium's socket utils, found in: webdriver/common/utils.py are trying to connect via "localhost" or 127.0.0.1:
socket_.connect(("localhost", port))
once I changed the "localhost" to '' (empty single quotes to represent all local addresses), it started working. So now, the previous line from utils.py looks like this:
socket_.connect(('', port))
I am using MacOs and Firefox 22. The latest version of Firefox at the time of this post is 24, but I heard there are some security issues with the version that may block some of selenium's functionality (I have not verified this). Regardless, for this reason, I am using the older version of Firefox.
This worked for me (Tested on Ubuntu Desktop 11.04 with Python-2.7):
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.stackoverflow.com")
Since you mentioned you use a proxy, try setting up the firefox driver with a proxy by following the answer given here proxy selenium python firefox
Try the following code
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
WebDriver DRIVER = new FirefoxDriver();
DRIVER.get("http://www.google.com");
You need to first declare url as a sting as below:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
import time
# Create a new instance of the Firefox driver
String URL = "http://www.google.com";
driver = webdriver.Firefox()
# go to the google home page
driver.get(URL);

Categories