I am now developing a webpage crawler, unfortunately the website generates the results by ajax. Following some coders suggestion, I tried to use selenium, a test automation tool for python.
As the example given in the documentation:
driver = webdriver.Firefox()
This code executes to open the Firefox browser. And then do something just like filling the form, submitting and so on.
Frankly speaking, this example works well on my PC(ubuntu 12.10), but my project will finally transfer to a CentOS server.
What I am considering is whether the code(need to open a browser gui) can be successfully ran on the CentOS server over ssh because no KDE such as gnome provided on that machine.....
And if without browser gui, the code cannot work well, then is there any other solutions?
Any reply would be admired~
You can probably use the HtmlUnit driver if you enable javascript. The only way to be sure though is to test it out. Another option would be to try and run with an X framebuffer.
Related
I am trying to create a python application while using eel to create a user interface in html. My operating system is Ubuntu Linux and I'm using Firefox to display the web interface.
The problem I'm having is every time I run the python code, Firefox opens a blank page saying "Unable to connect" followed by "Firefox can't establish a connection to the server at localhost:8000". However, if I click the "Try Again" button once, twice, or three times, my interface is displayed.
Once open, I can navigate to different pages but I also noticed that once I navigate to a different page, some of my javascript stops working (specifically a window.close() function). I don't know if this is related but I thought I would mention it just in case.
Any advice on the matter would be greatly appreciated.
Thank you.
I changed my browser from firefox to chromium and now my interface loads on startup the first time. I know some documentation says it can be used with firefox, and it can, but it seems to be kind of buggy and works better with other browsers.
However, I'm still having trouble with my javascript not running but that will be another question.
When I'm trying to open opensea.io with selenium it's giving Cloudfare captcha, even if I solve the captcha the captcha page is not redirecting to opensea.io
Update: Installing vpn solved this but there must be other ways.
driver.get("https://opensea.io")
Error screenshot given below.
cloudfare error
Edited:
There might be several reasons that are possibly causing this kind of problem:
Cloudflare blocked your I.P. Try using a new I.P. through a proxy (or VPN, Another ISP), and see if it works or not. (https://community.cloudflare.com/t/cant-bypass-cloudflare-captcha/200335/8)
Depending on Selenium versions and editions, it could explicitly tell the browser that it is a bot and allow the websites to know it is Selenium, so Cloudflare then blocks the request.
The browser is the problem. Try a different browser like Firefox.
Cloudflare or the website you are trying to reach cares about special cookies that are not available on a Selenium new browser (This was my wild guess, but it's not the case).
P.S.: I have tried to connect to this URL (https://opensea.io), and interestingly, it worked fine for me.
Here is some information about the environment I performed this action on:
Operation System: CentOS 7, Linux
Selenium Standalone Version: 4.0.0
Java Version: jre-8u311-linux-x64
The browser I used: Firefox
I have up and running an Apache Server with Python 3.x installed already on it. Right now I am trying to run ON the server a little python program (let's say filename.py). But this python program uses the webdriver for Chrome from Selenium. Also it uses sleep from time (but I think this comes by default, so I figure it won't be a problem)
from selenium import webdriver
When I code this program for the first time on my computer, not only I had to write the line of code above but also to manually download the webdriver for Chrome and paste it on /usr/local/bin. Here is the link to the file in case you wonder: Webdriver for Chorme
Anyway, I do not know what the equivalences are to configure this on my server. Do you have any idea how to do it? Or any concepts I could learn related to installing packages on an Apache Server?
Simple solution:
You don't need to install the driver in usr/local/bin. You can have the .exe anywhere and you can specify that with an executable path, see here for an example.
Solution for running on a server
If you have python installed on the server, ideally >3.4 which comes with pip as default. Then install ChromeDriver on a standalone server, follow the instructions here
Note that, Selenium always need an instance of a browser to control.
Luckily, there are browsers out there that aren't that heavy as the usual browsers you know. You don't have to open IE / Firefox / Chrome / Opera. You can use HtmlUnitDriver which controls HTMLUnit - a headless Java browser that does not have any UI. Or a PhantomJsDriver which drives PhantomJS - another headless browser running on WebKit.
Those headless browsers are much less memory-heavy, usually are faster (since they don't have to render anything), they don't require a graphical interface to be available for the computer they run at and are therefore easily usable server-side.
Sample code of headless setup
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
It's also worth reading on running Selenium RC, see here on that.
Background
I have built a chrome extension to run tasks automatically with python and selenium on my localhost.
I would like to use my extension on my smartphone (with a different network). For this, I just need to use a specific browser and its running well
The Problem
In order for my extension to work on a different device, I need a server to receive a request passing all the information to start the job.
The API is done, but I don't know how to proceed with the server part.
What I've Tried
I tried to host on Heroku, it's working and I can receive requests, but web-driver isn't working. To being headless and the server is located in Europe, the website is blocking my access to the content.
Also, I tried to use a proxy, but I need authentication, but that doesn't work with selenium.
Further Explanation
Basically, I need to enable my chrome extension to do a request directly to my personal computer or use a server with a graphical user interface to set-up the proxy manually, but I don't have any idea how to proceed with this or if that is even the best option.
Any thoughts about a good work around?
I have a Python script that uses Selenium WebDriver (with PyVirtualDisplay as the display) to log into Flickr.
http://pastebin.com/dqmf4Ecw (you’ll need to add your own Flickr credentials)
When I run it as myself on my Debian server, it works fine. (I’m a sudoer, but I don’t use sudo when running the script.)
When I run it as the user www-data (which is what it’ll be running as eventually, because I want to trigger it from a Django website), I get two problems, one small, one big:
(Small): the webdriver.Firefox() call takes 30–45 seconds to return, compared to 2 seconds when run as myself
(Big): the script fails to log into Flickr. In order to log in, I find the username and password fields on the Flickr signin page (http://www.flickr.com/signin/), and use element.send_keys() to enter the username and password. Although Selenium seems to find the elements (i.e. no NoSuchElementException is thrown), the values do not get entered in the fields when the script is run as www-data (according to the screenshots I take using browser.save_screenshot), unlike when the script is run as myself.
Why does send_keys() not work when the script is run as www-data? (And is it related to the browser taking much longer to start?)
Maybe you have something different in your environment.
Try copy by example your ~/.bashrc in /home/www-data
If it's not sufficient, run this command both as your current user & as www-data:
strace -tt -f -s 1000 -o /tmp/trace ./script.py
And paste it (filter out your logins/passwords) somewhere.
We will see what's happens.
Sometimes, Firefox performs some nasty plugin compatibility check during startup. As each user can have a different set of browser plugins, this could be responsible for the difference in startup times. You could try to sync your Firefox profiles between users.
Then, are you sure that Firefox as user www-data has proper network/internet access? Can you confirm that the Flickr site loads properly via SeleniumHQ? "The script fails to log into Flickr" is too unprecise. Some more details about why it fails might reveal the problem instantaneously.
Edit: Sorry, I just understood that there shouldn't be a difference in profiles, because Selenium creates one. Nevertheless, my second point might be useful, so I won't delete this answer.
Some more things to ponder about:
Could you spawn firefox manually from www-data account once and make sure that Firefox is not updating itself before every execution of the script? I once faced this problem with Selenium RC on Windows and had to let the update finish before starting the script with the updated binary.
As a workaround, I guess you could you try running the script as www-data user but connecting remotely to a webdriver server running in your login (aka "grid" mode). Would that work for you?
I would suggest getting the latest chrome from google and trying input.send_keys() in that browser instead.
Sometimes some features of webdriver get broken with new releases.. If you are bent on testing with firefox, you might have better luck with an older/newer version of selenium webdriver.
I remember having a similar issue regarding send_keys() on a mac.. My issue was that send_keys() did not work in certain modal windows after I updated selenium webdriver.. I fixed it by reverting to an older webdriver that I knew to work. However, I was using Ruby and not Python to drive webdriver.
sometimes, there might also be a problem with getting the correct ENV variables in your shell if you use it as a different user. I would suggest trying to troubleshoot and see if all the shell ENV variables are set properly under www-data.