Selenium - screenshot images taken by headless server are mangled - python

I am using selenium webdriver (Python version) to take images of a web page after it loads. When I go to a page and save an image using a python script on my local computer, it looks fine. However, I am running the script on a server and there the screenshots are mangled- the edge might be cut off with text missing, banners on the right side might be pushed to the bottom in a jumbled fashion, etc. I even tried maximizing the window
driver.get(url)
driver.maximize_window()
time.sleep(4)
driver.save_screenshot('screen.png')
On the server, I cannot load firefox in a headed version and must manually start/stop the display in my script before/after running selenium
from pyvirtualdisplay.xvnc import XvncDisplay
display = XvncDisplay(rfbport='####')
display.start()
So I'm thinking this might have to do with the settings of my display.
Anyone have any ideas on how to fix this? Thanks

Try chomium + chromedriver instead. It works for me, even though I didn't have the problem you describe. Just an idea.
Download chromedriver to /usr/local/bin https://sites.google.com/a/chromium.org/chromedriver/downloads and don't forget chmod a+x /usr/local/bin/chromedriver
I used this blog post: http://coreygoldberg.blogspot.cz/2011/07/python-taking-browser-screenshots-with.html

Related

localhost whitescreen on selenium chromedriver

when using selenium chromedriver in headless i need to initally log in into the website im trying to scrape so that the browserdata gets stored via
option.add_argument(r'--user-data-dir=.\UserData').
unfortunately the cookies dont work for headless if i put in the information via non headless.
the easiest way arround that for me was to let it display via option.add_argument('--remote-debugging-port=9222') while in headless.
this worked on my previous pc like a charm however on my new, current one it just displays a white screen of death on the localhost.
python: 3.7.9
i pasted the code from my previous pc where it worked perfectly fine so its not a code side error

How to use normal chrome completely without chromedriver selenium python not duplicate

How to use normal chrome completely without chromedriver selenium python not duplicate.
I am using python 3.8.8,os is windows 7 ultimate with pycharm as
IDE and chrome version is around 96. and my problem is that whenever I use my python script to scrape a website it uses chromedriver and when I specify what's given below:
options = Options ()
options.add_argument(r"user-data-dir=my chrome path which is not Executable instead the user data")
#this works but when opening chrome it shows "browser is controlled by automated software" and changing it to normal chrome. Exe won't work
Sure it uses normal chrome with my credentials but it still needs chromedriver to work and when I delete the chromedriver it throw an error and when I go into selenium source code in a file called site.py(or sites.py) which I changed the variable self. executable to chrome.exe path and it worked and it won't show the message browser is controlled by automated software but it won't do anything , it is just stuck there and what I want to do is use chrome as the browser to scrape without chromedriver in my pc is it possible? If yes please tell me how should I go on to do it and you can ask for further Clarification and details and Thanks in advance
By default, selenium is detected as an automated software and is flagged by most websites, and the flag is unable to be removed. There are, however, external libraries that can be installed that can remove the flag.
There are options here to try to get around the default flag and hide the fact the browser is automated.
Edit
I understand the question further, and see that you want a more portable chrome option. Chrome driver is a very specific program controlled by selenium and must be used. There is no substitute. You can use Firefox driver or internet explorer, but a webdriver must be used (hence the name driver for driving the main browser). When you specify the directory for the Chrome binary, you aren’t removing the middleman of the chromedriver, only Specifying where chrome driver needs to look!
Using Selenium you won't be able to initiate/spawn a new Browsing Context i.e. Chrome Browser session without the ChromeDriver.
The Parts and Pieces
As a minimum requirement, the WebDriver i.e the ChromeDriver talks to a browser through a driver and the communication is two way:
WebDriver passes commands to the browser through the driver
Receives information back via the same route.
Hence using ChromeDriver is a mandatory requirement.

How to install Selenium (python) on a Apache Web Server?

I have up and running an Apache Server with Python 3.x installed already on it. Right now I am trying to run ON the server a little python program (let's say filename.py). But this python program uses the webdriver for Chrome from Selenium. Also it uses sleep from time (but I think this comes by default, so I figure it won't be a problem)
from selenium import webdriver
When I code this program for the first time on my computer, not only I had to write the line of code above but also to manually download the webdriver for Chrome and paste it on /usr/local/bin. Here is the link to the file in case you wonder: Webdriver for Chorme
Anyway, I do not know what the equivalences are to configure this on my server. Do you have any idea how to do it? Or any concepts I could learn related to installing packages on an Apache Server?
Simple solution:
You don't need to install the driver in usr/local/bin. You can have the .exe anywhere and you can specify that with an executable path, see here for an example.
Solution for running on a server
If you have python installed on the server, ideally >3.4 which comes with pip as default. Then install ChromeDriver on a standalone server, follow the instructions here
Note that, Selenium always need an instance of a browser to control.
Luckily, there are browsers out there that aren't that heavy as the usual browsers you know. You don't have to open IE / Firefox / Chrome / Opera. You can use HtmlUnitDriver which controls HTMLUnit - a headless Java browser that does not have any UI. Or a PhantomJsDriver which drives PhantomJS - another headless browser running on WebKit.
Those headless browsers are much less memory-heavy, usually are faster (since they don't have to render anything), they don't require a graphical interface to be available for the computer they run at and are therefore easily usable server-side.
Sample code of headless setup
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
It's also worth reading on running Selenium RC, see here on that.

Running chromedriver Through Django Selenium Testing

It seems that a lof of people are having trouble getting Selenium to find chromedriver, so this may apply to them aswell if they actually have the chromedriver.exe in the correct path.
It seems I have everything I need to have these selenium tests working, and when I manually try running the following 2 lines, everything works fine (it finds chromedriver and opens Chrome).
from selenium import webdriver
webdriver.Chrome()
However, when I put the exact same code into a Django test and try running the test through Django, I get a "ChromeDriver executable needs to be available in the path" error. I've tried re-installing Django and Selenium to no success.
Any help would be appreciated!
The problem was actually caused by enabling Celery tasks.

Cannot create browser process when using selenium from python on RHEL5

I'm trying to use selenium from python but I'm having a problem running it on a RHEL5.5 server. I don't seem to be able to really start firefox.
from selenium import webdriver
b = webdriver.Firefox()
On my laptop with ubuntu this works fine and it starts a brings up a firefox window. When I log in to the server with ssh I can run firefox from the command line and get it displayed on my laptop. It is clearly firefox from the server since it has the RHEL5.5 home page.
When I run the python script above on the server it (or run it in ipython) the script hangs at webdriver.Firefox()
I have also tried
from selenium import webdriver
fb = webdriver.FirefoxProfile()
fb.native_events_enabled=True
b=webdriver.Firefox(fb)
Which also hangs on the final line there.
I'm using python2.7 installed in /opt/python2.7. In installed selenium with /opt/python2.7/pip-2.7.
I can see the firefox process on the server with top and it is using a lot of CPU. I can also see from /proc/#/environ that the DISPLAY is set to localhost:10.0 which seems right.
How can I get a browser started with selenium on RHEL5.5? How can I figure out why Firefox is not starting?
It looks like the problem I'm encountering is this selenium bug:
http://code.google.com/p/selenium/issues/detail?id=2852
I used the fix described in comment #9 http://code.google.com/p/selenium/issues/detail?id=2852#c9
That worked for me.

Categories