Selenium standalone server log level

Selenium standalone server log level - python

Long story short: I'm trying to change log level to WARNING on selenium standalone server. I'm running 2.48.2 on CentOS 6.7.
I tried the server side, i.e. added -Dselenium.LOGGER.level=WARNING when starting the server - didn't work. Then I tried custom properties file -Djava.util.logging.config.file=/opt/selenium/my.properties with default level as WARNING - didn't work.
Then I tried doing that on the client side, I'm using WebDriver API for python. I tried both suggestions from this thread, didn't work either.
Is there a nice non-hacky way to change the level to Warning? Or at least make it omit the keystrokes? It's dumping passwords in my log files and I don't like that.

-Dselenium.LOGGER.level=WARNING is correct. You need to add it in front of -jar.

Related

Workaround for Python & Selenium: authenticate against Active Directory

I am using Python (2.7) and Selenium (3.4.3) to drive Firefox (52.2.0 ESR) via geckodriver (0.19.0) to automate a process on a CentOS 7 machine.
I need totally unattended operation of this automation with user credentials passed through; no storage allowed and no breaking in.
One piece of drama is being caused by the fact that the internal website required for the process is within an Active Directory domain while the machine running my automation is not. I have no need to validate the user, only pass the credentials to the website in such a way as to not require human interaction or for the person to be a local user on the machine.
I have tried various permutations of:
[protocol]://[user,pass]#[url]
driver.switch_to_alert() + send_keys
It seems some of those only work on IE, something I have no access to.
I have checked for libraries to handle this and all to no avail.
I can add libraries to python and I have sudo access to the machine - can't touch authentication, so AD integration is not possible.
How can I give this AD website the credentials of an arbitrary user such that no local storage of their credentials happens an no user interaction is required?
Thank you
EDIT
I think something like a proxy which could authenticate the user then retain that authentication for selenium to do its thing ...
Is there a simple LDAP/AD proxy available?
EDIT 2
Perhaps a very simple way of stating this is that I want to pass user credentials and prevent the authentication popup from happening.

Solution Found:
I needed to use a browser extension.
My solution has been built for chromium but it should port almost-unchanged for Firefox and maybe edge.
First up, you need 2 APIs to be available for your browser:
webRequest.onAuthRequired - Chrome & Firefox
runtime.nativeMessaging - Chrome & Firefox
While both browser APIs are very similar, they do have some significant differences - such as Chrome's implementation lacking Promises.
If you setup your Native Messaging Host to send a properly-formed JSON string, you need only poll it once. This means you can use a single call to runtime.sendNativeMessage() and be assured that your credentials are paresable. Pun intended.
Next, we need to look at how we're supposed to handle the webRequest.onAuthRequired event.
Since I'm working in Chromium, I need to use the promise-less Chrome API.
chrome.webRequest.onAuthRequired.addListener(
callbackFunctionHere,
{urls:[targetUrls]},
['asyncBlocking'] // --> this line is important, too. Very.
The Change:
I'll be calling my function provideCredentials because I'm a big stealy-stealer and used an example from this source. Look for the asynchronous version.
The example code fetches the credentials from storage.local ...
chrome.storage.local.get(null, gotCredentials);
We don't want that. Nope.
We want to get the credentials from a single call to sendNativeMessage so we'll change that one line.
chrome.runtime.sendNativeMessage(hostName, { text: "Ready" }, gotCredentials);
That's all it takes. Seriously. As long as your Host plays nice, this is the big secret. I won't even tell you how long it took me to find it!
Links:
My questions with helpful links:
Here - Workaround for Authenticating against Active Directory
Here - Also has some working code for a functional NM Host
Here - Some enlightening material on promises
So this turns out to be a non-trivial problem.
I haven't implemented the solution, yet, but I know how to get there...
Passing values to an extension is the first step - this can be done in both Chrome and Firefox. Watch the version to make sure the API required, nativeMessaging, actually exists in your version. I have had to switch to chromium for this reason.
Alternatively, one can use the storage API to put values in browser storage first. [edit: I did not go this way for security concerns]
Next is to use the onAuthRequired event from the webRequest API . Setup a listener on the event and pass in the values you need.
Caveats: I have built everything right up to the extension itself for the nativeMessaging API solution and there's still a problem with getting the script to recognise the data. This is almost certainly my JavaScript skills clashing with the arcane knowledge required to make these APIs make much sense ...
I have yet to attempt the storage method as it's less secure (in my mind) but it does seem to be simpler.

How can a CGI server based on CGIHTTPRequestHandler require that a script start its response with headers that include a `content-type`?

Later note: the issues in the original posting below have been largely resolved.
Here's the background: For an introductory comp sci course, students develop html and server-side Python 2.7 scripts using a server provided by the instructors. That server is based on CGIHTTPRequestHandler, like the one at pointlessprogramming. When the students' html and scripts seem correct, they port those files to a remote, slow Apache server. Why support two servers? Well, the initial development using a local server has the benefit of reducing network issues and dependency on the remote, weak machine that is running Apache. Eventually porting to the Apache-running machine has the benefit of publishing their results for others to see.
For the local development to be most useful, the local server should closely resemble the Apache server. Currently there is an important difference: Apache requires that a script start its response with headers that include a content-type; if the script fails to provide such a header, Apache sends the client a 500 error ("Internal Server Error"), which too generic to help the students, who cannot use the server logs. CGIHTTPRequestHandler imposes no similar requirement. So it is common for a student to write header-free scripts that work with the local server, but get the baffling 500 error after copying files to the Apache server. It would be helpful to have a version of the local server that checks for a content-type header and gives a good error if there is none.
I seek advice about creating such a server. I am new to Python and to writing servers. Here are the issues that occur to me, but any helpful advice would be appreciated.
Is a content-type header required by the CGI standard? If so, other people might benefit from an answer to the main question here. Also, if so, I despair of finding a way to disable Apache's requirement. Maybe the relevant part of the CGI RFC is section 6.3.1 (CGI Response, Content-Type): "If an entity body is returned, the script MUST supply a Content-Type field in the response."
To make a local server that checks for the content-type header, perhaps I should sub-class CGIHTTPServer.CGIHTTPRequestHandler, to override run_cgi() with a version that issues an error for a missing header. I am looking at CGIHTTPServer.py __version__ = "0.4", which was installed with Python 2.7.3. But run_cgi() does a lot of processing, so it is a little unappealing to copy all its code, just to add a couple calls to a header-checking routine. Is there a better way?
If the answer to (2) is something like "No, overriding run_cgi() is recommended," I anticipate writing a version that invokes the desired script, then checks the script's output for headers before that output is sent to the client. There are apparently two places in the existing run_cgi() where the script is invoked:
3a. When run_cgi() is executed on a non-Unix system, the script is executed using Python's subprocess module. As a result, the standard output from the script will be available as an in-memory string, which I can presumably check for headers before the call to self.wfile.write. Does this sound right?
3b. But when run_cgi() is executed on a *nix system, the script is executed by a forked process. I think the child's stdout will write directly to self.wfile (I'm a little hazy on this), so I see no opportunity for the code in run_cgi() to check the output. Ugh. Any suggestions?
If analyzing the script's output is recommended, is email.parser the standard way to recognize whether there is a content-type header? Is another standard module recommended instead?
Is there a more appropriate forum for asking the main question ("How can a CGI server based on CGIHTTPRequestHandler require...")? It seems odd to ask if there is a better forum for asking programming questions than Stack Overflow, but I guess anything is possible.
Thanks for any help.

Inexplicable Urllib2 problem between virtualenv's.

I have some test code (as a part of a webapp) that uses urllib2 to perform an operation I would usually perform via a browser:
Log in to a remote website
Move to another page
Perform a POST by filling in a form
I've created 4 separate, clean virtualenvs (with --no-site-packages) on 3 different machines, all with different versions of python but the exact same packages (via pip requirements file), and the code only works on the two virtualenvs on my local development machine(2.6.1 and 2.7.2) - it won't work on either of my production VPSs
In the failing cases, I can log in successfully, move to the correct page but when I submit the form, the remote server replies telling me that there has been an error - it's an application server error page ('we couldn't complete your request') and not a webserver error.
because I can successfully log in and maneuver to a second page, this doesn't seem to be a session or a cookie problem - it's particular to the final POST
because I can perform the operation on a particular machine with the EXACT same headers and data, this doesn't seem to be a problem with what I am requesting/posting
because I am trying the code on two separate VPS rented from different companies, this doesn't seem to be a problem with the VPS physical environment
because the code works on 2 different python versions, I can't imagine it being an incompabilty problem
I'm completely lost at this stage as to why this wouldn't work. I've even 'turned-it-off-and-turn-it-on-again' because I just can't see what the problem could be.
I think it has to be something to do with the final POST coming from a VPS that the remote server doesn't like, but I can't figure out what that could be. I feel like there is something going on under the hood of URLlib that is causing the remote server to dislike the reply.
EDIT
I've installed the exact same Python version (2.6.1) on the VPS as is on my working local copy and it doesn't work remotely, so it must be something to do with originating from a VPS. How could this effect the Http request? Is it something lower level?

You might try setting the debuglevel=1 for urllib2 and see what it comes up with:
import urllib2
h=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(h)
...

This is a total shot in the dark, but are your VPSs 64-bit and your home computer 32-bit, or vice versa? Maybe a difference in default sizes or accuracies of something could be freaking out the server.
Barring that, can you try to find out any information on the software stack the web server is using?

I had similar issues with urllib2 (working with Zimbra's REST api), in the end switched to pycurl with success.
PS
for operations of login/navigate/post, I usually find Mechanize useful and easier to use. Maybe you can give it a show.

Well, it looks like I know why the problem was happening, but I'm not 100% the reason for it.
I simply had to make the server wait (time.sleep()) after it sent the 2nd request (Move to another page) before doing the 3rd request (Perform a POST by filling in a form).
I don't know is it because of a condition with the 3rd party server, or if it's some sort of odd issue with URLlib? The reason it seemed to work on my development machine is presumably because it was slower then the server at running the code?

How to set Selenium browsers to treat Selenium's hub as a proxy server in python on Selenium Grid?

I'm running Selenium 2.0b4dev on Selenium Grid in Ubuntu 10.04, using Python code to write test cases. I've been having trouble with getting basic HTTP authentication to a specific site working, and with a quick google search found that my problem could be solved with the addition of the line self.selenium.add_custom_request_header("Authorization", "Basic %s" % _encoded) (with a proper line break in the middle to conform to PEP 8, of course.)
Unfortunately, apparently also through my search I found in order for that line of code to work I need to configure my browser (whichever one I'm using to run the test cases on the grid) to treat Selenium's (automatically running, apparently?) proxy server as a proxy for that browser to use. But apparently I need to modify the profile of Firefox (or IE)'s launcher to automatically use that proxy, since the whole point of these Selenium Grid test cases is that they aren't supposed to require user intervention, and I have little to no idea how to do that. I've just been using the "ant launch-hub" and "ant launch-remote-control" and then running python programs on the hub that import selenium and unittest.
If anyone could help, that would be just fantastic.

I wrote up an article on how to do this in Ruby. It links to a complementary article on testing self-signed certificates and gives you the set of flags you need to launch Selenium with.
http://mogotest.com/blog/2010/06/23/how-to-perform-basic-auth-in-selenium
To pass args through from grid to the underlying RC server, you need to use something like:
ant -DseleniumArgs="-trustAllSSLCertificates" launch-remote-control
Re: browsers . . . firefox will auto-enable the proxy mode stuff if you pass trustAllSSLCertificates now. Otherwise you need to use *firefoxproxy. IE requires the use of *iexploreproxy or a custom HTA launcher that configures the proxy (the article links to one we open-sourced but would need to be updated to work with 2.0 beta 4).

How to Disable Django / mod_WSGI Page Caching

I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.

From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)

Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/

Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.

I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.