Environment variables - python

I use the module mechanize in order to log in a site. When I import twill.commands without any other apparent use, some debug messages [0] are displayed [1]. When I delete it, these messages disappear.
How can I see what is changed in the environment in order to emulate it and remove this dependency?
[0] Using the logging module.
[1] More specifically, I am interested in a Following HTTP-EQUIV=REFRESH message.
UPDATE: It turned out that there is a bug in twill.commands which was creating an error when trying to follow the HTTP-EQUIV=REFRESH header. After removing the import twill.commands and the ugly work around it, everything works smoothly.

My guess - without digging in the libraries - is that twill is instantiating a logger, and mechanize is doing the Right Thing for a library, logging if logging has been turned on, not if not.
To enable the logging of mechanize configure a logging.basicConfig root in your application code.

twill uses mechanize internally, you can log into a web site directly with twill.
To follow http-equiv redirection, just use the go command.
go <url> -- visit the given URL. The Python function returns the final URL visited, after all redirects.
To debug http-equiv redirects, enable the relevant debug level.
debug <what> <level> -- turn on or off debugging/tracing for
various functions. The first argument is either 'http' to show HTTP headers, 'equiv-refresh' to test HTTP EQUIV-REFRESH headers, or 'commands' to show twill commands. The second argument is '0' for off, '1' for on.

Related

make any internet-accessing python code work (proxy + custom .crt)

The situation
If the following is not done, all outgoing HTTP or HTTPS requests made with python ends in a WinError 10054 Connection Reset, or a SSL bad handshake error.
set the HTTP_PROXY, HTTPS_PROXY environment variable, or their counterparts
What needs to be verified must be verified with a custom .crt file.
For example, assuming the .crt file is in place, both gets me a 200 OK:
import os
os.environ['HTTP_PROXY'] = #some_appropriate_address
os.environ['HTTPS_PROXY'] = #some appropriate_address
requests.get('http://www.google.com',verify="C:\the_file.crt") # 200 OK
requests.get('http://httpbin.org',verify=False) # 200 OK, but unsafe
requests.get('http://httpbin.org') # SSL bad handshake error
The Problem
There is this massive jumble of pre-written code (heavily utilizing urllib3 and requests and possibly other pieces of internet-accessing code) I have, and I have to make it work under the conditions outlined above.
Sure, I can write verify='C:\the_file.crt' for every requests.get(), but that can very quickly get hairy, right? And the code may also be using some other library (that is not requests). So I am looking for a global setting (environment variable etc.) I should alter, so that everything works well (return a 200 OK upon a GET request to a server, whether or not the code is written in requests-py).
Also, if there is no such way, I would like an explanation as to why.
What I tried (am trying)
Maybe editing the .condarc file (via conda --config) is a solution. I tried, to no avail: python gives me a "SSL verification failed" error. On the contrary, note that the code snippet above gave me a 200 OK. To my knowledge, this does not fit nicely with many situations that were previously discussed in Stack Overflow.
By the way, setting ssl_verify to false does not solve the problem either; I still get a bad handshake error for some reason.
I am using Win 10, Python 3.7.4 (Anaconda).
Update
I have edited the question to prevent future misunderstandings about the content of this question. A few answers below are a reiteration of what was written here from the start.
The current answers are not entirely satisfactory either, as they only seem to address the case where I am using requests or urllib3.
You should be able to get any python code that uses the requests module(which is inside urllib3) to work behind a proxy without modifying the python code itself by setting the following environment variables in Windows.
http_proxy http://[<user>:<pwd>#]<http_host>:<http_port>
https_proxy http://[<user>:<pwd>#]<https_host>:<https_port>
requests_ca_bundle <path_to_ca_bundle.crt>
curl_ca_bundle <path_to_ca_bundle.crt>
You can set environment variables by doing the following:
Press Windows-Key + R, enter sysdm.cpl ,3 (mind the space before the comma) and press Enter
Click the Environment variables button
In either of the fields (User variables or System variables), add the four variables
According to Doc in Requests:
https://requests.readthedocs.io/en/master/user/advanced/#proxies
you can use proxy in this way:
proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080',}
requests.get('http://example.org', proxies=proxies)
Then depending on if you want to add .crt or .pem:
requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
requests.get('https://kennethreitz.org', cert='/path/client.pem')
https://2.python-requests.org//en/v1.0.4/user/advanced/
You are trying to make https requests to an outer url and you need to provide the proper certificate files for verification. You are trying to make these configurations inside each component. But I would suggest that you make those configurations globally and system-wide so neither of the components need to provide certificates and deal with ssl-verification stuff.
I am awful at windows related networking configurations, but I would suggest you go check Proxifier and I am pretty sure you can configure a ssl proxy with proper certificates.

Pyramid override default requests log to add new parameters

So I use Pyramid and I need to log all outgoing requests. I added this to configuration.ini:
[logger_requests]
level = DEBUG
handlers = console
qualname = urllib3
And this works fine.
1 2019-12-19T14:44:14.888+02:00 kazibo-msi APPNAME - DEBUG [urllib3.connectionpool][139843373852416 route="/status" x_request_id="9f7286e1-c6be-4136-83ba-2666fe1f854f"] https://website.com:443 "GET /rest/billing/debt/health HTTP/1.1" 200 1502
But I also need to log the time elapsed making request. Using requests package I can do it like that:
requests.get(url='https://somewebsite.com/data').elapsed
But how can I add this information to log now? I know about the option to add logger.log(...) but I would like to avoid it.
For code that I control I'd usually wrap things in my own utility that I can instrument instead of trying to patch/modify how urllib3 works or performs its own logging. This could be just a few functions you use across the codebase or a custom requests.Session subclass, etc.

Selenium standalone server log level

Long story short: I'm trying to change log level to WARNING on selenium standalone server. I'm running 2.48.2 on CentOS 6.7.
I tried the server side, i.e. added -Dselenium.LOGGER.level=WARNING when starting the server - didn't work. Then I tried custom properties file -Djava.util.logging.config.file=/opt/selenium/my.properties with default level as WARNING - didn't work.
Then I tried doing that on the client side, I'm using WebDriver API for python. I tried both suggestions from this thread, didn't work either.
Is there a nice non-hacky way to change the level to Warning? Or at least make it omit the keystrokes? It's dumping passwords in my log files and I don't like that.
-Dselenium.LOGGER.level=WARNING is correct. You need to add it in front of -jar.

Suds ignoring proxy setting

I'm trying to use the salesforce-python-toolkit to make web services calls to the Salesforce API, however I'm having trouble getting the client to go through a proxy. Since the toolkit is based on top of suds, I tried going down to use just suds itself to see if I could get it to respect the proxy setting there, but it didn't work either.
This is tested on suds 0.3.9 on both OS X 10.7 (python 2.7) and ubuntu 12.04.
an example request I've made that did not end up going through the proxy (just burp or charles proxy running locally):
import suds
ws = suds.client.Client('file://sandbox.xml',proxy={'http':'http://localhost:8888'})
ws.service.login('user','pass')
I've tried various things with the proxy - dropping http://, using an IP, using a FQDN. I've stepped through the code in pdb and see it setting the proxy option. I've also tried instantiating the client without the proxy and then setting it with:
ws.set_options(proxy={'http':'http://localhost:8888'})
Is proxy not used by suds any longer? I don't see it listed directly here http://jortel.fedorapeople.org/suds/doc/suds.options.Options-class.html, but I do see it under transport. Do I need to set it differently through a transport? When I stepped through in pdb it did look like it was using a transport, but I'm not sure how.
Thank you!
I went into #suds on freenode and Xelnor/rbarrois provided a great answer! Apparently the custom mapping in suds overrides urllib2's behavior for using the system configuration environment variables. This solution now relies on having the http_proxy/https_proxy/no_proxy environment variables set accordingly.
I hope this helps anyone else running into issues with proxies and suds (or other libraries that use suds). https://gist.github.com/3721801
from suds.transport.http import HttpTransport as SudsHttpTransport
class WellBehavedHttpTransport(SudsHttpTransport):
"""HttpTransport which properly obeys the ``*_proxy`` environment variables."""
def u2handlers(self):
"""Return a list of specific handlers to add.
The urllib2 logic regarding ``build_opener(*handlers)`` is:
- It has a list of default handlers to use
- If a subclass or an instance of one of those default handlers is given
in ``*handlers``, it overrides the default one.
Suds uses a custom {'protocol': 'proxy'} mapping in self.proxy, and adds
a ProxyHandler(self.proxy) to that list of handlers.
This overrides the default behaviour of urllib2, which would otherwise
use the system configuration (environment variables on Linux, System
Configuration on Mac OS, ...) to determine which proxies to use for
the current protocol, and when not to use a proxy (no_proxy).
Thus, passing an empty list will use the default ProxyHandler which
behaves correctly.
"""
return []
client = suds.client.Client(my_wsdl, transport=WellBehavedHttpTransport())
I think you can do by using a urllib2 opener like below.
import suds
t = suds.transport.http.HttpTransport()
proxy = urllib2.ProxyHandler({'http': 'http://localhost:8888'})
opener = urllib2.build_opener(proxy)
t.urlopener = opener
ws = suds.client.Client('file://sandbox.xml', transport=t)
I was actually able to get it working by doing two things:
making sure there were keys in the proxy dict for http as well as https.
setting the proxy using set_options AFTER creation of the client.
So, my relevant code looks like this:
self.suds_client = suds.client.Client(wsdl)
self.suds_client.set_options(proxy={'http': 'http://localhost:8888', 'https': 'http://localhost:8888'})
I had multiple issues using Suds, even though my proxy was configured properly I could not connect to the endpoint wsdl. After spending significant time attempting to formulate a workaround, I decided to give soap2py a shot - https://code.google.com/p/pysimplesoap/wiki/SoapClient
Worked straight off the bat.
For anyone who's attempting cji's solution over HTTPS, you actually need to keep one of the handlers for the basic authentication. I also am using python3.7 so urllib2 has been replaced with urllib.request.
from suds.transport.https import HttpAuthenticated as SudsHttpsTransport
from urllib.request import HTTPBasicAuthHandler
class WellBehavedHttpsTransport(SudsHttpsTransport):
""" HttpsTransport which properly obeys the ``*_proxy`` environment variables."""
def u2handlers(self):
""" Return a list of specific handlers to add.
The urllib2 logic regarding ``build_opener(*handlers)`` is:
- It has a list of default handlers to use
- If a subclass or an instance of one of those default handlers is given
in ``*handlers``, it overrides the default one.
Suds uses a custom {'protocol': 'proxy'} mapping in self.proxy, and adds
a ProxyHandler(self.proxy) to that list of handlers.
This overrides the default behaviour of urllib2, which would otherwise
use the system configuration (environment variables on Linux, System
Configuration on Mac OS, ...) to determine which proxies to use for
the current protocol, and when not to use a proxy (no_proxy).
Thus, passing an empty list (asides from the BasicAuthHandler)
will use the default ProxyHandler which behaves correctly.
"""
return [HTTPBasicAuthHandler(self.pm)]

How to Disable Django / mod_WSGI Page Caching

I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.
From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)
Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/
Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.
I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On

Categories