From Python, I would like to retrieve content from a web site via HTTPS with basic authentication. I need the content on disk. I am on an intranet, trusting the HTTPS server. Platform is Python 2.6.2 on Windows.
I have been playing around with urllib2, however did not succeed so far.
I have a solution running, calling wget via os.system():
wget_cmd = r'\path\to\wget.exe -q -e "https_proxy = http://fqdn.to.proxy:port" --no-check-certificate --http-user="username" --http-password="password" -O path\to\output https://fqdn.to.site/content'
I would like to get rid of the os.system(). Is that possible in Python?
Proxy and https wasn't working for a long time with urllib2. It will be fixed in the next released version of python 2.6 (v2.6.3).
In the meantime you can reimplement the correct support, that's what we did for mercurial: http://hg.intevation.org/mercurial/crew/rev/59acb9c7d90f
Try this (notice that you'll have to fill in the realm of your server also):
import urllib2
authinfo = urllib2.HTTPBasicAuthHandler()
authinfo.add_password(realm='Fill In Realm Here',
uri='https://fqdn.to.site/content',
user='username',
passwd='password')
proxy_support = urllib2.ProxyHandler({"https" : "http://fqdn.to.proxy:port"})
opener = urllib2.build_opener(proxy_support, authinfo)
fp = opener.open("https://fqdn.to.site/content")
open(r"path\to\output", "wb").write(fp.read())
You could try this too:
http://code.google.com/p/python-httpclient/
(It also supports the verification of the server certificate.)
Related
It's any way to check my public IP addres using Python? I have had an account on Cloudflare and VPS in a home (but dynamic IP). I need to update VPN IP'S before had an account on OVH and DDNS works after migration dose not work.
requests:
import requests
response = requests.get('http://ifconfig.me')
print(response.text)
python 2:
import urllib2
response = urllib2.urlopen('http://ifconfig.me')
print response.read()
python 3:
from urllib import request
response = request.urlopen('http://ifconfig.me')
print(response.read())
There's a bunch of websites online that offer simple APIs to check the IP address you're connecting from.
As an example:
➜ ~ curl 'https://api.ipify.org?format=json'
{"ip":"1.2.3.4"}
You can use a python library (like Requests) to call the API in python code.
# download requests with `pip install requests`
import requests
res = requests.get("https://api.ipify.org?format=json")
your_ip = res.json() # {"ip" : "1.2.3.4"}
Recently I've created a simple docker image that allows you to keep you local ip in sync with Cloudflare AKA Cloudflare Dynamic DNS (DDNS)
check it out here:
https://github.com/marcelowa/cloudflare-dynamic-dns
I have a code as below :
headers = {'content-type': 'ContentType.APPLICATION_XML'}
uri = "www.client.url.com/hit-here/"
clientCert = "path/to/cert/abc.crt"
clientKey = "path/to/key/abc.key"
PROTOCOL = ssl.PROTOCOL_TLSv1
context = ssl.SSLContext(PROTOCOL)
context.load_default_certs()
context.load_cert_chain(clientCert, clientKey)
conn = httplib.HTTPSConnection(uri, some_port, context=context)
I am not really a network programmer, so i did some googling for handshake connection and found ssl.SSLContext(PROTOCOL) as the needed function, code works fine.
Then i hit the roadblock, my local has version 2.7.10 but all the production boxes have 2.7.3 with them, so SSLContext is not supported and upgrading python version is not an option / in control.
I tried reading ssl — SSL wrapper for socket objects but couldn't make sense out of it.
what i tried (in vain) :
s_ = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = ssl.wrap_socket(s_, keyfile=clientKey, certfile=clientCert, cert_reqs=ssl.CERT_REQUIRED)
new_conn = s.connect((uri, some_port))
but returns :
SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)')
Question - how to generate SSL Context on older version so as to have a secure https connection?
You have to specify the ca_certs file (which should point to trust store)
I've got the perfect solution using the requests library. The requests library has got to be my favorite library I've ever used, cause it takes something in Python that is inherently difficult to do -- SSL and REST requests -- and makes it unbelievably simple. I checked out their version support and Python 2.6+ is supported.
Here is an example of how to use their library.
>>> requests.get(uri)
And that is all you have to do. The requests library takes care of establishing a ssl connection.
Taking this one step farther. If you need to persist cookies between requests, you can do so like this.
>>> sess = requests.Session()
>>> credentials = {"username": "user",
"password": "pass"}
>>> sess.post("https://some-website/login", params=credentials)
<Response [200]>
>>> sess.get("https://some-website/a-backend-page").text
<html> the backend page... </html>
Edit: If you need to, you can also pass in the path to the certificate and the key like so requests.get(uri, cert=('path/to/cert/abc.crt', 'path/to/key/abc.key'))
Now hopefully you can convince them to install the requests library on the production boxes, cause it would be well worth it. Let me know if this works out for you.
Few days back I was wanting to build a proxy that could allow me to securely and anonymously connect to websites and servers. At first it seemed like a pretty easy idea, I would create an HTTP proxy that uses SSL between the client and the proxy, It would then create a SSL connection with what ever website/server the client requested and then forward that information to and from the client and server. I spent about a day researching and writing code that would do just that. But I then realized that someone could compromise the proxy and use the session key that the proxy had to decrypt and read the data being sent to and from the server.
After a little more research it seem that a socks proxy is what I needed. However there is not much documentation on a python version of a socks proxy(Mostly just how to connect to one). I was able to find The PySocks Module and read the Socks.py file. It looks great for creating a client but I don't see how I could use it to make a proxy.
I was wondering if anyone had a simple example of a socks5 proxy or if someone could point me to some material that could help me begin learning and building one?
You create a python server to listen on a port and listen on IP 127.0.0.1. When you connect to your server you send: "www.facebook.com:80". No URL path nor http scheme. If the connect fails you send a failure message which may look something like "number Unable to connect to host." where number is an specific code that signifies a failed connection attempt. Upon success of a connection you send "200 Connection established". Then data is sent and received as normal. You do not want to use an http proxy because it accepts only website traffic.
You may want to use a framework for the proxy server because it should handle multiple connections.
I've read an ebook on asyncio named O'Reilly "Using Asyncio In Python 2020" multiple times and re-read it every now and again to try to grasp multiple connections. I have also just started to search for solutions using Flask because I want my proxy server to run along side a webserver.
I recommend using requesocks along with stem (assumes Tor). The official stem library is provided by Tor. Here's a simplified example based on a scraper that I wrote which also uses fake_useragent so you look like a browser:
import requesocks
from fake_useragent import UserAgent
from stem import Signal
from stem.control import Controller
class Proxy(object):
def __init__(self,
socks_port=9050,
tor_control_port=9051,
tor_connection_password='password')
self._socks_port = int(socks_port)
self._tor_control_port = int(tor_control_port)
self._tor_connection_password = tor_connection_password
self._user_agent = UserAgent()
self._session = None
self._update_session()
def _update_session(self):
self._session = requesocks.session()
# port 9050 is the default SOCKS port
self._session.proxies = {
'http': 'socks5://127.0.0.1:{}'.format(self._socks_port),
'https': 'socks5://127.0.0.1:{}'.format(self._socks_port),
}
def _renew_tor_connection(self):
with Controller.from_port(port=self._tor_control_port) as controller:
controller.authenticate(password=self._tor_connection_password)
controller.signal(Signal.NEWNYM)
def _sample_get_response(self, url):
if not self._session:
self._update_session()
# generate random user agent string for every request
headers = {
'User-Agent': self._user_agent.random,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-us,en;q=0.5',
} # adjust as desired
response = self._session.get(url, verify=False, headers=headers)
return response
You must have the Tor service running before executing this script and you must also modify your torrc file to enable the control port (9051).
Tor puts the torrc file in /usr/local/etc/tor/torrc if you compiled Tor from source, and /etc/tor/torrc or /etc/torrc if you installed a pre-built package. If you installed Tor Browser, look for
Browser/TorBrowser/Data/Tor/torrc inside your Tor Browser directory (On Mac OS X, you must right-click or command-click on the Tor Browser icon and select "Show Package Contents" before the Tor Browser directories become
visible).
Once you've found your torrc file, you need to uncomment the corresponding lines:
ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
HashedControlPassword 16:05834BCEDD478D1060F1D7E2CE98E9C13075E8D3061D702F63BCD674DE
Please note that the HashedControlPassword above is for the password "password". If you want to set a different password (recommended), replace the HashedControlPassword in the torrc file by noting the output from tor --hash-password "<new_password>" where <new_password> is the password that you want to set.
Once you've changed your torrc file, you will need to restart tor for the changes to take effect (note that you actually only need to send Tor a HUP signal, not actually restart it). To restart it:
sudo service tor restart
I hope this helps and at least gets you started for what you were looking for.
I am using gh-issues-import to migrate issues between GitHub and a GitHub Enterprise server. The problem I have is, our GHE requires going through a VPN proxy, while GitHubs API requires HTTPS route to access. I can only get one or the other, but having a hell of a time finding a way to access both via the same Python project using urllib.requests. Here is a scaled down script I used to utilize the library that is failing in gh-issues-import...
import urllib.request
# works through VPN (notice able to use http), requires VPN
GitHubEnterpriseurl = "http://xxxxx/api/v3/"
req = urllib.request.Request(GitHubEnterpriseurl)
response = urllib.request.urlopen(req)
json_data = response.read()
print(json_data)
# does not work on VPN due to https path, but fine outside of VPN
req = urllib.request.Request("https://api.github.com")
response = urllib.request.urlopen(req)
json_data = response.read()
print(json_data)
I have tried other HTTP libraries and comes down to the VPN blocking access to the https://api.github.com. What are some solutions for this? Can I create a script on another server, my VPN has access to, and simply clone the requests and route the data?
* I am able to connect to https://api.github.com using VPN through the browser (Chrome / Firefox) but when running any command line tools or this script to access it fails.
Does urllib2 in Python 2.6.1 support proxy via https?
I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:
NOTE
Currently urllib2 does not support
fetching of https locations through a
proxy. This can be a problem.
I'm trying automate login in to web site and downloading document, I have valid username/password.
proxy_info = {
'host':"axxx", # commented out the real data
'port':"1234" # commented out the real data
}
proxy_handler = urllib2.ProxyHandler(
{"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)
I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.
I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?
EDIT:
Nope, I tested with
proxies = {
"http" : "http://%(host)s:%(port)s" % proxy_info,
"https" : "https://%(host)s:%(port)s" % proxy_info
}
proxy_handler = urllib2.ProxyHandler(proxies)
And I get error:
urllib2.URLError: urlopen error
[Errno 8] _ssl.c:480: EOF occurred in
violation of protocol
Fixed in Python 2.6.3 and several other branches:
_bugs.python.org/issue1424152 (replace _ with http...)
http://www.python.org/download/releases/2.6.3/NEWS.txt
Issue #1424152: Fix for httplib, urllib2 to support SSL while working through
proxy. Original patch by Christopher Li, changes made by Senthil Kumaran.
I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)
Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).
See the unittest in the python source that is testing https proxying support for further information:
http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203