urllib2/pycurl in Django: Fetch XML, check HTTP status, check HTTPS connection - python

I need to make an API call (of sorts) in Django as a part of the custom authentication system we require. A username and password is sent to a specific URL over SSL (using GET for those parameters) and the response should be an HTTP 200 "OK" response with the body containing XML with the user's info.
On an unsuccessful auth, it will return an HTTP 401 "Unauthorized" response.
For security reasons, I need to check:
The request was sent over an HTTPS connection
The server certificate's public key matches an expected value (I use 'certificate pinning' to defend against broken CAs)
Is this possible in python/django using pycurl/urllib2 or any other method?

Using M2Crypto:
from M2Crypto import SSL
ctx = SSL.Context('sslv3')
ctx.set_verify(SSL.verify_peer | SSL.verify_fail_if_no_peer_cert, depth=9)
if ctx.load_verify_locations('ca.pem') != 1:
raise Exception('No CA certs')
c = SSL.Connection(ctx)
c.connect(('www.google.com', 443)) # automatically checks cert matches host
c.send('GET / \n')
c.close()
Using urllib2_ssl (it goes without saying but to be explicit: use it at your own risk):
import urllib2, urllib2_ssl
opener = urllib2.build_opener(urllib2_ssl.HTTPSHandler(ca_certs='ca.pem'))
xml = opener.open('https://example.com/').read()
Related: Making HTTPS Requests secure in Python.
Using pycurl:
c = pycurl.Curl()
c.setopt(pycurl.URL, "https://example.com?param1=val1&param2=val2")
c.setopt(pycurl.HTTPGET, 1)
c.setopt(pycurl.CAINFO, 'ca.pem')
c.setopt(pycurl.SSL_VERIFYPEER, 1)
c.setopt(pycurl.SSL_VERIFYHOST, 2)
c.setopt(pycurl.SSLVERSION, 3)
c.setopt(pycurl.NOBODY, 1)
c.setopt(pycurl.NOSIGNAL, 1)
c.perform()
c.close()
To implement 'certificate pinning' provide different 'ca.pem' for different domains.

httplib2 can do https requests with certificate validation:
import httplib2
http = httplib2.Http(ca_certs='/path/to/cert.pem')
try:
http.request('https://...')
except httplib2.SSLHandshakeError, e:
# do something
Just make sure that your httplib2 is up to date. The one which is shipped with my distribution (ubuntu 10.04) does not have ca_certs parameter.
Also in similar question to yours there is an example of certificate validation with pycurl.

Related

How to download data from a password protected website

I'm using request in python to try and download this file:
http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip there are 14000 such files hence why I need to automate the process. The other techniques I've found online don't seem to work. I assume due the websites they are designed for using a different authentication method. I don't know much about web development so I can't work out how this authentication works.
Edit
This is the code:
import json
import requests
from requests.auth import HTTPBasicAuth
file = open("srtm30m_bounding_boxes.json", 'r')
strjson = file.read()
x = json.loads(strjson)
filenamelist = []
url = "http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip"
for i in range(14295):
filenamelist.append(x['features'][i]['properties']['dataFile'])
filenamelist[i] = "http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/" + filenamelist[i]
jar = requests.cookies.RequestsCookieJar()
jar.set('urs_user_already_logged', 'yes')
jar.set('_urs-gui_session','8b972449036e60e3d83a6a819b93124d')
r = requests.get(url, cookies=jar)
And this is the error I get when I run the code:
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
The simplest thing is to provide the username and password in the URL before the host, e.g.:
requests.get('http://{username}:{password}#e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/{subpath}'.format(username=username, password=password, subpath=filenamelist[i]))
You can also supply the username/password as the auth parameter to get:
requests.get('http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/{subpath}'.format(subpath=filenamelist[i]), auth=(username, password))
totalhack is right that https is more secure, and it seems to work on this site. This form of authentication transmits the username and password as plaintext, so anyone who can observe the http request would also be able to steal your login. https encrypts the username / password since it encrypts the entire request.

python requests verify SSL certificate

I'm trying to pull data from an API which is secured by SSL. I wrote a python script to pull the data. Beforehand I have to convert a .p12 file to an openSSL certificate. When I use the following code it works just fine:
# ----- SCRIPT 1 -----
def pfx_to_pem(pfx_path, pfx_password):
''' Decrypts the .pfx file to be used with requests. '''
with tempfile.NamedTemporaryFile(suffix='.pem') as t_pem:
f_pem = open(t_pem.name, 'wb')
pfx = open(pfx_path, 'rb').read()
p12 = OpenSSL.crypto.load_pkcs12(pfx, pfx_password)
f_pem.write(OpenSSL.crypto.dump_privatekey(OpenSSL.crypto.FILETYPE_PEM, p12.get_privatekey()))
f_pem.write(OpenSSL.crypto.dump_certificate(OpenSSL.crypto.FILETYPE_PEM, p12.get_certificate()))
ca = p12.get_ca_certificates()
if ca is not None:
for cert in ca:
f_pem.write(OpenSSL.crypto.dump_certificate(OpenSSL.crypto.FILETYPE_PEM, cert))
f_pem.close()
yield t_pem.name
# read some config
with open('config.json') as config_json:
config = json.load(config_json)
api_url = config['api_url']
cert = config['cert']
cert_pem_path = cert['file']
cert_key_file = cert['pass']
# make the request
with pfx_to_pem(cert_pem_path, cert_key_file) as cert:
r = requests.get(api_url, cert = cert)
Because I'm also using the same functionality to authenticate my Flask web service towards the server I split up the cert file into three files:
# ----- SCRIPT 1 -----
# get certificate
f_pem.write(OpenSSL.crypto.dump_certificate(
OpenSSL.crypto.FILETYPE_PEM, p12.get_certificate())
)
# get keyfile
f_key.write(OpenSSL.crypto.dump_privatekey(
OpenSSL.crypto.FILETYPE_PEM, p12.get_privatekey())
)
# get CA_BUNDLE
ca = p12.get_ca_certificates()
if ca is not None:
for cert in ca:
f_ca.write(
OpenSSL.crypto.dump_certificate(
OpenSSL.crypto.FILETYPE_PEM, cert
))
Then I'm running the web service with the following code:
# ----- SCRIPT 2 -----
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
context.verify_mode = ssl.CERT_REQUIRED
context.load_verify_locations(cert_ca)
context.load_cert_chain(cert_pem, cert_key)
app.run(ssl_context = context, host = '0.0.0.0')
and changed the requests call to
# ----- SCRIPT 1 -----
r = requests.get(api_url, cert = (cert_pem, cert_key), verify = cert_ca)
When trying to pull data from the API I get the error
requests.exceptions.SSLError: HTTPSConnectionPool(host='some.host', port=443): Max retries exceeded with url: /some/path/var?ID=xxxxxx (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:847)'),))
Question 1: What am I doing wrong creating the CA_BUNDLE?
Question 2: Am I handling the creation of the web service correctly? My goal is to verify my server against the server holding the data to eventually be able to receive the data by push request.
EDIT: when connecting to my web service (in a browser) I receive the warning that the connection is not secure, because the certificate is not valid, despite the fact that I imported the .p12 certificate into my browser.
So I'm using the request and json library to call API, in my case I can set-up the request to ignore the certificate and this quickly solved my issue
requests.get(url, headers=headers, verify=False)
the argument verify=False ignore the certificate but when you run your code it will show a warning message as output saying that the certificate is wrong, so you can add this other piece of code to don't get request warning showed:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
I know that doesn't answer your question but maybe you can try to see if without certificate you are able to get information without problem.

Python Requests - Use navigate site by servers IP

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.
How can I utilize this in the requests library?
For example, I want to go directly to
www.example.com/foo.php, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?
I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?
You will have to set a custom header host with value of example.com, something like:
requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})
should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80 and then run the above command. It will produce output in the netcat window:
GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:
requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})
The URL 'patching' can be done with the urlparse library:
parsed = urlparse.urlparse(url)
hostname = parsed.hostname
parsed = parsed._replace(netloc=ipaddress)
ip_url = parsed.geturl()
response = requests.get(ip_url, headers={'Host': hostname})
Demo against Stack Overflow:
>>> import urlparse
>>> import socket
>>> url = 'http://stackoverflow.com/help/privileges'
>>> parsed = urlparse.urlparse(url)
>>> hostname = parsed.hostname
>>> hostname
'stackoverflow.com'
>>> ipaddress = socket.gethostbyname(hostname)
>>> ipaddress
'198.252.206.16'
>>> parsed = parsed._replace(netloc=ipaddress)
>>> ip_url = parsed.geturl()
>>> ip_url
'http://198.252.206.16/help/privileges'
>>> response = requests.get(ip_url, headers={'Host': hostname})
>>> response
<Response [200]>
In this case I looked up the ip address dynamically.
Answer for HTTPS/SNI support: Use the HostHeaderSSLAdapter in the requests_toolbelt module:
The above solution works fine with virtualhosts for non-encrypted HTTP connections. For HTTPS you also need to pass SNI (Server Name Identification) in the TLS header which as some servers will present a different SSL certificate depending on what is passed in via SNI. Also, the python ssl libraries by default don't look at the Host: header to match the server connection at connection time.
The above provides a straight-forward for adding a transport adapter to requests that handles this for you.
Example
import requests
from requests_toolbelt.adapters import host_header_ssl
# Create a new requests session
s = requests.Session()
# Mount the adapter for https URLs
s.mount('https://', host_header_ssl.HostHeaderSSLAdapter())
# Send your request
s.get("https://198.51.100.50", headers={"Host": "example.org"})
I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.
Otherwise, you will see various issue like InsecureRequestWarning, SSLCertVerificationError, and SNI is always missing in Client Hello, even if you try different combination of headers and verify arguments.
requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)
In addition, I tried
requests_toolbelt
pip install requests[security]
forcediphttpsadapter
all solutions mentioned here using requests with TLS doesn't give SNI support
None of them set SNI when hitting https://IP directly.
# mock /etc/hosts
# lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
etc_hosts = {}
# decorate python built-in resolver
def custom_resolver(builtin_resolver):
def wrapper(*args, **kwargs):
try:
return etc_hosts[args[:2]]
except KeyError:
# fall back to builtin_resolver for endpoints not in etc_hosts
return builtin_resolver(*args, **kwargs)
return wrapper
# monkey patching
socket.getaddrinfo = custom_resolver(socket.getaddrinfo)
def _bind_ip(domain_name, port, ip):
'''
resolve (domain_name,port) to a given ip
'''
key = (domain_name, port)
# (family, type, proto, canonname, sockaddr)
value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
etc_hosts[key] = [value]
_bind_ip('example.com', 443, '1.2.3.4')
# this sends requests to 1.2.3.4
response = requests.get('https://www.example.com/foo.php', verify=True)

Modifying Python code to use SSL for a REST call

I have Python code to call a REST service that is something like this:
import urllib
import urllib2
username = 'foo'
password = 'bar'
passwordManager = urllib2.HTTPPasswordMgrWithDefaultRealm()
passwordManager .add_password(None, MY_APP_PATH, username, password)
authHandler = urllib2.HTTPBasicAuthHandler(passwordManager)
opener = urllib2.build_opener(authHandler)
urllib2.install_opener(opener)
params= { "param1" : param1,
"param2" : param2,
"param3" : param3 }
xmlResults = urllib2.urlopen(MY_APP_PATH, urllib.urlencode(params)).read()
results = MyResponseParser.parse(xmlResults)
MY_APP_PATH is currently an HTTP url. I would like to change it to use SSL ("HTTPS"). How would I go about changing this code to use https in the simplest way possible?
Unfortunately, urllib2 and httplib, at least up to Python 2.7 don't do any certificate verification for when using HTTPS. The result is that you're exchanging information with a server you haven't necessarily identified (it's a bit like exchanging a secret with someone whose identity you haven't verified): this defeats the security purpose of HTTPS.
See this quote from httplib (in Python 2.7):
Note: This does not do any certificate
verification.
(This is independent of httplib.HTTPSConnection being able to send a client-certificate: that's what its key and cert parameters are for.)
There are ways around this, for example:
http://thejosephturner.com/blog/post/https-certificate-verification-in-python-with-urllib2/
http://code.google.com/p/python-httpclient/ (not using urllib2, so possibly not the shortest way for you)
Just using HTTPS:// instead of HTTP:// in the URL you are calling should work, at least if you are trying to reach a known/verified server. If necessary, you can use your client-side SSL certificate to secure the API transaction:
mykey = '/path/to/ssl_key_file'
mycert = '/path/to/ssl_cert_file'
opener = urllib2.build_opener(HTTPSClientAuthHandler(mykey, mycert))
opener.add_handler(urllib2.HTTPBasicAuthHandler()) # add HTTP Basic Authentication information...
opener.add_password(user=settings.USER_ID, passwd=settings.PASSWD)

python http request with token

how and with which python library is it possible to make an httprequest (https) with a user:password or a token?
basically the equivalent to curl -u user:pwd https://www.mysite.com/
thank you
use python requests : Http for Humans
import requests
requests.get("https://www.mysite.com/", auth=('username','pwd'))
you can also use digest auth...
If you need to make thread-safe requests, use pycurl (the python interface to curl):
import pycurl
from StringIO import StringIO
response_buffer = StringIO()
curl = pycurl.Curl()
curl.setopt(curl.URL, "https://www.yoursite.com/")
# Setup the base HTTP Authentication.
curl.setopt(curl.USERPWD, '%s:%s' % ('youruser', 'yourpassword'))
curl.setopt(curl.WRITEFUNCTION, response_buffer.write)
curl.perform()
curl.close()
response_value = response_buffer.getvalue()
Otherwise, use urllib2 (see other responses for more info) as it's builtin and the interface is much cleaner.
class urllib2.HTTPSHandler
A class to handle opening of HTTPS URLs.
21.6.7. HTTPPasswordMgr Objects
These methods are available on HTTPPasswordMgr and HTTPPasswordMgrWithDefaultRealm objects.
HTTPPasswordMgr.add_password(realm, uri, user, passwd)
uri can be either a single URI, or a sequence of URIs. realm, user and passwd must be strings. This causes (user, passwd) to be used as authentication tokens when authentication for realm and a super-URI of any of the given URIs is given.
HTTPPasswordMgr.find_user_password(realm, authuri)
Get user/password for given realm and URI, if any. This method will return (None, None) if there is no matching user/password.
For HTTPPasswordMgrWithDefaultRealm objects, the realm None will be searched if the given realm has no matching user/password.
Check our urllib2. The examples at the bottom will probably be of interest.
http://docs.python.org/library/urllib2.html

Categories