I am creating a proxy server in python, which is based on BaseHTTPServer.
What it does is create a connection to a squid proxy, identifies the browser request(GET, CONNECT, POST etc) and adds a proxy-authorization header to it, and then forwards this request to the squid proxy.
Problem is, as I understand, when I send a connect request, I should relay all the corresponding traffic to the squid proxy. But, as I can see in wireshark, the squid proxy doesn't reply to the 'Client Hello' part of the handshake, which I think is due to squid proxy not understanding binary data of SSL that I am just forwarding to it.
How do I process HTTPS requests in this case?
The code is more or less similar to TinyHTTPProxy : http://www.oki-osk.jp/esc/python/proxy/
RFC 2817 defines the CONNECT method. It is different from other HTTP methods in that the receiving proxy (your Python proxy) is directed to establish a raw TCP tunnel directly to the destination host (called the authority in the RFC).
A proxy can make no assumptions about the data that will be sent over that tunnel; it will not necessarily be HTTP – the client can use the tunnel to speak any protocol it likes. Indeed, SSL ≠ HTTP.
You have two options:
Open a TCP connection directly to the requested destination host.
Make a CONNECT request to your upstream proxy (Squid). This is within spec:
It may be the case that the proxy itself can only reach the
requested origin server through another proxy. In this case, the
first proxy SHOULD make a CONNECT request of that next proxy,
requesting a tunnel to the authority. A proxy MUST NOT respond
with any 2xx status code unless it has either a direct or tunnel
connection established to the authority.
Make sure that your request includes the required Host header.
CONNECT www.google.com:443 HTTP/1.1
Host: www.google.com:443
Proxy-Authorization: ...
Related
I am trying to understand how HTTP/3 works. Ultimately, my goal is to send HTTP/3 request to a host with proxy and receive a response back.
The host I am trying to reach only accepts HTTP/3 Connection.
There is a library that takes care of heavy lifting to initiate a HTTP 3 connection however they don't demonstrate how proxy can be passed into the packets.
https://github.com/aiortc/aioquic/blob/main/examples/http3_client.py
I am running the following file after cloning the repo like this:
python3 examples/http3_client.py 'https://www.truepeoplesearch.com/'
Doing so does route the request via HTTP/3 using QUIC protocol. How can I send the same request behind a proxy with IP, pOrt, username and password of the proxy.
I'm trying to send an HTTPS request through an HTTPS tunnel. That is, my proxy expects HTTPS for the CONNECT. It also expects a client certificate.
I'm using Requests' proxy features.
import requests
url = "https://some.external.com/endpoint"
with requests.Session() as session:
response = session.get(
url,
proxies={"https": "https://proxy.host:4443"},
# client certificates expected by proxy
cert=(cert_path, key_path),
verify="/home/savior/proxy-ca-bundle.pem",
)
with response:
...
This works, but with some limitations:
I can only set client certificates for the TLS connection with the proxy, not for the external endpoint.
The proxy-ca-bundle.pem only verifies the server certificates in the TLS connection with the proxy. The server certificates from the external endpoint are seemingly ignored.
Is there any way to use requests to address these two issues? I'd like to set a different set of CAs for the external endpoint.
I also tried using http.client and HTTPSConnection.set_tunnel but, as far as I can tell, its tunnel is done through HTTP and I need HTTPS.
Looking at the source code, it doesn't seem like requests currently supports this "TLS in TLS", ie. providing two sets of clients/CA bundles for a proxied requests.
We can use PycURL which simply wraps libcurl
from io import BytesIO
import pycurl
url = "https://some.external.com/endpoint"
buffer = BytesIO()
curl = pycurl.Curl()
curl.setopt(curl.URL, url)
curl.setopt(curl.WRITEDATA, buffer)
# proxy settings
curl.setopt(curl.HTTPPROXYTUNNEL, 1)
curl.setopt(curl.PROXY, "https://proxy.host")
curl.setopt(curl.PROXYPORT, 4443)
curl.setopt(curl.PROXY_SSLCERT, cert_path)
curl.setopt(curl.PROXY_SSLKEY, key_path)
curl.setopt(curl.PROXY_CAINFO, "/home/savior/proxy-ca-bundle.pem")
# endpoint verification
curl.setopt(curl.CAINFO, "/home/savior/external-ca-bundle.pem")
try:
curl.perform()
except pycurl.error:
pass # log or re-raise
else:
status_code = curl.getinfo(curl.RESPONSE_CODE)
PycURL will use the PROXY_ settings to establish a TLS connection to the proxy, send it an HTTP CONNECT request. Then it'll establish a new TLS session through the proxy connection to the external endpoint and use the CAINFO bundle to verify those server certificates.
when using Python 2.7s urllib2 I do not seem to be able to retrieve a resource from a HTTPS server while using a SSL secured proxy server, i.e. to following:
CLIENT ---- (HTTPS) ---> PROXY ---- (https) --- > SERVER
Of cause to get through the proxy server one uses CONNECT. Any ideas?
Alternative question: when using CONNECT one needs to setup a completly independent 2. SSL session inside the tunnel, right? How could one do that in python as simply calling ssl.wrap_socket does not do the trick...?
I am using SSL tunneling with a proxy server to connect to a target server. I use http to connect to the proxy server and HTTPS to connect to the target server. The SSL tunneling works as it should and I can exchange HTTPS messages with the remote server, but there is a problem. The proxy server returns a header in its reply to urllib2's request to establish the SSL tunnel that I need to see, but I don't see a way to get access to it using urllib2 (Python 2.7.3).
I suppose I could theoretically implement the SSL tunneling handshake myself, but that would get me way deeper into the protocol than I want to be (or with which I feel comfortable).
Is there a way to get access to the reply using urllib2 when establishing the SSL tunnel?
UPDATE:
Here is the code that uses the proxy server to connect to the target server (the proxy server and the target server's URLs are not the actual ones):
proxy_handler = urllib2.ProxyHandler({'https': 'http://proxy.com'})
url_opener = urllib2.build_opener (proxy_handler)
request = urllib2.Request ('https://target_server.com/')
response = url_opener.open (request)
print response.headers.dict
I used WireShark to look at the message traffic. WireShark won't show me the bodies of the messages exchanged with the target server because they are encrypted, but I can see the body of the SSL Tunnel handshake. I can see the header that I'm interested coming back from the proxy server.
How are you calling the https page.
are you using
resp = urllib2.urlopen('https')
resp.info().headers
Recently I have been playing around with the HTTP Proxy in twisted. After much trial and error I think I finally I have something working. What I want to know though, is how, if it is possible, do I expand this proxy to also be able to handle HTTPS pages? Here is what I've got so far:
from twisted.internet import reactor
from twisted.web import http
from twisted.web.proxy import Proxy, ProxyRequest, ProxyClientFactory, ProxyClient
class HTTPProxyClient(ProxyClient):
def handleHeader(self, key, value):
print "%s : %s" % (key, value)
ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
print buffer
ProxyClient.handleResponsePart(self, buffer)
class HTTPProxyFactory(ProxyClientFactory):
protocol = HTTPProxyClient
class HTTPProxyRequest(ProxyRequest):
protocols = {'http' : HTTPProxyFactory}
def process(self):
print self.method
for k,v in self.requestHeaders.getAllRawHeaders():
print "%s : %s" % (k,v)
print "\n \n"
ProxyRequest.process(self)
class HTTPProxy(Proxy):
requestFactory = HTTPProxyRequest
factory = http.HTTPFactory()
factory.protocol = HTTPProxy
reactor.listenSSL(8001, factory)
reactor.run()
As this code demonstrates, for the sake of example for now I am just printing out whatever is going through the connection. Is it possible to handle HTTPS with the same classes? If not, how should I go about implementing such a thing?
If you want to connect to an HTTPS website via an HTTP proxy, you need to use the CONNECT HTTP verb (because that's how a proxy works for HTTPS). In this case, the proxy server simply connects to the target server and relays whatever is sent by the server back to the client's socket (and vice versa). There's no caching involved in this case (but you might be able to log the hosts you're connecting to).
The exchange will look like this (client to proxy):
C->P: CONNECT target.host:443 HTTP/1.0
C->P:
P->C: 200 OK
P->C:
After this, the proxy simply opens a plain socket to the target server (no HTTP or SSL/TLS yet) and relays everything between the initial client and the target server (including the TLS handshake that the client initiates). The client upgrades the existing socket it has to the proxy to use TLS/SSL (by starting the SSL/TLS handshake). Once the client has read the '200' status line, as far as the client is concerned, it's as if it had made the connection to the target server directly.
I'm not sure about twisted, but I want to warn you that if you implement a HTTPS proxy, a web browser will expect the server's SSL certificate to match the domain name in the URL (address bar). The web browser will issue security warnings otherwise.
There are ways around this, such as generating certificates on the fly, but you'd need the root certificate to be trusted on the browser.