I'm trying to create a TCP socket server in Python that after receiving a string of bytes from a client passes the received data(without knowing what it's actually inside, assuming it's a valid HTTP request) to a HTTP or HTTPS proxy and waits for results, my code looks like this:
import socket
def test(host, port):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((socket.gethostbyname(host), int(port))
msg = """GET / HTTP/1.1
Host: www.bing.com
User-Agent: Firefox
"""
sent_count = sock.send(msg)
recv_value = sock.recv(2048)
print('recvieved:',)
print str(recv_value)
pass
if __name__ == '__main__':
test(host='x.x.x.xx', port='80') # a https proxy server
test(host='yy.yy.yy.yy', port='80') # a http proxy server
But when i connect to the HTTP proxy server it returns something like:
HTTP/1.1 404 Not Found
And when i connect to the HTTPS proxy server it shows something like:
HTTP/1.0 400 Bad Request
So wanted to ask if anybody know how could i send HTTP requests to HTTP/HTTPS servers via sockets in Python? or how can i redirect arbitrary strings of data toward HTTP/HTTPS proxy servers in general in Python using sockets?, any suggestions are very much appreciated, thanks in advance.
Related
For the HTTP 1.1 protocol, the connections are persistent (keep-alive).
The client should send Connection:close header attribute to close the connection.
In a Python program, this is the case for a GET request. However, a connection for a HEAD request is closed without the Connection:close header.
What is the issue?
I have also tested a Java version of a HEAD request, and the connection is persistent there.
Python program for a HEAD request:
#!/usr/bin/env python
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(("webcode.me" , 80))
s.sendall(b"HEAD / HTTP/1.1\r\nHost: webcode.me\r\nAccept: text/html\r\n\r\n")
print(str(s.recv(1024), 'utf-8'))
Python program for a GET request:
#!/usr/bin/env python
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(("webcode.me" , 80))
s.sendall(b"GET / HTTP/1.1\r\nHost: webcode.me\r\nAccept: text/html\r\nConnection: close\r\n\r\n")
# s.sendall(b"GET / HTTP/1.0\r\nHost: webcode.me\r\nAccept: text/html\r\n\r\n")
while True:
data = s.recv(512)
if not data:
break
print(data.decode())
For the HTTP 1.1 protocol, the connections are persistent (keep-alive)
No, the connections can be persistent if the server also wants them to be persistent. The server might decide to close the connection immediately, 5 seconds after ... or even never by its own if the client signals support for persistence.
However, a connection for a HEAD request is closed without the Connection:close header.
It is your client which is closing the connection, not the server. Your client does a single recv and then it is done with the socket and the program. If one would modify the code to continue with recv until no more data can be read then (similar to your second program) then the client would hang since the server is waiting for the new request from the client.
I'm creating a HTTP proxy in python but I'm having trouble in the fact that my proxy will only accept the webservers response and will completely ignore the browsers next request and the transfer of data just stops. Here's the code:
import socket
s = socket.socket()
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
bhost = '192.168.1.115'
port = 8080
s.bind((bhost, port))
s.listen(5)
def server(sock, data, host):
p = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
p.connect((host, 80))
p.send(data)
rdata = p.recv(1024)
print(rdata)
sock.send(rdata)
while True:
sock, addr = s.accept()
data = sock.recv(1024)
host = data.splitlines()[1][6:]
server(sock, data, host)`
Sorry about the code this is just a trial version and help will be much appreciated as I am only 14 and have much to learn :-)
Unfortunately I don't really see how your code should work, so I'm putting here my thoughts of how should a simple HTTP proxy look like.
So what should a basic proxy server do:
Accept connection from a client and receive an HTTP request.
Parse the request and extract its destination.
Forward requests and responses.
(optionally) Support Connection: keep-alive.
Let's go step by step and write some very simplified code.
How does proxy accepts a client. A socket should be created and moved to passive mode:
import socket, select
sock = socket.socket()
sock.bind((your_ip, port))
sock.listen()
while True:
client_sock = sock.accept()
do_stuff(client_sock)
Once the TCP connection is established, it's time receive a request. Let's assume we're going to get something like this:
GET /?a=1&b=2 HTTP/1.1
Host: localhost
User-Agent: my browser details
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
In TCP, message borders aren't preserved, so we should wait until we get at least first two lines (for GET request) in order to know what to do later:
def do_stuff(sock):
data = receive_two_lines(sock)
remote_host = parse_request(data)
After we have got the remote hostname, it's time to forward the requests and responses:
def do_stuff(client_sock):
data = receive_two_lines(client_sock)
remote_host = parse_request(data)
remote_ip = socket.getaddrinfo(remote_host) # see the docs for exact use
webserver = socket.socket()
webserver.connect((remote_ip, 80))
webserver.sendall(data)
while it_makes_sense():
client_ready = select.select([client_sock], [], [])[0]
web_ready = select.select([webserver], [], [])[0]
if client_ready:
webserver.sendall(client_sock.recv(1024))
if web_ready:
client_sock.sendall(webserver.recv(1024))
Please note select - this is how we know if a remote peer has sent us data. I haven't run and tested this code and there are thing left to do:
Chances are, you will get several GET requests in a single client_sock.recv(1024) call, because again, message borders aren't preserved in TCP. Probably, look additional get requests each time you receive data.
Request may differ for POST, HEAD, PUT, DELETE and other types of requests. Parse them accordingly.
Browsers and servers usually utilise one TCP connection by setting Connection: keep-alive option in the headers, but they also may decide to drop it. Be ready to detect disconnects and sockets closed by a remote peer (for simplicity sake, this is called while it_makes_sense() in the code).
bind, listen, accept, recv, send, sendall, getaddrinfo, select - all these functions can throw exceptions. It's better to catch them and act accordingly.
The code currently server one client at a time.
I am using python to write a simple web server, and sending requests to it. And I use libevent as my http client. But every time I send a keep-alive request, the http connection have the close callback before the success callback. I think it might be the keep-alive problem. And this is my python(server) code:
import socket
HOST, PORT = '', 8999
listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
listen_socket.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 60)
listen_socket.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 4)
listen_socket.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 15)
listen_socket.bind((HOST, PORT))
listen_socket.listen(1)
print 'Serving HTTP on port %s ...' % PORT
while True:
client_connection, client_address = listen_socket.accept()
request = client_connection.recv(1024)
print request
http_response = """\
HTTP/1.1 200 OK
Hello, World!
"""
client_connection.sendall(http_response)
client_connection.close()
But every time I send a keep-alive request, ...
I think you are mixing up the application layer HTTP keep-alive and the transport layer TCP keep-alive.
HTTP keep-alive is used by the client to suggest to the server that the underlying TCP connection should be kept open for further requests from the client. But the server might decline and your server explicitly closes the connection after it handled the clients request,i.e. finished sending the response. Apart from that the way the server sends the response in a way which makes HTTP keep-alive impossible because the length of the response is unknown and thus ends only with the end of the underlying TCP connection. To fix this you would need to specify a Content-length or use chunked transfer encoding.
TCP keep alive instead is used to detect break of connectivity, i.e. one side crashed, router dead or similar. It is not related to HTTP keep-alive at all except for the similar name. It is set with setsockopt and that's what you are doing. But there is no such thing as a keep-alive request which you can explicitly send in case of TCP keep-alive.
To begin with, I understand there are other modules such as Requests that would be better suited and simpler to use, but I want to use the socket module to better understand HTTP.
I have a simple script that does the following:
Client ---> HTTP Proxy ---> External Resource (GET Google.com)
I am able to connect to the HTTP proxy alright, but when I send the GET request headers for google.com to the proxy, it doesn't serve me any response at all.
#!/usr/bin/python
import socket
import sys
headers = """GET / HTTP/1.1\r\n
Host: google.com\r\n\r\n"""
socket = socket
host = "165.139.179.225" #proxy server IP
port = 8080 #proxy server port
try:
s = socket.socket()
s.connect((host,port))
s.send(("CONNECT {0}:{1} HTTP/1.1\r\n" + "Host: {2}: {3}\r\n\r\n").format(socket.gethostbyname(socket.gethostname()),1000,port,host))
print s.recv(1096)
s.send(headers)
response = s.recv(1096)
print response
s.close()
except socket.error,m:
print str(m)
s.close()
sys.exit(1)
To make a HTTP request to a proxy open a connection to the proxy server and then send a HTTP-proxy request. This request is mostly the same as the normal HTTP request, but contains the absolute URL instead of the relative URL, e.g.
> GET http://www.google.com HTTP/1.1
> Host: www.google.com
> ...
< HTTP response
To make a HTTPS request open a tunnel using the CONNECT method and then proceed inside this tunnel normally, that is do the SSL handshake and then a normal non-proxy request inside the tunnel, e.g.
> CONNECT www.google.com:443 HTTP/1.1
>
< .. read response to CONNECT request, must be 200 ...
.. establish the TLS connection inside the tunnel
> GET / HTTP/1.1
> Host: www.google.com
Python 3 requires the request to be encoded. Thus, expanding on David's original code, combined with Steffens answer, here is the solution written for Python 3:
def connectThroughProxy():
headers = """GET http://www.example.org HTTP/1.1
Host: www.example.org\r\n\r\n"""
host = "192.97.215.348" #proxy server IP
port = 8080 #proxy server port
try:
s = socket.socket()
s.connect((host,port))
s.send(headers.encode('utf-8'))
response = s.recv(3000)
print (response)
s.close()
except socket.error as m:
print (str(m))
s.close()
sys.exit(1)
This allows me to connect to the example.org host through my corporate proxy (at least for non SSL/TLS connections).
I am currently programming a proxy server using httplib,
and when I try to connect to HTTPS websites (such as facebook and google) my client sends me "CONNECT" requests that look like this:
CONNECT www.google.co.il:443 HTTP/1.1\r\n
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0\r\n
Proxy-Connection: keep-alive\r\n
Connection: keep-alive\r\n
Host: www.google.co.il:443\r\n
\r\n
I took a working proxy from the internet and put it on, then sniffed the network on wireshark, and the response to this request should look this way:
HTTP/1.1 200 Connection established\n
Proxy-agent: Python Proxy/0.1.0 Draft 1\n
\n
I noticed that the client sends the request to the proxy itself, so I decided to use socket, and send the response to the client in this way:
if getmethod(clientreq) is "CONNECT":
text="HTTP/1.1 200 Connection established\nProxy-Agent: THE BB Proxy\n\n"
client.send(text)
I really hoped that handling those "CONNECT" requests would be the solution and that my server will finally take care of HTTPS requests but it doesn't, and the response packets that I send to the client don't even appear on wireshark.
So my questions are:
1. What does the "CONNECT" method really do?
2. What else do I need except handling "CONNECT" method requests in order to communicate with a HTTPS servers?
I am replying after this long time because I recently worked with this concept. It may help others.
To work with CONNECT http method proxy need to create socket connection with the server's https port (ex. 443). Once connection is established you can send "HTTP/1.1 200 Connection established" as response.
After this client and server with communicate with each other through proxy. Proxy has to just transfer data from client socket to server socket and vice versa. Client and server will exchange certificate information for handshaking, once handshaking is done they will start sharing data in encrypted format so proxy will not be able to understand anything.
May the following code helps you.
def _read_write(self):
socs = [self.client, self.target]
count = 0
while 1:
count += 1
(recv, _, error) = select.select(socs, [], socs, 3)
if error:
break
if recv:
for in_ in recv:
data = in_.recv(BUFLEN)
if in_ is self.client:
out = self.target
else:
out = self.client
if data:
out.send(data)
print(data)
count = 0
if count == time_out_max:
break
Hope this answer helps anyone in need.
As I had to go through a lot of things to find this answer.
I met basically the same problem and the way I finally solved this is to look up for sample code on GitHub. It turns out that the proxy2 project is quite helpful. Some relevant code that is pretty similar to rushikesh's answer:
def connect_relay(self):
address = self.path.split(':', 1)
address[1] = int(address[1]) or 443
try:
s = socket.create_connection(address, timeout=self.timeout)
except Exception as e:
self.send_error(502)
return
self.send_response(200, 'Connection Established')
self.end_headers()
conns = [self.connection, s]
self.close_connection = 0
while not self.close_connection:
rlist, wlist, xlist = select.select(conns, [], conns, self.timeout)
if xlist or not rlist:
break
for r in rlist:
other = conns[1] if r is conns[0] else conns[0]
data = r.recv(8192)
if not data:
self.close_connection = 1
break
other.sendall(data)
You can find more information in the repo.