Cache Proxy Server in Python - python

I have a homework assignment which involves implementing a proxy cache server in Python. The idea is to write the web pages I access to temporary files on my local machine and then access them as requests come in if they are stored. Right now the code looks like this:
from socket import *
import sys
def main():
#Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM) #Initializing socket
tcpSerSock.bind(("", 8030)) #Binding socket to port
tcpSerSock.listen(5) #Listening for page requests
while True:
#Start receiving data from the client
print 'Ready to serve...'
tcpCliSock, addr = tcpSerSock.accept()
print 'Received a connection from:', addr
message = tcpCliSock.recv(1024)
print message
#Extract the filename from the given message
print message.split()[1]
filename = message.split()[1].partition("/")[2]
print filename
fileExist = "false"
filetouse = "/" + filename
print filetouse
try: #Check whether the file exists in the cache
f = open(filetouse[1:], "r")
outputdata = f.readlines()
fileExist = "true"
#ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.0 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
for data in outputdata:
tcpCliSock.send(data)
print 'Read from cache'
except IOError: #Error handling for file not found in cache
if fileExist == "false":
c = socket(AF_INET, SOCK_STREAM) #Create a socket on the proxyserver
hostn = filename.replace("www.","",1)
print hostn
try:
c.connect((hostn, 80)) #https://docs.python.org/2/library/socket.html
# Create a temporary file on this socket and ask port 80 for
# the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET " + "http://" + filename + "HTTP/1.0\r\n")
# Read the response into buffer
buffr = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket and the
# corresponding file in the cache
tmpFile = open(filename,"wb")
for data in buffr:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print "Illegal request"
else: #File not found
print "404: File Not Found"
tcpCliSock.close() #Close the client and the server sockets
main()
To test my code, I run the proxy cache on my localhost and set my browser proxy settings accordingly like so
However, when I run this code and try to access google with Chrome, I'm greeting with an error page saying err_empty_response.
Stepping through the code with the debugger made me realizing it's failing on this line
c.connect((hostn, 80))
and I have no idea why. Any help would be greatly appreciated.
P.S. I'm testing this with Google Chrome, Python 2.7, and Windows 10

You cannot use a name on connect. Connect expects an IP address to connect to.
You can get the socket information you need to build the connection using getaddrinfo(). In my pure-python-whois package I used the following code to create a connection:
def _openconn(self, server, timeout, port=None):
port = port if port else 'nicname'
try:
for srv in socket.getaddrinfo(server, port, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_ADDRCONFIG):
af, socktype, proto, _, sa = srv
try:
c = socket.socket(af, socktype, proto)
except socket.error:
c = None
continue
try:
if self.source_addr:
c.bind(self.source_addr)
c.settimeout(timeout)
c.connect(sa)
except socket.error:
c.close()
c = None
continue
break
except socket.gaierror:
return False
return c
Note that this isn't great code because the loop is actually there for nothing instead of using the different alternatives. You should only break the loop once you have established a connection. However, this should work as an illustration for using getaddrinfo()
EDIT:
You are also not cleaning your hostname correctly. I get /www.example.com/ when I try accessing http://www.example.com/ which obviously won't resolve. I'd suggest that you use a regular expression to get the file name for your cache.

Related

Basic python client/server that reads HTML body from server.

I am writing a very simple Python socket program to read an HTML body from the server. If I create a HelloWorld.html file and open it with the designated host and port, I can open the file in my browser with the following server and read the message in the HTML file. However, I am having trouble reading in the same information from my client.
Server
from socket import *
serverSocket = socket(AF_INET,SOCK_STREAM)
host = '127.0.0.1'
port = 6789
serverSocket.bind((host,port))
serverSocket.listen(5)
print("server started...")
(connectionSocket, addr) = serverSocket.accept()
try:
message = connectionSocket.recv(1024).decode()
filename = message.split()[1]
f = open(filename[1:]) # Throws IOError if file not found
print(filename, "found")
connectionSocket.send("HTTP/1.0 200 OK\r\n".encode())
connectionSocket.send("Content-Type: text/html\r\n".encode())
connectionSocket.send(message.encode())
outputdata = f.read()
for i in range(0, len(outputdata)):
connectionSocket.send(outputdata[i].encode())
connectionSocket.send("\r\n".encode())
connectionSocket.close()
print(filename, "delivered")
except IOError:
print(filename, "NOT found")
connectionSocket.send('HTTP/1.0 404 NOT FOUND\r\n')
connectionSocket.close()
print("file not found message delivered")
serverSocket.close()
print("server closed...")
My server seems to be working. However, when my client tries to send the HTML object path to the socket and have the server read it, it does not seem to be picking up the message. I have just started socket programming in Python and I am trying to understand how the server receives the message from the socket. My initial thought was if I send the path of the HTML object (located in same directory as client and server) to the socket, the server should be able to read that information, open it, and return the information to the client.
Client
from socket import *
import sys
client = socket(AF_INET, SOCK_STREAM)
host = sys.argv[1]
port = sys.argv[2]
obj = sys.argv[3]
port = int(port)
client.connect((host, port))
print(client.getsockname())
request = obj
client.send("hello".encode())
client.send(request.encode())
s = client.recv(1024).decode()
print(s)
For my client, I accept the host,port, and the path to the HTML from a commandline argument and establish a connection.
When I run the browser for my HTML file with the url http://127.0.0.1:6789/HelloWorld.html, the server responds well. However, when I run the server and run the client with the command py capClient.py 127.0.0.1 6789 HelloWorld.html on the shell, it returns the filename = message.split()[1] IndexError: list index out of range error. I am assuming that this problem is coming from the server not being able to split the message coming in from the connectionSocket as an acceptable HTML object path.
What are some tips on modifying the client code to receive HTML file from servers?
This trouble is because you await message string is 'hello HelloWorld.html', but it is 'helloHelloWorld.html' and split() get you list ['helloHelloWorld.html'] where index 1 not exists.
f = open(filename[1:])
# must be replaced with [1:] give you a list, not string
f = open(filename)
# there needs encode()
connectionSocket.send('HTTP/1.0 404 NOT FOUND\r\n'.encode())

Python proxy server fails to connect to host

I'm making a python proxy server for a school assignment and I've got the code below. When I run it in my command prompt and attempt to connect to google, the code doesn't make it past connecting the server socket, but the page still connects. I honestly have no idea why it doesn't even go through the connection step. Thoughts?
EDIT: And yeah there's been other homework posts about this but none of them seem to have addressed the fact the sys.exit() on line 8 ends the script (to my knowledge anyway) and whenever we comment it out, the script still does not get past connecting the server socket and hits the "illegal request" exception.
from socket import *
from urllib2 import HTTPError #Used for 404 Not Found error
import sys
import requests
if len(sys.argv) <= 1:
print 'Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server]'
#sys.exit(2)
#POST request extension
print 'Fetching webpage using POST'
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
print 'Printing webpage body'
print r.text
print 'Creating and binding socket for proxy server'
# Create a server socket, bind it to a port and start listening
tcpServerSock = socket(AF_INET, SOCK_STREAM)
# Fill in start.
tcpServerSock.bind(('',8888))
tcpServerSock.listen(10) #the number is the maximum number of connections we want to have
# Fill in end.
while 1:
# Start receiving data from the client
print 'Ready to serve...'
tcpClientSock, addr = tcpServerSock.accept()
print 'Received a connection from:', addr
# Fill in start.
message = tcpClientSock.recv(4096) #receive data with buffer size 4096
# Fill in end.
print 'Printing message'
print message
# Extract the filename from the given message
print message.split()[1]
filename = message.split()[1].partition("/")[2]
print '\n'
print 'Printing file name'
print filename
fileExist = "false"
filetouse = "/" + filename
print '\n'
print 'Printing file to use'
print filetouse
print '\n'
try:
# Check whether the file exist in the cache
f = open(filetouse[1:], "r")
outputdata = f.readlines()
fileExist = "true"
# ProxyServer finds a cache hit and generates a response message
tcpClientSock.send("HTTP/1.0 200 OK\r\n")
tcpClientSock.send("Content-Type:text/html\r\n")
# Fill in start.
for x in range(0,len(outputdata)):
tcpClientSock.send(outputdata[x])
# Fill in end.
print 'Read from cache\n'
# Error handling for file not found in cache
except IOError:
if fileExist == "false":
# Create a socket on the proxyserver
# Fill in start.
print 'Creating server socket\n'
c = socket(AF_INET, SOCK_STREAM)
# Fill in end.
hostn = filename
#hostn = filename.replace("www.","",1)
print 'Printing host to connect'
print hostn
print '\n'
print 'Attempting to connect to hostn\n'
try:
# Connect to the socket to port 80
# Fill in start.
c.connect((hostn,80)) #port 80 is used for http web pages
# Fill in end.
# Create a temporary file on this socket and ask port 80
# for the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET "+"http://" + filename + "HTTP/1.0\n\n")
# Show what request was made
print "GET "+"http://" + filename + " HTTP/1.0"
# Read the response into buffer
# Fill in start.
buff = fileobj.readlines() #reads until EOF and returns a list with the lines read
# Fill in end.
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
tmpFile = open("./" + filename,"wb") #creates the temp file for the requested file
# Fill in start.
for x in range(0, len(buff)):
tmpFile.write(buff[x]) #writes the buffer response into the temp file (cache?)
tcpClientSock.send(buff[x]) #sends the response saved in the buffer to the client
# Fill in end.
tmpFile.close()
except:
print "Illegal request\n"
else:
# HTTP response message for file not found
# Fill in start.
print 'File not found'
# Fill in end.
#404 not found error handling
except HTTPError as e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
# Close the client and the server sockets
tcpClientSock.close()
# Fill in start.
tcpServerSock.close()
# Fill in end
I'm aware this question is old, and Jose M's assignment is probably long past due.
if len(sys.argv) <= 1: checks for an additional argument that needs to be passed, which is the IP of the server. Commenting out the exit essentially removes the error checking.
A fix for the code above is to change line 20 from this tcpSerSock.bind(('', 8888)) to this tcpSerSock.bind((sys.argv[1], tcpSerPort))
You must then call the script correctly python ProxyServer.py 127.0.0.1.

socket.makefile issues in python 3 while creating a http proxy

from socket import *
import sys
# Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM)
serverPort = 12000
tcpSerSock.bind(('', serverPort))
tcpSerSock.listen(1)
print ("Server ready")
while 1==1:
# Start receiving data from the client. e.g. request = "GET http://localhost:portNum/www.google.com"
tcpCliSock, addr = tcpSerSock.accept()
print ('Received a connection from:', addr)
request = str(tcpCliSock.recv(1024).decode())
print ("Requested " + request)
# Extract the file name from the given request
fileName = request.split()[1]
print ("File name is " + fileName)
fileExist = "false"
fileToUse = "/" + fileName
print ("File to use: " + fileToUse)
try:
# Check wether the file exist in the cache. The open will fail and go to "except" in case the file doesn't exist. Similar to try/catch in java
f = open(fileToUse[1:], "r")
outputData = f.readlines()
fileExist = "true"
# ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.1 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
tcpCliSock.send(outputData)
print ('This was read from cache')
except IOError:
if fileExist == "false":
# Create a socket on the proxyserver
c = socket(AF_INET, SOCK_STREAM)
hostn = fileName.replace("www.","",1) #max arg specified to 1 in case the webpage contains "www." other than the usual one
print (hostn)
try:
# Connect to the socket to port 80
c.bind(('', 80))
# Create a temporary file on this socket and ask port 80 for the file requested by the client
print("premake")
fileObj = c.makefile('r', 0)
print("postmake")
fileObj.write("GET " + "http://" + fileName + " HTTP/1.1\r\n")
# Read the response into buffer
print("post write")
buff = fileObj.readlines()
# Create a new file in the cache for the requested file.
tmpFile = open("./" + filename,"wb")
# Send the response in the buffer to both client socket and the corresponding file in the cache
for line in buff:
tmpFile.write(line)
tcpCliSock.send(tmpFile)
except:
print ("Illegal request")
break
else:
# HTTP response message for file not found
print("HTTP response Not found")
# Close the client and the server sockets
tcpCliSock.close()
#tcpSerSock.close()
The code never manages to execute the 'try' entered in 'except IOError'. The problem seems to be the socket.makefile(mode, buffsize) function, which has poor documentation for python 3. I tried passing 'rwb', 'r+', 'r+b' and so on to the function, but at most I would manage to create the file and be unable to write to it thereafter.
This is a python2.7 vs python3 issue. While makefile('r',0) works in python 2.7, you need makefile('r',None) in python3.
From the documentation for python2.7:
socket.makefile([mode[, bufsize]])
From the documentation for python3:
socket.makefile(mode='r', buffering=None, *, encoding=None, errors=None, newline=None)

Python socket programming for Webserver

I have created a proxy server that receives requests, searches for the requested file in its cache. If available it returns the cached file. If file is not available then it will ask the actual server, gets it, stores it in the cache and returns the file to the client.
Following is the code:
from socket import *
import sys
if len(sys.argv) <= 1:
print 'Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server'
sys.exit(2)
# Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM)
tcpSerSock.bind((sys.argv[1], 8888))
tcpSerSock.listen(100)
while 1:
# Strat receiving data from the client
print 'Ready to serve...'
tcpCliSock, addr = tcpSerSock.accept()
print 'Received a connection from:', addr
message = tcpCliSock.recv(1024)
print message
# Extract the filename from the given message
print message.split()[1]
filename = message.split()[1].partition("/")[2]
print filename
fileExist = "false"
filetouse = "/" + filename
print filetouse
try:
# Check wether the file exist in the cache
f = open(filetouse[1:], "r")
outputdata = f.readlines()
fileExist = "true"
# ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.0 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
for i in range(0, len(outputdata)):
tcpCliSock.send(outputdata[i])
print 'Read from cache'
# Error handling for file not found in cache
except IOError:
if fileExist == "false":
# Create a socket on the proxyserver
c = socket(AF_INET, SOCK_STREAM)
hostn = filename.replace("www.","",1)
print hostn
try:
# Connect to the socket to port 80
c.connect((hostn, 80))
# Create a temporary file on this socket and ask port 80 for the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET "+"http://" + filename + " HTTP/1.0\n\n")
# Read the response into buffer
buff = fileobj.readlines()
# Create a new file in the cache for the requested file. Also send the response in the buffer to client socket and the corresponding file in the cache
tmpFile = open("./" + filename,"wb")
for line in buff:
tmpFile.write(line);
tcpCliSock.send(line);
except:
print "Illegal request"
else:
# HTTP response message for file not found
tcpCliSock.send("HTTP/1.0 404 sendErrorErrorError\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
tcpCliSock.send("\r\n")
# Close the client and the server sockets
tcpCliSock.close()
tcpSerSock.close()
But for every file I request I only get an "illegal request" message printed. There seems to be an issue that the proxy server actually is not able to retrieve the requested file by the client. Can someone tell me where I can improve the code.
This is the first time I am coding in Python so please mention any minor errors.
Your request is illegal. For normal http servers, GET must not contain a URL, but only the path. The rest of your proxy contains also many errors. You probably want to use sendall everywhere you use send. recv can receive less that one message, so you have to handle this case also.
Why do you use the strings "true" and "false" instead of True and False?
There is a security hole, as you can read any file on your computer through your proxy. Reading binary files won't work. You don't close opened files.

Python WebServer breaks with .send message?

Inserting a .send to send an OK message apparently makes the rest of the code not work?
If I remove the client.send messages from the following code, it works. But with it, nothing happens in the browser, checking in Firefox, it says that the request went through, but there isn't any page displayed... it's just blank. Why would .send messages cause nothing to happen?
from socket import *
server = socket(AF_INET, SOCK_STREAM)
port = 12030
server.bind((gethostname(), port))
server.listen(1)
while True:
print 'Ready to serve'
conection, addr = server.accept()
try:
print 'Working'
message = conection.recv(1024)
conection.send("HTTP/1.0 200 OK\r\n")
conection.send("Content-Type:text/html\r\n")
filename = message.split()[1]
print "FILENAME", filename
f = open(filename[1:]) #cuts off the '/' in the request page
outputdata = f.read()
print "OUTDATA: ", outputdata
for i in range(0, len(outputdata)):
conection.send(outputdata[i])
conection.close()
except IOError:
print 'IO ERROR'
conection.send("404 NOT FOUND")
print message
conection.close()
except KeyboardInterrupt:
server.close()
conection.close()
break;
As seen here, it doesn't affect the data stream at all..
user ##$$ python webServer.py
Ready to serve
Working
FILENAME /HelloWorld.html
OUTDATA: <html>Hello World</html>
Ready to serve

Categories