I'm making a python proxy server for a school assignment and I've got the code below. When I run it in my command prompt and attempt to connect to google, the code doesn't make it past connecting the server socket, but the page still connects. I honestly have no idea why it doesn't even go through the connection step. Thoughts?
EDIT: And yeah there's been other homework posts about this but none of them seem to have addressed the fact the sys.exit() on line 8 ends the script (to my knowledge anyway) and whenever we comment it out, the script still does not get past connecting the server socket and hits the "illegal request" exception.
from socket import *
from urllib2 import HTTPError #Used for 404 Not Found error
import sys
import requests
if len(sys.argv) <= 1:
print 'Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server]'
#sys.exit(2)
#POST request extension
print 'Fetching webpage using POST'
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
print 'Printing webpage body'
print r.text
print 'Creating and binding socket for proxy server'
# Create a server socket, bind it to a port and start listening
tcpServerSock = socket(AF_INET, SOCK_STREAM)
# Fill in start.
tcpServerSock.bind(('',8888))
tcpServerSock.listen(10) #the number is the maximum number of connections we want to have
# Fill in end.
while 1:
# Start receiving data from the client
print 'Ready to serve...'
tcpClientSock, addr = tcpServerSock.accept()
print 'Received a connection from:', addr
# Fill in start.
message = tcpClientSock.recv(4096) #receive data with buffer size 4096
# Fill in end.
print 'Printing message'
print message
# Extract the filename from the given message
print message.split()[1]
filename = message.split()[1].partition("/")[2]
print '\n'
print 'Printing file name'
print filename
fileExist = "false"
filetouse = "/" + filename
print '\n'
print 'Printing file to use'
print filetouse
print '\n'
try:
# Check whether the file exist in the cache
f = open(filetouse[1:], "r")
outputdata = f.readlines()
fileExist = "true"
# ProxyServer finds a cache hit and generates a response message
tcpClientSock.send("HTTP/1.0 200 OK\r\n")
tcpClientSock.send("Content-Type:text/html\r\n")
# Fill in start.
for x in range(0,len(outputdata)):
tcpClientSock.send(outputdata[x])
# Fill in end.
print 'Read from cache\n'
# Error handling for file not found in cache
except IOError:
if fileExist == "false":
# Create a socket on the proxyserver
# Fill in start.
print 'Creating server socket\n'
c = socket(AF_INET, SOCK_STREAM)
# Fill in end.
hostn = filename
#hostn = filename.replace("www.","",1)
print 'Printing host to connect'
print hostn
print '\n'
print 'Attempting to connect to hostn\n'
try:
# Connect to the socket to port 80
# Fill in start.
c.connect((hostn,80)) #port 80 is used for http web pages
# Fill in end.
# Create a temporary file on this socket and ask port 80
# for the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET "+"http://" + filename + "HTTP/1.0\n\n")
# Show what request was made
print "GET "+"http://" + filename + " HTTP/1.0"
# Read the response into buffer
# Fill in start.
buff = fileobj.readlines() #reads until EOF and returns a list with the lines read
# Fill in end.
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
tmpFile = open("./" + filename,"wb") #creates the temp file for the requested file
# Fill in start.
for x in range(0, len(buff)):
tmpFile.write(buff[x]) #writes the buffer response into the temp file (cache?)
tcpClientSock.send(buff[x]) #sends the response saved in the buffer to the client
# Fill in end.
tmpFile.close()
except:
print "Illegal request\n"
else:
# HTTP response message for file not found
# Fill in start.
print 'File not found'
# Fill in end.
#404 not found error handling
except HTTPError as e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
# Close the client and the server sockets
tcpClientSock.close()
# Fill in start.
tcpServerSock.close()
# Fill in end
I'm aware this question is old, and Jose M's assignment is probably long past due.
if len(sys.argv) <= 1: checks for an additional argument that needs to be passed, which is the IP of the server. Commenting out the exit essentially removes the error checking.
A fix for the code above is to change line 20 from this tcpSerSock.bind(('', 8888)) to this tcpSerSock.bind((sys.argv[1], tcpSerPort))
You must then call the script correctly python ProxyServer.py 127.0.0.1.
Related
from socket import *
import sys
# Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM)
serverPort = 12000
tcpSerSock.bind(('', serverPort))
tcpSerSock.listen(1)
print ("Server ready")
while 1==1:
# Start receiving data from the client. e.g. request = "GET http://localhost:portNum/www.google.com"
tcpCliSock, addr = tcpSerSock.accept()
print ('Received a connection from:', addr)
request = str(tcpCliSock.recv(1024).decode())
print ("Requested " + request)
# Extract the file name from the given request
fileName = request.split()[1]
print ("File name is " + fileName)
fileExist = "false"
fileToUse = "/" + fileName
print ("File to use: " + fileToUse)
try:
# Check wether the file exist in the cache. The open will fail and go to "except" in case the file doesn't exist. Similar to try/catch in java
f = open(fileToUse[1:], "r")
outputData = f.readlines()
fileExist = "true"
# ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.1 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
tcpCliSock.send(outputData)
print ('This was read from cache')
except IOError:
if fileExist == "false":
# Create a socket on the proxyserver
c = socket(AF_INET, SOCK_STREAM)
hostn = fileName.replace("www.","",1) #max arg specified to 1 in case the webpage contains "www." other than the usual one
print (hostn)
try:
# Connect to the socket to port 80
c.bind(('', 80))
# Create a temporary file on this socket and ask port 80 for the file requested by the client
print("premake")
fileObj = c.makefile('r', 0)
print("postmake")
fileObj.write("GET " + "http://" + fileName + " HTTP/1.1\r\n")
# Read the response into buffer
print("post write")
buff = fileObj.readlines()
# Create a new file in the cache for the requested file.
tmpFile = open("./" + filename,"wb")
# Send the response in the buffer to both client socket and the corresponding file in the cache
for line in buff:
tmpFile.write(line)
tcpCliSock.send(tmpFile)
except:
print ("Illegal request")
break
else:
# HTTP response message for file not found
print("HTTP response Not found")
# Close the client and the server sockets
tcpCliSock.close()
#tcpSerSock.close()
The code never manages to execute the 'try' entered in 'except IOError'. The problem seems to be the socket.makefile(mode, buffsize) function, which has poor documentation for python 3. I tried passing 'rwb', 'r+', 'r+b' and so on to the function, but at most I would manage to create the file and be unable to write to it thereafter.
This is a python2.7 vs python3 issue. While makefile('r',0) works in python 2.7, you need makefile('r',None) in python3.
From the documentation for python2.7:
socket.makefile([mode[, bufsize]])
From the documentation for python3:
socket.makefile(mode='r', buffering=None, *, encoding=None, errors=None, newline=None)
I have a homework assignment which involves implementing a proxy cache server in Python. The idea is to write the web pages I access to temporary files on my local machine and then access them as requests come in if they are stored. Right now the code looks like this:
from socket import *
import sys
def main():
#Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM) #Initializing socket
tcpSerSock.bind(("", 8030)) #Binding socket to port
tcpSerSock.listen(5) #Listening for page requests
while True:
#Start receiving data from the client
print 'Ready to serve...'
tcpCliSock, addr = tcpSerSock.accept()
print 'Received a connection from:', addr
message = tcpCliSock.recv(1024)
print message
#Extract the filename from the given message
print message.split()[1]
filename = message.split()[1].partition("/")[2]
print filename
fileExist = "false"
filetouse = "/" + filename
print filetouse
try: #Check whether the file exists in the cache
f = open(filetouse[1:], "r")
outputdata = f.readlines()
fileExist = "true"
#ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.0 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
for data in outputdata:
tcpCliSock.send(data)
print 'Read from cache'
except IOError: #Error handling for file not found in cache
if fileExist == "false":
c = socket(AF_INET, SOCK_STREAM) #Create a socket on the proxyserver
hostn = filename.replace("www.","",1)
print hostn
try:
c.connect((hostn, 80)) #https://docs.python.org/2/library/socket.html
# Create a temporary file on this socket and ask port 80 for
# the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET " + "http://" + filename + "HTTP/1.0\r\n")
# Read the response into buffer
buffr = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket and the
# corresponding file in the cache
tmpFile = open(filename,"wb")
for data in buffr:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print "Illegal request"
else: #File not found
print "404: File Not Found"
tcpCliSock.close() #Close the client and the server sockets
main()
To test my code, I run the proxy cache on my localhost and set my browser proxy settings accordingly like so
However, when I run this code and try to access google with Chrome, I'm greeting with an error page saying err_empty_response.
Stepping through the code with the debugger made me realizing it's failing on this line
c.connect((hostn, 80))
and I have no idea why. Any help would be greatly appreciated.
P.S. I'm testing this with Google Chrome, Python 2.7, and Windows 10
You cannot use a name on connect. Connect expects an IP address to connect to.
You can get the socket information you need to build the connection using getaddrinfo(). In my pure-python-whois package I used the following code to create a connection:
def _openconn(self, server, timeout, port=None):
port = port if port else 'nicname'
try:
for srv in socket.getaddrinfo(server, port, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_ADDRCONFIG):
af, socktype, proto, _, sa = srv
try:
c = socket.socket(af, socktype, proto)
except socket.error:
c = None
continue
try:
if self.source_addr:
c.bind(self.source_addr)
c.settimeout(timeout)
c.connect(sa)
except socket.error:
c.close()
c = None
continue
break
except socket.gaierror:
return False
return c
Note that this isn't great code because the loop is actually there for nothing instead of using the different alternatives. You should only break the loop once you have established a connection. However, this should work as an illustration for using getaddrinfo()
EDIT:
You are also not cleaning your hostname correctly. I get /www.example.com/ when I try accessing http://www.example.com/ which obviously won't resolve. I'd suggest that you use a regular expression to get the file name for your cache.
I have created a proxy server that receives requests, searches for the requested file in its cache. If available it returns the cached file. If file is not available then it will ask the actual server, gets it, stores it in the cache and returns the file to the client.
Following is the code:
from socket import *
import sys
if len(sys.argv) <= 1:
print 'Usage : "python ProxyServer.py server_ip"\n[server_ip : It is the IP Address Of Proxy Server'
sys.exit(2)
# Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM)
tcpSerSock.bind((sys.argv[1], 8888))
tcpSerSock.listen(100)
while 1:
# Strat receiving data from the client
print 'Ready to serve...'
tcpCliSock, addr = tcpSerSock.accept()
print 'Received a connection from:', addr
message = tcpCliSock.recv(1024)
print message
# Extract the filename from the given message
print message.split()[1]
filename = message.split()[1].partition("/")[2]
print filename
fileExist = "false"
filetouse = "/" + filename
print filetouse
try:
# Check wether the file exist in the cache
f = open(filetouse[1:], "r")
outputdata = f.readlines()
fileExist = "true"
# ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.0 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
for i in range(0, len(outputdata)):
tcpCliSock.send(outputdata[i])
print 'Read from cache'
# Error handling for file not found in cache
except IOError:
if fileExist == "false":
# Create a socket on the proxyserver
c = socket(AF_INET, SOCK_STREAM)
hostn = filename.replace("www.","",1)
print hostn
try:
# Connect to the socket to port 80
c.connect((hostn, 80))
# Create a temporary file on this socket and ask port 80 for the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET "+"http://" + filename + " HTTP/1.0\n\n")
# Read the response into buffer
buff = fileobj.readlines()
# Create a new file in the cache for the requested file. Also send the response in the buffer to client socket and the corresponding file in the cache
tmpFile = open("./" + filename,"wb")
for line in buff:
tmpFile.write(line);
tcpCliSock.send(line);
except:
print "Illegal request"
else:
# HTTP response message for file not found
tcpCliSock.send("HTTP/1.0 404 sendErrorErrorError\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
tcpCliSock.send("\r\n")
# Close the client and the server sockets
tcpCliSock.close()
tcpSerSock.close()
But for every file I request I only get an "illegal request" message printed. There seems to be an issue that the proxy server actually is not able to retrieve the requested file by the client. Can someone tell me where I can improve the code.
This is the first time I am coding in Python so please mention any minor errors.
Your request is illegal. For normal http servers, GET must not contain a URL, but only the path. The rest of your proxy contains also many errors. You probably want to use sendall everywhere you use send. recv can receive less that one message, so you have to handle this case also.
Why do you use the strings "true" and "false" instead of True and False?
There is a security hole, as you can read any file on your computer through your proxy. Reading binary files won't work. You don't close opened files.
I'm making a proxy server using sockets. When the requested file is not in my current directory (cache), I do a http get request to the origin server (which is the www) and I cache it for later.
The problem with my code is that every time I get a resource from the www I cache it but the content of the file is always "Moved permanently".
So this is what happens: user requests "stackoverlflow.com" by entering "localhost:8080/stackoverflow.com" into the browser. The browser will return the page correctly. When the user enters "localhost:8080/stackoverflow.com" for a 2nd time in the browser, the browser will return a page saying that stackoverflow.com has moved permanently.
Here is the code of the method that does the http get request and the caching:
#staticmethod
def find_on_www(conn, requested_file):
try:
# Create a socket on the proxy server
print 'Creating socket on proxy server'
c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host_name = requested_file.replace("www.","",1)
print 'Host Name: ', host_name
# Connect to the socket to port 80
c.connect((host_name, 80))
print 'Socket connected to port 80 of the host'
# Create a temporary file on this socket and ask port 80
# for the file requested by the client
file_object = c.makefile('r', 0)
file_object.write("GET " + "http://" + requested_file + " HTTP/1.0\n\n")
# Read the response into buffer
buff = file_object.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
temp_file = open("./" + requested_file, "wb")
for i in range(0, len(buff)):
temp_file.write(buff[i])
conn.send(buff[i])
conn.close()
And here is the rest of my code, if anyone is interested:
import socket # Socket programming
import signal # To shut down server on ctrl+c
import time # Current time
import os # To get the last-modified
import mimetypes # To guess the type of requested file
import sys # To exit the program
from threading import Thread
def generate_header_lines(code, modified, length, mimetype):
""" Generates the header lines for the response message """
h = ''
if code == 200:
# Append status code
h = 'HTTP/1.1 200 OK\n'
# Append the date
# Append the name of the server
h += 'Server: Proxy-Server-Thomas\n'
# Append the date of the last modification to the file
h += 'Last-Modified: ' + modified + '\n'
elif code == 404:
# Append the status code
h = 'HTTP/1.1 404 Not Found\n'
# Append the date
h += 'Date: ' + time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime()) + '\n'
# Append the name of the web server
h += 'Server: Web-Server-Thomas\n'
# Append the length of the content
h += 'Content-Length: ' + str(length) + '\n'
# Append the type of the content
h += 'Content-Type: ' + mimetype + '\n'
# Append the connection closed - let the client know we close the connection
h += 'Connection: close\n\n'
return h
def get_mime_type(requested_file):
# Get the file's mimetype and encoding
try:
(mimetype, encoding) = mimetypes.guess_type(requested_file, True)
if not mimetype:
print "Mimetype found: text/html"
return 'text/html'
else:
print "Mimetype found: ", mimetype
return mimetype
except TypeError:
print "Mimetype found: text/html"
return 'text/html'
class WebServer:
def __init__(self):
"""
Constructor
:return:
"""
self.host = '' # Host for the server
self.port = 8000 # Port for the server
# Create socket
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def start_server(self):
""" Starts the server
:return:
"""
# Bind the socket to the host and port
self.socket.bind((self.host, self.port))
print "Connection started on ", self.port
# Start the main loop of the server - start handling clients
self.main_loop()
#staticmethod
def shutdown():
""" Shuts down the server """
try:
s.socket.close()
except Exception as e:
print "Something went wrong closing the socket: ", e
def main_loop(self):
"""Main loop of the server"""
while True:
# Start listening
self.socket.listen(1)
# Wait for a client to connect
client_socket, client_address = self.socket.accept()
# Wait for a request from the client
data = client_socket.recv(1024)
t = Thread(target=self.handle_request, args=(client_socket, data))
t.start()
# # Handle the request from the client
# self.handle_request(client_socket, data)
def handle_request(self, conn, data):
""" Handles a request from the client """
# Decode the data
string = bytes.decode(data)
# Split the request
requested_file = string.split(' ')
# Get the method that is requested
request_method = requested_file[0]
if request_method == 'GET':
# Get the part of the request that contains the name
requested_file = requested_file[1]
# Get the name of the file from the request
requested_file = requested_file[1:]
print "Searching for: ", requested_file
try:
# Open the file
file_handler = open(requested_file, 'rb')
# Get the content of the file
response_content = file_handler.read()
# Close the handler
file_handler.close()
# Get information about the file from the OS
file_info = os.stat(requested_file)
# Extract the last modified time from the information
time_modified = time.ctime(file_info[8])
# Get the time modified in seconds
modified_seconds = os.path.getctime(requested_file)
print "Current time: ", time.time()
print "Modified: ", time_modified
if (float(time.time()) - float(modified_seconds)) > 120: # more than 2 minutes
print "Time outdated!"
#self.find_on_www(conn, requested_file)
# Get the file's mimetype and encoding
mimetype = get_mime_type(requested_file)
print "Mimetype = ", mimetype
# Create the correct header lines
response_headers = generate_header_lines(200, time_modified, len(response_content), mimetype)
# Create the response to the request
server_response = response_headers.encode() + response_content
# Send the response back to the client
conn.send(server_response)
# Close the connection
conn.close()
except IOError: # Couldn't find the file in the cache - Go find file on www
print "Error: " + requested_file + " not found in cache!"
self.find_on_www(conn, requested_file)
#staticmethod
def find_on_www(conn, requested_file):
try:
# Create a socket on the proxy server
print 'Creating socket on proxy server'
c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host_name = requested_file.replace("www.","",1)
print 'Host Name: ', host_name
# Connect to the socket to port 80
c.connect((host_name, 80))
print 'Socket connected to port 80 of the host'
# Create a temporary file on this socket and ask port 80
# for the file requested by the client
file_object = c.makefile('r', 0)
file_object.write("GET " + "http://" + requested_file + " HTTP/1.0\n\n")
# Read the response into buffer
buff = file_object.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
temp_file = open("./" + requested_file, "wb")
for i in range(0, len(buff)):
temp_file.write(buff[i])
conn.send(buff[i])
conn.close()
except Exception as e:
# Generate a body for the file - so we don't have an empty page
response_content = "<html><body><p>Error 404: File not found</p></body></html>"
# Generate the correct header lines
response_headers = generate_header_lines(404, '', len(response_content), 'text/html')
# Create the response to the request
server_response = response_headers.encode() + response_content
# Send the response back to the client
conn.send(server_response)
# Close the connection
conn.close()
def shutdown_server(sig, dummy):
""" Shuts down the server """
# Shutdown the server
s.shutdown()
# exit the program
sys.exit(1)
# Shut down on ctrl+c
signal.signal(signal.SIGINT, shutdown_server)
# Create a web server
s = WebServer()
# Start the server
s.start_server()
The problem with your code is that when if you go to a page with that returns a status code of 301 page moved, it adds this to the header. When you view a page that is not stored in your cache, you copy the GET request that the proxy server makes straight to client. This will inform the client to make another GET request, which it makes ignoring your proxy server.
The second time you attempt to request the page through the proxy server, it retrieves the previous request from the cache. This file contains the headers from the previous request which correctly contain the redirect status code however you then add your own status code of 200 ok to the returned message. As the client reads this status code first it does not realise that you wish it to make another request to find the page that has been redirected. Therefore it just shows the page that tells you the page has moved.
What you need to do is parse the headers that are returned by the web server when the proxy server has to look at the actual page on the internet. Then depending on these server the correct headers back to the client.
Inserting a .send to send an OK message apparently makes the rest of the code not work?
If I remove the client.send messages from the following code, it works. But with it, nothing happens in the browser, checking in Firefox, it says that the request went through, but there isn't any page displayed... it's just blank. Why would .send messages cause nothing to happen?
from socket import *
server = socket(AF_INET, SOCK_STREAM)
port = 12030
server.bind((gethostname(), port))
server.listen(1)
while True:
print 'Ready to serve'
conection, addr = server.accept()
try:
print 'Working'
message = conection.recv(1024)
conection.send("HTTP/1.0 200 OK\r\n")
conection.send("Content-Type:text/html\r\n")
filename = message.split()[1]
print "FILENAME", filename
f = open(filename[1:]) #cuts off the '/' in the request page
outputdata = f.read()
print "OUTDATA: ", outputdata
for i in range(0, len(outputdata)):
conection.send(outputdata[i])
conection.close()
except IOError:
print 'IO ERROR'
conection.send("404 NOT FOUND")
print message
conection.close()
except KeyboardInterrupt:
server.close()
conection.close()
break;
As seen here, it doesn't affect the data stream at all..
user ##$$ python webServer.py
Ready to serve
Working
FILENAME /HelloWorld.html
OUTDATA: <html>Hello World</html>
Ready to serve