I'm sending raw HTTP headers to a website, and I want to detect errors such as 400 Bad Request or 404 Not Found manually without using urllib or Requests package. I'm sending a HEAD request like this:
head_request = "HEAD " + url_path + " HTTP/1.1\nHost: %s\r\n\r\n" % (host)
socket_id = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket_id.connect((host, 80))
socket_id.send(head_request)
recv_head = socket_id.recv(1024)
How should I manually catch Exceptions?
One way is to manually search for the HTTP response using a regular expression.
Another way is to port what you need from the http_parser.c module from the http-parser project.
It can be downloaded from here: https://pypi.python.org/pypi/http-parser/
You can parse the HTTP response using http-parser which works on the socket level.
Here is the description:
http-parser provide you parser.HttpParser low-level parser in C that you can access in your python program and http.HttpStream providing higher-level access to a readable,sequential io.RawIOBase object.
Here is how you can parse the HTTP response using sockets in Python in the manner according to the example you gave:
https://github.com/benoitc/http-parser/tree/master/http_parser
def main():
p = HttpParser()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
body = []
try:
s.connect(('gunicorn.org', 80))
s.send("GET / HTTP/1.1\r\nHost: gunicorn.org\r\n\r\n")
while True:
data = s.recv(1024)
if not data:
break
recved = len(data)
nparsed = p.execute(data, recved)
assert nparsed == recved
if p.is_headers_complete():
print p.get_headers()
if p.is_partial_body():
body.append(p.recv_body())
if p.is_message_complete():
break
print "".join(body)
finally:
s.close()
Related
My goal is to send a Python GET request to a server https://jsonplaceholder.typicode.com/todos/1 using the python socket module. I do not want to use any of the other modules/libraries like "requests" or "urllib". I'm just have trouble understanding where to use the /todos/1 in my code.
import ssl
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('jsonplaceholder.typicode.com', 443))
s = ssl.wrap_socket(s, keyfile=None, certfile=None, server_side=False, cert_reqs=ssl.CERT_NONE, ssl_version=ssl.PROTOCOL_SSLv23)
s.sendall(b"GET / HTTP/1.1\r\nHost: jsonplaceholder.typicode.com\r\nConnection: close\r\n\r\n")
while True:
new = s.recv(4096)
if not new:
s.close()
break
print (new)
My end-goal will be to use what I learned from this exercise, to send a request to CouchDB database from a micro controller with Micropython installed, and be able to see the response headers, so that I can retrieve the returned cookie. Micropython's urequests does not show return headers.
I'm attempting to create a simple threaded server in python. When making a call in a browser it seems that two requests are made, which cause the script to error.
I can add additional checks to urlparse but I would prefer to prevent this second call from occurring.
I wondered if it was my PHP script, so have used postman to send the request and still have the same issue.
Update: On adding a break after the final send the loop obviously stops. Whats odd is the first request seems to fail then every other request the client has the correct data.
class ClientThread(Thread):
def __init__(self,ip,port):
Thread.__init__(self)
self.ip = ip
self.port = port
def run(self):
while True :
data = conn.recv(2048)
// it errors here stating index out of range.
// I can add checks, but would like to prevent/not detect this seccond request
parsed_url = urlparse(data.split()[1].decode("utf-8"))
dctParams = parse_qs(parsed_url.query)
if "ProductID" in dctParams:
sProductID = dctParams.get("ProductID")[0]
else:
print("Unable to find key. Continuing")
break #continue
print("Requested ProductID: ", sProductID, "found. Attempting to find matches")
bSuccess = True
try:
aOut = data_matrix.loc[sProductID].nlargest(10)
print("Request satisfied")
print(aOut)
print("\n\n")
except:
bSuccess = False
print("Error Finding product in data set")
byt = ""
if bSuccess:
byt = json.dumps({'Success': True, 'Data': aOut.to_dict()}).encode("utf-8")
else:
byt = json.dumps({'Success': False, 'Data': ''}).encode("utf-8")
# send headers
conn.send('HTTP/1.0 200 OK\r\n'.encode("utf-8"))
conn.send("Content-Type: application/json\r\n".encode("utf-8"))
sLenth = "Content-Length: " + str(len(byt))
sLenth += "\r\n\r\n"
conn.send(sLenth.encode("utf-8"))
# send the acutal json
conn.send(byt)
# Multithreaded Python server : TCP Server Socket Program Stub
TCP_IP = '0.0.0.0'
TCP_PORT = 12345
BUFFER_SIZE = 20 # Usually 1024, but we need quick response
tcpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpServer.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
tcpServer.bind((TCP_IP, TCP_PORT))
threads = []
while True:
tcpServer.listen(4)
print("Multithreaded Python server : Waiting for connections from TCP clients...\n\r")
(conn, (ip,port)) = tcpServer.accept()
newthread = ClientThread(ip,port)
newthread.start()
threads.append(newthread)
for t in threads:
t.join()
What are the two requests? No, I'll tell you one of them is for the index and the other is for the favorite icon.
You can not prevent the browser from making any potential unwanted requests all you can do is respond to them with the appropriate response code.
So I was reading about these partial GET requests that would make a server timeout the connection after a while on this request. How would send a partial GET request?..
import socket, sys
host = sys.argv[1]
request = "GET / HTTP/1.1\nHost: "+host+"\n\nUser-Agent:Mozilla 5.0\n" #How could I make this into a partial request?
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, 80))
s.sendto(request, (host, 80))
response = s.recv(1024)
How would I do this?
I think you confuse partial and incomplete requests:
partial: request for some part of a resource, that is a Range request like shown in the answer of falsetru. This will not cause a timeout but instead a response with code 206 and your requested part of the resource.
incomplete: your request is incomplete and cannot be processed by the server, thus it will wait for the rest of the request and timeout after a while if it does not get the request. In your question you already have such an incomplete request because you did not finish you request properly (it must end with \r\n\r\n and not a single \n). Other ways are just a TCP connect without sending any data or doing a POST request with a content-length and then not sending as much data as specified in the request header.
The HTTP headers ends too early. (\n\n should come after headers, before the contents)
import socket, sys
host = sys.argv[1]
request = "GET / HTTP/1.1\nHost: "+host+"\nUser-Agent:Mozilla 5.0\n\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, 80))
s.send(request)
response = s.recv(1024)
If you mean partial content retrieval, you can speicfy Range header:
"GET / HTTP/1.1\nHost: "+host+"\nUser-Agent:Mozilla 5.0\rRange: bytes=0-999\n\n"
NOTE
It should be \r\n not \n as line end, even if most (but not all) servers accept \n too.
The following code doesn't output anything(why?).
#!/usr/bin/python
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.python.org" , 80))
print s.recv(4096)
s.close()
What do I have to change in order to output the source code of the python website as you would see when you go to view source in a browser?
HTTP is request/response protocol. You're not sending any request, thus you're not getting any response.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.python.org" , 80))
s.sendall("GET /\r\n") # you're missing this line
print s.recv(4096)
s.close()
Of course that will do the most raw HTTP/1.0 request, without handling HTTP errors, HTTP redirects, etc. I would not recommend it for actual usage beyond doing it as an exercise to familiarize yourself with socket programming and HTTP.
For HTTP Python provides few built in modules: httplib (bit lower level), urllib and urllib2 (high level ones).
You'll get a redirect (302) unless you use the full URL in your request.
Try this instead:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.python.org" , 80))
s.sendall("GET http://www.python.org HTTP/1.0\n\n")
print s.recv(4096)
s.close()
Of course if you just want the content of a URL this is far simpler. :)
print urllib2.urlopen('http://www.python.org').read()
I get the html with
def steal_html():
url='https://some_website.org'
with open('index.html', 'w') as FILE:
html = requests.get(url).text
FILE.write(html)
Following is the code which listens on a port for HTTP requests and sends the request packet to the server running on port 80, gets the response and sends the data back to the client. Now, everything is executing fine but the following line of code :
data = req_soc.recv(1024)
is taking too much time to execute and I have observed that, it takes long time to execute when it is going to/has received the last packet. I have also tried the same code using select.select() but the results are the same. Since I want to handle the data (raw) that is coming from the client and the actual HTTP server, I have no other choice than using sockets.
import socket
import thread
def handle_client(client):
data = client.recv(512)
request = ''
request += data
print data
print '-'*20
spl = data.split("\r\n")
print spl[0]
print spl[1]
if len(request):
req_soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
req_soc.connect(('localhost', 80))
req_soc.send(request)
response = ''
data = req_soc.recv(1024)
while data:
response += data
print 1
data = req_soc.recv(1024)
req_soc.close()
print response
if len(response):
client.send(response)
client.close()
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('localhost', 4422))
server.listen(5)
print("Server is running...\n")
MSGLEN = 1024
while 1:
client, address = server.accept()
thread.start_new_thread(handle_client, (client, ))
Clients can do multiple commands (eg: GET) within one connection. You cannot wait for the client to send all the commands because based on what you return it could request more (eg: images of a web page). You have to parse the parts (commands) of request, find the boundary, forward that request to the server and write back the answer to the client. All this in a way that doesn't block on reading the client.
I'm not sure what's the best way to do this in python, but if you spend 5 minutes of googling you'll find a perfect HTTP proxy library.