I was learning socket programming and tried to design a basic http client of mine. But somehow everything is going good but I am not receiving any data. Can you please tell me what am I missing?
CODE
import socket
def create_socket():
return socket.socket( socket.AF_INET, socket.SOCK_STREAM )
def remove_socket(sock):
sock.close()
del sock
sock = create_socket()
print "Connecting"
sock.connect( ('en.wikipedia.org', 80) )
print "Sending Request"
print sock.sendall ('''GET /wiki/List_of_HTTP_header_fields HTTP/1.1
Host: en.wikipedia.org
Connection: close
User-Agent: Web-sniffer/1.0.37 (+http://web-sniffer.net/)
Accept-Encoding: gzip
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7
Cache-Control: no-cache
Accept-Language: de,en;q=0.7,en-us;q=0.3
Referer: d_r_G_o_s
''')
print "Receving Reponse"
while True:
content = sock.recv(1024)
if content:
print content
else:
break
print "Completed"
OUTPUT
Connecting
Sending Request
298
Receving Reponse
Completed
While I was expecting it show me html content of homepage of wikipedia :'(
Also, it would be great if somebody can share some web resources / books where I can read in detail about python socket programming for HTTP Request Client
Thanks!
For a minimal HTTP client, you definitely shouldn't send Accept-Encoding: gzip -- the server will most likely reply with a gzipped response you won't be able to make much sense of by eye. :)
You aren't sending the final double \r\n (nor are you actually terminating your lines with \r\n as per the spec (unless you happen to develop on Windows with Windows line endings, but that's just luck and not programming per se).
Also, del sock there does not do what you think it does.
Anyway -- this works:
import socket
sock = socket.socket()
sock.connect(('en.wikipedia.org', 80))
for line in (
"GET /wiki/List_of_HTTP_header_fields HTTP/1.1",
"Host: en.wikipedia.org",
"Connection: close",
):
sock.send(line + "\r\n")
sock.send("\r\n")
while True:
content = sock.recv(1024)
if content:
print content
else:
break
EDIT: As for resources/books/reference -- for a reference HTTP client implementation, look at Python's very own httplib.py. :)
Related
I want to extract links from a website js. Using sockets, I'm trying to get the web JS but it always shows response header and not an actual JS/HTML. Here's what I'm using:
import socket
import ssl
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
cont = ssl.create_default_context()
sock.connect(('blog.clova.line.me', 443))
sock = cont.wrap_socket(sock, server_hostname = 'blog.clova.line.me')
sock.sendall('GET /hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js HTTP/1.1\r\nHost: blog.clova.line.me\r\n\r\n'.encode())
resp = sock.recv(2048)
print(resp.decode('utf-8'))
It returns only response header:
HTTP/1.1 200 OK
Date: Tue, 06 Sep 2022 12:02:38 GMT
Content-Type: application/javascript
Transfer-Encoding: chunked
Connection: keep-alive
CF-Ray: 74670e8b9b594c2f-SIN
Age: 3444278
...
I have tried the following:
Setting Content-Type: text/plain; charset=utf-8 header
Changing the header to GET https://blog.clova.line.me/hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js HTTP/1.1
Have been searching related, it's seems that: other people is able to achieve HTML data after response header are received, but for me; I only able to receive the headers and not the HTML data. Frankly, it's working on requests:
resp = requests.get('https://blog.clova.line.me/hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js')
print(resp.text)
How can I achieve similar result using socket? Honestly, I don't like using 3rd-party module that's why I'm not using requests.
The response is just truncated: sock.recv(2048) is reading just the first 2048 bytes. If you read more bytes, you will see the body after the headers.
Anyway, I wouldn't recommend doing that using such a low level library.
Honestly, I don't like
using 3rd-party module that's why I'm not using requests.
If your point is to stick to the python standard library, you can use urrlib.request which provides more abstraction than socket:
import urllib
req = urllib.request.urlopen('…')
print(req.read())
From documentation:
Now we come to the major stumbling block of sockets - send and recv
operate on the network buffers. They do not necessarily handle all the
bytes you hand them (or expect from them), because their major focus
is handling the network buffers. In general, they return when the
associated network buffers have been filled (send) or emptied (recv).
They then tell you how many bytes they handled. It is your
responsibility to call them again until your message has been
completely dealt with.
I've re-write your code and added a receive_all function, which handles the received bytes: (Of course it's a naive implementation)
import socket
import ssl
request_text = (
"GET /hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js "
"HTTP/1.1\r\nHost: blog.clova.line.me\r\n\r\n"
)
host_name = "blog.clova.line.me"
def receive_all(sock):
chunks: list[bytes] = []
while True:
chunk = sock.recv(2048)
if not chunk.endswith(b"0\r\n\r\n"):
chunks.append(chunk)
else:
break
return b"".join(chunks)
cont = ssl.create_default_context()
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.settimeout(5)
with cont.wrap_socket(sock, server_hostname=host_name) as ssock:
ssock.connect((host_name, 443))
ssock.sendall(request_text.encode())
resp = receive_all(ssock)
print(resp.decode("utf-8"))
I wrote a simple proxyserver code that connects to a client (currently the local machine) and retrieves data from a server (again running on the local machine) through HTTP GET requests that I incorporate into the curl command
Here's the curl command
curl -x http://localhost:12345 http://127.0.0.1:20000/1.txt
But despite making the connection, the curl command doesn't seem to extract the GET request. It doesn't seem to be a problem with the code I've written as it seems to work on other PCs, it just won't work on mine
Here's the data I received from the socket that is supposed to contain the GET request
listening to host: ('127.0.0.1', 58274)
GET http:/// HTTP/1.1
Host:
User-Agent: curl/7.55.1
Accept: */*
Proxy-Connection: Keep-Alive
And here's a snippet of the code I used to just make the the connection through the socket
import socket, sys, os
from thread import *
import operator
try:
listening_port = 12345
except KeyboardInterrupt:
print "\nUser Requested an interrupt"
print "Application Exiting ..."
sys.exit()
max_conn = 5
buffer_size = 40960
def main():
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('',listening_port))
s.listen(max_conn)
except Exception, e:
print "\nUnable to initialize socket"
sys.exit(2)
while 1:
try:
conn, addr = s.accept()
print "Connection with client established"
print "listening to host: ", addr
print conn.recv(1024)
except KeyboardInterrupt:
print "\nProxy Server shutting down ..."
sys.exit(1)
s.close()
Does anybody have any idea as to why this is happening ? I'm new to sockets and networking in general, I'd really appreciate it if I could get some help.
Thanks
I have written a web server in python and I want to send HTTP response message codes:400 instead of the response "Website Coming Soon!" on any client-request, please tell how can I do this.
The Server Code is:
import socket
import re
HOST = "localhost"
PORT = 13555
listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_socket.bind((HOST, PORT))
listen_socket.listen(1)
print ("Serving HTTP on port %s ..." % PORT)
while True:
client_connection, client_address = listen_socket.accept()
request = client_connection.recv(2048)
response = "Website Coming Soon!" #this response should be http response message code:400
http_response = "HTTP/1.1 200 OK\n"+"Content-Type: text/html\n"+"\n"+"<html><body>"+response+"</body></html>\n"
client_connection.sendall(http_response)
client_connection.close()
Try to get to know the protocol you're trying to speak :)
HTTP is fairly simple, all HTTP messages consist of 3 basic parts, of which the 3rd is optional:
The request or status line (first line)
The request headers, each on one line (or with some escaping spread over multiple), followed by an extra newline
The request body, which is optional for most requests, and for some responses.
What you want to do is change the "status line" in a response message. Since you want to send the 400 status code, the first line in your response should be
HTTP/1.1 400 Bad request
But there's two things wrong here:
You don't actually parse the request, so you can't really tell the client he's doing something wrong (all 4xx codes represent client errors)
Your sending the wrong message. Probably, what you want is something like 503 Service unavailable
Dive into the specs. They're really, really straight forward. And if you read it thoroughly, and start speaking HTTP the way it is intended, the world gets another tiny bit better ;)
In python 3.3, I want to get response headers from a youtube webpage. Using HTTP 1.0, the code below works fine:
import socket
PATH='/watch?v=GVIjOr98B7Q'
HOST='www.youtube.com'
buffer = bytes('HEAD %s HTTP/1.0\r\nHost: %s\r\n\r\n' %(PATH, HOST),'ascii')
PORT=80
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(buffer)
td=b''
while 1:
data = s.recv(1024)
td+=data
if not data:
break
However, if I replace HTTP 1.0 with HTTP 1.1:
buffer = bytes('HEAD %s HTTP/1.1\r\nHost: %s\r\n\r\n' %(PATH, HOST),'ascii')
and any other lines remain the same. It will stop at the while loop for a really long time (It is not looping but waiting for the end signal). Why does this happen?
The HTTP 1.1 keeps connections open unless you pass along the header Connection: close. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
If you want the connection to close immediately either use HTTP 1.0 or send the header Connection: close
I have created a server in python, and am trying to send a file over to a client when the file is requested. The server receives the request, but then I cannot send the file over through TCP.
I used a template to create a response header, and then I try to send the file afterward, but it does not entirely work. I am able to "send" .py and .html files over, and they do display in my browser, but it must be luck, because according to my TA, the real test is images... which are not working for me.
First I will post the header and response as shown by the Firefox addon Firebug, then my code, and lastly the error message.
Firebug Request and Response
----------------------------
Response Headersview source
Accept-Ranges bytes
Connection Keep-Alive (or Connection: close)Content-Type: text/html; charset=ISO-8859-1
Content-Length 10000
Keep-Alive timeout=10, max=100
Request Headersview source
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Host xxx.xxx.244.5:10000
User-Agent Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0
** My python code: **
#import socket module
from socket import *
serverSocket = socket(AF_INET, SOCK_STREAM)
#Prepare a server socket
serverPort = 10000
serverName = 'xxx.xxx.xxx.xx' #Laptop IP
serverSocket.bind((serverName,serverPort))
serverSocket.listen(5)
while True:
#Establish the connection
print 'Ready to serve...'
connectionSocket, addr = serverSocket.accept()
print addr
try:
message = connectionSocket.recv(4096)
filename = message.split()[1]
f = open(filename[1:])
outputdata = f.read()
f.close()
print 'length of output data: '
print len(outputdata)
print filename
print message
header = ("HTTP/1.1 200 OK\r\n"
"Accept-Ranges: bytes\r\n"
"Content-Length: 100000\r\n"
"Keep-Alive: timeout=10, max=100\r\n"
"Connection: Keep-Alive\r\n (or Connection: close)"
"Content-Type: text/html; charset=ISO-8859-1\r\n"
"\r\n")
connectionSocket.send(header)
#Send the content of the requested file to the client
for i in range(0, len(outputdata)):
connectionSocket.sendall(outputdata[i])
connectionSocket.close()
print '\ntry code has executed\n'
except IOError:
print 'exception code has been executed'
connectionSocket.send('HTTP/1.1 404 Not found: The requested document does not exist on this server.')
connectionSocket.send('If you can read this, then the exception code has run')
print '\tconnectionSocket.send has executed'
connectionSocket.close()
print '\tconnectionSocket.close has executed\n'
#serverSocket.close()
And here is the error message:
This image "http://xxx.xxx.244.5:10000/kitty.jpg" cannot be displayed because it contains errors.
Thank you in advance!
Open your JPEG file in binary mode: open(filename[1:], "rb"). Otherwise Python will helpfully translate some bytes in the file to \n characters, which will corrupt the image and prevent the browser from being able to make any sense of it.
Also, you should use a Content-Type of image/jpeg for a JPEG image, rather than text/html, although your browser seems to have figured out that it's a JPEG anyway.