How to download image from HTTP server [Python/sockets]

How to download image from HTTP server [Python/sockets] - python

I want to download an example image from a HTTP server using methods defined in HTTP protocol (and socket's, of course).
I tried to implement it, but it seems that my code does not download the whole image, no matter if I have the while loop or not.
An example image is here: https://httpbin.org/image/png.
My code downloads only part of the image, and I do not know how to fix it. I do not want use any libraries, such as urllib, I want to use just the sockets.
Any ideas?
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('httpbin.org', 80))
s.sendall('GET /image/png HTTP/1.1\r\nHOST: httpbin.org\r\n\r\n')
reply = ""
while True:
data = s.recv(2048)
if not data: break
reply += data
# get image size
size = -1
tmp = reply.split('\r\n')
for line in tmp:
if "Content-Length:" in line:
size = int(line.split()[1])
break
headers = reply.split('\r\n\r\n')[0]
image = reply.split('\r\n\r\n')[1]
# save image
f = open('image.png', 'wb')
f.write(image)
f.close()

You are doing a HTTP/1.1 request. This HTTP version implicitly behaves like Connection: keep-alive was set. This means that the server might not close the TCP connection immediately after sending the response as you expect in your code but might keep the connection open to wait for more HTTP requests.
When replacing the version with HTTP/1.0 instead the server closes the connection after the request is done and the image is complete because HTTP/1.0 implies Connection: close.
Apart from that: HTTP is way more complex than you might think. Please don't just design your code after some example messages you've seen somewhere but actually read and follow the standards if you really want to implement HTTP yourself.

import socket
import select
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('httpbin.org', 80))
s.sendall(b'GET /image/png HTTP/1.1\r\nHOST: httpbin.org\r\n\r\n')
reply = b''
while select.select([s], [], [], 3)[0]:
data = s.recv(2048)
if not data: break
reply += data
headers = reply.split(b'\r\n\r\n')[0]
image = reply[len(headers)+4:]
# save image
f = open('image.png', 'wb')
f.write(image)
f.close()
Note this example is not perfect. The elegant way should be checking Content-Length header and recv exact length of data. (Instead of hard coding 3 seconds as timeout.) And if the server can use chunked encoding, it becomes even more complicated.)
--
The example is in python 3

Related

Python Socket only returns Response header instead of HTML

I want to extract links from a website js. Using sockets, I'm trying to get the web JS but it always shows response header and not an actual JS/HTML. Here's what I'm using:
import socket
import ssl
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
cont = ssl.create_default_context()
sock.connect(('blog.clova.line.me', 443))
sock = cont.wrap_socket(sock, server_hostname = 'blog.clova.line.me')
sock.sendall('GET /hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js HTTP/1.1\r\nHost: blog.clova.line.me\r\n\r\n'.encode())
resp = sock.recv(2048)
print(resp.decode('utf-8'))
It returns only response header:
HTTP/1.1 200 OK
Date: Tue, 06 Sep 2022 12:02:38 GMT
Content-Type: application/javascript
Transfer-Encoding: chunked
Connection: keep-alive
CF-Ray: 74670e8b9b594c2f-SIN
Age: 3444278
...
I have tried the following:
Setting Content-Type: text/plain; charset=utf-8 header
Changing the header to GET https://blog.clova.line.me/hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js HTTP/1.1
Have been searching related, it's seems that: other people is able to achieve HTML data after response header are received, but for me; I only able to receive the headers and not the HTML data. Frankly, it's working on requests:
resp = requests.get('https://blog.clova.line.me/hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js')
print(resp.text)
How can I achieve similar result using socket? Honestly, I don't like using 3rd-party module that's why I'm not using requests.

The response is just truncated: sock.recv(2048) is reading just the first 2048 bytes. If you read more bytes, you will see the body after the headers.
Anyway, I wouldn't recommend doing that using such a low level library.
Honestly, I don't like
using 3rd-party module that's why I'm not using requests.
If your point is to stick to the python standard library, you can use urrlib.request which provides more abstraction than socket:
import urllib
req = urllib.request.urlopen('…')
print(req.read())

From documentation:
Now we come to the major stumbling block of sockets - send and recv
operate on the network buffers. They do not necessarily handle all the
bytes you hand them (or expect from them), because their major focus
is handling the network buffers. In general, they return when the
associated network buffers have been filled (send) or emptied (recv).
They then tell you how many bytes they handled. It is your
responsibility to call them again until your message has been
completely dealt with.
I've re-write your code and added a receive_all function, which handles the received bytes: (Of course it's a naive implementation)
import socket
import ssl
request_text = (
"GET /hs/hsstatic/HubspotToolsMenu/static-1.138/js/index.js "
"HTTP/1.1\r\nHost: blog.clova.line.me\r\n\r\n"
)
host_name = "blog.clova.line.me"
def receive_all(sock):
chunks: list[bytes] = []
while True:
chunk = sock.recv(2048)
if not chunk.endswith(b"0\r\n\r\n"):
chunks.append(chunk)
else:
break
return b"".join(chunks)
cont = ssl.create_default_context()
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.settimeout(5)
with cont.wrap_socket(sock, server_hostname=host_name) as ssock:
ssock.connect((host_name, 443))
ssock.sendall(request_text.encode())
resp = receive_all(ssock)
print(resp.decode("utf-8"))

Peer not responding to Handshake Message in BitTorrent Protocol

I am sending a handshake to a peer. This is what the handshake looks like:
b'\x13BitTorrent Protocol\x00\x00\x00\x00\x00\x00\x00\x00\x08O\xae=J2\xc5g\x98Y\xafK\x9e\x8d\xbb\x7f`qcG\x08O\xff=J2\xc5g\x98Y\xafK\x9e\x8d\xbb\x7f`qcG'
However, I get an empty b'' in response. I have set timeout to 10.
Here's my code:
clientsocket=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.settimeout(5)
print("trying")
try:
clientsocket.connect((ip,port))
except:
continue
print('connected')
#print(req)
clientsocket.send(req)
clientsocket.settimeout(10)
try:
buffer = clientsocket.recv(1048)
except:
continue
Any idea what my mistake is?

There are a few issues with your sample code. The core issue is the header in your handshake mistakenly capitalizes "Protocol", most BitTorrent implementations will drop the TCP connection if this header isn't byte-for-byte correct.
The following is a slightly cleaned up version of the code that works:
# IP and Port, obviously change these to match where the server is
ip, port = "127.0.0.1", 6881
import socket
# Broken up the BitTorrent header to multiple lines just to make it easier to read
# The main header, note the lower "p" in protocol, that's important
req = b'\x13'
req += b'BitTorrent protocol'
# The optional bits, note that normally some of these are set by most clients
req += b'\x00\x00\x00\x00\x00\x00\x00\x00'
# The Infohash we're interested in. Let python convert the human readable
# version to a byte array just to make it easier to read
req += bytearray.fromhex("5fff0e1c8ac414860310bcc1cb76ac28e960efbe")
# Our client ID. Just a random blob of bytes, note that most clients
# use the first bytes of this to mark which client they are
req += bytearray.fromhex("5b76c604def8aa17e0b0304cf9ac9caab516c692")
# Open the socket
clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.settimeout(5)
print("Trying")
clientsocket.connect((ip,port))
print('Connected')
# Note: Use sendall, in case the handshake doesn't make it one packet for
# whatever reason
clientsocket.sendall(req)
# And see what the server sends back. Note that really you should keep reading
# till one of two things happens:
# - Nothing is returned, likely meaning the server "hung up" on us, probably
# because it doesn't care about the infohash we're talking about
# - We get 68 bytes in the handshake response, so we have a full handshake
buffer = clientsocket.recv(1048)
print(buffer)

Sending png files over python sockets

I've set up a python client and server with socket in Python, that allows the server to send text to the client and I've been trying to extend it so that images can be sent to the client.
Server code:
import socket
#setup and bind server socket
s_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)#setup socket
s_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)#reuses same port (allows reconnection)
s_socket.bind(('192.168.178.52', 9001))
s_socket.listen(1)
#connects and prints clients data and send message
clientsocket, address = s_socket.accept()
print('Connection from {}'.format(address))
clientsocket.send(bytes('Welcome to the server', 'utf-8'))
#Loops for server to sent text data to client
while True:
m = input('Enter')
try:
file = open(m, 'rb')
b = file.read(2048)
clientsocket.send(b)
except:
clientsocket.send(bytes(m, 'utf-8'))
Client code:
import socket
import webbrowser
import os
import pyautogui
#setup and bind client socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.connect(('----------', 9001))#ip swapped for post
while True:
message = s.recv(2048)#recieves all messages sent with buffer size
if message:
txt = str(message)
with open('plswork.png', 'wb') as file:
file.write(message)
file.close()
The problem I'm having is that it will send the file over and create it perfectly fine, but only part of the image will load in when i open it (see image) I am pretty sure this is something to do with the buffer size however when I increase it, it wont recognise the file at all and I'll get an error trying to open the photo (preferably you would be able to send most photos). New to python sockets so any help would be appreciated!
(at the moment trying to send a pic of tux...)
https://i.stack.imgur.com/lBblq.png

I don't know the size of the file, but shouldn't you read the file until it is read completely and send data in chunks?
while True:
m = input('Enter')
try:
file = open(m, 'rb')
while True:
b = file.read(2048)
if not b:
break
clientsocket.send(b)
except:
clientsocket.send(bytes(m, 'utf-8'))
Client side had to be adapted as well.
Most network protocols add more information to simplify reception.
It could for example be a good idea, if you first send the number of bytes, that the welcome message contains, then the welcome message, then some indicator, that you will send a file, then some information, how many bytes you will send for the image and only then the bytes of the image
You will also find out, that it is complicated for the client to know what is the text message and what is part of the png file.
In fact if you would remove the input() command from the server and hard code a file name you might probably notice. that the welcome message and the png file could arrive combined at the client side. and it would have difficulties separating the two.
So to make your code robust, there's a lot of work to do.

HTTP server on pure sockets in python

I`m trying to write very simple http server in python. Working version is like this:
def run(self, host='localhost',port=8000):
with socket.socket(socket.AF_INET,socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((host,port))
s.listen(1)
while True:
connection, adress = s.accept()
with connection:
data = b''
while True:
recived = connection.recv(1024)
data += recived
if len(recived) < 1024:
break
if data != b'':
handle_request(data,connection)
It works , but i have some misunderstanding whats going on.
As i understand, socket "s" accept connection from the client -> and return new socket object "connection" from which i can read what client sends to me and send response. I read data from connection until client send empty line b''. After this point TCP part ends and I pass recived bytes to handler which parse recived data as HTTP.
Qestions: At this point i read all the data which client send to me, but if i want to limit max size of HTTP request, should i just do something like this:
..................................
with connection:
data = b''
request_size_limit=1024*100 # some desired http request max size
while True:
recived = connection.recv(1024)
data += recived
if len(recived) < 1024 or len(data) > request_size_limit:
break
if data != b'':
handle_request(data,connection)
If i do something like this how can I inform client, that for example i have at most 1024*1024 free bytes of RAM and I can`t handle requests larger than this?
If clients want to send more that this limit, he must send several separated requests which will contain 1 part of necessary data?
Or for example for big POST request i must parse each recv(1024) while i found \r\n\r\n sequence , check content length and recv() content length by parts 1024b into some file and proceed after?

A1) If you can't handle the request because it is too large consider just closing the connection. Alternatively you can read (and discard) everything they send and then respond with a 413 Request Took Large.
A2) You'll need to work out a protocol for sending just parts of a request at a time. HTTP doesn't do this natively.
A3) If you can read the whole request in chunks and save it to a file, then it sounds like you have a solution to the 1024*1024 RAM limit, doesn't it?
But fix the issues with reading chunked data off the socket.

Downloading just one image from a server that keeps sending them

There is a server that sends images from a CCTV. The data looks like the following:
--BoundaryString
Content-type: image/jpeg
Content-Length: 15839
... first image in binary...
--BoundaryString
Content-type: image/jpeg
Content-Length: 15895
... second image in binary...
and so on (it continues indefinitely). I was trying pyCurl to fetch just one image like so:
curl = pycurl.Curl()
curl.setopt(curl.URL, 'http://localhost:8080')
with open('image.jpg', 'w') as fd:
curl.setopt(curl.WRITEFUNCTION, fd.write)
curl.perform()
but it doesn't stop after one image and it continues to read from the server. Is there a way to tell curl to stop after one part?
Alternatively, I could just use a socket and implement a simple GET / myself. That's not a problem. However I'm wondering if it's possible to use pyCurl for this case and I'd also like to know what this is since it doesn't look like a proper multipart message to me.
The server is something called "motion" (a video motion detection daemon for Linux).
Thank you.

This is some code that works for me. (python 2)
This will get you all the images sent by the server. if you only need one, sys.exit(0) after you saved the image.
from functools import partial
import socket
def readline(s):
fx = partial(s.recv, 1)
ret = [x for x in iter(fx, '\n')]
return ''.join(ret)
def main():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("127.0.0.1", 8080))
while True:
line = readline(s)
if line.rstrip('\r') == '--BoundaryString':
content_type = readline(s)
length = int(readline(s).rstrip('\r').split()[-1])
_ = readline(s) # we skip an empty line
image = ''
while length:
data = s.recv(length) # here is receiving only 1375 bytes even if you tell it more
length -= len(data) # so we decrement and retry
image += data
# print repr(image[:20]) # was for debug
# TODO --> open a file and save the image
if __name__ == "__main__":
main()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to download image from HTTP server [Python/sockets] - python

Related

Python Socket only returns Response header instead of HTML

Peer not responding to Handshake Message in BitTorrent Protocol

Sending png files over python sockets

HTTP server on pure sockets in python

Downloading just one image from a server that keeps sending them

Categories

Resources