Socket received invalid start byte (UnicodeDecodeError, SOCK_STREAM) - python

I am using a blocking python socket of the type socket.socket(socket.AF_INET, socket.SOCK_STREAM) to send messages from my client to my server. If I send messages in quick succession (but not simultaneously), I get the following error on my server:
in receive
size = int(rec_sock.recv(HEADER_SIZE).decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Before each message I send a header with the length of the following message. The header is encoded in UTF-8 by the client and therefore shouldn't throw this error. The header is also the only part of the message that the client attempts to decode with UTF-8 so I am not sure how this error can happen.
I am using the following methods to send, receive, and make a header:
BUF_SIZE = 16384
HEADER_SIZE = 16
def receive(rec_sock: socket.socket) -> Any:
message = b''
size = int(rec_sock.recv(HEADER_SIZE).decode('utf-8'))
if size:
while len(message) < size:
data = rec_sock.recv(BUF_SIZE)
message += data
return pickle.loads(message) if len(message) else None
def send(resp: Any, send_sock: socket.socket) -> None:
pickeled = pickle.dumps(resp)
send_sock.send(make_header(len(pickeled)))
send_sock.send(pickeled)
def make_header(msg_size: int) -> bytes:
encoded = str(msg_size).encode('utf-8')
return b'0' * (HEADER_SIZE - len(encoded)) + encoded

The issue was that I am always filling the entire buffer in my receive method, even if the length of the remaining message is less than the buffer size. Because of this, if two messages were sent consecutively in a short time frame, the header of the next message was read by the previous call to receive and the actual content of the next message is read as the header (which cannot be decoded by utf-8).
Changing the receive method to this fixed it:
def receive(rec_sock: socket.socket) -> Any:
message = b''
size = int(rec_sock.recv(HEADER_SIZE).decode('utf-8'))
print("Waiting for", size, "bytes ...")
if size:
while len(message) < size:
remaining = size - len(message)
read_len = BUF_SIZE if remaining >= BUF_SIZE else remaining
data = rec_sock.recv(read_len)
message += data
print("Received", len(message), "bytes.")
return pickle.loads(message) if len(message) else None

Related

infinite loop when receiving a large pickled object with sockets

I'm using a remote linux server and I want to send an array via sockets from client with python so I used this code :
message = pickle.dumps(faceBlob)
message_header = bytes(f"{len(message):<{HEADER_LENGTH}}", "utf-8")
client_socket.send(message_header + message)
to receive it in the server I used a while loop to catch all the message since it is > 4096 :
def receive_blop(client_socket):
# Receive our "header" containing message length, it's size is defined and constant
message_header = client_socket.recv(HEADER_LENGTH)
# If we received no data, client gracefully closed a connection, for example using socket.close() or socket.shutdown(socket.SHUT_RDWR)
if not len(message_header):
return False
# Convert header to int value
message_length = int(message_header.decode('utf-8').strip())
fragments = []
print(message_length)
while True:
# this loop is infinite
print("I arrived her")
chunk = client_socket.recv(4096)
if not chunk:
break
fragments.append(chunk)
data_arr = b"".join(fragments)
# Return an object of message header and message data
return {'header': message_header, 'data': data_arr}
the server still printing the 'I arrived here' but receive the message until the connection is ended from the client
Your loop will continue until the client closes the connection. If they don't close until they get a response from you, you've got a deadlock.
Since you know the message length, you can stop the loop when you've received that many bytes.
received_length = 0
while received_len < message_length:
print("I arrived her")
chunk = client_socket.recv(message_length - received_length)
if not chunk:
break
fragments.append(chunk)
received_length += len(chunk)

Why the bytes stream got by python socket.recvfrom is different from that crawled by WireShark?

I used the python socket to send a DNS query packet socket and listen to the response. Finally, I got a DNS response packet by the socket.recvfrom(2048) function as expected. But strangely, where I compared the response packet with the packet crawled by Wireshark, I found there exists many difference.
The differences would be found as 3f at the second picture.
The DNS response packet (The highlighted part) crawled by the Wireshark
The DNS response packet got by the socket.recvfrom(2048)
The Creating a Socket Part Codes:
ipv = check_ip(dst)
udp = socket.getprotobyname(Proto.UDP)
if ipv == IPV.ERROR:
return None
elif ipv == IPV.IPV4:
return socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
elif ipv == IPV.IPV6:
return socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, udp)
else:
return None
The Receiving a DNS response packet Part Codes:
remained_time = 0
while True:
remained_time = self.timeout - timeit.default_timer() + sent_time
readable = select.select([sock], [], [], remained_time)[0]
if len(readable) == 0:
return (-1, None)
packet, addr = sock.recvfrom(4096)
Byte 0x3F is the ASCII '?' character. That commonly means the data is being treated as text and is passing through a charset conversion that doesn't support the bytes being converted.
Notice that 0x3F is replacing only the bytes that are > 0x7F (the last byte supported by ASCII). Non-ASCII bytes in the range of 0x80-0xFF are subject to charset interpretation.
That makes sense, as you are using the version of recvfrom() that returns a string, so the received bytes need to be converted to Python's default string encoding.
Since you need raw bytes instead, use recvfrom_into() to fill a pre-allocated bytearray, eg:
packet = bytearray(4096)
remained_time = 0
while True:
remained_time = self.timeout - timeit.default_timer() + sent_time
readable = select.select([sock], [], [], remained_time)[0]
if len(readable) == 0:
return (-1, None)
nbytes, addr = sock.recvfrom_into(packet)
Then you can use packet up to nbytes number of bytes as needed.

Decoding error message from tracker

I am trying to decode an error message from a UDP tracker.
below is my code.
import struct, socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
info_hash = "%1D%D4%D1%EDQn%DB%5CL%83%90%1B%2B%F8%83%A2%19%C0%7C%98"
peer_id = "-UT1234-m%09%B2%D5%99%FA%1Fj%88%AC%0D%A7"
action =1 # announce
downloaded = 0
left = 0
uploaded = 0
event =0
ip = 0
key = 0
num_want = -1
port = 9999
announce_pack = struct.pack(">QLL20s20sQQQLLLLi",connection_id,action,transaction_id,info_hash,peer_id,down loaded,left,uploaded,event,ip,key,num_want,port)
client_socket.sendto(announce_pack, ("tracker.ccc.de", 80))
res = client_socket.recv(1024)
try:
action = struct.unpack(">HLLLLQQQ20s20sLLH", res[:98])
except Exception as e:
error_action, error_tid, error_message = struct.unpack(">ii8s", res)
raise TrackerRequestException(error_message.decode('utf-16'), "")
i am able to unpack the message but for some reason i am getting error message a
\uc061\u51be\u5841\ud3bf
how do I decode this into proper text?
I got the protocol description from this link http://bittorrent.org/beps/bep_0015.html
There can be an exception for any number of reasons; you could have read too little data for example (socket.recv(1024) can return fewer bytes if that's all that's available at that time).
You need to follow the BEP more closely. You need to check that you have received at least 8 bytes first, and then check for the TID, and the action code. Only if your action code is set to 3 is the response an error message.
The message is not encoded in UTF-16. It should just be ASCII data.

Having trouble building a Dns Packet in Python

I'm trying to build a dns packet to send over a socket. I don't want to use any libraries because I want direct access to the socket variable that sends it. Whenever I send the DNS packet, wireshark says that it's malformed. What exactly am I doing wrong?
Some things that are wrong with the Dns packet itself:
It says it has 256 questions, no class and no type
class DnsPacketBuilder:
def __init__(self):
pass
def build_packet(self, url):
packet = struct.pack("H", 12049) # Query Ids (Just 1 for now)
packet += struct.pack("H", 256) # Flags
packet += struct.pack("H", 1) # Questions
packet += struct.pack("H", 0) # Answers
packet += struct.pack("H", 0) # Authorities
packet += struct.pack("H", 0) # Additional
split_url = url.split(".")
for part in split_url:
packet += struct.pack("B", len(part))
for byte in bytes(part):
packet += struct.pack("c", byte)
packet += struct.pack("B", 0) # End of String
packet += struct.pack("H", 1) # Query Type
packet += struct.pack("H", 1) # Query Class
return packet
# Sending the packet
builder = DnsPacketBuilder()
packet = builder.build_packet("www.northeastern.edu")
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('', 8888))
sock.settimeout(2)
sock.sendto(bytes(packet), ("208.67.222.222", 53))
print("Packet Sent")
data, addr = sock.recvfrom(1024)
print("Response: " + data)
sock.close()
Your system is using "little endian" byte order natively.
You need to reverse the byte order of the 16-bit fields into "big endian" (aka "network order") using the ">H" format string in struct.pack().

Python Socket, No Data Received after Initial Transmission

I'm looking to make a very basic remote desktop application. Right now I am able to capture the screen data using the python win32 API, and I am able to send one image over the socket connection, and rebuild it correctly on the receiving end. I send the size of the image and some other data encoded as an 11 byte string before sending the actual image data. The problem I am having is when I try to send the second 11 character string. Simply, no data is coming through the socket. The client sends the data, prints out some information confirming its progress and then closes. but on the server side, no data is coming through to the socket. I am not sure what is going on.
Here is my code, comments in line:
Client Side:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('localhost', 8888))
imgLength = sys.getsizeof(bmpstr) ## bmpstr is the pixel data
prefix = str(imgLength) # message length
prefixSize = sys.getsizeof(prefix)
if prefixSize < 30:
prefix = ('0' * (30 - prefixSize)) + prefix
prefix = "5" + "1" + prefix ## BLOCK LOCATION
s.send(prefix.encode("UTF-8"))
totalSent = 0
while totalSent < imgLength:
totalSent += 4096
if (totalSent >= imgLength):
s.send(bmpstr[totalSent :])
break
else:
s.send(bmpstr[totalSent : totalSent + 4096])
Right now I simply run this twice, sending the prefix and pixel data the same way. Its literally copy and paste. I don't close socket s, I use the same connection for both images. I'm wondering if maybe that is my problem? I am hoping to have a somewhat realtime transmission of data, maybe 3-4 FPS, so I would like to do this as efficiently as possible.
Server Side:
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind(('localhost', 8888))
serversocket.listen(5)
transmission = clientsocket.recv(4096)
transmissionMetaData = decode_meta_data(transmission)
transmissionLength = transmissionMetaData[0]
blockX = transmissionMetaData[1]
blockY = transmissionMetaData[2]
while 1:
thisData = clientsocket.recv(4096)
data += thisData
if len(data) >= transmissionLength or not(thisData):
break
## rebuild the image...
# prepare for second image
data = ""
transmission = ""
prefixTransmission = ""
## here is the problem, I am trying to receive the prefix data which will contain
# the size of the second transmission. But for some reason this never gets any data
# it works just fine when i do it above.
while 1:
thisData = clientsocket.recv(4096)
prefixTransmission += thisData
### this line always prints an empty string for the data
print sys.getsizeof(prefixTransmission), " :", prefixTransmission
if sys.getsizeof(prefixTransmission) >= 32:
transmissionMetaData = prefixTransmission[0:11]
if sys.getsizeof(prefixTransmission) > 32:
data = prefixTransmission[11:]
break
transmissionMetaData = decode_meta_data(transmission)
transmissionLength = transmissionMetaData[0]
blockX = transmissionMetaData[1]
blockY = transmissionMetaData[2]
while 1:
thisData = clientsocket.recv(4096)
data += thisData
if len(data) >= transmissionLength or not(thisData):
break
So my current problem is that the second piece of meta data is simply not coming through the socket. If I just send the 11 character meta data, then the program hangs with an empty transmission. If I have the client send the 11 character meta data, followed by the image data itself, the server crashed because it cannot decode the first 11 bytes.
UnicodeDecodeError: 'utf8' cannot decode byte 0xff in position 2: invalid start byte
I think maybe I am pulling out the 11 characters improperly?

Categories