Bittorrent and sockets: how to handle multiple messages? - python

I'm writing a bittorrent client in python, and have been using a loop to continually read messages from the peer sockets using recv().
When I run my program I look in wireshark to see what bittorrent messages I'm getting. It's pretty easy to tell what kind of message you got from the first 5 bytes of the message, since the length and message ID are specified there.
I'm running into some problems when dealing with receiving data containing multiple messages.
I've tried tackling it by writing a method like this:
def handleMultiple(self, message, peer):
total_length = len(message)
parsed = 0
while parsed < total_length:
m_len, m_id = struct.unpack(">IB", message[parsed:parsed + 5])
m_total = m_len + 4
print(m_len, total_length, parsed, m_id, peer.made_handshake, peer.ip)
self.handleMessage(message[parsed:m_total + parsed], peer)
parsed += m_total
The function just breaks down the received bytes into its constituent messages and hands it off to the message handler that knows how to deal with individual messages.
The problem is that when I printed out the length prefix and message ID from a message I received using recv(), sometimes it looks like just garbage numbers.
This is really my first time experimenting with sockets, so I lack the intuition to know what I'm really getting when calling recv(). Should I just call receive on the first 5 bytes of data I get, then do some checking to make sure that the length and ID are valid, then call recv() on the rest of the message?
How should I go about handling multiple messages incoming at a time?
Edit:
I wanted to provide some images of the results I'm seeing to see if anyone can help identify the issue I'm having.
Here's a picture of the bittorrent messages I'm receiving:
Here's a corresponding logging output:
The columns are supposed to be message length + 4, total message length, message id, and the IP from the sender:
As I can see, the length prefix for the first messages, (the ones that are multiple messages sent to me at a time) are completely too large. The fifth message I got from 95.211.212.26 is a well formed bitfield message.
Another thing I noticed is that the supposed message ID from each of the multi-message messages is 255. Also given that the total length of a bitfield message for this given torrent is 126, the total lengths (303, 328, 325) are not inconceivable for messages of a bitfield followed by several have messages.

Alright so I've managed to figure out where I was going wrong. I was reading from the socket assuming that my message would be there in full. In reality, I was reading the initial snippet of the message, and at a later time I was reading the middle of the message. The 255 values I was seeing weren't message IDs but actually the middle of the peer's bitfield (0xff).
I changed my approach to store the read in bytes from the socket to the peer's message buffer. Once the message buffer was at least as long as the expected payload, I read the message and trimmed the buffer to exclude what I just read. Now all of my messages' IDs are looking as I expect.

Related

Modify Hl7 messages inline using Python.

I need to be able to either modify some text within existing Hl7 message (mostly PID and OBX segments), or create a copy Hl7 message from the existing message, but alter some of the fields based on some criteria (drop PHI strings)
The OBX segment is used to transmit a single observation or
observation fragment. It represents the smallest indivisible unit of a
report. Its mission is to carry information about observations in
report messages.
HL7 messages should not be modified once received or sent or "copied". Each HL7 message indicates a movement of a transaction where several actors interact.
The HL7 messages must be generated from an episode that must be notified to other systems, in this case, it must be generated and sent or received and processed.
You can check the next library (python-hl7) as it is useful for parsing.
Use hl7apy. docs
from hl7apy.parser import parse_message
hl7 = "your hl7 message"
message = parse_message(hl7)
// you can modify whatever you want
message.MSH.MSH_3.value = "your value"

how to delete kafka message after reading

I am using the below code to read messages from a topic. How do i delete a message after it is read?
from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers=['localhost:9092'])
for message in consumer:
# message value and key are raw bytes -- decode if necessary!
# e.g., for unicode: `message.value.decode('utf-8')`
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
message.value))
There is no way to delete a specific message from kafka - kafka simply is not designed to do that. The only way to delete messages is by setting log.retention.hours in kafka's config/server.properties to a value of your liking. The default is 168 - meaning that messages are not kept after 168 hours.
If you instead are looking for a way to read messages from a specific offset - i.e. not read from the beginning every time, look here http://kafka-python.readthedocs.org/en/master/apidoc/KafkaConsumer.html
commit() - committing read offsets to kafka
seek_to_end() - fast forward to consuming only newly arriving messages
seek() - moving to a given offset (presumably stored somewhere else than in kafka)

TCP Sockets: Double messages

I'm having a problem with sockets in python.
I have a a TCP server and client that send each other data in a while 1 loop.
It packages up 2 shorts in the struct module (struct.pack("hh", mousex, mousey)). But sometimes when recving the data on the other computer, it seems like 2 messages have been glued together. Is this nagle's algorithm?
What exactly is going on here? Thanks in advance.
I agree with other posters, that "TCP just does that". TCP guarantees that your bytes arrive in the right order, but makes no guarantees about the sizes of the chunks they arrive in. I would add that TCP is also allowed to split a single send into multiple recv's, or even for example to split aabb, ccdd into aab, bcc, dd.
I put together this module for dealing with the relevant issues in python:
http://stromberg.dnsalias.org/~strombrg/bufsock.html
It's under an opensource license and is owned by UCI. It's been tested on CPython 2.x, CPython 3.x, Pypy and Jython.
HTH
To be sure I'd have to see actual code, but it sounds like you are expecting a send of n bytes to show up on the receiver as exactly n bytes all the time, every time.
TCP streams don't work that way. It's a "streaming" protocol, as opposed to a "datagram" (record-oriented) one like UDP or STCP or RDS.
For fixed-data-size protocols (or any where the next chunk size is predictable in advance), you can build your own "datagram-like receiver" on a stream socket by simply recv()ing in a loop until you get exactly n bytes:
def recv_n_bytes(socket, n):
"attempt to receive exactly n bytes; return what we got"
data = []
while True:
have = sum(len(x) for x in data)
if have >= n:
break
want = n - have
got = socket.recv(want)
if got == '':
break
return ''.join(data)
(untested; python 2.x code; not necessarily efficient; etc).
You may not assume that data will become available for reading from the local socket in the same size pieces it was provided for sending at the other source end. As you have seen, this might sometimes be usually true, but by no means reliably so. Rather, what TCP guarantees is that what goes in one end will eventually come out the other, in order without anything missing or if that cannot be achieved by means built into the protocol such as retries, then whole thing will break with an error.
Nagle is one possible cause, but not the only one.

Python Parse Minecraft Packet

I have a script that connects to a mine craft server, receives packets, and sends packets.
So, I send a 'login' packet, and I receive a 'login' packet. Unfortunately, the received login packet is encoded (Information about encoding here: http://wiki.vg/Protocol#0x01).
The received login packet is stored in a variable named received_login_packet. I need to decode it so that I can get the separate bits of data, such as the packet type, the dimension, etc...
I've looked around a bit, but I have absolutely no idea as to how to go about doing this.
Here's some code if it helps:
#encoding the packet to send
encuserlen = str(len(enc_user)) # enc_user is just my username
packfmt = '>bih%sshiibBB' % encuserlen
packetbytes = struct.pack(packfmt, 1, 28, len(data['user']), enc_user, 0, 0, 0, 0, 0, 0)
s.send(packetbytes)
time.sleep(2)
#login packet sent, waited for response
response = s.recv(1024) #this is the raw login response.
#it's encoded as detailed above. i want to decode it
Any help would be appreciated and please don't hesitate to say if it's not clear enough.
So, if I understand this right, you want to decode the packet response, which is a 1024 byte, to get the correct information out. It seems you were able to use the struct.pack statement, there is a similar statement to unpack, as seen in the documentation. Basically, it looks like this.
packfmt = '>issiibBB'
output=struct.unpack(packfmt,response)
Also, I'm not quite convinced that your request was being sent right, but I'll leave that as an exercise for you to figure out how to set it. See format strings.

How do I dump the TCP client's buffer in order to accept more data?

I've got a simple TCP server and client. The client receives data:
received = sock.recv(1024)
It seems trivial, but I can't figure out how to recieve data larger than the buffer. I tried chunking my data and sending it multiple times from the server (worked for UDP), but it just told me that my pipe was broken.
Suggestions?
If you have no idea how much data is going to pour over the socket, and you simply want to read everything until the socket closes, then you need to put socket.recv() in a loop:
# Assumes a blocking socket.
while True:
data = sock.recv(4096)
if not data:
break
# Do something with `data` here.
Mike's answer is the one you're looking for, but that's not a situation you want to find yourself in. You should develop an over-the-wire protocol that uses a fixed-length field that describes how much data is going to be sent. It's a Type-Length-Value protocol, which you'll find again and again and again in network protocols. It future-proofs your protocol against unforeseen requirements and helps isolate network transmission problems from programmatic ones.
The sending side becomes something like:
socket.write(struct.pack("B", type) #send a one-byte msg type
socket.write(struct.pack("H", len(data)) #send a two-byte size field
socket.write(data)
And the receiving side something like:
type = socket.read(1) # get the type of msg
dataToRead = struct.unpack("H", socket.read(2))[0] # get the len of the msg
data = socket.read(dataToRead) # read the msg
if TYPE_FOO == type:
handleFoo(data)
elif TYPE_BAR == type:
handleBar(data)
else:
raise UnknownTypeException(type)
You end up with an over-the-wire message format that looks like:
struct {
unsigned char type;
unsigned short length;
void *data;
}
Keep in mind that:
Your operating system has it's own idea of what it's TCP/IP socket buffer size is.
TCP/IP packet maximum size (generally is 1500 bytes)
pydoc for socket suggests that 4096 is a good buffer size
With that said, it'd really be helpful to see the code around that one line. There are a few things that could play into this, if you're using select or just polling, is the socket non-blocking, etc.
It also matters how you're sending the data, if your remote end disconnects. More details.

Categories