Python and ZMQ: REP socket not receiving all requests

Python and ZMQ: REP socket not receiving all requests - python

I have a REP socket that's connected to many REQ sockets, each running on a separate Google Compute Engine instance. I'm trying to accomplish the synchronization detailed in the ZMQ Guide's syncpub/syncsub example, and my code looks pretty similar to that example:
context = zmq.Context()
sync_reply = context.socket(zmq.REP)
sync_reply.bind('tcp://*:5555')
# start a bunch of other sockets ...
ready = 0
while ready < len(all_instances):
sync_reply.recv()
sync.reply.send(b'')
ready += 1
And each instance is running the following code:
context = zmq.Context()
sync_request = context.socket(zmq.REQ)
sync_request.connect('tcp://IP_ADDRESS:5555')
sync_request.send(b'')
sync_request.recv()
# start other sockets and do other work ...
This system works fine up until a certain number of instances (around 140). Any more, though, and the REP socket will not receive all of the requests. It also seems like the requests it drops are from different instances each time, which leads me to believe that all the requests are indeed being sent, but the socket is just not receiving any more than (about) 140 of them.
I've tried setting the high water mark for the sockets, spacing out the requests over the span of a few seconds, switching to ROUTER/DEALER sockets - all with no improvement. The part that confuses me the most is that the syncsub/syncpub example code (linked above) works fine for me with up to 200 Google Compute Engine instances, which is as many as I can start. I'm not sure what about my code specifically is causing this problem - any help or tips would be appreciated.

Answering my own question - it seems like it was an issue with the large number of sockets I was using, and also possibly the memory limitations of the GCE instances used. See comment thread above for more details.

Related

Telemetry data through python socket, without stopping execution of the program

I'm building photovoltaic motorized solar trackers. They're controlled by Raspberry Pi's running python script. RPI's are connected to my public openVPN server for remote control and continuous software development. That's working fine. Recently a passionate customer asked me for some sort of telemetry data for his tracker - let's say, it's current orientation, measured wind speed etc.. By being new to python, I'm really struggling with this part.
I've decided to use socket approach from guides like this. Python script listens on a socket, and my openVPN server, which is also web server, connects to it using PHP fsockopen. Python sends telemetry data, PHP makes it user friendly and displays it on the web. Everything so far works, however I don't know how to design my python script around it.
The problem is, that my script has to run continuously, and socket.accept() halts it's execution, waiting for a connection. Didn't find any obvious solution on the web. Would multi-threading work for this? Sounds a bit like overkill.
Is there a way to run socket listening asynchronously? Like, for example, pigpio callback's which I'm using abundantly?
Or alternatively, is there a better way to accomplish my goal?
I tried with remote accessing status file that my script is maintaining, but that proved to be extremely involved with setup and prone to errors when the file was being written.
I also tried running the second script. Problem is, then I have no access to relevant data, or I need to read beforementioned status file, and that leads to the same problems as above.
Relevant bit of code is literally only this:
# Main loop
try:
while True:
# Telemetry
conn, addr = S.accept()
conn.send(data.encode())
conn.close()
Best regards.

For a simple case like this I would probably just wrap the socket code into a separate thread.
With multithreading in python, the Global Interpreter Lock (GIL) means that only one thread executes at a time, so you don't really need to add any further locks to the data if you're just reading the values, and don't care if it's also being updated at the same time.
Your code would essentially read something like:
from threading import Thread
def handle_telemetry_requests():
# Main loop
try:
while True:
# Telemetry
conn, addr = S.accept()
conn.send(data.encode())
conn.close()
except:
# Error handling here (this will cause thread to exit if any error occurs)
pass
socket_thread = Thread(target=handle_telemetry_requests)
socket_thread.daemon = True
socket_thread.start()
Setting the daemon flag means that when the main application ends, the thread will also be terminated.
Python does provide the asyncio module - which may provide the callbacks you're looking for (though I don't have any experience with this).
Other options are to run a flask server in the python apps which will handle the sockets for you and you can just code the endpoints to request the data. Or think about using an MQTT broker - the current data can be written to that - and other apps can subscribe to updates.

CanOpen communication (Python) 1 Slave and CAN-USB adapter

I am currently trying to implement simple communication between an I/O module as a CanOpen slave and my computer(Python script). The I/O module is connected to my computer with a PEAK USB-CAN adapter.
My goal would be to read or write the inputs/outputs. Is this even possible with the hardware, since I don't have a real "master" from that point of view?
Unfortunately I don't know what else I have to do to be able to communicate correctly with my I/O module.
import canopen
import time
network = canopen.Network()
network.connect(bustype='pcan', channel='PCAN_USBBUS1', bitrate=500000)
#add node and DCF File
IO_module = network.add_node(1, 'path to my DIO.DCF')
network.add_node(IO_module)
IO_module.nmt.state = 'RESET COMMUNICATION' # 000h 82 01
print(IO_module.nmt.state)
time.sleep(5)
IO_module.nmt.state = 'OPERATIONAL'
print(IO_module.nmt.state)
for node_id in network:
print(network[node_id])
IO_module.load_configuration()
i see some kind of communication in my console with timeout errors
INITIALISING
OPERATIONAL
<canopen.node.remote.RemoteNode object at 0x000002A023493A30>
Transfer aborted by client with code 0x05040000
No SDO response received
Transfer aborted by client with code 0x05040000
No SDO response received
Any advices ?
I can't get any further with the documentation alone
https://canopen.readthedocs.io/en/latest/
thank you

The good news is, you probably have all the required hardware. You are doing the "master" part from Python, that's fine. (The CAN bus isn't really master/slave, just broadcast. CANopen can be master/slave sometimes, but it's still all broadcast messages among equals on the same bus.)
You haven't provided information about your device, but I would start checking at a lower level.
Do you even have the CAN-Bus wired up correctly, and if so, what did you do to verify it? (Most common mistake: CAN-Bus not terminated with two 120ohm resistors. Though you usually can get away with just one instead of two.) And have you verified that you are using the correct baud rate?
The library docu example suggests to wait for the heartbeat with node.nmt.wait_for_heartbeat(). Why are you using a sleep instead? If there is no heartbeat, you don't need to continue. (Unless the device docu says that it doesn't implement NMT heartbeat - would be unusual.)
I certainly wouldn't try to go ahead with SDOs if you cannot confirm a NMT heartbeat. Also, some devices don't implement SDOs but only PDOs.
Try sniffing the CAN bus at a lower level (e.g. not PDOs/SDOs but just print the raw messages received - from Python, or with a separate application - e.g. candump on Linux.) Try getting statistics of the CAN "network" interface (on Linux, e.g. ifconfig). If everything is okay, the adapter should be in state "ERROR-ACTIVE", and you should see the frame counter increase for frames you've sent via Python.

Continuous UDP inquiry in Python stops randomly

I'm rather new to Python and working on a small script for UDP inquiries to a network camera on a given port. I'm sending the inquiry string and expect to receive a string with the needed information. I got the basic functionality running with the code shown below and am receiving the expected response. However, I need the inquiry to be done continuously and in best case roughly according to common camera framerates (25fps). My script works for this case as well, but after some time just stops after sending one last inquiry which never gets answered. The timespan after which this stop happens might be a few hundreds inquiries or just less than
So far I have some difficulties wrapping my head around all the functions of sockets, so I'm not sure where to start looking for my problem. I was hoping that socket.SO_REUSEADDR might be part of an solution but so far this didn't work out. My first guess was that I might just be flooding the camera with too many requests but the issue still comes up with a longer sleep time after each message. Also the inquiry works fine when sent continuously with a tool like Packetsender, so the issue seems to be with my script.
I would be grateful for any hint in which direction I should be looking for a solution.
import socket
import time
UDP_IP_ADDRESS = "192.168.17.25"
UDP_PORT_NO = 52381
Message = b'\x01\x10\x00\x05\xff\xff\xff\xff\x81\x09\x06\x12\xff'
clientSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
clientSock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
clientSock.connect((UDP_IP_ADDRESS, UDP_PORT_NO))
while True:
clientSock.send(Message)
reply = clientSock.recv(1024)
reply_hex = reply.hex()
print(reply_hex)
time.sleep(0.04)

I think I solved my problem by adding clientSock.settimeout(0.1) and putting reply = clientSock.recv(32) into a try block.

Python 'requests' GET in loop eventually throws [WinError 10048]

Disclaimer: This is similar to some other questions relating to this error but my program is not using any multi-threading/processing and I'm working with the 'requests' module instead of raw socket commands so none of the solutions I saw related to my issue.
I have a basic status-checking program running Python 3.4 on Windows that uses a GET request to pull some data off a status site hosted by a number of servers I have to keep watch over. The core code is setup like this:
import requests
import time
URL_LIST = [some, list, of, the, status, sites] # https:// sites
session = requests.session()
previous_data = ""
while 1:
data = ""
for url in URL_LIST:
headers = {'X-Auth-Token': Associated_Auth_Token}
try:
status = session.get(url, headers=headers).json()['status']
except ConnectionError:
status = "SERVER DOWN"
data += "%s \t%s\n" % (url, status)
if data != previous_data:
print(data)
previous_data = data
time.sleep(15)
...which typically runs just fine for hours (this script is intended to run 24/7 and has additional logging built in I left out here for simplicity and relevance) but eventually it crashes and throws the error mentioned in the title:
[WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
The servers I'm requesting from are notoriously slow at times (and sometimes go down entirely, hence the try/except) so my inclination would be that after looping this over and over eventually a request has not fully finished before the next request comes through and Windows tries to step on itself, but I don't see how that could happen with my code since it iterates serially through the URLs.
Also, if this is a TIME_WAIT issue as some other related posts ran into, I'd rather not have to wait for that to finish since I'd like to update every 15 seconds or better, so then I considered closing and opening a new requests session every so often since it typically works fine for hours before hitting a snag, but based off Lukasa's comment here:
To avoid getting sockets in TIME_WAIT, the best thing to do is to use a single Session object at as high a scope as you can and leave it open for the lifetime of your program. Requests will do its best to re-use the sockets as much as possible, which should prevent them lapsing into TIME_WAIT
...it sounds that is not a good idea - though when he says 'lifetime of your program' he may not intend the statement to include 24/7 use as in my case.
So instead of blindly trying things and waiting some number of hours for the program to crash again so I can see if the error changes, I wanted to consult the wealth of knowledge here first to see if anyone can see what's going wrong and knows how I should fix it.
Thanks!

Linux libnetfilter_queue delayed packet problem

I have to filter and modify network traffic using Linux kernel libnetfilter_queue (precisely the python binding) and dpkt, and i'm trying to implement delayed packet forward.
Normal filtering works really well, but if i try to delay packets with function like this
def setVerdict(pkt, nf_payload):
nf_payload.set_verdict_modified(nfqueue.NF_ACCEPT, str(pkt), len(pkt))
t = threading.Timer(10, setVerdict, [pkt, nf_payload])
t.start()
It crashs throwing no exception (surely is a low level crash). Can i implement delay using directly libnetfilter like this or I must copy pkt, drop it and send the copy using standard socket.socket.send()?
Thank you

Sorry for the late reply, but I needed to do something like this, although slightly more complicated. I used the C-version of the library and I copied packets to a buffer inside my program, and then issued a DROP verdict. After a timeout relating to your delay, I reinject the packet using a raw socket. This works fine, and seems quite efficient.
I think the reason for your crash was due to the fact that you didnt issue a verdict fast enough.

I can't answer your question, but why not use the "netem" traffic-queue module on the outgoing interface to delay the packet?
It is possible to configure tc queues to apply different policies to packets which are "marked" in some way; the normal way to mark such packets is with a netfilter module (e.g. iptables or nfqueue).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.