non-blocking read/log from an http stream

non-blocking read/log from an http stream - python

I have a client that connects to an HTTP stream and logs the text data it consumes.
I send the streaming server an HTTP GET request... The server replies and continuously publishes data... It will either publish text or send a ping (text) message regularly... and will never close the connection.
I need to read and log the data it consumes in a non-blocking manner.
I am doing something like this:
import urllib2
req = urllib2.urlopen(url)
for dat in req:
with open('out.txt', 'a') as f:
f.write(dat)
My questions are:
will this ever block when the stream is continuous?
how much data is read in each chunk and can it be specified/tuned?
is this the best way to read/log an http stream?

Hey, that's three questions in one! ;-)
It could block sometimes - even if your server is generating data quite quickly, network bottlenecks could in theory cause your reads to block.
Reading the URL data using "for dat in req" will mean reading a line at a time - not really useful if you're reading binary data such as an image. You get better control if you use
chunk = req.read(size)
which can of course block.
Whether it's the best way depends on specifics not available in your question. For example, if you need to run with no blocking calls whatever, you'll need to consider a framework like Twisted. If you don't want blocking to hold you up and don't want to use Twisted (which is a whole new paradigm compared to the blocking way of doing things), then you can spin up a thread to do the reading and writing to file, while your main thread goes on its merry way:
def func(req):
#code the read from URL stream and write to file here
...
t = threading.Thread(target=func)
t.start() # will execute func in a separate thread
...
t.join() # will wait for spawned thread to die
Obviously, I've omitted error checking/exception handling etc. but hopefully it's enough to give you the picture.

You're using too high-level an interface to have good control about such issues as blocking and buffering block sizes. If you're not willing to go all the way to an async interface (in which case twisted, already suggested, is hard to beat!), why not httplib, which is after all in the standard library? HTTPResponse instance .read(amount) method is more likely to block for no longer than needed to read amount bytes, than the similar method on the object returned by urlopen (although admittedly there are no documented specs about that on either module, hmmm...).

Another option is to use the socket module directly. Establish a connection, send the HTTP request, set the socket to non-blocking mode, and then read the data with socket.recv() handling 'Resource temporarily unavailable' exceptions (which means that there is nothing to read). A very rough example is this:
import socket, time
BUFSIZE = 1024
s = socket.socket()
s.connect(('localhost', 1234))
s.send('GET /path HTTP/1.0\n\n')
s.setblocking(False)
running = True
while running:
try:
print "Attempting to read from socket..."
while True:
data = s.recv(BUFSIZE)
if len(data) == 0: # remote end closed
print "Remote end closed"
running = False
break
print "Received %d bytes: %r" % (len(data), data)
except socket.error, e:
if e[0] != 11: # Resource temporarily unavailable
print e
raise
# perform other program tasks
print "Sleeping..."
time.sleep(1)
However, urllib.urlopen() has some benefits if the web server redirects, you need URL based basic authentication etc. You could make use of the select module which will tell you when there is data to read.

Yes when you catch up with the server it will block until the server produces more data
Each dat will be one line including the newline on the end
twisted is a good option
I would swap the with and for around in your example, do you really want to open and close the file for every line that arrives?

Related

don't want to wait for socket.accept() each loop iteration

I'm using the line "conn, addr = httpSocket.accept()", but I don't want to wait for it every iteration of my loop because there won't always be someone trying to connect. Is there a way to check if anyone is trying to connect, and move on if there isn't?
I have looked at using asyncio (I can't use threads because this is micropython on an esp8266, and threading is not supported) but my line is not awaitable.
with open('page.html', 'r') as file:
html = file.read()
while True:
conn, addr = httpSocket.accept()
print('Got a connection from %s' % str(addr))
conn.send('HTTP/1.1 200 OK\n')
conn.send('Content-Type: text/html\n')
conn.sendall(html)
conn.close()

If threads isn't an option you can always use the select module.
With select you basically split your sockets into 3 categories:
Sockets that you want to read data from them (including new connections).
Sockets that you want to send them data.
Exceptional sockets ( usually for error checking).
And with each iteration select returns to you lists of sockets by these categories, so you know how to handle each one instead of waiting for a new connection each time.
You can see an example here:
https://steelkiwi.com/blog/working-tcp-sockets/

pySerial Capturing a long response

Hi guys I'm working a on script that will get data from a host using the Data Communications Standard (Developed by: Data Communication Standard Committee Lens Processing Division of The Vision Council), by serial port and pass the data into ModBus Protocol for the device to perform it's operations.
Since I don't fiscally have access to the host machine I'm trying to develop a secondary script to emulate the host. I am currently on the stage where I need to read a lot of information from the serial port and I get only part of the data. I was hoping to get the whole string sent on the send_job() function on my host emulator script.
Guys also can any of you tell me if this would be a good approach? the only thing the machine is supposed to do is grab 2 values from the host response and assign them to two modbus holding registers.
NOTE: the initialization function is hard coded because it will always be the same and the actual response data will not matter except for status. Also the job request is hard coded i only pass the job # that i get from a modbus holding register, the exact logic on how the host resolved this should not matter i only need to send the job number scanned from the device in this format.
main script:
def request_job_modbus(job):
data = F'[06][1c]req=33[0d][0a]job={job}[0d][0a][1e][1d]'.encode('ascii')
writer(data)
def get_job_from_serial():
response = serial_client.read_all()
resp = response.decode()
return resp
# TODO : SEND INIT SEQUENCE ONCE AND VERIFY IF REQUEST status=0
initiation_request()
init_response_status = get_init_status()
print('init method being active')
print(get_init_status())
while True:
# TODO: get job request data
job_serial = get_job_from_serial()
print(job_serial)
host emulation script:
def send_job():
job_response = '''[06][1c]ans=33[0d]job=30925[0d]status=0;"ok"[0d]do=l[0d]add=;2.50[0d]ar=1[0d]
bcerin=;3.93[0d]bcerup=;-2.97[0d]crib=;64.00[0d]do=l[0d]ellh=;64.00[0d]engmask=;613l[0d]
erdrin=;0.00[0d]erdrup=;10.00[0d]ernrin=;2.00[0d]ernrup=;-8.00[0d]ersgin=;0.00[0d]
ersgup=;4.00[0d]gax=;0.00[0d]gbasex=;-5.30[0d]gcrosx=;-7.96[0d]kprva=;275[0d]kprvm=;0.55[0d]
ldpath=\\uscqx-tcpmain-at\lds\iot\do\800468.sdf[0d]lmatid=;151[0d]lmatname=;f50[0d]
lnam=;vsp_basic_fh15[0d]sgerin=;0.00[0d]sgerup=;0.00[0d]sval=;5.18[0d]text_11=;[0d]
text_12=;[0d]tind=;1.53[0d][1e][1d]'''.encode('ascii')
writer(job_response)
def get_init_request():
req = p.readline()
print(req)
request = req.decode()[4:11]
# print(request)
if request == 'req=ini':
print('request == req=ini??? <<<<<<< cumple condicion y enviala respuesta')
send_init_response()
send_job()
while True:
# print(get_init_request())
get_init_request()
what I get in screen: main script
init method being active
bce
erd
condition was met init status=0
outside loop
ers
condition was met init status=0
inside while loop
trigger reset <<<--------------------
5782
`:lmatid=;151[0d]lmatname=;f50[0d]
lnam=;vsp_basic_fh15[0d]sgerin=;0.00[0d]sgerup=;0.00[0d]sval=;5.18[0d]text_11=;[0d]
text_12=;[0d]tind=;1.53[0d][1e][1d]
outside loop
condition was met init status=0
outside loop
what I get in screen: host emulation script
b'[1c]req=ini[0d][0a][1e][1d]'
request == req=ini??? <<<<<<< cumple condicion y enviala respuesta
b''
b'[06][1c]req=33[0d][0a]job=5782[0d][0a][1e][1d]'
b''
b''
b''
b''
b''
b''

I'm suspect you're trying to write too much at once to a hardware buffer that is fairly small. Especially when dealing with low power hardware, assuming you can stuff an entire message into a buffer is not often correct. Even full modern PC's sometimes have very small buffers for legacy hardware like serial ports. You may find when you switch from development to actual hardware, that the RTS and DTR lines need to be used to determine when to send or receive data. This will be up to whoever designed the hardware unfortunately, as they are often also ignored.
I would try chunking your data transfer into smaller bits as a test to see if the whole message gets through. This is a quick and dirty first attempt that may have bugs, but it should get you down the right path:
def get_job_from_serial():
response = b'' #buffer for response
while True:
try:
response += serial_client.read() #read any available data or wait for timeout
#this technically could only be reading 1 char at a time, but any
#remotely modern pc should easily keep up with 9600 baud
except serial.SerialTimeoutException: #timeout probably means end of data
#you could also presumably check the length of the buffer if it's always
#a fixed length to determine if the entire message has been sent yet.
break
return response
def writer(command):
written = 0 #how many bytes have we actually written
chunksize = 128 #the smaller you go, the less likely to overflow
# a buffer, but the slower you go.
while written < len(command):
#you presumably might have to wait for p.dtr() == True or similar
#though it's just as likely to not have been implemented.
written += p.write(command[written:written+chunksize])
p.flush() #probably don't actually need this
P.S. I had to go to the source code for p.read_all (for some reason I couldn't find it online), and it does not do what I think you expect it does. The exact code for it is:
def read_all(self):
"""\
Read all bytes currently available in the buffer of the OS.
"""
return self.read(self.in_waiting)
There is no concept of waiting for a complete message, it just a shorthand for grab everything currently available.

python socket programming for transferring a photo

I'm new to socket programming in python. Here is an example of opening a TCP socket in a Mininet host and sending a photo from one host to another. In fact I changed the code that I had used to send a simple message to another host (writing the received data to a text file) in order to meet my requirements. Although when I implement this revised code, there is no error and it seems to transfer correctly, I am not sure whether this is a correct way to do this transmission or not. Since I'm running both hosts on the same machine, I thought it may have an influence on the result. I wanted to ask you to check whether this is a correct way to transfer or I should add or remove something.
mininetSocketTest.py
#!/usr/bin/python
from mininet.topo import Topo, SingleSwitchTopo
from mininet.net import Mininet
from mininet.log import lg, info
from mininet.cli import CLI
def main():
lg.setLogLevel('info')
net = Mininet(SingleSwitchTopo(k=2))
net.start()
h1 = net.get('h1')
p1 = h1.popen('python myClient2.py')
h2 = net.get('h2')
h2.cmd('python myServer2.py')
CLI( net )
#p1.terminate()
net.stop()
if __name__ == '__main__':
main()
myServer2.py
import socket
import sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('10.0.0.1', 12345))
buf = 1024
f = open("2.jpg",'wb')
s.listen(1)
conn , addr = s.accept()
while 1:
data = conn.recv(buf)
print(data[:10])
#print "PACKAGE RECEIVED..."
f.write(data)
if not data: break
#conn.send(data)
conn.close()
s.close()
myClient2.py:
import socket
import sys
f=open ("1.jpg", "rb")
print sys.getsizeof(f)
buf = 1024
data = f.read(buf)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('10.0.0.1',12345))
while (data):
if(s.sendall(data)):
#print "sending ..."
data = f.read(buf)
print(f.tell(), data[:10])
else:
s.close()
s.close()

This loop in client2 is wrong:
while (data):
if(s.send(data)):
print "sending ..."
data = f.read(buf)
As the send
docs say:
Returns the number of bytes sent. Applications are responsible for checking that all data has been sent; if only some of the data was transmitted, the application needs to attempt delivery of the remaining data. For further information on this topic, consult the Socket Programming HOWTO.
You're not even attempting to do this. So, while it probably works on localhost, on a lightly-loaded machine, with smallish files, it's going to break as soon as you try to use it for real.
As the help says, you need to do something to deliver the rest of the buffer. Since there's probably no good reason you can't just block until it's all sent, the simplest thing to do is to call sendall:
Unlike send(), this method continues to send data from bytes until either all data has been sent or an error occurs. None is returned on success. On error, an exception is raised…
And this brings up the next problem: You're not doing any exception handling anywhere. Maybe that's OK, but usually it isn't. For example, if one of your sockets goes down, but the other one is still up, do you want to abort the whole program and hard-drop your connection, or do you maybe want to finish sending whatever you have first?
You should at least probably use a with clause of a finally, to make sure you close your sockets cleanly, so the other side will get a nice EOF instead of an exception.
Also, your server code just serves a single client and then quits. Is that actually what you wanted? Usually, even if you don't need concurrent clients, you at least want to loop around accepting and servicing them one by one.
Finally, a server almost always wants to do this:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
Without this, if you try to run the server again within a few seconds after it finished (a platform-specific number of seconds, which may even depend whether it finished with an exception instead of a clean shutdown), the bind will fail, in the same way as if you tried to bind a socket that's actually in use by another program.

First of all, you should use TCP and not UDP. TCP will ensure that your client/server has received the whole photo properly. UDP is more used for content streaming.
Absolutely not your use case.

Python 2.7 Script works with breakpoint in Debug mode but not when Run

def mp_worker(row):
ip = row[0]
ip_address = ip
tcp_port = 2112
buffer_size = 1024
# Read the reset message sent from the sign when a new connection is established
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
print('Connecting to terminal: {0}'.format(ip_address))
s.connect((ip_address, tcp_port))
#Putting a breakpoint on this call in debug makes the script work
s.send(":08a8RV;")
#data = recv_timeout(s)
data = s.recv(buffer_size)
strip = data.split("$", 1)[-1].rstrip()
strip = strip[:-1]
print(strip)
termStat = [ip_address, strip]
terminals.append(termStat)
except Exception as exc:
print("Exception connecting to: " + ip_address)
print(exc)
The above code is the section of the script that is causing the problem. It's a pretty simple function that connects to a socket based on a passed in IP from a DB query and receives a response that indicates the hardware's firmware version.
Now, the issue is that when I run it in debug with a breakpoint on the socket I get the entire expected response from the hardware, but if I don't have a breakpoint in there or I full on Run the script it only responds with part of the expected message. I tried both putting a time.sleep() in after the send to see if it would get the entire response and I tried using the commented out recv_timeout() method in there which uses a non-blocking socket and timeout to try to get an entire response, both with the exact same results.
As another note, this works in a script with everything in one main code block, but I need this part separated into a function so I can use it with the multiprocessing library. I've tried running it on both my local Windows 7 machine and on a Unix server with the same results.

I'll expands and reiterate on what I've put into a comment moment ago. I am still not entirely sure what is behind the different behavior in either scenario (apart from timing guess apparently disproved by an attempt to include sleep.
However, it's somewhat immaterial as stream sockets do not guarantee you get all the requested data at once and in chunks as requested. This is up for an application to deal with. If the server closes the socket after full response was sent, you could replace:
data = s.recv(buffer_size)
with recv() until zero bytes were received, this would be equivalent of getting 0 (EOF) from from the syscall:
data = ''
while True:
received = s.recv(buffer_size)
if len(received) == 0:
break
data += received
If that is not the case, you would have to rely on fixed or known (sent in the beginning) size you want to consider together. Or deal with this on protocol level (look for characters, sequences used to signal message boundaries.

I just recently found out a solution here, and thought I'd post it in case anyone else has issue, I just decided to try and call socket.recv() before calling socket.send() and then calling socket.recv() again afterwards and it seems to have fixed the issue; I couldn't really tell you why it works though.
data = s.recv(buffer_size)
s.send(":08a8RV;")
data = s.recv(buffer_size)

Python Sockets, requesting file from server then waiting to receive it

I am attempting to send a string to my server from my client with a specific filename and then send that file to the client. For some reason it hangs even after it's received all of the file. It hangs on the:
m = s.recv(1024)
client.py
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.1.2", 54321))
s.send(b"File:test.txt")
f = open("newfile.txt", "wb")
data = None
while True:
m = s.recv(1024)
data = m
if m:
while m:
m = s.recv(1024)
data += m
else:
break
f.write(data)
f.close()
print("Done receiving")
server.py
import socket
import os
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 54321))
while True:
client_input = c.recv(1024)
command = client_input.split(":")[0]
if command == "File":
command_parameter = client_input.split(":")[1]
f = open(command_parameter, "rb")
l = os.path.getsize(command_parameter)
m = f.read(l)
c.sendall(m)
f.close()

TLDR
The reason recv blocks is because the socket connection is not shutdown after the file data was sent. The implementation currently has no way to know when the communication is over, which results in a deadlock between the two, remote processes. To avoid this, close the socket connection in the server, which will generate an end-of-file event in the client (i.e. recv returns a zero-length string).
More insight
Whenever you design any software where two processes communicate with each other, you have to define a protocol that disambiguates the communication such that both peers know exactly which state they are in at all times. Typically this involves using the syntax of the communication to help guide the interpretation of the data.
Currently, there are some problems with your implementation: it doesn't define an adequate protocol to resolve potential ambiguity. This becomes apparent when you consider the fact that each call to send in one peer doesn't necessarily correspond to exactly one call to recv in the other. That is, the calls to send and recv are not necessarily one-to-one. Consider sending the file name to the server on a heavily congested network: perhaps only half of the file name makes it to the server when the first call to recv returns. The server has no way (currently) to know if it has finished receiving the file name. The same is true in the client: how does the client know when the file has finished?
To work around this, we can introduce some syntax into the protocol and some logic into the server to ensure we get the complete file name before continuing. A simple solution would be to use an EOL character, i.e. \n to denote the end of the client's message. Now, 99.99% of the time in your testing this will take a single call to recv to read in. However you have to anticipate the cases in which it might take more than one call to recv. This can be implemented using a loop, obviously.
The client end is simpler for this demo. If the communication is over after the sending of the file, then that event can be used to denote the end of the data stream. This happens when the server closes the connection on its end.
If we were to expand the implementation to, say, allow for requests for multiple, back-to-back files, then we'd have to introduce some mechanism in the protocol for distinguishing the end of one file and the beginning of the next. Note that this also means the server would need to potentially buffer extra bytes that it reads in on previous iterations in case there is overlap. A stream implementation is generally useful for these sorts of things.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.