Python sockets buffering - python

Let's say I want to read a line from a socket, using the standard socket module:
def read_line(s):
ret = ''
while True:
c = s.recv(1)
if c == '\n' or c == '':
break
else:
ret += c
return ret
What exactly happens in s.recv(1)? Will it issue a system call each time? I guess I should add some buffering, anyway:
For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.
http://docs.python.org/library/socket.html#socket.socket.recv
But it doesn't seem easy to write efficient and thread-safe buffering. What if I use file.readline()?
# does this work well, is it efficiently buffered?
s.makefile().readline()

If you are concerned with performance and control the socket completely
(you are not passing it into a library for example) then try implementing
your own buffering in Python -- Python string.find and string.split and such can
be amazingly fast.
def linesplit(socket):
buffer = socket.recv(4096)
buffering = True
while buffering:
if "\n" in buffer:
(line, buffer) = buffer.split("\n", 1)
yield line + "\n"
else:
more = socket.recv(4096)
if not more:
buffering = False
else:
buffer += more
if buffer:
yield buffer
If you expect the payload to consist of lines
that are not too huge, that should run pretty fast,
and avoid jumping through too many layers of function
calls unnecessarily. I'd be interesting in knowing
how this compares to file.readline() or using socket.recv(1).

The recv() call is handled directly by calling the C library function.
It will block waiting for the socket to have data. In reality it will just let the recv() system call block.
file.readline() is an efficient buffered implementation. It is not threadsafe, because it presumes it's the only one reading the file. (For example by buffering upcoming input.)
If you are using the file object, every time read() is called with a positive argument, the underlying code will recv() only the amount of data requested, unless it's already buffered.
It would be buffered if:
you had called readline(), which reads a full buffer
the end of the line was before the end of the buffer
Thus leaving data in the buffer. Otherwise the buffer is generally not overfilled.
The goal of the question is not clear. if you need to see if data is available before reading, you can select() or set the socket to nonblocking mode with s.setblocking(False). Then, reads will return empty, rather than blocking, if there is no waiting data.
Are you reading one file or socket with multiple threads? I would put a single worker on reading the socket and feeding received items into a queue for handling by other threads.
Suggest consulting Python Socket Module source and C Source that makes the system calls.

def buffered_readlines(pull_next_chunk, buf_size=4096):
"""
pull_next_chunk is callable that should accept one positional argument max_len,
i.e. socket.recv or file().read and returns string of up to max_len long or
empty one when nothing left to read.
>>> for line in buffered_readlines(socket.recv, 16384):
... print line
...
>>> # the following code won't read whole file into memory
... # before splitting it into lines like .readlines method
... # of file does. Also it won't block until FIFO-file is closed
...
>>> for line in buffered_readlines(open('huge_file').read):
... # process it on per-line basis
...
>>>
"""
chunks = []
while True:
chunk = pull_next_chunk(buf_size)
if not chunk:
if chunks:
yield ''.join(chunks)
break
if not '\n' in chunk:
chunks.append(chunk)
continue
chunk = chunk.split('\n')
if chunks:
yield ''.join(chunks + [chunk[0]])
else:
yield chunk[0]
for line in chunk[1:-1]:
yield line
if chunk[-1]:
chunks = [chunk[-1]]
else:
chunks = []

Related

Receiving data from multiple connections using select in python

I'm a bit confused about how to keep calling recv() when using select(). This code isn't complete but it demonstrates the issue. Lets assume we are receiving a decent amount of data from each connection (10,20mb).
Should you keep looping using recv() until you get the desired number of bytes after the call to select()?
while True:
r,w,e = select.select(r_ready, w_ready, [], timeout)
for client in r:
if client == sock:
acceptConnection(sock)
else:
chunks = []
bytesRead = 0
while bytesRead < desiredBytes:
chunk = client.recv(1024)
bytesRead += len(chunk)
Or should you only call recv() once after each select() loop?
clientBuffers = {}
while True:
r,w,e = select.select(r_ready, w_ready, [], timeout)
for client in r:
if client == sock:
acceptConnection(sock)
else:
chunk = client.recv(1024)
clientBuffers[client].append(chunk)
Should you keep looping using recv() until you get the desired number
of bytes after the call to select()?
In general, no; because you have no way of knowing how long that will take. (e.g. for all you know, the client might not send (or the network might not deliver) the entire sequence of desired bytes until an hour after it sends the first bytes in the sequence; which means that if you stay in a loop calling recv() until you get all of the bytes, then it's possible that all of the other clients will not get any response from your server for a very long time -- clearly not desirable behavior for a multi-client server!)
Instead, just get as many bytes from recv() as you currently can, and if you didn't receive enough bytes to take action yet, then store the received bytes in a buffer somewhere for later and go back to your regular select() call. select() should be the only place in your event loop that you ever block. Making all of your sockets non-blocking is highly recommended, in order to guarantee that you won't ever accidentally block inside a recv() call.

In python, how to use queues properly?

So far I have the following:
fnamw = input("Enter name of file:")
def carrem(fnamw):
s = Queue( )
for line in fnamw:
s.enqueue(line)
return s
print(carrem(fnamw))
The above doesn't print a list of the numbers in the file that I input instead the following is obtained:
<__main__.Queue object at 0x0252C930>
When printing a Queue, you're just printing the object directly, which is why you get that result.
You don't want to print the object representation, but I'm assuming you want to print the contents of the Queue. To do so you need to call the get method of the Queue. It's worth noting that in doing so, you will exhaust the Queue.
Replacing print(carrem(fnamw)) with print(carrem(fnamw).get()) should print the first item of the Queue.
If you really just want to print the list of items in the Queue, you should just use a list. Queue are specifically if you're looking for a FIFO (first-in-first-out) data structure.
It seems to me that you don't actually have any need for a Queue in that program. A Queue is used primarily for synchronization and data transfer in multithreaded programming. And it really doesn't seem as if that is what you're attempting to do.
For you usage, you could just as well use an ordinary Python list:
fnamw = input("Enter name of file:")
def carrem(fnamw):
s = []
for line in fnamw:
s.append(line)
return s
print(carrem(fnamw))
On that same note, however, you're not actually reading the file. The program as you quoted it will simply put each character in the filename as a post of its own into the list (or Queue). What you really want is this:
def carrem(fnamw):
s = []
with open(fnamw) as fp:
for line in fp:
s.append(line)
return s
Or, even simpler:
def carrem(fnamw):
with open(fnamw) as fp:
return list(fp)

Reads after select() - blocking on a pipe

My Python program needs to multiplex reads from several different file descriptors. Some of them are the stdout/stderr descriptors of subprocesses; others are the file descriptors associated with inotify calls.
My problem is being able to do a "non-blocking"[1] read after select(). According to the documentation, sockets that select() reports to be ready for writes "are guaranteed to not block on a write of up to PIPE_BUF bytes".
I suppose that no such guarantee makes sense with a read, as select() reporting that there is data waiting to be ready in the kernel pipe buffer doesn't mean that you can go ahead and to .read(socket.PIPE_BUF), as there could be just a few bytes in there.
This means that when I'm calling read() on the socket, I can get what is effectively a deadlock as some of the subprocesses produce output very rarely.
Is there any way around this? My current workaround is to call readline() on it, and I'm lucky enough that everything I'm reading from has line-by-line output. Is select() of any use at all when reading from a pipe like this, seeing as there's no way to know how many bytes you can safely read without blocking?
[1] I'm aware that this is distinct from an O_NONBLOCK socket
It's OK to go ahead and read each pipe and socket: you'll get whatever data are available now:
>>> import os
>>> desc = os.pipe()
>>> desc
(3, 4)
>>> os.write(desc[1], 'foo')
3
>>> os.read(desc[0], 100)
'foo'
>>> os.read(desc[0], 100)
[hangs here as there's no input available, interrupt with ^C]
...
KeyboardInterrupt
>>> os.write(desc[1], 'a')
1
>>> os.read(desc[0], 100)
'a'
>>>
Just as an alternative, I ran into exactly the same problem and solved it by using readline(1) and appending that to an internal buffer until readline returned a character that I was interested in tokenizing on (newline, space, etc.).
More detail: I called select() on a file descriptor and then called readline(1) on any file descriptor that was returned by select, appended that char to a buffer, and repeated until readline returned what I wanted. Then I returned my buffer, cleared it and moved on. Incidentally, I also returned a Boolean that let the calling method know if the data that I was returning was empty because of a bad read of just because it wasn't done.
I also implemented a version that would tokenize on a timeout. If I'd been buffering for x ms without finding a newline or EOF, go ahead and return the buffer.
I'm currently trying to find out if there's a way to ask a file descriptor how many bytes it has waiting to be read, then just readline([that many bytes])...
Hope that helps.

How to limit file size when writing one?

I am using the output streams from the io module and writing to files. I want to be able to detect when I have written 1G of data to a file and then start writing to a second file. I can't seem to figure out how to determine how much data I have written to the file.
Is there something easy built in to io? Or might I have to count the bytes before each write manually?
if you are using this file for a logging purpose i suggest using the RotatingFileHandler in logging module like this:
import logging
import logging.handlers
file_name = 'test.log'
test_logger = logging.getLogger('Test')
handler = logging.handlers.RotatingFileHandler(file_name, maxBytes=10**9)
test_logger.addHandler(handler)
N.B: you can also use this method even if you don't use it for logging if you like doing hacks :)
See the Python documentation for File Objects, specifically tell().
Example:
>>> f=open('test.txt','w')
>>> f.write(10*'a')
>>> f.tell()
10L
>>> f.write(100*'a')
>>> f.tell()
110L
See the tell() method on the stream object.
One fairly straight-forward approach is to subclass the builtinfileclass and have it keep track of the amount of output which is written to the file. Below is a some sample code showing how that might be done which appears to mostly work.
I say mostly because the size of the files produced is sometimes slightly over the maximum while testing it, but that's because the test the file was opened in "text" mode and on Windows this means that all the'\n' linefeed characters get converted into'\r\n'(carriage-return, linefeed) pairs, which throws the size accumulator off. Also, as currently written, thebufsizeargument that the standardfile()andopen() functions accept is not supported, so the system's default size and mode will always be used.
Depending on exactly what you're doing, the size issue may not be big problem -- however for large maximum sizes it might be off significantly. If anyone has a good platform-independent fix for this, by all means let us know.
import os.path
verbose = False
class LtdSizeFile(file):
''' A file subclass which limits size of file written to approximately "maxsize" bytes '''
def __init__(self, filename, mode='wt', maxsize=None):
self.root, self.ext = os.path.splitext(filename)
self.num = 1
self.size = 0
if maxsize is not None and maxsize < 1:
raise ValueError('"maxsize: argument should be a positive number')
self.maxsize = maxsize
file.__init__(self, self._getfilename(), mode)
if verbose: print 'file "%s" opened' % self._getfilename()
def close(self):
file.close(self)
self.size = 0
if verbose: print 'file "%s" closed' % self._getfilename()
def write(self, text):
lentext =len(text)
if self.maxsize is None or self.size+lentext <= self.maxsize:
file.write(self, text)
self.size += lentext
else:
self.close()
self.num += 1
file.__init__(self, self._getfilename(), self.mode)
if verbose: print 'file "%s" opened' % self._getfilename()
self.num += 1
file.write(self, text)
self.size += lentext
def writelines(self, lines):
for line in lines:
self.write(line)
def _getfilename(self):
return '{0}{1}{2}'.format(self.root, self.num if self.num > 1 else '', self.ext)
if __name__=='__main__':
import random
import string
def randomword():
letters = []
for i in range(random.randrange(2,7)):
letters.append(random.choice(string.lowercase))
return ''.join(letters)
def randomsentence():
words = []
for i in range(random.randrange(2,10)):
words.append(randomword())
words[0] = words[0].capitalize()
words[-1] = ''.join([words[-1], '.\n'])
return ' '.join(words)
lsfile = LtdSizeFile('LtdSizeTest.txt', 'wt', 100)
for i in range(100):
sentence = randomsentence()
if verbose: print ' writing: {!r}'.format(sentence)
lsfile.write(sentence)
lsfile.close()
I noticed an ambiguity in your question. Do you want the file to be (a) over (b) under (c) exactly 1GiB large, before switching?
It's easy to tell if you've gone over. tell() is sufficient for that kind of thing; just check if tell() > 1024*1024*1024: and you'll know.
Checking if you're under 1GiB, but will go over 1GiB on your next write, is a similar technique. if len(data_to_write) + tell > 1024*1024*1024: will suffice.
The trickiest thing to do is to get the file to exactly 1GiB. You will need to tell() the length of the file, and then partition your data appropriately in order to hit the mark precisely.
Regardless of exactly which semantics you want, tell() is always going to be at least as slow as doing the counting yourself, and possibly slower. This doesn't mean that it's the wrong thing to do; if you're writing the file from a thread, then you almost certainly will want to tell() rather than hope that you've correctly preempted other threads writing to the same file. (And do your locks, etc., but that's another question.)
By the way, I noticed a definite direction in your last couple questions. Are you aware of #twisted and #python IRC channels on Freenode (irc.freenode.net)? You will get timelier, more useful answers.
~ C.
I recommend counting. There's no internal language counter that I'm aware of. Somebody else mentioned using tell(), but an internal counter will take roughly the same amount of work and eliminate the constant OS calls.
#pseudocode
if (written + sizeOfNew > 1G) {
rotateFile()
}

Troubles with python list and file saving

I really don't know why my code is not saving for me the readings from the adc and gps receiver to the file I already open it in the first line in the code. it save only one record from both adc and gps receiver.
this is my code:
import MDM
f = open("cord+adc.txt", 'w')
def getADC():
res = MDM.send('AT#ADC?\r', 0)
res = MDM.receive(100)
if(res.find('OK') != -1):
return res
else:
return ""
def AcquiredPosition():
res = MDM.send('AT$GPSACP\r', 0)
res = MDM.receive(30)
if(res.find('OK') != -1):
tmp = res.split("\r\n")
res = tmp[1]
tmp = res.split(" ")
return tmp[1]
else:
return ""
while (1):
cordlist = []
adclist = []
p = AcquiredPosition()
res = MDM.receive(60)
cordlist.append(p)
cordlist.append("\r\n")
f.writelines(cordlist)
q = getADC()
res = MDM.receive(60)
adclist.append(q)
adclist.append("\r\n")
f.writelines(adclist)
and this is the file called "cord+adc.txt":
174506.000,2612.7354N,05027.5971E,1.0,23.1,3,192.69,0.18,0.09,191109,07
#ADC: 0
if there is another way to write my code, please advise me or just point to me the error in the above code.
thanks for any suggestion
You have two problems here, you are not closing you file. There is a bigger problem in your program though your while loop will go forever (or until something else goes wrong in your program) there is no terminating condition. You are looping while 1 but never explicitly breaking out of the loop. I assume that when the function AcquiredPosition() returns an empty string you want the loop to terminate so I added the code if not p: break after the call to the function if it returns an empty string the loop will terminate the file will be closed thanks to the with statement.You should restructure your while loop like below:
with open("cord+adc.txt", 'w') as f:
while (1):
cordlist = []
adclist = []
p = AcquiredPosition()
if not p:
break
res = MDM.receive(60)
cordlist.append(p)
cordlist.append("\r\n")
f.writelines(cordlist)
q = getADC()
res = MDM.receive(60)
adclist.append(q)
adclist.append("\r\n")
f.writelines(adclist)
Because you never explicitly flush() or close() your file, there's no guarantee at all about what will wind up in it. You should probably flush() it after each packet, and you must explicitly close() it when you wish your program to exit.
If your modem connection is a socket,
make sure your socket is functioning by calling getADC() and AcquiredPosition() directly from the interactive interpreter. Just drop the while(1) loop in a function (main() is the common practice), then import the module from the interactive prompt.
Your example is missing the initialization of the socket object, MDM. Make sure it is correctly set up to the appropriate address, with code like:
import socket
MDM = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
MDM.connect((HOST, PORT))
If MDM doesn't refer to a TCP socket, you can still try calling the mentioned methods interactively.
I don't see you closing the file anywhere. Add this as the last line of your code:
f.close()
That should contribute to fixing your problem. I don;t know much about sockets, etc, so I can't help you there.
When you write a line into a file, it is actualy buffered into memory first (this is the C way of handling files). When the maximum size for the buffer is hit or you close the file, the buffer is emptyed into the specified file.
From the explanation so far i think you got the scary image of file manipulation. Now, the best way to solve any and all problems is to flush the buffer's content to the file (meaning after the flush() function is executed and the buffer is empty you have all the content safely saved into your file). Of cource it wold be a great thing to close the file also, but in an infinite loop it's hardly possible (you could hardcode an event maybe, send it to the actual function and when the infinite loop stops - closing the program - close the file also; just a sugestion of cource, the flush () thing shold do the trick.

Categories