I'm developing a program in python that plays audio files from various websites, mostly mp3s. My first thought to play these files was to try streaming them with requests then decoding the chunks, seeking would be some sort of range in the request header.
So I've tried doing some tests with this and ran into some problems converting small chunks of data to the playable form. Before I dig in and try to fix them, I was wondering if streaming is even necessary. Is that how programs like vlc deal with it? How would you go about dealing with it?
I did a lot of google searching and didn't come up with anything useful.
Yes, if you're playing back as you're downloading, streaming is the way to go.
This generally doesn't require any special requests on your part. Simply make a request, decode the data as it comes in, buffer it, and put backpressure on the stream if your buffers are getting full.
What ends up happening is that the TCP window size will be reduced, slowing the speed at which the server transmits to you, until it matches the rate of playback. (In practice, this means that the window slams to zero pretty quickly, then spurts open for a few packets and back to zero again, since internet connections these days are typically much faster than required.)
Now, you still may want to handle ranged requests if you lose your connection. That is, if I'm listening to audio for several minutes and then lose my connection (such as when changing from WiFi to LTE for example), your app can reconnect and request all bytes from the point at which it left off. Browsers do this. It becomes more important when using common HTTP CDNs which are less tolerant of connections that stay open for long periods of time. Typically, if the TCP window size stays at zero for 2 minutes, expect that TCP connection to close.
You might download a copy of Wireshark or some other packet sniffer and watch what happens over the wire while you play one of these HTTP streams in VLC. You'll get a better idea of what's going on under-the-hood.
Related
I have 3 Raspberry Pi's, all on the same LAN doing stuff that is monitored by Python and I want them to talk to each other, and to my PC. Sockets seem like the way to go, but the examples are so simplistic. Here's the issue I am stuck on - the listen and receive processes are all blocking, unless you set a timeout, in which case they still block, just for less time.
So, if I set up a round-robin, then each Pi will only be listened to (or received on) for 1/3 of the time, or less if there is stuff to transmit as well.
What I'd like to understand better is what happens to the data (or connection requests) when I am not listening/receiving - are these buffered by the OS, or lost..? What happens to the socket when there is no method called, is it happy to be ignored for a while, or will the socket itself be dumped by the OS..?
I am starting to split these into separate processes now, which is getting messy and seems inefficient, but I can't think of another way except to run this as 3 (currently), maybe 6 (transmit/receive) or even 9 (listen/transmit/receive) separate processes..?
Sorry I don't have a code example, but it is already way tooo big, and it doesn't work. plus a lot of the issue seems to me to be in the murky part of the sockets - that part between the socket and the OS. I feel I need to understand this better to get to the right architecture for my bit of code before I really start debugging the various exceptions and communication failures...
You can handle multiple sockets in a single process using I/O multiplexing. This is usually done using calls such as epoll(), poll() or select(). These calls monitor multiple sockets and return when one or more sockets have data available for reading. Or are ready to write data to. In many cases this is more convenient than using multiple processes and/or threads.
These calls are pretty low level OS calls. Python seems to have higher level functionality that might be easier to use but I haven't tried this myself.
I have been trying to do some coding with sockets recently and found out that i often get broken pipe errors, especially when working with bad connections.
In an attempt to solve this problem I made sure to sleep after every socket operation. It works but it is slow and ugly.
Is there any other way of making a socket connection more stable?
...server and client getting out of sync
Basically you say that your application is buggy. And the way to make the connection more stable is therefor to fix these bugs, not to work around it with some explicit sleep.
While you don't show any code, a common cause of "getting out of sync" is the assumption that a send on one side is matched exactly by a recv on the other side. Another common assumption is that send will actually send all data given and recv(n) will receive exactly n bytes of data.
All of these assumptions are wrong. TCP is not a message based protocol but a byte stream. Any message semantics need to be explicitly added on top of this byte stream, for example by prefixing messages with a length or by having a unique message separator or by having a fixed message size. And the result of send and recv need to be checked to be sure that all data have been send or all expected data have been received - and if not more send or recv would need to be done until all data are processed.
Adding some sleep often seems to "fix" some of these problems by basically adding "time" as a message separator. But it is not a real fix, i.e. it affects performance but it is also not 100% reliable either.
I've been using Python's Sockets for a long time and I can tell that as long as your code (which you unfortunately didn't provide) is clean and synchronized in itself you shouldn't get any problems. I use Sockets for small applications where I don't necessarily want/need to write/use an API, and it works like a dream.
As #Steffen already mentioned in his answer, TCP is not a message based protocol. It is a "stream oriented protocol" which means that is sends data byte-by-byte and not message-by-message..
Take a look at this thread and this paper to get a better understanding about the differences.
I would also suggest taking a look at this great answer to know how to sync your messages between your server and your client(s).
I've got a program which receives information from about 10 other (sensor reading) programs (all controlled by myself). I now want to make them communicate using ZeroMQ.
For most of the queues the important thing is that the central receiving program always has the latest sensor data, all older messages are not important anymore. If a couple messages get lost I don't care. So for all of them I started out with a separate PUB/SUB socket; one for each program. But I'm not sure if that is the right way to do it. As far as I understand I have two options:
Make a separate socket for every program and read them out in a loop. That way I know by the socket what the information is I'm receiving (I'm often just sending an int).
Make one socket to which all the programs connect, and with every message I send a string which tells the receiving end what the message is about.
All connections are on a PUB/SUB basis, so creating one socket would well work out. I'm just not sure if that is the most efficient way to do it.
All tips are welcome!
- PUB/SUB is fine and allows an easy conversion from N-sensors:1-logger into N-sensors:2+-loggers- one might also benefit from a conceptual separation of a socket from an access-port, where more than one sockets may get connected
How to get always JUST THE ACTUAL ( LAST ) SENSOR READOUT:
If not bound, due to system-integration constraints, to some early ZeroMQ API, there is a lovely feature exactly for this via a .setsockopt( ZMQ_CONFLATE, True ) method:
ZMQ_CONFLATE: Keep only last message
If set, a socket shall keep only one message in its inbound/outbound queue, this message being the last message received/the last message to be sent. Ignores ZMQ_RCVHWM and ZMQ_SNDHWM options. Does not support multi-part messages, in particular, only one part of it is kept in the socket internal queue.
On design dilemma:
Unless your real-time control stability introduces some hard-real-time limit, the PUB-side freely decides, how often a new value is instructed to .send() to SUB(-s). Here no magic is needed, the less with ZMQ_CONFLATE option set on the internal outgoing queue managed.
The SUB(-s) side receiver(s) will also benefit from the ZMQ_CONFLATE option set on the internal incoming queue managed, but given a set of individual .bind()-s instantiate separate landing ports for delivery of different individual sensoric readouts, your "last" values will remain consistently the "last"-readouts. If all readouts would go into a common landing pad, your receiving process will get masked-out ( lost ) all readouts but the one that was just accidentally the "last" right before .recv() took place, which would not help much, would it?
If some I/O-performance related tweaking becomes necessary, the .Context( n_IO_threads ) + ZMQ_AFFINITY-mapping options may increase and prioritise the resources the ioDataPump may harness for increased IO-performance
Unless you're up against a tight real time requirement there's not much point in having more sockets than necessary. ZMQ's fair queuing ought to take care of giving each sensor program equal attention (see Figure 6 in the guide)
If your sensor programs are on other devices connected by Ethernet, the ultimate performance of your programs is limited by the bandwidth of the Ethernet NIC in your computer. A single thread program handling a single PULL socket stands a good chance of being able to process the data coming in faster than it can transit the NIC.
If that's so, then you may as well stick to a single socket and enjoy the simpler code. It's not very hard dealing with multiple sockets, but it's far easier to deal with one. For example, with one single socket you don't have to tell each sensor program what network port to connect to - it can be a constant.
PUSH/PULL sounds like a more natural pattern for your situation than PUB/SUB, but that won't make much difference.
Lastness
Lastness is going to be your (potential) problem. The whole point of things like ZMQ is that they will deliver messages in the order they're sent. Thus you read a message, it is by definition the "last" message so far as the recipient is concerned. The recipient has no idea as to whether or not there is another message on the way, in transit.
This is a feature of Actor model architectures (which is what ZMQ is). Messages get buffered up in the transport, and there's no information about the newness of the message to be learned when it's read. All you know is that it was sent some time beforehand. There is no execution rendezvous with the sender.
Now, you either process it as if it is the last message, or you wait for a period of time to see if another one comes along before processing it. The easiest thing to do is to simply process each message as if it is the last.
Contrast this with a Communicating Sequential Processes architecture. It's basically the same as an Actor model architecture, except that the transport does not buffer messages. Message sends block until the recipient has called message read.
Thus when you read a message, the recipient knows that it the last one sent by the sender. And the sender knows that the message it has sent has been received at that very instant by the recipient. So the knowledge of lastness is absolute - the message received really is the last one sent.
However, unless you have something fairly heavyweight going on I wouldn't worry about it. You are quite likely to be able to keep up with your sensor data stream even if the messages you're reading aren't the latest in the queue.
You can nearly make ZMQ into CSP by setting the high water limit on the sending end's socket to 1. That means that you can buffer up at most 1 message. That's not the same as 0, and unfortunately setting the HWM to 0 means "unlimited size buffer".
I am sending coordinates of points to a visualizational client-side script via TCP over the internet. I wonder which option I should use:
concat the coordinates into a large string and send them together, or
send them one by one
I don't know which one the faster is. I have some other questions too:
Which one should I use?
Is there a maximum size of packet of TCP? (python: maximum size of string for client.send(string))
As it is a visualization project should I use UDP instead of TCP?
Could you please tell me a bit about lost packet? When do they occur? How to deal with them?
Sorry for the many questions, but I really struggle with this issue...
When you send a string, that might be sent in multiple TCP packets. If you send multiple strings, they might all be sent in one TCP packet. You are not exposed to the packets, TCP sockets are just a constant stream of data. Do not expect that every call to recv() is paired with a single call to send(), because that isn't true. You might send "abcd" and "efg", and might read in "a", "bcde", and "fg" from recv().
It is probably best to send data as soon as you get it, so that the networking stack has as much information about what you're sending, as soon as possible. It will decide exactly what to do. You can send as big a string as you like, and if necessary, it will be broken up to send over the wire. All automatically.
Since in TCP you don't deal with packets, things like lost packets also aren't your concern. That's all handled automatically -- either the data gets through eventually, or the connection closes.
As for UDP - you probably don't want UDP. :)
UDP has a maximum theoretical packet size of 65535, in reality this size is much lower and depends on operation systems and routing hardware. UDP does not have any mechanism to ensure data delivery, this make it much faster but unreliable without some hand crafted data protection mechanism.
TCP, you don't have to worry about packet size, if you try to send a bigger chunk then possible, wil be automatically and transparently split by the operating system for you. Sending small chunks of data can be inefficient due the overhead of the TCP communication, but even here, by default you have Nagle's algorithm enabled on most OS (See Nagle's algorithm). The algorithm basically tries to join small chunks of data, for most scenarios this is fast and efficient.
I'm serving requests from several XMLRPC clients over WAN. The thing works great for, let's say, a period of one day (sometimes two), then freezes in socket.py:
data = self._sock.recv(self._rbufsize)
_sock.timeout is -1, _sock.gettimeout is None
There is nothing special I do in the main thread (just receiving XMLRPC calls), there are another two threads talking to DB. Both these threads work fine and survive this block (did a check with WinPdb). Clients are sending requests not being longer than 1KB, and there isn't any special content: just nice and clean strings in dictionary. Between two blockings I serve tens of thousands requests without problems.
Firewall is off, no strange software on the same machine, etc...
I use Windows XP and Python 2.6.4. I've checked differences between 2.6.4. and 2.6.5, and didn't find anything important (or am I mistaking?). 2.7 version is not an option as I would miss binaries for MySqlDB.
The only thing that happens from time to time caused by the clients that have poor internet connection is that sockets break. This is happening, every 5-10 minutes (there are just five clients accessing server every 2 seconds).
I've spent great deal of time on this issue, now I'm beginning to lose any ideas what to do. Any hint or thought would be highly appreciated.
What exactly is happening in your OS's TCP/IP stack (possibly in the python layers on top, but that's less likely) to cause this is a mystery. As a practical workaround, I'd set a timeout longer than the delays you expect between requests (10 seconds should be plenty if you expect a request every 2 seconds) and if one occurs, close and reopen. (Calibrate the delay needed to work around freezes without interrupting normal traffic by trial and error). Unpleasant to hack a fix w/o understanding the problem, I know, but being pragmatical about such things is a necessary survival trait in the world of writing, deploying and operating actual server systems. Be sure to comment the workaround accurately for future maintainers!
thanks so much for the fast response. Right after I've receive it I augmented the timeout to 10 seconds. Now it is all running without problems, but of course I would need to wait another day or two to have sort of confirmation, but only after 5 days I'll be sure and will come back with the results. I see now that 140K request went well already, having so hard experience on this one I would wait at least another 200K.
What you were proposing about auto adaptation of timeouts (without putting the system down) sounds also reasonable. Would the right way to go be in creating a small class (e.g. AutoTimeoutCalibrator) and embedding it directly into serial.py?
Yes - being pragmatical is the only way without loosing another 10 days trying to figure out the real reason behind.
Thanks again, I'll be back with the results.
(sorry, but for some reason I was not able to post it as a reply to your post)