Stream GSM codec audio from network to speakers on the fly - python

i doing a VoIP software in python, i try to recreate a specific ham radio program protocol, it uses GSM audio codec.
as python has no easy way to play gsm files, i however managed to at least convert a file with it, so i know it is possible.
i use myfile.write(data3) from network stream to write a .gsm file on hard drive.
then i use pysoundfile to convert it to wav file
data, samplerate = sf.read('temppi.gsm')
sf.write('temppi.wav', data, samplerate)
after i can play it with pyaudio. it give huge delay it need to be on the fly not after audio packet came in..
My question how i can direcly on the fly play the file from stream with soundfile? i tried to search google all is only about converting files, there is no way to play it direcly on the fly? any suggestions what i could do. Thanks and happy new year :)
EDIT:
now i have it on fly but this is bad.. and it doing alot chunking sounds
here we start thread aaniulos
if ekabitti == b'\x01':
dataaa = self.socket.recv(198)
data3 = io.BytesIO(bytes(dataaa))
while True:
global aani
#global data3
if aani:
print ('Ääni saije lopetetaan..')
break
data, samplerate = sf.read(io.BytesIO(bytes(data3.getbuffer())), format = 'RAW', channels = 1, samplerate=8000, dtype ='int16', subtype='GSM610', endian ='FILE')
virtuaalifilu = io.BytesIO()
sf.write (virtuaalifilu, data, 8000, format='wav', subtype= 'PCM_16')
sound_file = io.BytesIO(bytes(virtuaalifilu.getbuffer()))
print ('striimataan ääntä nyt kaijuttimiin!!!')
stream.stop_stream()
stream.close()
return

Since you omit much detail I can only guess how your implementation works. It sounds like you are not doing it correctly. My guess is that the huge delay you experience is because you are sending too much audio in each packet, maybe even a whole audio file? To achieve audio streaming with low latency you basically need to follow this crude scheme:
At the sender:
Record audio to a buffer.
Continuously slice the buffer in chunks of a pre-defined length, e.g. 20 milliseconds.
Encode each chunk with a suitable audio codec, e.g. GSM.
Send each chunk in a packet to the receiver, preferably using a datagram based protocol like UDP.
At the receiver:
Read packets from network when available.
Decode each packet to raw audio data and put it in an audio buffer.
Continuously play audio from the audio buffer.
If using UDP as transfer protocol you also need to handle packet losses and out-of-order packets. Depending on the latency requirements you could probably also use (or at least try) TCP to send each audio chunk.
To achieve continuous audio recording and playback sounddevice seems to be a good alternative. For recording, check out InputStream or RawInputStream. For playback, have a look at OutputStream or RawOutputStream.
It might be possible to still use SoundFile to convert from GSM codec to raw audio, but you need to do that for each chunk. And the chunk must be quite small, e.g. 20 milliseconds.

Related

Need some advice for an audio processing app

I want to ask some advices about realtime audio data processing.
For the moment, I created a simple server and client using python sockets which send and receive audio data from microphone until I stop it (4096 bytes for each packet, but could be much more).
I saw two kinds of different analysis:
realtime: perform analysis on each X bytes packet and send back result in response
after receiving a lot of bytes (for example every 1h), append these bytes and store them into a DB. When the microphone is stopped, concatenate all the previous chunk and perform some actions on it (like create a waveplot image for this recorded session).
For this kind of usage, which kind of selfhosted DB can I use ?
how can I concatenate these large volumes of data at regular intervals and add them to the DB ?
For only 6 minutes, I received something like 32MB of data. Maybe I should put each chunk in a redis as soon as I receipt it, rather than keeping it in a python object. Another way could be serialize audio data into b64. I'm just afraid of losing speed since I'm currently using tcp for sending data.
Thanks for your help !
On your question about the size. Is there any reason not to compress the audio data? It's very easy. 32 MB for 6 mins of uncompressed audio (mono) is normal. You could Store smaller chunks and/or append incoming chunks to a bigger file. Have a look at this, it might help you:
https://realpython.com/playing-and-recording-sound-python/
How to join two wav files using python?

Is there a way to live stream audio from server without the local file?

I'm looking for a way to continuously stream audio from a server, the main issue is that the server side code it will receive many url's to stream audio. There will also be instances where the url is swapped live and a new piece of audio is streamed instead. I have not yet found a solution that wouldn't involve me downloading each file to then stream, which would hinder the live feature.
I've attempted to use vlc for python but it wouldn't allow for the ability to change the url being streamed in the moment. I've also attempted to use pyaudio but I haven't been able to get the correct audio format let alone swap the source of the audio.
An example link, fairwarning it'll autoplay: audio
To make a continuous stream that is sent to clients, you'll need to break this project into two halves.
Playout
You need something to decode the source streams from their compressed formats to a non-compressed standardized format that you can manipulate... raw PCM samples. Use a child process and have it output to STDOUT so you can get that data in your Python script. You can use VLC for this if you want, but FFmpeg is pretty easy:
ffmpeg -i "http://example.com/stream" -ar 48000 -ac 2 -f f32le -acodec pcm_f32le -
That will output raw PCM to STDOUT as 32-bit floats, in stereo, at 48 kHz. Once in this standard format, you can arbitrarily join streams. So, when you're done playing one stream, just kill the process, switch to the next, and start playing back samples from the new one.
Encoding
You want to create a single PCM stream that then you can re-encode with some external encoder, basically in reverse from what you did on playout. Again, something FFmpeg can do for you:
ffmpeg -f f32le -ar 48000 -ac 2 - -f opus -acodec libopus icecast://...
Now, you'll note the output example here, I suggested sending this off to Icecast. Icecast is a decent streaming server you can use. If you'd rather just output directly over HTTP, you can. But if you're playing this stream out to more than one listener, I'd suggest letting Icecast or similar take care of it for you.

Stream rtsp video from opencv in python in h.264 with low latency

I'm quite new to video streaming and opencv in general.
I wanted to stream my computations to another device via rtsp from a raspberry pi 3 using h264.
I tried writing to a pipe using popen with ffmpeg to a ffserver anf with vlc creating rtsp servers to stream the content. Unfortunately I have huge lag in the stream, the best I could do was go down to 3 seconds.
Is there any way to achieve this? I'm open to consider other technologies.
Thank you
RTMP is no the best way to achieve low latency (< 5s).
I suggest you to use FFMPEG with pure RTP to stream the video to a RTPS server. Or use directly Gstreamer with Gst-RTSP-server, both are open solutions in C.
Latency will also be impacted by your encoder and the hardware it uses to process.
This question has more information.
I would recommend you to use RTMP instead. Latency can be as low as 100's of milliseconds.
Another thing to consider is that VLC and other clients will introduce a video delay due to internal buffering by the player. Look for the option to not buffer the video and you should be able to shave off a couple of seconds from the video latency.
With ffplay you can try the following:
ffplay --fflags nobuffer rtmp://your.server.ip/path/to/stream -loglevel verbose
If you will transmux to DASH or HLS you can also expect to introduce more latency to the video streaming.

Extracting each individual frame from an H264 stream for real-time analysis with OpenCV

Problem Outline
I have an h264 real-time video stream (I'll call this "the stream") being captured in Process1. My goal is to extract each frame from the stream as it comes through and use Process2 to analyze it with OpenCV. (Process1 is nodejs, Process2 is Python)
Things I've tried, and their failure modes:
Send the stream directly from one Process1 to Process2 over a named fifo pipe:
I succeeded in directing the stream from Process1 into the pipe. However, in Process2 (which is Python) I could not (a) extract individual frames from the stream, and (b) convert any extracted data from h264 into an OpenCV format (e.g. JPEG, numpy array).
I had hoped to use OpenCV's VideoCapture() method, but it does not allow you to pass a FIFO pipe as an input. I was able to use VideoCapture by saving the h264 stream to a .h264 file, and then passing that as the file path. This doesn't help me, because I need to do my analysis in real time (i.e. I can't save the stream to a file before reading it in to OpenCV).
Pipe the stream from Process1 to FFMPEG, use FFMPEG to change the stream format from h264 to MJPEG, then pipe the output to Process2:
I attempted this using the command:
cat pipeFromProcess1.fifo | ffmpeg -i pipe:0 -f h264 -f mjpeg pipe:1 | cat > pipeToProcess2.fifo
The biggest issue with this approach is that FFMPEG takes inputs from Process1 until Process1 is killed, and only then does Process2 begin to receive the data.
Additionally, on the Process2 side, I still don't understand how to extract individual frames from the data coming over the pipe. I open the pipe for reading (as "f") and then execute data = f.readline(). The size of data varies drastically (some reads have length on the order of 100, others length on the order of 1,000). When I use f.read() instead of f.readline(), the length is much larger, on the order of 100,000.
If I were to know that I was getting the correct size chunk of data, I would still not know how to transform it into an OpenCV-compatible array because I don't understand the format it's coming over in. It's a string, but when I print it out it looks like this:
��_M~0A0����tQ,\%��e���f/�H�#Y�p�f#�Kus�} F����ʳa�G������+$x�%V�� }[����Wo �1'̶A���c����*�&=Z^�o'��Ͽ� SX-ԁ涶V&H|��$
~��<�E�� ��>�����u���7�����cR� �f�=�9 ��fs�q�ڄߧ�9v�]�Ӷ���& gr]�n�IRܜ�檯����
� ����+ �I��w�}� ��9�o��� �w��M�m���IJ ��� �m�=�Soՙ}S �>j �,�ƙ�'���tad =i ��WY�FeC֓z �2�g�;EXX��S��Ҁ*, ���w� _|�&�y��H��=��)� ���Ɗ3# �h���Ѻ�Ɋ��ZzR`��)�y�� c�ڋ.��v�!u���� �S�I#�$9R�Ԯ0py z ��8 #��A�q�� �͕� ijc �bp=��۹ c SqH
Converting from base64 doesn't seem to help. I also tried:
array = np.fromstring(data, dtype=np.uint8)
which does convert to an array, but not one of a size that makes sense based on the 640x368x3 dimensions of the frames I'm trying to decode.
Using decoders such as Broadway.js to convert the h264 stream:
These seem to be focused on streaming to a website, and I did not have success trying to re-purpose them for my goal.
Clarification about what I'm NOT trying to do:
I've found many related questions about streaming h264 video to a website. This is a solved problem, but none of the solutions help me extract individual frames and put them in an OpenCV-compatible format.
Also, I need to use the extracted frames in real time on a continual basis. So saving each frame as a .jpg is not helpful.
System Specs
Raspberry Pi 3 running Raspian Jessie
Additional Detail
I've tried to generalize the problem I'm having in my question. If it's useful to know, Process1 is using the node-bebop package to pull down the h264 stream (using drone.getVideoStream()) from a Parrot Bebop 2.0. I tried using the other video stream available through node-bebop (getMjpegStream()). This worked, but was not nearly real-time; I was getting very intermittent data streams. I've entered that specific problem as an Issue in the node-bebop repository.
Thanks for reading; I really appreciate any help anyone can give!
I was able to solve opening a Parrot Anafi stream with OpenCV (built with FFMPEG) in Python by setting the following environment variable:
export OPENCV_FFMPEG_CAPTURE_OPTIONS="rtsp_transport;udp"
FFMPEG defaults to TCP transport, but the feed from the drone is UDP so this sets the correct mode for FFMPEG.
Then use:
cv2.VideoCapture(<stream URI>, cv2.CAP_FFMPEG)
ret, frame = cap.read()
while ret:
cv2.imshow('frame', frame)
# do other processing on frame...
ret, frame = cap.read()
if (cv2.waitKey(1) & 0xFF == ord('q')):
break
cap.release()
cv2.destroyAllWindows()
as usual.
This should also work with a Parrot Bebop, but I don't have one to test it.
There are some suggestions online for piping the h264 stream into the opencv program using standard in:
some-h264-stream | ./opencv-program
where opencv-program contains something like:
VideoCapture cap("/dev/stdin");

Use (Python) Gstreamer to decode audio (to PCM data)

I'm writing an application that uses the Python Gstreamer bindings to play audio, but I'm now trying to also just decode audio -- that is, I'd like to read data using a decodebin and receive a raw PCM buffer. Specifically, I want to read chunks of the file incrementally rather than reading the whole file into memory.
Some specific questions: How can I accomplish this with Gstreamer? With pygst specifically? Is there a particular "sink" element I need to use to read data from the stream? Is there a preferred way to read data from a pygst Buffer object? How do I go about controlling the rate at which I consume data (rather than just entering a "main loop")?
To get the data back in your application, the recommended way is appsink.
Based on a simple audio player like this one (and replace the oggdemux/vorbisdec by decodebin & capsfilter with caps = "audio/x-raw-int"), change autoaudiosink to appsink, and connect "new-buffer" signal to a python function + set "emit-signals" to True. The function will receive decoded chunks of PCM/int data. The rate of the decoding will depend on the rate at which you can decode and consume. Since the new-buffer signal is in the Gstreamer thread context, you could just sleep/wait in that function to control or slow down the decoding speed.

Categories