I have a service that sends text to an external text to speech service that returns back audio in a response. This is how i access the audio:
res = requests.get(TTS_SERVICE_URL, params={"text":text_to_synth})
bytes_content = io.BytesIO(bytes(res.content))
audio = bytes_content.getvalue()
Now i would like to send multiple lines of text in different requests, and receive all the audio content in bytes, merge them into one audio and then display it, can anyone guide me as to how would i be able to merge the bytes_content into one audio byte stream
I got this to work, posting the answer here if someone else faces the same problem, solved it as such
Read the bytes_content into a numpy array using soundfile:
data, samplerate = sf.read(bytes_content)
datas.append(data)
where datas is an empty array where each file to be concatenated is added
Then combine the files again
combined = np.concatenate(datas)
and convert back to a byte stream if needed
out = io.BytesIO()
sf.write(out, combined, samplerate=samplerate, format="wav")
I am pretty sure that this isn't the right way to do things, but this is what worked for me
Related
I have a numpy array from a some.npy file that contains data of an audio file that is encoded in the .wav format.
The some.npy was created with sig = librosa.load(some_wav_file, sr=22050) and np.save('some.npy', sig).
I want to convert this numpy array as if its content was encoded with .mp3 instead.
Unfortunately, I am restricted to the use of in-memory file objects for two reasons.
I have many .npy files. They are cached in advance and it would be highly inefficient to have that much "real" I/O when actually running the application.
Conflicting access rights of people who are executing the application on a server.
First, I was looking for a way to convert the data in the numpy array directly, but there seems to be no library function. So is there a simple way to achieve this with in-memory file objects?
NOTE: I found this question How to convert MP3 to WAV in Python and its solution could be theoretically adapted but this is not in-memory.
You can read and write memory using BytesIO, like this:
import BytesIO
# Create "in-memory" buffer
memoryBuff = io.BytesIO()
And you can read and write MP3 using pydub module:
from pydub import AudioSegment
# Read a file in
sound = AudioSegment.from_wav('stereo_file.wav')
# Write to memory buffer as MP3
sound.export(memoryBuff, format='mp3')
Your MP3 data is now available at memoryBuff.getvalue()
You can convert between AudioSegments and Numpy arrays using this answer.
I finally found a working solution. This is what I wanted.
from pydub import AudioSegment
wav = np.load('some.npy')
with io.BytesIO() as inmemoryfile:
compression_format = 'mp3'
n_channels = 2 if wav.shape[0] == 2 else 1 # stereo and mono files
AudioSegment(wav.tobytes(), frame_rate=my_sample_rate, sample_width=wav.dtype.itemsize,
channels=n_channels).export(inmemoryfile, format=compression_format)
wav = np.array(AudioSegment.from_file_using_temporary_files(inmemoryfile)
.get_array_of_samples())
There exists a wrapper package (audiosegment) with which one could convert the last line to:
wav = audiosegment.AudioSegment.to_numpy_array(AudioSegment.from_file_using_temporary_files(inmemoryfile))
I'm using a windows build of gphoto2 to generate a byte stream. Take the byte stream and look for the jpeg headers (ff d8) and footer (ff d9) and display a single image from the stream. Whenever I pass the parsed byte string into imdecode it returns None. I pass all of the data including the ff d8/ ff d9 into imdecode.
pipe = sp.Popen('gphoto2 --stdout-size --capture-movie', stdout = sp.PIPE)
founda=False
foundb=False
bytesl=b''
while True:
bytesl=bytesl+pipe.stdout.readline()
if ~founda:
a = bytesl.find(b'\xff\xd8') # JPEG start
bytesl = bytesl[a:]
if a!=-1:
founda=True
if founda and ~foundb:
b = bytesl.find(b'\xff\xd9') # JPEG end
if a!=-1 and b!=-1:
foundb=True
if founda and foundb:
jpgi = bytesl[:b+2]
imfbuffh = cv2.imdecode(np.frombuffer(jpgi, dtype=np.uint8),cv2.IMREAD_COLOR)
I keep getting nothing from imdecode and I'm not sure why. The byte string appears to correctly parse the data. Any help would be greatly appreciated.
Edit:
Something else I've noticed is if I just read a JPG from a file and I do a np.shape on the object from np.buffer I report something like (140000,1) versus when i do the np.shape when I'm reading it from the byte string I get (140000,). I've tried expanding the dimensions but that didn't work.
Edit2:
Well I realized that the header for the mjpeg is not just a standard jpeg header. I'm not sure how to convert it to the standard format. If anyone has any tips that would be great.
Edit3:
I simplified the output and write to file code to just read the pipe data.
I have two test cases one where I use --capture-movie 2 and one where I use --capture-image-and-download so that in the first case I capture 2 frames of MJPEG data and another where I capture 1 frame of jpeg data. I tried to display the data for both cases with my previous code and they failed to display the image even if I just wait for the stdout to finish rather than reading the data in real time.
Here is the code I used to just to write the bytes to a byte file. In my previous comment I was just recording the byte string from a print statement (stupid I know I'm not very good at this). Should be noted I think these byte strings need to be decoded.
pipe = sp.Popen('gphoto2 --stdout-size --capture-movie 2', stdout = sp.PIPE)
pipedata=pipe.stdout.read()
f = open('C:\\Users\\Work\\frame2out.txt', 'wb')
f.write(pipedata)
Attached are links to the two cases.
2 Frames from --capture-movie
https://www.dropbox.com/s/3wvyg8s1tflzwaa/frame2out.txt?dl=0
Bytes from --capture-image-and-download
https://www.dropbox.com/s/3arozhvfz6a77lr/imageout.txt?dl=0
I'm writing a Rest-API function which should take a video from a post request, process the video using OpenCV and return a text response. I got stuck at reading the video from its string representation.
I looked at documentations that describe how to read a video in OpenCV and all of them are either reading from a path or from the webcam. For example, cv2.VideoCapture or FileVideoStream from imutils are all using the file path to load the video. However, I want to avoid redundant IO operations and don't want to write the video to a file first.
Related part in my project:
#app.route('/processvideo', methods = ['POST'])
def process_video():
# read as string
fStr = request.files['file'].read() # file is read as string from the request
npimg = np.fromstring(fStr, np.uint8) # string data is converted to numpy array.
# image = cv2.imdecode(npimg, cv2.IMREAD_COLOR) # this functions doesn't work, because it only takes image, not video.
return jsonify( { 'output': 'test' } )
I'm sending the request in cli for test as follows:
curl -F 'file=#demo.mp4' http://localhost:5000/processvideo
I want to process the incoming video frame by frame, so I need the frames as an image. Thanks from now for any help.
I have small-sized sound files stored in MongoDB as BSON.
Task is to retrieve Binary data from the database, convert it to an appropriate format and send back to the front end.
The problem is with the converting. I have found pydub can be used for this.
My code is as follows
query_param = json_data['retriever']
query_param1 = query_param.replace('"', "");
data = db.soundData
y = data.find_one({'name': query_param1})
s = y['data'] // here I retrieve the binary data
AudioSegment.from_file(s).export(x, format="mp3")
return send_file(x, 'audio/mp3')
The question is with Audiosegment line as it does not follow the standard of
AudioSegment.from_wav("/input/file.wav").export("/output/file.mp3", format="mp3")
and an error of 'bytes' object has no attribute 'read' is still thrown. Is it achievable with pydub?
AudioSegment.from_file() takes a file path or file-like object as it's first argument. Assuming you have the raw bytes of a whole wave file (including wave headers, not just the audio data) then you can:
import io
s = io.BytesIO(y['data'])
AudioSegment.from_file(s).export(x, format='mp3')
If you only have the bytes of the audio samples you would need to know some metadata about your audio data:
AudioSegment(y['data'], sample_width=???, frame_rate=???, channels=???)
sample_width is the number of bytes in each sample (so for 16-bit/CD audio, you'd use 2)
frame_rate is number of samples/second (aka, sample rate, for CD audio it's 44100)
channels how many audio streams are there, stereo is 2, mono is 1, etc
I'm experiencing problems with image not being encoded properly in my custom multipart-form/data being posted.
After sending out the HTTP POST packet, I noticed the bytes representing the image is completely different. I did this comparison by capturing data packets for a working scenario (using the web browser) and using my python app.
There's no issues otherwise with how the multipart-form body is constructed, it's just that the image not being encoded properly for the body.
Here's what I did to open the image and prep it to be sent out:
image_data=(open('plane.jpg',mode='rb').read()) ## image_data is the jpeg in bytes ------------- fist bytes
body.append(str(image_data)) ## coverting the data to a string such that it can be appended to the body array. ------ bytes to string
body.append(CRLF)
body.append('--' + boundary + '--')
body.append(CRLF)
body=''.join(body)
## starting the post
unicode_data = body.encode('utf-8',errors='ignore') ## --------string encoded
multipart_header['content-length']=len(unicode_data)
req = urllib.request.Request('http://localhost/api/image/upload', data=unicode_data, headers=multipart_header) ## Packet sent here and the image section of the unicode_data looks wrong but the other sections look good.
Image being uploaded: http://tinypic.com/view.php?pic=5aq3w6&s=6
So what is the correct way to encode this image and append it as part of the body to be sent? I don't want to use any apis other than the ones that came with python 3.3. and would like to stay within urllib and urllib2
I tried appending the byte version of the image to the body but apparently string arrays can only contain strings which is why I created a new string with the image in bytes; I think this is where it goes down hill.
Thanks help is much appreciated!