wave.error: # channels not specified - python

I am trying to edit length of wav files using the wave module. However it seems that I can't get anywhere because i keep getting the same error that number of channels is not specified. Still, when i write something to see the number of channels i still get that error or when i try to set the number of channels as seen here:
def editLength(wavFile):
file = wave.open(wavFile, 'w')
file.setnchannels(file.getnchannels())
x = file.getnchannels()
print (x)

from https://docs.python.org/3.7/library/wave.html#wave.open
wave.open(file, mode=None)
If file is a string, open the file by that name, otherwise treat it as a file-like
object.
mode can be:
'rb' Read only mode.
'wb' Write only mode.
Note that it does not allow read/write WAV files.
You attempt to read and write from a WAV file, the file object has at the time of the first file.getnchannels() not specified the number of channels.
def editLength(wavFile):
with open(wavFile, "rb") as file:
x = file.getnchannels()
print(x)
if you want to edit the file you should first read from the original file and write to a temporary file. then copy the temporary file over the original file.

It's maybe not super obvious from the docs: https://docs.python.org/3/library/wave.html
The Wave_write object expects you to explicitly set all params for the object.
After a little trial and error I was able to read my wav file and write a specific duration to disk.
For example if I have a 44.1k sample rate wav file with 2 channels...
import wave
with wave.open("some_wavfile.wav", "rb") as handle:
params = handle.getparams()
# only read the first 10 seconds of audio
frames = handle.readframes(441000)
print(handle.tell())
print(params)
params = list(params)
params[3] = len(frames)
print(params)
with wave.open("output_wavfile.wav", "wb") as handle:
handle.setparams(params)
handle.writeframes(frames)
This should leave you with an stdout looking something like this.
441000
_wave_params(nchannels=2, sampwidth=2, framerate=44100, nframes=10348480, comptype='NONE', compname='not compressed')
[2, 2, 44100, 1764000, 'NONE', 'not compressed']
nframes here is 1764000 probably because nchannels=2 and sampwidth=2, so 1764000/4=441000 (I guess)
Oddly enough, setparams was able to accept a list instead of a tuple.
ffprobe shows exactly 10 seconds of audio for the output file and sounds perfect to me.

Related

get size of compressed file while compressing

I currently try to create a module that writes a *.gz file up to a specific size. I want to use it for a custom log handler to specify the maximum size of a zipped logfile. I already made my way through the gzip documentation and also the zlib documentation.
I could use zlib right away and measure the length of my compressed bytearray, but then I would have to create and write the gzip file header by myself. The zlib-documentaion itself says: For reading and writing .gz files see the gzip module..
But I do not see any option for getting the size of the compressed file in the gzip module.
the logfile opened via logfile = gzip.open("test.gz", "ab", compresslevel=6) does have a .size parameter, but this is the size of the original file, not the compressed file.
Also os.path.getsize("test.gz") is zero until logfile is closed and is actually written to the disk.
Do you have any idea how I can use the built-in gzip module to close a compressed file once it reached a certain size? Without closing and re-opening it all the time?
Or is this even possible?
Thanks for any help on this!
Update:
It is not true that no data is written to disk until the file is closed, it just takes some time to collect some kilobytes before the filesize changes. This is good enogh for me and my usecase, so this is solved. Thanks for any input!
My test code for this:
import os
import gzip
import time
data = 'Hello world'
limit = 10000
i = 0
logfile = gzip.open("test.gz", "wb", compresslevel=6)
while i < limit:
msg = f"{data} {str(i)} \n"
logfile.write(msg.encode("utf-8"))
print(os.path.getsize("test.gz"))
print(logfile.size)
if i > 1000:
logfile.flush()
break
#time.sleep(0.03)
i += 1
logfile.close()
print(f"final size of *.gz file: {os.path.getsize('test.gz')}")
print(f"final size of logfile object file: {logfile.size}")
gzip does not actually compress the file until after you close it so it does no really make sense to ask to know the size of the compressed file beforehand. One thing you could do is look at the size of compressed files you obtain on real data from your use case and do a linear regression to have some kind of approximation of the compression ratio.

How do I read in wav files in .gz?

I am learning machine learning and data analysis on wav files.
I know if I have wav files directly I can do something like this to read in the data
import librosa
mono, fs = librosa.load('./small_data/time_series_audio.wav', sr = 44100)
Now I'm given a gz-file "music_feature_extraction_test.tar.gz"
I'm not sure what to do now.
I tried:
with gzip.open('music_train.tar.gz', 'rb') as f:
for files in f :
mono, fs = librosa.load(files, sr = 44100)
but it gives me:
TypeError: lstat() argument 1 must be encoded string without null bytes, not str
Can anyone help me out?
There are several things going on:
The file you are given is a gzipped-compressed tarball. Take a look at the tarfile module, it can read gzip-compressed files directly. You'll get an iterator over it's members, each of which is an individual file.
AFAIKS librosa can't read from an in-memory buffer so you have to unpack the tar-members to temporary files. The tempfile-module is your friend here, a NamedTemporaryFile will provide you with a self-deleting file that you can uncompress to and provide to librosa.
You probably want to implement this as a simple generator function that takes the tarfile-name as it's input, iterates over it's members and yields what librosa.load() provides you. That way everything gets cleaned up automatically.
The basic loop would therefore be
Open the tarball using the tarfile-module. For each member
Get a new temporary file using NamedTemporaryFile. Copy the content of the tarball-member to that file. You may want to use shutil.copyfileobj to avoid reading the entire wav-file into memory before writing it to disk.
The NamedTemporaryFile has a filename-attribute. Pass that to librosa.open.
yield the return value of librosa.open to the caller.
You can use PySoundFile to read from the compressed file.
https://pysoundfile.readthedocs.io/en/0.9.0/#virtual-io
import soundfile
with gzip.open('music_train.tar.gz', 'rb') as gz_f:
for file in gz_f :
fs, mono = soundfile.read(file, samplerate=44100)
Maybe you should also check if you need to resample the data before processing it with librosa:
https://librosa.github.io/librosa/ioformats.html#read-specific-formats

How can I truncate an mp3 audio file by 30%?

I am trying to truncate an audio file by 30%, if the audio file was 4 minutes long, after truncating it, it should be around 72 seconds. I have written the code below to do it but it only returns a 0 byte file size. Please tell me where i went wrong?
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = len(in_file.read())
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(data)
out_file.write(in_file.read()[:ndata])
def newBytes(bytes):
newLength = (bytes/100) * 30
return int(newLength)
loadFile()
You are trying to read your file a second time which will result in no data, e.g. len(in_file.read(). Instead read the whole file into a variable and then calculate the length of that. The variable can then be used a second time.
def newBytes(bytes):
return (bytes * 70) / 100
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = in_file.read()
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(len(data))
out_file.write(data[:ndata])
Also it is better to multiply first and then divide to avoid having to work with floating point numbers.
You cannot reliably truncate an MP3 file by byte size and expect it to be equivalently truncated in audio time length.
MP3 frames can change bitrate. While your method will sort of work, it won't be all that accurate. Additionally, you'll undoubtedly break frames leaving glitches at the end of the file. You will also lose ID3v1 tags (if you still use them... better to use ID3v2 anyway).
Consider executing FFmpeg with -acodec copy instead. This will simply copy the bytes over while maintaining the integrity of the file, and ensuring a good clean cut where you want it to be.

Reading 32 Bytes of file at a time

Working on reading and interpreting FAT12 directory entries. I am trying to determine how to read 32 bytes of the file at a time. So far I have the following:
f_name = sys.argv[1] #set file name as the argument to be passed in command line
with open(f_name, mode='rb') as file:
data = file.read()
struct.unpack(,data[:])
Most of the things I have seen say to use struct.unpack() I have looked at the documentation on this and I am having trouble understanding how to use it. Is there an easier way to read 32 bytes at a time until I have read a full 512 bytes?
file.read() takes the upper limit of bytes/characters to read and advance the read pointer by:
data = file.read(32)

Stop Python Script from Writing to File after it reaches a certain size in linux

Somewhat new to Python and new to linux. I created a script that mines Twitter's streaming API. Script writes to a .csv file when things in the stream match my parameters.
I'd like to know if there's any way to stop my script once the file has reached 1 gig. I know cron can be used to time the script and everything, but I'm more concerned about the file size than the time it takes.
Thanks for your input and consideration.
In your case, you probably don't need os.stat and os.stat may give you a false size in some cases (namely buffers not flushing). Why not just use f.tell() to read the size with something like this
with open('out.txt', 'w', encoding='utf-8') as f:
csvfile = csv.writer(f)
maxsize = 1024 # max file size in bytes
for row in data():
csvfile.writerow(row)
if f.tell() > maxsize: # f.tell() gives byte offset, no need to worry about multiwide chars
break
Use python's os.stat() to get info on the file, then check the total number of bytes of the existing file (fileInfo.st_size) plus the size of the data you are about to write.
import os
fileInfo = os.stat('twitter_stream.csv')
fileSize = fileInfo.st_size
print fileSize
# Now get data from twitter
# determine number of bytes in data
# write data if file size + data bytes < 1GB

Categories