unable to read some .wav files using scipy.io.wavread.read() - python

I am trying to read .wav file using scipy.io.wavread. It reads some file properly.
For some files its giving following error...
Warning (from warnings module):
File "D:\project\cardiocare-1.0\src\scipy\io\wavfile.py", line 121
warnings.warn("chunk not understood", WavFileWarning)
WavFileWarning: chunk not understood
Traceback (most recent call last):
File "D:\project\cardiocare-1.0\src\ccare\plot.py", line 37, in plot
input_data = read(p.bitfile)
File "D:\project\cardiocare-1.0\src\scipy\io\wavfile.py", line 119, in read
data = _read_data_chunk(fid, noc, bits)
File "D:\project\cardiocare-1.0\src\scipy\io\wavfile.py", line 56, in _read_data_chunk
data = data.reshape(-1,noc)
ValueError: total size of new array must be unchanged
Can any one suggest me any solution?

I use the below code to read wav files. I know it does not solve your problem, but maybee you could read your wav file with this code and maybe figure out what is wrong?
My experience is, that wav files sometimes contains "strange" things, that must be handled or removed.
Hope it helps you out
Rgds
Cyrex
import wave
import struct
def wavRead(fileN):
waveFile = wave.open(fileN, 'r')
NbChanels = waveFile.getnchannels()
data = []
for x in range(NbChanels):
data.append([])
for i in range(0,waveFile.getnframes()):
waveData = waveFile.readframes(1)
data[i%(NbChanels)].append(int(struct.unpack("<h", waveData)[0]))
RetAR = []
BitDebth = waveFile.getsampwidth()*8
for x in range(NbChanels):
RetAR.append(np.array(data[x]))
RetAR[-1] = RetAR[-1]/float(pow(2,(BitDebth-1)))
fs = waveFile.getframerate()
return RetAR,fs

Related

wave write function not working, what am I doing wrong?

I am trying to halve the existing sampling rate of a folder full of .wav files. This is the only way I have found to do it but it is not working. The read part works just fine up until f.close(), then the wave.write part causes the error.
import wave
import contextlib
import os
for file_name in os.listdir(os.getcwd()):
if file_name.endswith(".wav"):
with contextlib.closing(wave.open(file_name, 'rb')) as f:
rate = f.getframerate()
new_rate = rate/2
f.close()
with contextlib.closing(wave.open(file_name, 'wb')) as f:
rate = f.setframerate(new_rate)
This is the output when I run it.
Traceback (most recent call last):
File "C:\Users\hsash\OneDrive\Desktop\used AR1-20210513T223533Z-001 - Copy (2)\sounds\python code.py", line 36, in <module>
rate = f.setframerate(new_rate)
File "C:\Users\hsash\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 303, in __exit__
self.thing.close()
File "C:\Users\hsash\AppData\Local\Programs\Python\Python39\lib\wave.py", line 444, in close
self._ensure_header_written(0)
File "C:\Users\hsash\AppData\Local\Programs\Python\Python39\lib\wave.py", line 462, in _ensure_header_written
raise Error('# channels not specified')
wave.Error: # channels not specified
It says right there that #channels not specified. When you are opening a wavefile for writing, python sets all of the header fields to zero irrespectively of the current state of the file.
In order to make sure that the other fields are saved you need to copy them over from the old file when you read it the first time.
In the snippet below I'm using getparams and setparams to copy the header fields over and I'm using readframes and writeframes to copy the wave data.
import wave
import contextlib
import os
for file_name in os.listdir(os.getcwd()):
if file_name.endswith(".wav"):
with contextlib.closing(wave.open(file_name, 'rb')) as f:
rate = f.getframerate()
params = f.getparams()
frames = f.getnframes()
data = f.readframes(frames)
new_rate = rate/2
f.close()
with contextlib.closing(wave.open(file_name, 'wb')) as f:
f.setparams(params)
f.setframerate(new_rate)
f.writeframes(data)

Python OSError: [Errno 9] Bad file descriptor after opening big json file

I just tried to read in a big json file (the Wikipedia json dump) in Python line by line and got the Error:
Traceback (most recent call last):
File "C:/.../test_json_wiki_file.py", line 19, in <module>
test_fct()
File "C:/.../test_json_wiki_file.py", line 12, in test_fct
for line in f:
OSError: [Errno 9] Bad file descriptor
Here is my code:
import json
def test_fct():
data = []
i = 0
with open('E:/.../20200713.json/20200713.json') as f:
for line in f:
data.append(json.loads(line))
i = i + 1
if i > 1:
input_file.close()
return data
test_data = test_fct()
The file size is around 700GB and the description (https://www.wikidata.org/wiki/Wikidata:Database_download) of the file states that it can be read line by line. I don't know if this is important but the E:/ hard drive is an external one.
Thank you for your help in advance :)
I don't have any firsthand knowledge on opening large files in python, but did you mean to have the path as 20200713.json/20200713.json. Is the first one actually a directory that has a .json extension? I'd also suggest trying to first load a smaller sample of the file (opening might be hard, so maybe just use the more command in terminal?).

Reading TDMS File with python nptdms, cannot open tdms file

I am having issues with getting basic function of the nptdms module working.
First, I am just trying to open a TDMS file and print the contents of specific channels within specific groups.
Using python 2.7 and the nptdms quick start here
Following this, I will be writing these specific pieces of data into a new TDMS file. Then, my ultimate goal is to be able to take a set of source files, open each, and write (append) to a new file. The source data files contain far more information that is needed, so I am breaking out the specifics into their own file.
The problem I have is that I cannot get past a basic error.
When running this code, I get:
Traceback (most recent call last):
File "PullTDMSdataIntoNewFile.py", line 27, in <module>
tdms_file = TdmsFile(r"C:\\Users\daniel.worts\Desktop\this_is_my_tdms_file.tdms","r")
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 94, in __init__
self._read_segments(f)
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 119, in _read_segments
object._initialise_data(memmap_dir=self.memmap_dir)
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 709, in _initialise_data
mode='w+b', prefix="nptdms_", dir=memmap_dir)
File "C:\Anaconda2\lib\tempfile.py", line 475, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
File "C:\Anaconda2\lib\tempfile.py", line 244, in _mkstemp_inner
fd = _os.open(file, flags, 0600)
OSError: [Errno 2] No such file or directory: 'r\\nptdms_yjfyam'
Here is my code:
from nptdms import TdmsFile
import numpy as np
import pandas as pd
#set Tdms file path
tdms_file = TdmsFile(r"C:\\Users\daniel.worts\Desktop\this_is_my_tdms_file.tdms","r")
# set variable for TDMS groups
group_nameone = '101'
group_nametwo = '752'
# set objects for TDMS channels
channel_dataone = tdms_file.object(group_nameone 'Payload_1')
channel_datatwo = tdms_file.object(group_nametwo, 'Payload_2')
# set data from channels
data_dataone = channel_dataone.data
data_datatwo = channel_datatwo.data
print data_dataone
print data_datatwo
Big thanks to anyone who may have encountered this before and can help point to what I am missing.
Best,
- Dan
edit:
Solved the read data issue by removing the 'r' argument from the file path.
Now I am having another error I can't trace when trying to write.
from nptdms import TdmsFile, TdmsWriter, RootObject, GroupObject, ChannelObject
import numpy as np
import pandas as pd
newfilepath = r"C:\\Users\daniel.worts\Desktop\Mined.tdms"
datetimegroup101_channel_object = ChannelObject('101', DateTime, data_datetimegroup101)
with TdmsWriter(newfilepath) as tdms_writer:
tdms_writer.write_segment([datetimegroup101_channel_object])
Returns error:
Traceback (most recent call last):
File "PullTDMSdataIntoNewFile.py", line 82, in <module>
tdms_writer.write_segment([datetimegroup101_channel_object])
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 68, in write_segment
segment = TdmsSegment(objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 88, in __init__
paths = set(obj.path for obj in objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 88, in <genexpr>
paths = set(obj.path for obj in objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 254, in path
self.channel.replace("'", "''"))
AttributeError: 'TdmsObject' object has no attribute 'replace'

pydub - memoryerror

I am trying to split large podcasts mp3 files into smaller 5 minute chunks using python and the pydub library. This is my code:
folder = r"C:\temp"
filename = r"p967.mp3"
from pydub import AudioSegment
sound = AudioSegment.from_mp3(folder + "\\" + filename)
This works fine for small files but for the large podcasts i am interested in 100mb +. This returns the following error.
Traceback (most recent call last):
File "C:\temp\mp3split.py", line 6, in <module>
sound = AudioSegment.from_mp3(folder + "\\" + filename)
File "C:\Python27\lib\site-packages\pydub\audio_segment.py", line 522, in from_mp3
return cls.from_file(file, 'mp3', parameters)
File "C:\Python27\lib\site-packages\pydub\audio_segment.py", line 511, in from_file
obj = cls._from_safe_wav(output)
File "C:\Python27\lib\site-packages\pydub\audio_segment.py", line 544, in _from_safe_wav
return cls(data=file)
File "C:\Python27\lib\site-packages\pydub\audio_segment.py", line 146, in __init__
data = data if isinstance(data, (basestring, bytes)) else data.read()
MemoryError
Is this a limitation of the library? should i be using an alternative approach to achieve this?
If i add the follwoing code to check the memory status at the point of running.
import psutil
print psutil.virtual_memory()
This prints:
svmem(total=8476975104L, available=5342715904L, percent=37.0 used=3134259200L, free=5342715904L)
This suggests to me that there is plenty of memory at the start of the operation, though I am happy to be proven wrong.
Yes, the most likely cause is that you've simply run out of available memory. Do you know how much memory you have available before you execute that statement? Consider inserting a system call (see the os package) just before the failing statement.

Use codecs to read file with correct encoding: TypeError

I need to read from a file, linewise. Also also need to make sure the encoding is correctly handled.
I wrote the following code:
#!/bin/bash
import codecs
filename = "something.x10"
f = open(filename, 'r')
fEncoded = codecs.getreader("ISO-8859-15")(f)
totalLength = 0
for line in fEncoded:
totalLength+=len(line)
print("Total Length is "+totalLength)
This code does not work on all files, on some files I get a
Traceback (most recent call last):
File "test.py", line 11, in <module>
for line in fEncoded:
File "/usr/lib/python3.2/codecs.py", line 623, in __next__
line = self.readline()
File "/usr/lib/python3.2/codecs.py", line 536, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib/python3.2/codecs.py", line 480, in read
data = self.bytebuffer + newdata
TypeError: can't concat bytes to str
Im using python 3.3 and the script must work with this python version.
What am I doing wrong, I was not able to find out which files work and which not, even some plain ASCII files fail.
You are opening the file in non-binary mode. If you read from it, you get a string decoded according to your default encoding (http://docs.python.org/3/library/functions.html?highlight=open%20builtin#open).
codec's StreamReader needs a bytestream (http://docs.python.org/3/library/codecs#codecs.StreamReader)
So this should work:
import codecs
filename = "something.x10"
f = open(filename, 'rb')
f_decoded = codecs.getreader("ISO-8859-15")(f)
totalLength = 0
for line in f_decoded:
total_length += len(line)
print("Total Length is "+total_length)
or you can use the encoding parameter on open:
f_decoded = open(filename, mode='r', encoding='ISO-8859-15')
The reader returns decoded data, so I fixed your variable name. Also, consider pep8 as a guide for formatting and coding style.

Categories