Got an error during UBM speaker-adaptation with sidekit - python

I've already trained a UBM model and now I'm trying to implement the speaker-adaptation when I got following error.
Exception: show enroll/something.wav is not in the HDF5 file
I got two files "enroll" and "test" under the file "feat" which contains respectively features(.h5) for training and test, and my enroll_idmap is generated with the audios(.wav) only for training. And, my wav files and feat files are separated. I think I got a problem of idmap. "enroll/something.wav" is the rightid of my enroll_idmap, but what does that "HDF5 file" refer to?
Could anyone tell me what this error means and how to fix it?
Here's the code of my enroll_idmap
def __init__(self):
BASE_DIR = "./Database/sidekit_data"
self.AUDIO_DIR = os.path.join(BASE_DIR, "audio")
self.FEATURE_DIR = os.path.join(BASE_DIR, "feat")
self.TASK_DIR = os.path.join(BASE_DIR, "task")
def create_idMap(self, group):
# Make enrollment (IdMap) file list
group_dir = os.path.join(self.AUDIO_DIR, group) # enrollment data directory
group_files = os.listdir(group_dir)
group_models = [files.split('_')[0] for files in group_files] # list of model IDs
group_segments = [group+"/"+f for f in group_files]
# Generate IdMap
group_idmap = sidekit.IdMap()
group_idmap.leftids = np.asarray(group_models)
group_idmap.rightids = np.asarray(group_segments)
group_idmap.start = np.empty(group_idmap.rightids.shape, '|O')
group_idmap.stop = np.empty(group_idmap.rightids.shape, '|O')
if group_idmap.validate():
group_idmap.write(os.path.join(self.TASK_DIR, group+'_idmap.h5'))
else:
raise RuntimeError('Problems with creating idMap file')
And after that I got enroll_idmap and test_idmap with :
create_idMap("enroll")
create_idMap("test")
And here's the code of speaker-adaptation, the error above comes out during the execution of enroll_stat.accumulate_stat(…):
BASE_DIR = "./Database/sidekit_data"
enroll_idmap = sidekit.IdMap.read(os.path.join(BASE_DIR, "task", "enroll_idmap.h5"))
ubm = sidekit.Mixture()
model_name = "ubm_{}.h5".format(NUM_GUASSIANS)
ubm.read(os.path.join(BASE_DIR, "ubm", model_name))
server_eval = sidekit.FeaturesServer(feature_filename_structure="./Database/sidekit_data/feat/{}.h5",
...
...)
print("Compute the sufficient statistics")
enroll_stat.accumulate_stat(ubm=ubm,
feature_server=server_eval,
seg_indices=range(enroll_stat.segset.shape[0]),
num_thread=nbThread
)
This seems not to be a big problem but it stops me for a few days, help please.

I finally got this problem fixed by changing the path of training and test features, making it outside the "BASEDIR"
server_eval = sidekit.FeaturesServer(feature_filename_structure="./enroll/{}.h5",
...)

Related

AzureML Python SDK OutputFileDatasetConfig and Datastore

I am new to Azure and dealing with all these paths is proving to be extremely challenging. I am trying to create a pipeline that contains a dataprep.py step and an AutoML step. What i want to do is (after passing the input to the dataprep block and performing several operations on it) to save the resulting tabulardataset in the datastore and have it as an output to then be able to reuse in my train block.
My dataprep.py file
-----dataprep stuff and imports
parser = argparse.ArgumentParser()
parser.add_argument("--month_train", required=True)
parser.add_argument("--year_train", required=True)
parser.add_argument('--output_path', dest = 'output_path', required=True)
args = parser.parse_args()
run = Run.get_context()
ws = run.experiment.workspace
datastore = ws.get_default_datastore()
name_dataset_input = 'Customer_data_'+str(args.year_train)
name_dataset_output = 'DATA_PREP_'+str(args.year_train)+'_'+str(args.month_train)
# get the input dataset by name
ds = Dataset.get_by_name(ws, name_dataset_input)
df = ds.to_pandas_dataframe()
# apply is one of my dataprep functions that i defined earlier
df = apply(df, args.mois_train)
# this is where i am having issues, I want to save this in the datastore but also have it as output
ds = Dataset.Tabular.register_pandas_dataframe(df, args.output_path ,name_dataset_output)
The pipeline block instructions.
from azureml.data import OutputFileDatasetConfig
from azureml.pipeline.steps import PythonScriptStep
prepped_data_path = OutputFileDatasetConfig(name="output_path", destination = (datastore, 'managed-dataset/{run-id}/{output-name}'))
dataprep_step = PythonScriptStep(
name="dataprep",
script_name="dataprep.py",
compute_target=compute_target,
runconfig=aml_run_config,
arguments=["--output_path", prepped_data_path, "--month_train", month_train,"--year_train",year_train],
allow_reuse=True

Error when using a custom dataset with fastai

I am getting an error when trying to use my custom fastai dataset
The error:
Exception: Can't infer the type of your targets.
It's either because your data source is empty or because your labeling function raised an error.
The code:
from fastai import *
from fastai.vision import *
class URL:
MURDERHORNETS = f"https://superdata.quinniboi10.repl.co/MurderHornetImages"
path = untar_data(URL.MURDERHORNETS)
'''
path = untar_data(URLs.PETS)
files = get_image_files(path)
import PIL
img = PIL.Image.open(files[0])
img
'''
fnames = get_image_files(path)
fnames[:5]
np.random.seed (2)
pat = r'/([^/]+)_\d+\.(png|jpg|jpeg)$'
data = ImageDataBunch.from_folder(path, train=path, test=None, valid_pct=0.2,
ds_tfms=get_transforms(),
size=160)
data.normalize (imagenet_stats)
data.show_batch(rows=3, figsize=(7,6))
print (data.classes)
len (data.classes),data.c
learn = cnn_learner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(5)
learn.save ('stage-1')
The dataset is here, don't comment on the name, I don't know why that is what I chose :/
Get the zip file of the dataset here

Invalid Syntax while importing dataset

So I have a script that trains on video files, and so have a dataset.py file to work on it.
Whenever I am trying to run the script it throws an SyntaxError: invalid syntax error even though I have
1) set the path correctly
2) gave the whole path
My folder structure is /home/Videos/project_folder_name/data/train
and my script is
class Dataset:
def __init__(self,
folder:/data/train,
resize:(int, int),
batch_size:int,
timesteps:int,
windowsteps:int,
shift:int,
train:bool):
self.folder = folder
self.resize = resize
self.batch_size = batch_size
self.timesteps = timesteps
self.train = train
self.images = sorted(os.listdir(folder + 'images/'))
self.labels = open(folder + 'labels.txt').readlines()
self.data = self._sliding_window(self.images, shift, windowsteps)
This error has been troubling a lot.
After putting it into SublimeText3 (I highly recommend you use a text editor which will help you find syntax errors like this)
Most of your errors boil down to not properly indenting the lines. I was able to complete my build once it was formatted like this
class Dataset:
def __init__(self,
folder:data/train,
resize:(int, int),
batch_size:int,
timesteps:int,
windowsteps:int,
shift:int,
train:bool):
self.folder = folder
self.resize = resize
self.batch_size = batch_size
self.timesteps = timesteps
self.train = train
self.images = sorted(os.listdir(folder + 'images/'))
self.labels = open(folder + 'labels.txt').readlines()
self.data = self._sliding_window(self.images, shift, windowsteps)
EDIT:
I know formatting on the website can be odd so if this code didn't solve your problem let me know and I'll delete or edit.

NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\username\\MYD06_L2.A2008001.0000.006.2013341193524.hdf'

I am using Windows 10 and running the code in Jupyter Notebook (in Chrome).
This is my code:
if __name__ == '__main__':
import itertools
MOD03_path = r"C:\Users\saviosebastian\MYD03.A2008001.0000.006.2012066122450.hdf"
MOD06_path = r"C:\Users\saviosebastian\MYD06_L2.A2008001.0000.006.2013341193524.hdf"
satellite = 'Aqua'
yr = [2008]
mn = [1] #np.arange(1,13)
dy = [1]
# latitude and longtitude boundaries of level-3 grid
lat_bnd = np.arange(-90,91,1)
lon_bnd = np.arange(-180,180,1)
nlat = 180
nlon = 360
TOT_pix = np.zeros(nlat*nlon)
CLD_pix = np.zeros(nlat*nlon)
### To use Spark in Python
spark = SparkSession\
.builder\
.appName("Aggregation")\
.getOrCreate()
filenames0=['']*500
i=0
for y,m,d in itertools.product(yr,mn,dy):
#-------------find the MODIS prodcts--------------#
date = datetime.datetime(y,m,d)
JD01, JD02 = gcal2jd(y,1,1)
JD1, JD2 = gcal2jd(y,m,d)
JD = np.int((JD2+JD1)-(JD01+JD02) + 1)
granule_time = datetime.datetime(y,m,d,0,0)
while granule_time <= datetime.datetime(y,m,d,23,55): # 23,55
print('granule time:',granule_time)
**[MOD03_fp = 'MYD03.A{:04d}{:03d}.{:02d}{:02d}.006.?????????????.hdf'.format(y,JD,granule_time.hour,granule_time.minute)][1]**
MOD06_fp = 'MYD06_L2.A{:04d}{:03d}.{:02d}{:02d}.006.?????????????.hdf'.format(y,JD,granule_time.hour,granule_time.minute)
MOD03_fn, MOD06_fn =[],[]
for MOD06_flist in os.listdir(MOD06_path):
if fnmatch.fnmatch(MOD06_flist, MOD06_fp):
MOD06_fn = MOD06_flist
for MOD03_flist in os.listdir(MOD03_path):
if fnmatch.fnmatch(MOD03_flist, MOD03_fp):
MOD03_fn = MOD03_flist
if MOD03_fn and MOD06_fn: # if both MOD06 and MOD03 products are in the directory
I am getting the following error:
Do you know any solution to this problem?
I can't give you a specific answer without knowledge of the directory system on your computer, but for now it's obvious that there is something wrong with the name of the directory that you are referencing. Use File Explorer to make sure that the directory actually exists, and also make sure that you haven't misspelled the name of the file, which could easily happen given the filename.
You are giving the full path along with file name. The os.listdir(path) method in python is used to get the list of all files and directories in the specified directory. If we don’t specify any directory, then list of files and directories in the current working directory will be returned.
You can just write "C:/Users/saviosebastian" in path.
Same goes for os.chdir("C:/Users/saviosebastian").

how to import my function to python file and get input from it?

I have a function called analyze() which is like following:
def analyze():
for stmt in irsb.statements:
if isinstance(stmt, pyvex.IRStmt.WrTmp):
wrtmp(stmt)
if isinstance(stmt, pyvex.IRStmt.Store):
address = stmt.addr
address1 = '{}'.format(address)[1:]
print address1
data = stmt.data
data1 = '{}'.format(data)[1:]
tmp3 = store64(address1, int64(data1))
if isinstance(stmt, pyvex.IRStmt.Put):
expr = stmt.expressions[0]
putoffset = stmt.offset
data = stmt.data
data4 = '{}'.format(data)[1:]
if (str(data).startswith("0x")):
#const_1 = ir.Constant(int64, data4)
tmp = put64(putoffset, ZERO_TAG)
else:
put64(putoffset, int64(data4))
if isinstance(stmt.data, pyvex.IRExpr.Const):
reg_name = irsb.arch.translate_register_name(stmt.offset, stmt.data.result_size(stmt.data.tag))
print reg_name
stmt.pp()
This code function gets following input and try to analyze it:
CODE = b"\xc1\xe0\x05"
irsb = pyvex.block.IRSB(CODE, 0x80482f0, archinfo.ArchAMD64())
When this input is in the same file in my code (lets call the whole as analyze.py) it works and python analyze.py will make me an output. However, I want to make a seperate file(call array.py), call analyze there and also put the inputs inside it and run python array.py to get the same result. I did the following for array.py:
from analyze import analyze
CODE = b"\xc1\xe0\x05"
irsb = pyvex.block.IRSB(CODE, 0x80482f0, archinfo.ArchAMD64())
analyze()
However, when I run the array.py, it stops me with error;
NameError: name 'CODE' is not defined
how can I resolve this problem? What is the solution?
A simple change in your function, add parameters:
def analyze(irsb): # irsb here called parameter
...
# The rest is the same
And then pass arguments when calling it:
from analyze import analyze
CODE = b"\xc1\xe0\x05"
irsb_as_arg = pyvex.block.IRSB(CODE, 0x80482f0, archinfo.ArchAMD64())
analyze(irsb_as_arg) # irsb_as_arg is an argument
I have just changed here irsb to irsb_as_arg to take attention, but it can be the same name

Categories