How to get the mfcc of song(.wav) using python?

How to get the mfcc of song(.wav) using python? - python

I have been trying to create mfcc of every file in my dataset.
I want to create a preprocessing fuction which takes the source_path as input and returns dictionary mydict with two keys label , mfcc.
I have tried to create the following function
def preprocess_data(source_path):
mydict = {
"labels" : [] ,
"mfcc" : []
}
music = ['reggae', 'jazz', 'country', 'hiphop', 'rock', 'metal', 'classical', 'disco', 'blues', 'pop']
path = 'Data/genres_original/'
i = 0
for n in music :
new_path = path + n
song = os.listdir(new_path)
for p in song:
final_path = new_path + "/" + p
melody , sr = librosa.load(final_path)
mfcc = librosa.feature.mfcc(melody, sr=sr, n_mfcc=13)
mydict["labels"].append(i)
mydict["mfcc"].append(mfcc.tolist())
i+= 1
return mydict
but this is not working intead it shows a warning PySoundFile failed. Trying audioread instead. and then gives error !
to resolve it have also installed ffmpeg but it didn't do anything
note : i am using google colab and expect a code that runs on it
here is the link

Related

How can i optimize my Python loop for speed

I wrote some code that uses OCR to extract text from screenshots of follower lists and then transfer them into a data frame.
The reason I have to do the hustle with "name" / "display name" and removing blank lines is that the initial text extraction looks something like this:
Screenname 1
name 1
Screenname 2
name 2
(and so on)
So I know in which order each extraction will be.
My code works well for 1-30 images, but if I take more than that its gets a bit slow. My goal is to run around 5-10k screenshots through it at once. I'm pretty new to programming so any ideas/tips on how to optimize the speed would be very appreciated! Thank you all in advance :)
from PIL import Image
from pytesseract import pytesseract
import os
import pandas as pd
from itertools import chain
list_final = [""]
list_name = [""]
liste_anzeigename = [""]
list_raw = [""]
anzeigename = [""]
name = [""]
sort = [""]
f = r'/Users/PycharmProjects/pythonProject/images'
myconfig = r"--psm 4 --oem 3"
os.listdir(f)
for file in os.listdir(f):
f_img = f+"/"+file
img = Image.open(f_img)
img = img.crop((240, 400, 800, 2400))
img.save(f_img)
for file in os.listdir(f):
f_img = f + "/" + file
test = pytesseract.image_to_string(PIL.Image.open(f_img), config=myconfig)
lines = test.split("\n")
list_raw = [line for line in lines if line.strip() != ""]
sort.append(list_raw)
name = {list_raw[0], list_raw[2], list_raw[4],
list_raw[6], list_raw[8], list_raw[10],
list_raw[12], list_raw[14], list_raw[16]}
list_name.append(name)
anzeigename = {list_raw[1], list_raw[3], list_raw[5],
list_raw[7], list_raw[9], list_raw[11],
list_raw[13], list_raw[15], list_raw[17]}
liste_anzeigename.append(anzeigename)
reihenfolge_name = list(chain.from_iterable(list_name))
index_anzeigename = list(chain.from_iterable(liste_anzeigename))
sortieren = list(chain.from_iterable(sort))
print(list_raw)
sort_name = sorted(reihenfolge_name, key=sortieren.index)
sort_anzeigename = sorted(index_anzeigename, key=sortieren.index)
final = pd.DataFrame(zip(sort_name, sort_anzeigename), columns=['name', 'anzeigename'])
print(final)

Use a multiprocessing.Pool.
Combine the code under the for-loops, and put it into a function process_file.
This function should accept a single argument; the name of a file to process.
Next using listdir, create a list of files to process.
Then create a Pool and use its map method to process the list;
import multiprocessing as mp
def process_file(name):
# your code goes here.
return anzeigename # Or watever the result should be.
if __name__ is "__main__":
f = r'/Users/PycharmProjects/pythonProject/images'
p = mp.Pool()
liste_anzeigename = p.map(process_file, os.listdir(f))
This will run your code in parallel in as many cores as your CPU has.
For a N-core CPU this will take approximately 1/N times the time as doing it without multiprocessing.
Note that the return value of the worker function should be pickleable; it has to be returned from the worker process to the parent process.

Error when using a custom dataset with fastai

I am getting an error when trying to use my custom fastai dataset
The error:
Exception: Can't infer the type of your targets.
It's either because your data source is empty or because your labeling function raised an error.
The code:
from fastai import *
from fastai.vision import *
class URL:
MURDERHORNETS = f"https://superdata.quinniboi10.repl.co/MurderHornetImages"
path = untar_data(URL.MURDERHORNETS)
'''
path = untar_data(URLs.PETS)
files = get_image_files(path)
import PIL
img = PIL.Image.open(files[0])
img
'''
fnames = get_image_files(path)
fnames[:5]
np.random.seed (2)
pat = r'/([^/]+)_\d+\.(png|jpg|jpeg)$'
data = ImageDataBunch.from_folder(path, train=path, test=None, valid_pct=0.2,
ds_tfms=get_transforms(),
size=160)
data.normalize (imagenet_stats)
data.show_batch(rows=3, figsize=(7,6))
print (data.classes)
len (data.classes),data.c
learn = cnn_learner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(5)
learn.save ('stage-1')
The dataset is here, don't comment on the name, I don't know why that is what I chose :/
Get the zip file of the dataset here

How can I save the headers and values in Html <script> as a table in the csv file?

I'm new to writing code. Using slenium and beautifulsoup, I managed to reach the script I want among dozens of scripts on the web page. I am looking for script [17]. When these codes are executed, the script [17] gives a result as follows.
the last part of my codes
html=driver.page_source
soup=BeautifulSoup(html, "html.parser")
scripts=soup.find_all("script")
x=scripts[17]
print(x)
result, output
note: The list of dates is ahead of the script [17]. slide the bar. Dummy Data
Dummy Data
<script language="JavaScript"> var theHlp='/yardim/matris.asp';var theTitle = 'Piyasa Değeri';var theCaption='Cam (TL)';var lastmod = '';var h='<a class=hisselink href=../Hisse/HisseAnaliz.aspx?HNO=';var e='<a class=hisselink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('Hisse',4,50);theCols[1] = new Array('2012.12',1,60);theCols[2] = new Array('2013.03',1,60);theCols[3] = new Array('2013.06',1,60);theCols[4] = new Array('2013.09',1,60);theCols[5] = new Array('2013.12',1,60);theCols[6] = new Array('2014.03',1,60);theCols[7] = new Array('2014.06',1,60);theCols[8] = new Array('2014.09',1,60);theCols[9] = new Array('2014.12',1,60);theCols[10] = new Array('2015.03',1,60);theCols[11] = new Array('2015.06',1,60);theCols[12] = new Array('2015.09',1,60);theCols[13] = new Array('2015.12',1,60);theCols[14] = new Array('2016.03',1,60);theCols[15] = new Array('2016.06',1,60);theCols[16] = new Array('2016.09',1,60);theCols[17] = new Array('2016.12',1,60);theCols[18] = new Array('2017.03',1,60);theCols[19] = new Array('2017.06',1,60);theCols[20] = new Array('2017.09',1,60);theCols[21] = new Array('2017.12',1,60);theCols[22] = new Array('2018.03',1,60);theCols[23] = new Array('2018.06',1,60);theCols[24] = new Array('2018.09',1,60);theCols[25] = new Array('2018.12',1,60);theCols[26] = new Array('2019.03',1,60);theCols[27] = new Array('2019.06',1,60);theCols[28] = new Array('2019.09',1,60);theCols[29] = new Array('2019.12',1,60);theCols[30] = new Array('2020.03',1,60);var theRows = new Array();
theRows[0] = new Array ('<b>'+h+'30>ANA</B></a>','1,114,919,783.60','1,142,792,778.19','1,091,028,645.38','991,850,000.48','796,800,000.38','697,200,000.34','751,150,000.36','723,720,000.33','888,000,000.40','790,320,000.36','883,560,000.40','927,960,000.42','737,040,000.33','879,120,000.40','914,640,000.41','927,960,000.42','1,172,160,000.53','1,416,360,000.64','1,589,520,000.72','1,552,500,000.41','1,972,500,000.53','2,520,000,000.67','2,160,000,000.58','2,475,000,000.66','2,010,000,000.54','2,250,000,000.60','2,077,500,000.55','2,332,500,000.62','3,270,000,000.87','2,347,500,000.63');
theRows[1] = new Array ('<b>'+h+'89>DEN</B></a>','55,200,000.00','55,920,000.00','45,960,000.00','42,600,000.00','35,760,000.00','39,600,000.00','40,200,000.00','47,700,000.00','50,460,000.00','45,300,000.00','41,760,000.00','59,340,000.00','66,600,000.00','97,020,000.00','81,060,000.00','69,300,000.00','79,800,000.00','68,400,000.00','66,900,000.00','66,960,000.00','71,220,000.00','71,520,000.00','71,880,000.00','60,600,000.00','69,120,000.00','62,640,000.00','57,180,000.00','89,850,000.00','125,100,000.00','85,350,000.00');
theRows[2] = new Array ('<b>'+h+'269>SIS</B></a>','4,425,000,000.00','4,695,000,000.00','4,050,000,000.00','4,367,380,000.00','4,273,120,000.00','3,644,720,000.00','4,681,580,000.00','4,913,000,000.00','6,188,000,000.00','5,457,000,000.00','6,137,000,000.00','5,453,000,000.00','6,061,000,000.00','6,954,000,000.00','6,745,000,000.00','6,519,000,000.00','7,851,500,000.00','8,548,500,000.00','9,430,000,000.00','9,225,000,000.00','10,575,000,000.00','11,610,000,000.00','9,517,500,000.00','13,140,000,000.00','12,757,500,000.00','13,117,500,000.00','11,677,500,000.00','10,507,500,000.00','11,857,500,000.00','9,315,000,000.00');
theRows[3] = new Array ('<b>'+h+'297>TRK</B></a>','1,692,579,200.00','1,983,924,800.00','1,831,315,200.00','1,704,000,000.00','1,803,400,000.00','1,498,100,000.00','1,803,400,000.00','1,884,450,000.00','2,542,160,000.00','2,180,050,000.00','2,069,200,000.00','1,682,600,000.00','1,619,950,000.00','1,852,650,000.00','2,040,600,000.00','2,315,700,000.00','2,641,200,000.00','2,938,800,000.00','3,599,100,000.00','4,101,900,000.00','5,220,600,000.00','5,808,200,000.00','4,689,500,000.00','5,375,000,000.00','3,787,500,000.00','4,150,000,000.00','3,662,500,000.00','3,712,500,000.00','4,375,000,000.00','3,587,500,000.00');
var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script>
My purpose is to extract this output into a table and save it as a csv file. How can i extract this script as i want?
all dates should be on top, all names should be on the far right, all values should be between the two.
Hisse 2012.12 2013.3 2013.4 ...
ANA 1,114,919,783.60 1,142,792,778.19 1,091,028,645.38 ...
DEN 55,200,000.00 55,920,000.00 45,960,000.00 ....
.
.
.

Solution
The custom-function process_scripts() will produce what you are looking for. I am using the dummy data given below (at the end). First we check that the code does what it is expected and so we create a pandas dataframe to see the output.
You could also open this Colab Jupyter Notebook and run it on Cloud for free. This will allow you to not worry about any installation or setup and simply focus on examining the solution itself.
1. Processing A Single Script
## Define CSV file-output folder-path
OUTPUT_PATH = './output'
## Process scripts
dfs = process_scripts(scripts = [s],
output_path = OUTPUT_PATH,
save_to_csv = False,
verbose = 0)
print(dfs[0].reset_index(drop=True))
Output:
Name 2012.12 ... 2019.12 2020.03
0 ANA 1,114,919,783.60 ... 3,270,000,000.87 2,347,500,000.63
1 DEN 55,200,000.00 ... 125,100,000.00 85,350,000.00
2 SIS 4,425,000,000.00 ... 11,857,500,000.00 9,315,000,000.00
3 TRK 1,692,579,200.00 ... 4,375,000,000.00 3,587,500,000.00
[4 rows x 31 columns]
2. Processing All the Scripts
You can process all your scripts using the custom-define function process_scripts(). The code is given below.
## Define CSV file-output folder-path
OUTPUT_PATH = './output'
## Process scripts
dfs = process_scripts(scripts,
output_path = OUTPUT_PATH,
save_to_csv = True,
verbose = 0)
## To clear the output dir-contents
#!rm -f $OUTPUT_PATH/*
I did this on Google Colab and it worked as expected.
3. Making Paths in OS-agnostic Manner
Making paths for windows or unix based systems could be very different. The following shows you a method to achieve that without having to worry about which OS you will run the code. I have used the os library here. However, I would suggest you to look at the Pathlib library as well.
# Define relative path for output-folder
OUTPUT_PATH = './output'
# Dynamically define absolute path
pwd = os.getcwd() # present-working-directory
OUTPUT_PATH = os.path.join(pwd, os.path.abspath(OUTPUT_PATH))
4. Code: custom function process_scripts()
Here we use the regex (regular expression) library, along with pandas for organizing the data in a tabular format and then writing to csv file. The tqdm library is used to give you a nice progressbar while processing multiple scripts. Please see the comments in the code to know what to do if you are running it not from a jupyter notebook. The os library is used for path manipulation and creation of output-directory.
#pip install -U pandas
#pip install tqdm
import pandas as pd
import re # regex
import os
from tqdm.notebook import tqdm
# Use the following line if not using a jupyter notebook
# from tqdm import tqdm
def process_scripts(scripts,
output_path='./output',
save_to_csv: bool=False,
verbose: int=0):
"""Process all scripts and return a list of dataframes and
optionally write each dataframe to a CSV file.
Parameters
----------
scripts: list of scripts
output_path (str): output-folder-path for csv files
save_to_csv (bool): default is False
verbose (int): prints output for verbose>0
Example
-------
OUTPUT_PATH = './output'
dfs = process_scripts(scripts,
output_path = OUTPUT_PATH,
save_to_csv = True,
verbose = 0)
## To clear the output dir-contents
#!rm -f $OUTPUT_PATH/*
"""
## Define regex patterns and compile for speed
pat_header = re.compile(r"theCols\[\d+\] = new Array\s*\([\'](\d{4}\.\d{1,2})[\'],\d+,\d+\)")
pat_line = re.compile(r"theRows\[\d+\] = new Array\s*\((.*)\).*")
pat_code = re.compile("([A-Z]{3})")
# Calculate zfill-digits
zfill_digits = len(str(len(scripts)))
print(f'Total scripts: {len(scripts)}')
# Create output_path
if not os.path.exists(output_path):
os.makedirs(output_path)
# Define a list of dataframes:
# An accumulator of all scripts
dfs = []
## If you do not have tqdm installed, uncomment the
# next line and comment out the following line.
#for script_num, script in enumerate(scripts):
for script_num, script in enumerate(tqdm(scripts, desc='Scripts Processed')):
## Extract: Headers, Rows
# Rows : code (Name: single column), line-data (multi-column)
headers = script.strip().split('\n', 0)[0]
headers = ['Name'] + re.findall(pat_header, headers)
lines = re.findall(pat_line, script)
codes = [re.findall(pat_code, line)[0] for line in lines]
# Clean data for each row
lines_data = dict()
for line, code in zip(lines, codes):
line_data = line.replace("','", "|").split('|')
line_data[-1] = line_data[-1].replace("'", "")
line_data[0] = code
lines_data.update({code: line_data.copy()})
if verbose>0:
print('{}: {}'.format(script_num, codes))
## Load data into a pandas-dataframe
# and write to csv.
df = pd.DataFrame(lines_data).T
df.columns = headers
dfs.append(df.copy()) # update list
# Write to CSV
if save_to_csv:
num_label = str(script_num).zfill(zfill_digits)
script_file_csv = f'Script_{num_label}.csv'
script_path = os.path.join(output_path, script_file_csv)
df.to_csv(script_path, index=False)
return dfs
5. Dummy Data
## Dummy Data
s = """
<script language="JavaScript"> var theHlp='/yardim/matris.asp';var theTitle = 'Piyasa Değeri';var theCaption='Cam (TL)';var lastmod = '';var h='<a class=hisselink href=../Hisse/HisseAnaliz.aspx?HNO=';var e='<a class=hisselink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('Hisse',4,50);theCols[1] = new Array('2012.12',1,60);theCols[2] = new Array('2013.03',1,60);theCols[3] = new Array('2013.06',1,60);theCols[4] = new Array('2013.09',1,60);theCols[5] = new Array('2013.12',1,60);theCols[6] = new Array('2014.03',1,60);theCols[7] = new Array('2014.06',1,60);theCols[8] = new Array('2014.09',1,60);theCols[9] = new Array('2014.12',1,60);theCols[10] = new Array('2015.03',1,60);theCols[11] = new Array('2015.06',1,60);theCols[12] = new Array('2015.09',1,60);theCols[13] = new Array('2015.12',1,60);theCols[14] = new Array('2016.03',1,60);theCols[15] = new Array('2016.06',1,60);theCols[16] = new Array('2016.09',1,60);theCols[17] = new Array('2016.12',1,60);theCols[18] = new Array('2017.03',1,60);theCols[19] = new Array('2017.06',1,60);theCols[20] = new Array('2017.09',1,60);theCols[21] = new Array('2017.12',1,60);theCols[22] = new Array('2018.03',1,60);theCols[23] = new Array('2018.06',1,60);theCols[24] = new Array('2018.09',1,60);theCols[25] = new Array('2018.12',1,60);theCols[26] = new Array('2019.03',1,60);theCols[27] = new Array('2019.06',1,60);theCols[28] = new Array('2019.09',1,60);theCols[29] = new Array('2019.12',1,60);theCols[30] = new Array('2020.03',1,60);var theRows = new Array();
theRows[0] = new Array ('<b>'+h+'30>ANA</B></a>','1,114,919,783.60','1,142,792,778.19','1,091,028,645.38','991,850,000.48','796,800,000.38','697,200,000.34','751,150,000.36','723,720,000.33','888,000,000.40','790,320,000.36','883,560,000.40','927,960,000.42','737,040,000.33','879,120,000.40','914,640,000.41','927,960,000.42','1,172,160,000.53','1,416,360,000.64','1,589,520,000.72','1,552,500,000.41','1,972,500,000.53','2,520,000,000.67','2,160,000,000.58','2,475,000,000.66','2,010,000,000.54','2,250,000,000.60','2,077,500,000.55','2,332,500,000.62','3,270,000,000.87','2,347,500,000.63');
theRows[1] = new Array ('<b>'+h+'89>DEN</B></a>','55,200,000.00','55,920,000.00','45,960,000.00','42,600,000.00','35,760,000.00','39,600,000.00','40,200,000.00','47,700,000.00','50,460,000.00','45,300,000.00','41,760,000.00','59,340,000.00','66,600,000.00','97,020,000.00','81,060,000.00','69,300,000.00','79,800,000.00','68,400,000.00','66,900,000.00','66,960,000.00','71,220,000.00','71,520,000.00','71,880,000.00','60,600,000.00','69,120,000.00','62,640,000.00','57,180,000.00','89,850,000.00','125,100,000.00','85,350,000.00');
theRows[2] = new Array ('<b>'+h+'269>SIS</B></a>','4,425,000,000.00','4,695,000,000.00','4,050,000,000.00','4,367,380,000.00','4,273,120,000.00','3,644,720,000.00','4,681,580,000.00','4,913,000,000.00','6,188,000,000.00','5,457,000,000.00','6,137,000,000.00','5,453,000,000.00','6,061,000,000.00','6,954,000,000.00','6,745,000,000.00','6,519,000,000.00','7,851,500,000.00','8,548,500,000.00','9,430,000,000.00','9,225,000,000.00','10,575,000,000.00','11,610,000,000.00','9,517,500,000.00','13,140,000,000.00','12,757,500,000.00','13,117,500,000.00','11,677,500,000.00','10,507,500,000.00','11,857,500,000.00','9,315,000,000.00');
theRows[3] = new Array ('<b>'+h+'297>TRK</B></a>','1,692,579,200.00','1,983,924,800.00','1,831,315,200.00','1,704,000,000.00','1,803,400,000.00','1,498,100,000.00','1,803,400,000.00','1,884,450,000.00','2,542,160,000.00','2,180,050,000.00','2,069,200,000.00','1,682,600,000.00','1,619,950,000.00','1,852,650,000.00','2,040,600,000.00','2,315,700,000.00','2,641,200,000.00','2,938,800,000.00','3,599,100,000.00','4,101,900,000.00','5,220,600,000.00','5,808,200,000.00','4,689,500,000.00','5,375,000,000.00','3,787,500,000.00','4,150,000,000.00','3,662,500,000.00','3,712,500,000.00','4,375,000,000.00','3,587,500,000.00');
var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script>
"""
## Make a dummy list of scripts
scripts = [s for _ in range(10)]

According to the provided <script> in your question, you can do something like code below to have a list of Dates for each name ANA, DEN ..:
for _ in range(1, len(aaa.split("<b>'"))-1):
s = aaa.split("<b>'")[_].split("'")
print(_)
lst = []
for i in s:
if "</B>" in i:
name = i.split('>')[1].split("<")[0]
print("{} = ".format(name), end="")
if any(j.isdigit() for j in i) and ',' in i:
lst.append(i)
print(lst)
It's just an example code, so it's not that beautiful :)
Hope this will help you.

Converting CSV files to TF Records

I've been running my script for more than 5 hours already. I have 258 CSV files that I want to convert to TF Records. I wrote the following script, and as I've said, I've been running it for more than 5 hours already:
import argparse
import os
import sys
import standardize_data
import tensorflow as tf
FLAGS = None
PATH = '/home/darth/GitHub Projects/gru_svm/dataset/train'
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def convert_to(dataset, name):
"""Converts a dataset to tfrecords"""
filename_queue = tf.train.string_input_producer(dataset)
# TF reader
reader = tf.TextLineReader()
# default values, in case of empty columns
record_defaults = [[0.0] for x in range(24)]
key, value = reader.read(filename_queue)
duration, service, src_bytes, dest_bytes, count, same_srv_rate, \
serror_rate, srv_serror_rate, dst_host_count, dst_host_srv_count, \
dst_host_same_src_port_rate, dst_host_serror_rate, dst_host_srv_serror_rate, \
flag, ids_detection, malware_detection, ashula_detection, label, src_ip_add, \
src_port_num, dst_ip_add, dst_port_num, start_time, protocol = \
tf.decode_csv(value, record_defaults=record_defaults)
features = tf.stack([duration, service, src_bytes, dest_bytes, count, same_srv_rate,
serror_rate, srv_serror_rate, dst_host_count, dst_host_srv_count,
dst_host_same_src_port_rate, dst_host_serror_rate, dst_host_srv_serror_rate,
flag, ids_detection, malware_detection, ashula_detection, src_ip_add,
src_port_num, dst_ip_add, dst_port_num, start_time, protocol])
filename = os.path.join(FLAGS.directory, name + '.tfrecords')
print('Writing {}'.format(filename))
writer = tf.python_io.TFRecordWriter(filename)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
while not coord.should_stop():
example, l = sess.run([features, label])
print('Writing {dataset} : {example}, {label}'.format(dataset=sess.run(key),
example=example, label=l))
example_to_write = tf.train.Example(features=tf.train.Features(feature={
'duration' : _float_feature(example[0]),
'service' : _int64_feature(int(example[1])),
'src_bytes' : _float_feature(example[2]),
'dest_bytes' : _float_feature(example[3]),
'count' : _float_feature(example[4]),
'same_srv_rate' : _float_feature(example[5]),
'serror_rate' : _float_feature(example[6]),
'srv_serror_rate' : _float_feature(example[7]),
'dst_host_count' : _float_feature(example[8]),
'dst_host_srv_count' : _float_feature(example[9]),
'dst_host_same_src_port_rate' : _float_feature(example[10]),
'dst_host_serror_rate' : _float_feature(example[11]),
'dst_host_srv_serror_rate' : _float_feature(example[12]),
'flag' : _int64_feature(int(example[13])),
'ids_detection' : _int64_feature(int(example[14])),
'malware_detection' : _int64_feature(int(example[15])),
'ashula_detection' : _int64_feature(int(example[16])),
'label' : _int64_feature(int(l)),
'src_ip_add' : _float_feature(example[17]),
'src_port_num' : _float_feature(example[18]),
'dst_ip_add' : _float_feature(example[19]),
'dst_port_num' : _float_feature(example[20]),
'start_time' : _float_feature(example[21]),
'protocol' : _int64_feature(int(example[22])),
}))
writer.write(example_to_write.SerializeToString())
writer.close()
except tf.errors.OutOfRangeError:
print('Done converting -- EOF reached.')
finally:
coord.request_stop()
coord.join(threads)
def main(unused_argv):
files = standardize_data.list_files(path=PATH)
convert_to(dataset=files, name='train')
It already got me thinking that perhaps it's stuck in an infinite loop? What I want to do is to read all rows in each CSV file (258 CSV files), and write those rows into a TF Record (a feature and a label, that is, of course). And then, stop the loop when there are no more rows available, or the CSV files have been exhausted already.
The standardize_data.list_files(path) is a function I wrote in a different module. I just re-used it for this script. What it does is to return a list of all the files found in PATH. Take note that the files in my PATH only contains CSV files.

Set num_epochs=1 in string_input_producer. Another note: Converting these csv to tfrecords may not offer any advantage you are looking in tfrecords, the overheads is very high with this kind of data (with the large number of single features/labels). You may want to experiment this part.

Unable to display all the information except for first selection

I am using the following code to process a list of images that is found in my scene, before the gathered information, namely the tifPath and texPath is used in another function.
However, example in my scene, there are 3 textures, and hence I should be seeing 3 sets of tifPath and texPath but I am only seeing 1 of them., whereas if I am running to check surShaderOut or surShaderTex I am able to see all the 3 textures info.
For example, the 3 textures file path is as follows (in the surShaderTex): /user_data/testShader/textureTGA_01.tga, /user_data/testShader/textureTGA_02.tga, /user_data/testShader/textureTGA_03.tga
I guess what I am trying to say is that why in my for statement, it is able to print out all the 3 results and yet anything bypass that, it is only printing out a single result.
Any advices?
surShader = cmds.ls(type = 'surfaceShader')
for con in surShader:
surShaderOut = cmds.listConnections('%s.outColor' % con)
surShaderTex = cmds.getAttr("%s.fileTextureName" % surShaderOut[0])
path = os.path.dirname(surShaderTex)
f = surShaderTex.split("/")[-1]
tifName = os.path.splitext(f)[0] + ".tif"
texName = os.path.splitext(f)[0] + ".tex"
tifPath = os.path.join(path, tifName)
texPath = os.path.join(path, texName)
convertText(surShaderTex, tifPath, texPath)

Only two lines are part of your for loop. The rest only execute once.
So first this runs:
surShader = cmds.ls(type = 'surfaceShader')
for con in surShader:
surShaderOut = cmds.listConnections('%s.outColor' % con)
surShaderTex = cmds.getAttr("%s.fileTextureName" % surShaderOut[0])
Then after that loop, with only one surShader, one surShaderOut, and one surShaderTex, the following is executed once:
path = os.path.dirname(surShaderTex)
f = surShaderTex.split("/")[-1]
tifName = os.path.splitext(f)[0] + ".tif"
texName = os.path.splitext(f)[0] + ".tex"
tifPath = os.path.join(path, tifName)
texPath = os.path.join(path, texName)
Indent that the same as the lines above it, and it'll be run for each element of surShader instead of only once.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the mfcc of song(.wav) using python? - python

Related

How can i optimize my Python loop for speed

Error when using a custom dataset with fastai

How can I save the headers and values in Html <script> as a table in the csv file?

Converting CSV files to TF Records

Unable to display all the information except for first selection

Categories

Resources