Multiple loops logic and speed optimization in Python?

Multiple loops logic and speed optimization in Python? - python

Here is two python functions to transfer data from one file to another file. Both source file and target file have the same number of objects but with different data.
def getBlock(rigObj, objName):
rigObj.seek(0)
Tag = False
block = ""
for line in rigObj:
if line.find("ObjectAlias " + str(objName) + "\n") != -1:
for line in rigObj:
if line.find("BeginKeyframe") != -1:
Tag = True
elif line.lstrip().startswith("0.000 ") and line.rstrip().endswith("default"):
Tag = False
break
elif Tag:
block += line
return (block)
def buildScene(sceneObj, rigObj, objList):
sceneObj.seek(0)
rigObj.seek(0)
newscene = ""
for line in sceneObj:
newscene += line
for obj in objList:
if line.find("ObjectAlias " + obj + "\n") != -1:
Tag = True
for line in sceneObj:
if line.find("BeginKeyframe") != -1:
newscene += line
newscene += getBlock(rigObj, obj)
Tag = False
elif line.lstrip().startswith("0.000 ") and line.rstrip().endswith("default"):
newscene += line
Tag = True
break
elif Tag:
newscene += line
return (newscene)
getBlock is a sub-function for getting data from rigobj;
buildScene is my main function, it has three parameters:
First parameter(sceneobj) is the file that I want to put data into;
Second parameter(rigobj) is the file that I get the data from;
Third parameter(objlist) is a list of what object's data to be transfered.
So far, the function does its job, the only problem is a bit of slow(sceneobj<10MB, rigobj<2MB, objlist<10 objects), I am not sure if there are a logic problem in the code, should I loop the sceneObj first or loop the objList first? Does it affect the speed?
UPDATE:
Both sceneObj and rigObj have similar data like this:
lines
BeginObject
lines
ObjectAlias xxx #--> object in transfer list
lines
BeginKeyframe 10 12
-9.000 4095 default #--> transfer begins
lines #--> transfer from rigObj to sceneObj and override lines in sceneObj
-8.000 63 default #--> same
lines #--> same
-7.000 63 default #--> same
lines #--> same
-1.000 63 default #--> same
lines #--> transfer ends
0.000 -1 default
lines
EndKeyframe
EndMotion
lines
EndObject
The data want to be transfered and overrided is only lines bewteen BeginKeyframe and 0.000 -1 default of any specified objects(by objList)

Most obvious optimization is to index data for getBlock function, so you will able to seek to needed position instead of always parsing full file from beginning.
like so:
def create_rig_index(rig_obj):
""" This function creates dict of offsets for specific ObjectAlias
Example:
data:
line with offset 100: ObjectAlias xxx
more lines
line with offset 200: ObjectAlias yyy
more lines
line with offset 300: ObjectAlias xxx
more lines
result will be:
xxx: [100, 300]
yyy: [200]
"""
idx = defaultdict( list )
position = 0
for line in rig_obj:
strip_line = line.strip()
if strip_line.startswith( "ObjectAlias" ):
obj_name = strip_line.split()[1]
idx[ obj_name ].append( position )
# unfortunately python prevent `tell` calls during iteration.
position += len( bytes( line, 'utf-8' ) )
# if data guaranteed to be ascii only its possible to use len( line )
# or you can write custom line generator on top of read function.
return idx;
def getBlock(rigObj, rigIdx, objName):
""" same as your getBlock, but uses precalculated offsets"""
block = ""
for idx in rigIdx[ objName ]:
rigObj.seek( idx )
Tag = False
for line in rigObj:
if line.find("BeginKeyframe") != -1:
Tag = True
elif line.lstrip().startswith("0.000 ") and line.rstrip().endswith("default"):
break
elif Tag:
block += line
return (block)
In buildScene method you should create rig_index before running for loop, and use this index in getBlock function.

Related

ERROR: File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice

..can.. Hi all, can someone take a look on this code, I have one problem but I don't know what is?
I'm working on generating various length and shapes of text on image, and when segmentated area is enough big then text is placed, but when the length of text a little bigger then this error shows me. Example, when the text has 1-8 words then the output is fine, but when the length is bigger then it shows me this error, but on some images works fine because it have bigger area to render the text. So I don't know what to do?
Terminal shows me these errors:
File "/..../text_utils.py", line 679, in sample
return self.fdict[kind](nline_max,nchar_max)
File "/..../text_utils.py", line 725, in sample_para
lines = self.get_lines(nline, nword, nchar_max, f=0.35)
File "/..../text_utils.py", line 657, in get_lines
lines = h_lines(niter=100)
File "/..../text_utils.py", line 649, in h_lines
line_start = np.random.choice(len(self.txt)-nline)
File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken
I saw this on this link: https://github.com/numpy/numpy/blob/main/numpy/random/mtrand.pyx there is some statement at 902 line but I don't understand.
And this is my code:
def get_lines(self, nline, nword, nchar_max, f=0.35, niter=100):
def h_lines(niter=100):
lines = ['']
iter = 0
while not np.all(self.is_good(lines,f)) and iter < niter:
iter += 1
**649 ---->** line_start = np.random.choice(len(self.txt)-nline)
lines = [self.txt[line_start+i] for i in range(nline)]
return lines
lines = ['']
iter = 0
while not np.all(self.is_good(lines,f)) and iter < niter:
iter += 1
**657 ---->** lines = h_lines(niter=100)
# get words per line:
nline = len(lines)
for i in range(nline):
words = lines[i].split()
dw = len(words)-nword[i]
if dw > 0:
first_word_index = random.choice(range(dw+1))
lines[i] = ' '.join(words[first_word_index:first_word_index+nword[i]])
while len(lines[i]) > nchar_max: #chop-off characters from end:
if not np.any([ch.isspace() for ch in lines[i]]):
lines[i] = ''
else:
lines[i] = lines[i][:len(lines[i])-lines[i][::-1].find(' ')].strip()
if not np.all(self.is_good(lines,f)):
return #None
else:
return lines
def sample(self, nline_max,nchar_max,kind='WORD'):
**679 ---->** return self.fdict[kind](nline_max,nchar_max)
def sample_para(self,nline_max,nchar_max):
# get number of lines in the paragraph:
nline = nline_max*sstat.beta.rvs(a=self.p_para_nline[0], b=self.p_para_nline[1])
nline = max(1, int(np.ceil(nline)))
# get number of words:
nword = [self.p_para_nword[2]*sstat.beta.rvs(a=self.p_para_nword[0], b=self.p_para_nword[1])
for _ in range(nline)]
nword = [max(1,int(np.ceil(n))) for n in nword]
**725 ---->** lines = self.get_lines(nline, nword, nchar_max, f=0.35)
if lines is not None:
# center align the paragraph-text:
if np.random.rand() < self.center_para:
lines = self.center_align(lines)
return '\n'.join(lines)
else:
return []

Python: text log file processing and transposing rows to columns

I am new to python and stuck with a log file in text format, where it has following repetitive structure and I am required to extract the data from rows and change it into column depending upon the data. e.g.
First 50 line are trash like below(in first six lines):
-------------------------------------------------------------
Logging to file xyz.
Char
1,
3
r
=
----------------------------------------------
Pid 0
Name SAB=1, XYZ=3
----------------------------------------------
a 1
b 2
c 3
----------------------------------------------
Pid 0
Name SAB=1, XYZ=3, P_NO=546467
----------------------------------------------
Test_data_1 00001
Test_data_2 FOXABC
Test_data_3 SHEEP123
Country US
----------------------------------------------
Pid 0
Name SAB=1
----------------------------------------------
Sno 893489423
Log FileFormat
------------Continues for another million lines.
Now the required output is like below:
Required output format
PID, Name, a,b,c
0, "SAB=1, XYZ=3", 1,2,3
PID, Name , Test_data_1, Test_data_2, Test_data_3, Country
0, "SAB=1, XYZ=3, P_NO=546467", 00001, FOXABC, SHEEP123, US
Pid, Name, Sno
0, SAB=1, 893489423
I tried to write a code but failed to get the desired results: My attempt was as below:
'''
fn=open(file_name,'r')
for i,line in enumerate(fn ):
if i >= 50 and "Name " in line: # for first 50 line deletion/or starting point
last_tag=line.split(",")[-1]
last_element=last_tag.split("=")[0]
print(last_element)
'''
Any help would be really appreciated.
Newly Discovered Structure
RBY Structure

The solution I came up with is a bit messy but it works, check it out below:
import sys
import re
import StringIO
ifile = open(sys.argv[1],'r') #Input log file as command-line argument
ofile = open(sys.argv[1][:-4]+"_formatted.csv",'w') #output formatted log txt
stringOut = ""
i = 0
flagReturn = True
j = 0
reVal = re.compile("Pid[\s]+(.*)\nName[\s]+(.*)\n[-]+\<br\>(.*)\<br\>") #Regex pattern for separating the Pid & Name from the variables
reVar = re.compile("(.*)[ ]+(.*)") #Regex pattern for getting vars and their values
reVarStr = re.compile(">>> [0-9]+.(.*)=(.*)") #Regex Pattern for Struct
reVarStrMatch = re.compile("Struct(.*)+has(.*)+members:") #Regex pattern for Struct check
for lines in ifile.readlines():
if(i>8): #Omitting the first 9 lines of Garbage values
if(lines.strip()=="----------------------------------------------"): #Checking for separation between PID & Name group and the Var group
j+=1 #variable keeping track of whether we are inside the vars section or not (between two rows of hyphens)
flagReturn = not flagReturn #To print the variables in single line to easily separate them with regex pattern reVal
if(not flagReturn):
stringTmp = lines.strip()+"<br>" #adding break to the end of each vars line in order for easier separation
else:
stringTmp = lines #if not vars then save each line as is
stringOut += stringTmp #concatenating each lines to form the searchable string
i+=1 #incrementing for omitting lines (useless after i=8)
if(j==2): #Once a complete set of PIDs, Names and Vars have been collected
j=0 #Reset j
matchObj = reVal.match(stringOut) #Match for PID, Name & Vars
line1 = "Pid,Name,"
line2 = matchObj.group(1).strip()+",\""+matchObj.group(2)+"\","
buf = StringIO.StringIO(matchObj.group(3).replace("<br>","\n"))
structFlag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
if(not (reVarStrMatch.match(line) is None)):
structFlag = True
elif(structFlag and (not (reVarStr.match(line) is None))):
matchObjVars = reVarStr.match(line)
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
else:
structFlag = False
matchObjVars = reVar.match(line)
try:
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
except:
line1 += line.strip()+","
line2 += " ,"
ofile.writelines(line1[:-1]+"\n")
ofile.writelines(line2[:-1]+"\n")
ofile.writelines("\n")
stringOut = "" #Reseting the string
ofile.close()
ifile.close()
EDIT
This is what I came up with to include the new pattern as well.
I suggest you do the following:
Run the parser script on a copy of the log file and see where it fails next.
Identify and write down the new pattern that broke the parser.
Delete all data in the newly identified pattern.
Repeat from Step 1 till all patterns have been identified.
Create individual regular expressions pattern for each type of pattern and call them in separate functions to write to the string.
EDIT 2
structFlag = False
RBYflag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
if(not (reVarStrMatch.match(line) is None)):
structFlag = True
elif(structFlag and (not (reVarStr.match(line) is None))):
matchObjVars = reVarStr.match(line)
if(matchObjVars.group(1).strip()=="RBY" and not RBYFlag):
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+"**"
RBYFlag = True
elif(matchObjVars.group(1).strip()=="RBY"):
line2 += matchObjVars.group(2).strip()+"**"
else:
if(RBYFlag):
line2 = line2[:-2]
RBYFlag = False
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
else:
structFlag = False
if(RBYFlag):
line2 = line2[:-2]
RBYFlag = False
matchObjVars = reVar.match(line)
try:
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
except:
line1 += line.strip()+","
line2 += " ,"`
NOTE
This loop has become very bloated and it is better to create a separate function to identify the type of data and return some value accordingly.

Python: read line and modify it (if needed)

let's say I have a file Example.txt like this:
alpha_1 = 10
%alpha_2 = 20
Now, I'd like to have a python script which performs these tasks:
If the line containing alpha_1 parameter is not commented (% symbol), to rewrite the line adding %, like it is with alpha_2
To perform the task in 1. independently of the line order
To leave untouched the rest of the file Example.txt
The file I wrote is:
with open('Example.txt', 'r+') as config:
while 1:
line = config.readline()
if not line:
break
# remove line returns
line = line.strip('\r\n')
# make sure it has useful data
if (not "=" in line) or (line[0] == '%'):
continue
# split across equal sign
line = line.split("=",1)
this_param = line[0].strip()
this_value = line[1].strip()
for case in switch(this_param):
if case("alpha1"):
string = ('% alpha1 =', this_value )
s = str(string)
config.write(s)
Up to now the output is the same Example.txt with a further line (%alpha1 =, 10) down the original line alpha1 = 10.
Thanks everybody

I found the solution after a while. Everything can be easily done writing everything on another file and substituting it at the end.
configfile2 = open('Example.txt' + '_temp',"w")
with open('Example.txt', 'r') as configfile:
while 1:
line = configfile.readline()
string = line
if not line:
break
# remove line returns
line = line.strip('\r\n')
# make sure it has useful data
if (not "=" in line) or (line[0] == '%'):
configfile2.write(string)
else:
# split across equal sign
line = line.split("=",1)
this_param = line[0].strip()
this_value = line[1].strip()
#float values
if this_param == "alpha1":
stringalt = '% alpha1 = '+ this_value + ' \r\n'
configfile2.write(stringalt)
else:
configfile2.write(string)
configfile.close()
configfile2.close()
# the file is now replaced
os.remove('Example.txt' )
os.rename('Example.txt' + '_temp','Example.txt' )

Call a sub-function in a function to deal with a same file in Python?

I am still fight with python copy and replace lines, a question Here. Basically, I want to statistics the number of a pattern in a section, and renew it in the line. I think I have found the problem in my question: I call a sub-function to interate an same file in the main function, and the interation is messing up at time. I am pretty new to programming, I don't know how to do this copy-statistics-replace-copy thing in another way. Any suggestions or hints is welcome.
Here is part of code what I got now:
# sum number of keyframes
def sumKeys (sceneObj, objName):
sceneObj.seek(0)
block = []
Keys = ""
for line in sceneObj:
if line.find("ObjectAlias " + objName + "\n") != -1:
for line in sceneObj:
if line.find("BeginKeyframe") != -1:
for line in sceneObj:
if line.find("default") != -1:
block.append(line.rstrip())
Keys = len(block)
elif line.find("EndKeyframe") != -1:
break
break
break
return (Keys)
# renew number of keyframes
def renewKeys (sceneObj, objName):
sceneObj.seek(0)
newscene = ""
item = []
for line in sceneObj:
newscene += line
for obj in objName:
if line.find("ObjectAlias " + obj + "\n") != -1:
for line in sceneObj:
if line.find("EndKeyframe") != -1:
newscene += line
break
if line.find("BeginKeyframe") != -1:
item = line.split()
newscene += item[0] + " " + str(sumKey(sceneObj, obj)) + " " + item[-1] + "\n"
continue
else:
newscene += line
return (newscene)
Original:
lines
BeginObjects
lines
ObjectAlias xxx
lines
BeginKeyframe 34 12 ----> 34 is what I want to replace
lines
EndObject
BeginAnotherObjects
...
Goal:
lines
BeginObjects
lines
ObjectAlias xxx
lines
BeginKeyframe INT 12 ---->INT comes from sumKeys function
lines
EndObject
BeginAnotherObjects
...

You can use tell and seek to move inside a file, so to do what you want to do, you could use something like this, which I hacked together:
import re
# so, we're looking for the object 'HeyThere'
objectname = 'HeyThere'
with open('input.txt', 'r+') as f:
line = f.readline()
pos = f.tell()
found = False
while line:
# we only want to alter the part with the
# right ObjectAlias, so we use the 'found' flag
if 'ObjectAlias ' + objectname in line:
found = True
if 'EndObject' in line:
found = False
if found and 'BeginKeyframe' in line:
# we found the Keyframe part, so we read all lines
# until EndKeyframe and count each line with 'default'
sub_line = f.readline()
frames = 0
while not 'EndKeyframe' in sub_line:
if 'default' in sub_line:
frames += 1
sub_line = f.readline()
# since we want to override the 'BeginKeyframe', we
# have to move back in the file to before this line
f.seek(pos)
# now we read the rest of the file, but we skip the
# old 'BeginKeyframe' line we want to replace
f.readline()
rest = f.read()
# we jump back to the right position again
f.seek(pos)
# and we write our new 'BeginKeyframe' line
f.write(re.sub('\d+', str(frames), line, count=1))
# and write the rest of the file
f.write(rest)
f.truncate()
# nothing to do here anymore, just quit the loop
break
# before reading a new line, we keep track
# of our current position in the file
pos = f.tell()
line = f.readline()
The comments pretty much explain what's going on.
Given an input file like
foo
bar
BeginObject
something
something
ObjectAlias NotMe
lines
more lines
BeginKeyframe 22 12
foo
bar default
foo default
bar default
EndKeyframe
EndObject
foo
bar
BeginObject
something
something
ObjectAlias HeyThere
lines
more lines
BeginKeyframe 43243 12
foo
bar default
foo default
bar default
foo default
bar default
foo default
bar
EndKeyframe
EndObject
it will replace the line
BeginKeyframe 43243 12
with
BeginKeyframe 6 12

How to proceed next line while reading specific line?

I got file called 'datafile', which contains data like this:
tag
12
22
33
tag
234
1
23
43
tag
8
tag
0
12
The number of numbers between the "tag"s varies.
What I need to do is to access every first( or second) number (if exist) after "tag".
My uncompleted Python code:
f = open('datafile', 'r')
for line in f.readlines():
line = line.strip()
if line == 'tag':
print('tag found!')
# how can I access next number here?
How can I proceed to next line inside the for loop?

It's easier to just reset a counter every time you encounter "tag":
k = 1 # or 2, or whatever
with open('datafile', 'r') as f:
since_tag = 0
for line in f:
line = line.strip()
if line == 'tag':
print('tag found!')
since_tag = 0
if since_tag == k:
print "kth number found: ", line
since_tag += 1

Use an iterator fed into a generator.
def getFirstSecond(browser):
for line in browser:
line = line.strip()
if line == 'tag':
first = next(browser).strip()
try:
second = next(browser).strip()
except StopIteration:
second = None
yield first, second if second != 'tag' else first, None
with open('datafile', 'r') as f:
browser = iter(f)
pairs = list(getFirstSecond(browser))
To answer your question, you proceed to the next line using the next function.
Note the use of the with statement; this is how you should be opening a file (it ensures the file is closed when you are done with it).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple loops logic and speed optimization in Python? - python

Related

ERROR: File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice

Python: text log file processing and transposing rows to columns

Python: read line and modify it (if needed)

Call a sub-function in a function to deal with a same file in Python?

How to proceed next line while reading specific line?

Categories

Resources