optimization for faster calculation on python defaultdict - python

I have such a script;
for b in range(len(xy_alignments.keys())):
print str(b) + " : " + str(len(xy_alignments.keys()))
x = xy_alignments.keys()[b][0]
y = xy_alignments.keys()[b][1]
yx_prob = yx_alignments[(y,x)] / x_phrases[x]
xy_prob = xy_alignments[(x,y)] / y_phrases[y]
line_str = x + "\t" + y + "\t" + str(yx_prob) + "\t" + str(xy_prob) + "\n"
of.write(line_str.encode("utf-8"))
of.close()
xy_alignments, yx_alignments, x_phrases, and y_phrases are
python defaultdict variables which involve millions of keys.
When I run the loop above, it runs damn slowly.
Do python lovers have a suggestion to make it fast?
Thanks,

Here's a more idiomatic version, that should also be faster.
for (x, y), xy_alignment in xy_alignments.iteritems():
yx_prob = yx_alignments[(y, x)] / x_phrases[x]
xy_prob = xy_alignment / y_phrases[y]
of.write(b'%s\t%s\t%s\t%s\n' % (x, y, yx_prob, xy_prob))
This
saves the key() calls which create new lists every time,
saves one dict lookup by using iteritems(),
saves string allocations by using string formatting, and
saves the encode() call because all output is in the ascii range anyway.

Related

print statement never gets executed in a loop

in the below posted code, the first nested for loops displays the logs or the print statemnt as expected. but the latter nested for loops which has k and l as indces never displys the logs or the print statement within it.
please let me know why the print statement
print(str(x) + ",,,,,,,,,,,,,,,,,,," + str(y))
never gets displayed despite the polygonCoordinatesInEPSG25832 contains values
python code:
for feature in featuresArray:
polygonCoordinatesInEPSG4326.append(WebServices.fetchCoordinateForForFeature(feature))
for i in range(len(polygonCoordinatesInEPSG4326)):
for j in range(len(polygonCoordinatesInEPSG4326[i])):
lon = polygonCoordinatesInEPSG4326[i][j][0]
lat = polygonCoordinatesInEPSG4326[i][j][1]
x, y = transform(inputProj, outputProj, lon, lat)
xy.append([x,y])
print ("lon:" + str(lon) + "," + "lat:" + str(lat) + "<=>" + "x:" + str(x) + "," + "y:" , str(y))
print(str(x) + "," + str(y))
print("xy[%d]: %s"%(len(xy)-1,str(xy[len(xy)-1])))
print("\n")
print("len(xy): %d"%(len(xy)))
polygonCoordinatesInEPSG25832.append(xy)
print("len(polygonCoordinatesInEPSG25832[%d]: %d"%(i,len(polygonCoordinatesInEPSG25832[i])))
xy.clear()
print("len(polygonCoordinatesInEPSG25832 = %d" %(len(polygonCoordinatesInEPSG25832)))
for k in range(len(polygonCoordinatesInEPSG25832)):
for l in range(len(polygonCoordinatesInEPSG25832[k])):
x = polygonCoordinatesInEPSG25832[k][l][0]
y = polygonCoordinatesInEPSG25832[k][l][1]
print(str(x) + ",,,,,,,,,,,,,,,,,,," + str(y))
polygonCoordinatesInEPSG25832 contain values but polygonCoordinatesInEPSG25832[k] don't.
You append it with xy but you didn't unlinked it so when you call xy.clear() it become empty. Try deep copy it instead.

Python: Substituting variables with functions SymPy

I'm writing a code where I need to substitute variables of a function with multiple functions.
For example, I have B=x1**2+x2**2+x3**2 where I need to substitute x1=cos(x1+x2), x2=sin(x2+x3) and x3=x1 so as to get this value: cos(x1+x2)**2+sin(x2+x3)**2+x1**2
However, when I do this iteratively like this:
for j in range(nvar):
B=expand(B.subs(x[j],f[j]))
where nvar=3 and x is defined as a list of symbols and f as a list of symbolic functions, at each iteration, x[j] from the previous substitution is replaced and gives a wrong answer: x1**2 + sin(x1 + cos(x1 + sin(x1 + x2)))**2 + cos(x1 + sin(x1 + cos(x1 + sin(x1 + x2))))**2
How can I perform this substitution simultaneously?
You can use the simultaneous keyword for subs which was made for cases like this:
>>> (x1**2+x2**2+x3**2).subs(dict(x1=cos(x1+x2), x2=sin(x2+x3), x3=x1), simultaneous=True)
x1**2 + sin(x2 + x3)**2 + cos(x1 + x2)**2
Or, if x and f contain all instances of replacements you are interested in,
>>> reps = dict(zip(x, f))
>>> B = expand(B.subs(reps, simultaneous=True)

Python: Calculating difference of values in a nested list by using a While Loop

I have a list that is composed of nested lists, each nested list contains two values - a float value (file creation date), and a string (a name of the file).
For example:
n_List = [[201609070736L, 'GOPR5478.MP4'], [201609070753L, 'GP015478.MP4'],[201609070811L, 'GP025478.MP4']]
The nested list is already sorted in order of ascending values (creation dates). I am trying to use a While loop to calculate the difference between each sequential float value.
For Example: 201609070753 - 201609070736 = 17
The goal is to use the time difference values as the basis for grouping the files.
The problem I am having is that when the count reaches the last value for len(n_List) it throws an IndexError because count+1 is out of range.
IndexError: list index out of range
I can't figure out how to work around this error. no matter what i try the count is always of range when it reaches the last value in the list.
Here is the While loop I've been using.
count = 0
while count <= len(n_List):
full_path = source_folder + "/" + n_List[count][1]
time_dif = n_List[count+1][0] - n_List[count][0]
if time_dif < 100:
f_List.write(full_path + "\n")
count = count + 1
else:
f_List.write(full_path + "\n")
f_List.close()
f_List = open(source_folder + 'GoPro' + '_' + str(count) + '.txt', 'w')
f_List.write(full_path + "\n")
count = count + 1
PS. The only work around I can think of is to assume that the last value will always be appended to the final group of files. so, when the count reaches len(n_List - 1), I skip the time dif calculation, and just automatically add that final value to the last group. While this will probably work most of the time, I can see edge cases where the final value in the list may need to go in a separate group.
I think using zip could be easier to get difference.
res1,res2 = [],[]
for i,j in zip(n_List,n_List[1:]):
target = res1 if j[0]-i[0] < 100 else res2
target.append(i[1])
n_list(len(n_list)) will always return an index out of range error
while count < len(n_List):
should be enough because you are starting count at 0, not 1.
FYI, here is the solution I used, thanks to #galaxyman for the help.
I handled the issue of the last value in the nested list, by simply
adding that value after the loop completes. Don't know if that's the most
elegant way to do it, but it works.
(note: i'm only posting the function related to the zip method suggested in the previous posts).
def list_zip(get_gp_list):
ffmpeg_list = open(output_path + '\\' + gp_List[0][1][0:8] + '.txt', 'a')
for a,b in zip(gp_List,gp_List[1:]):
full_path = gopro_folder + '\\' + a[1]
time_dif = b[0]-a[0]
if time_dif < 100:
ffmpeg_list.write("file " + full_path + "\n")
else:
ffmpeg_list.write("file " + full_path + "\n")
ffmpeg_list.close()
ffmpeg_list = open(output_path + '\\' + b[1][0:8] + '.txt', 'a')
last_val = gp_List[-1][1]
ffmpeg_list.write("file " + gopro_folder + '\\' + last_val + "\n")
ffmpeg_list.close()

Leading/prefix 0s in out of for loop

I am writing a four loop in my program that writes data to a file. I'm wanting for the output to be formatted as follows
frame001 + K.1
frame002 + K.2
...
frame099 + K.99
frame100 + K.100
So far I am doing
for f in range(1, 100):
file.write('frame' + str(f) + ' + K.' + str(f) + '\n')
I have no problem having the K part come out correctly as K.1-K.100, but I don't know how to have prefix zeros/have it output also frame00F to frameFFF with the appropriate amount of preceding zeros.
Using str.format:
>>> 'frame{0:03d} + K.{0}\n'.format(1)
'frame001 + K.1\n'
>>> 'frame{0:03d} + K.{0}\n'.format(100)
'frame100 + K.100\n'
BTW, range(1, 100) will not yield 100. If you want 100 to be included, that should be range(1, 101).
If you are using old version of Python (Python 2.5-), use % operator (String formatting operator) instead (need to specify multiple argument unlike str.format)
>>> 'frame%03d + K.%d\n' % (1, 1)
'frame001 + K.1\n'
>>> 'frame%03d + K.%d\n' % (100, 100)
'frame100 + K.100\n'
If you don't want to repeat arguments, you can pass mapping instead with slightly different format specifier:
>>> 'frame%(i)03d + K.%(i)d\n' % {'i': 1}
'frame001 + K.1\n'

Swapping lines in a text

How to change this:
fv (x,y,z) begin print x;;; print y ;;; return x + y + z end;
x = fv(2,34,5)
g (x) begin y = x + 45 ;;; return y end;
z = g(23)
r = 53
h (x,y,z,r) begin print x;;; print y ;;; print z;;;print r;;;return x + y + z end;
To this:
def fv (x,y,z) :
print x
print y
return x + y + z
x = fv(2,34,5)
def g (x) :
y = x + 45
return y
z = g(23)
r = 53
def h (x,y,z,r) :
print x
print y
print z
print r
return x + y + z
I'm not asking for a full code or to do my homework, I only need advices and/or samples or a direction how to do this.
Since you're only looking for a starting hint, and this is probably homework...
Do a replace() on the various line-enders (e.g. "begin", ";;;", "end;") converting them to "\n", with possibly a ':' in one of them.
Split the resulting text into lines with .split("\n")
Walk the lines to adjust the line prefixes ("def ", indentation)
Put the lines back together using "\n".join(...)
Write the output text
this could get you started
for line in code:
line = line.replace( "begin", " :\n" + " " * 4 ).replace( ";;;", "\n" + " " * 4 ).replace( "end;", "\n" + " " * 4 )
Look at the sed command line tool, for instance. It's a bit hard to know what tools you're expected/allowed to use ...
Well, for starters you open() the file, use its readlines() method to get it into a list of strings.
From there you could iterate through that list and use a combination of split(";;;") methods or something more complex from the re module on strings.
This might be overkill, but take a look at the Ply parser project. You will have to learn about regular expressions and Backus Naur formatting.
Ply parser

Categories