Damerau-Levenshtein distance code throwing errors?

Damerau-Levenshtein distance code throwing errors? - python

For some reason, when I try and implement the following code (I'm using Sublime Text 2) it gives me the error "Invalid Syntax" on line 18. I'm not sure why this is, I found the code here and it apparently should work, so I have no idea why it doesn't. Any tips?
Here is the code:
def damerau_levenshtein_distance(word1, word2):
distances = {}
len_word1 = len(word1)
len_word2 = len(word2)
for i in xrange(-1, (len_word1 + 1)):
distances[(i,-1)] = i + 1
for j in xrange(-1, (len_word2 + 1)):
distances[(-1,j)] = j + 1
for i in xrange(len_word1):
if word1[i] == word2[j]:
distance_total = 0
else:
distance_total = 1
distances[(i, j)] = min(
distances[(i-1,j)] + 1, # deletion
distances[(i,j-1)] + 1 # insertion
distances[(i-1,j-1)] + distance_total #substitution
)
if i and j and word1[i] == word2[j-1] and word1[i-1] == word2[j]:
distances[(i,j)] = min(distances[(i,j)], distances[i-2,j-2] + distance_total) # transposition
return distances[len_word1-1,len_word2-1]

there is an error should be:
,#insertion

Looks like you've fixed this issue, but if you don't want to implement all of these yourself, you can use the jellyfish package found in pypi: https://pypi.python.org/pypi/jellyfish. I've used it to great success in the past.
It contains several distance functions, including Damerau-Levenshtein distances.

Related

NameError: Name 'ShiftArr' is not defined

# indices to calculate pair-wise products (H, V, D1, D2)
shifts = [[0,1], [1,0], [1,1], [-1,1]]
# calculate pairwise components in each orientation
for itr_shift in range(1, len(shifts) + 1):
OrigArr = structdis
reqshift = shifts[itr_shift-1] # shifting index
for i in range(structdis.shape[0]):
for j in range(structdis.shape[1]):
if(i + reqshift[0] >= 0 and i + reqshift[0] < structdis.shape[0] \
and j + reqshift[1] >= 0 and j + reqshift[1] < structdis.shape[1]):
ShiftArr[i, j] = OrigArr[i + reqshift[0], j + reqshift[1]]
else:
ShiftArr[i, j] = 0
If I try to run the code, I get the following error:
NameError: Name 'ShiftArr' is not defined
How can I solve this error?

By the looks of things you have not defined ShiftArr before you have used it. This is what the error is saying.
It looks like you first use ShiftArr in your nested loop, but nowhere before have you said something like ShiftArr = ...
If you add ShiftArr = [] before your first for loop, this should solve your issue I think. It's a little difficult to understand what you're trying to do as your variable names aren't super informative - this might help you when drying to fix errors in your code.

using jacobi method to solve laplace equation PYTHON

I am fairly new to python and am trying to recreate the electric potential in a metal box using the laplace equation and the jacobi method. I have written a code that seems to work initially, however I am getting the error: IndexError: index 8 is out of bounds for axis 0 with size 7 and can not figure out why. any help would be awesome!
from visual import*
from visual.graph import*
import numpy as np
lenx = leny = 7
delta = 2
vtop = [-1,-.67,-.33,.00,.33,.67,1]
vbottom = [-1,-.67,-.33,.00,.33,.67,1]
vleft = -1
vright = 1
vguess= 0
x,y = np.meshgrid(np.arange(0,lenx), np.arange(0,leny))
v = np.empty((lenx,leny))
v.fill(vguess)
v[(leny-1):,:] = vtop
v [:1,:] = vbottom
v[:,(lenx-1):] = vright
v[:,:1] = vleft
maxit = 500
for iteration in range (0,maxit):
for i in range(1,lenx):
for j in range(1,leny-1):
v[i,j] = .25*(v[i+i][j] + v[i-1][j] + v[i][j+1] + v[i][j-1])
print v

Just from a quick glance at your code it seems as though the indexing error is happening at this part and can be changed accordingly:
# you had v[i+i][j] instead if v[i+1][j]
v[i,j] = .25*(v[i+1][j] + v[i-1][j] + v[i][j+1] + v[i][j-1])
You simply added and extra i to your indexing which would have definitely been out of range

Find length of a string that includes its own length?

I want to get the length of a string including a part of the string that represents its own length without padding or using structs or anything like that that forces fixed lengths.
So for example I want to be able to take this string as input:
"A string|"
And return this:
"A string|11"

On the basis of the OP tolerating such an approach (and to provide an implementation technique for the eventual python answer), here's a solution in Java.
final String s = "A String|";
int n = s.length(); // `length()` returns the length of the string.
String t; // the result
do {
t = s + n; // append the stringified n to the original string
if (n == t.length()){
return t; // string length no longer changing; we're good.
}
n = t.length(); // n must hold the total length
} while (true); // round again
The problem of, course, is that in appending n, the string length changes. But luckily, the length only ever increases or stays the same. So it will converge very quickly: due to the logarithmic nature of the length of n. In this particular case, the attempted values of n are 9, 10, and 11. And that's a pernicious case.

A simple solution is :
def addlength(string):
n1=len(string)
n2=len(str(n1))+n1
n2 += len(str(n2))-len(str(n1)) # a carry can arise
return string+str(n2)
Since a possible carry will increase the length by at most one unit.
Examples :
In [2]: addlength('a'*8)
Out[2]: 'aaaaaaaa9'
In [3]: addlength('a'*9)
Out[3]: 'aaaaaaaaa11'
In [4]: addlength('a'*99)
Out[4]: 'aaaaa...aaa102'
In [5]: addlength('a'*999)
Out[5]: 'aaaa...aaa1003'

Here is a simple python port of Bathsheba's answer :
def str_len(s):
n = len(s)
t = ''
while True:
t = s + str(n)
if n == len(t):
return t
n = len(t)
This is a much more clever and simple way than anything I was thinking of trying!
Suppose you had s = 'abcdefgh|, On the first pass through, t = 'abcdefgh|9
Since n != len(t) ( which is now 10 ) it goes through again : t = 'abcdefgh|' + str(n) and str(n)='10' so you have abcdefgh|10 which is still not quite right! Now n=len(t) which is finally n=11 you get it right then. Pretty clever solution!

It is a tricky one, but I think I've figured it out.
Done in a hurry in Python 2.7, please fully test - this should handle strings up to 998 characters:
import sys
orig = sys.argv[1]
origLen = len(orig)
if (origLen >= 98):
extra = str(origLen + 3)
elif (origLen >= 8):
extra = str(origLen + 2)
else:
extra = str(origLen + 1)
final = orig + extra
print final
Results of very brief testing
C:\Users\PH\Desktop>python test.py "tiny|"
tiny|6
C:\Users\PH\Desktop>python test.py "myString|"
myString|11
C:\Users\PH\Desktop>python test.py "myStringWith98Characters.........................................................................|"
myStringWith98Characters.........................................................................|101

Just find the length of the string. Then iterate through each value of the number of digits the length of the resulting string can possibly have. While iterating, check if the sum of the number of digits to be appended and the initial string length is equal to the length of the resulting string.
def get_length(s):
s = s + "|"
result = ""
len_s = len(s)
i = 1
while True:
candidate = len_s + i
if len(str(candidate)) == i:
result = s + str(len_s + i)
break
i += 1

This code gives the result.
I used a few var, but at the end it shows the output you want:
def len_s(s):
s = s + '|'
b = len(s)
z = s + str(b)
length = len(z)
new_s = s + str(length)
new_len = len(new_s)
return s + str(new_len)
s = "A string"
print len_s(s)

Here's a direct equation for this (so it's not necessary to construct the string). If s is the string, then the length of the string including the length of the appended length will be:
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
The idea here is that a direct calculation is only problematic when the appended length will push the length past a power of ten; that is, at 9, 98, 99, 997, 998, 999, 9996, etc. To work this through, 1 + int(log10(len(s))) is the number of digits in the length of s. If we add that to len(s), then 9->10, 98->100, 99->101, etc, but still 8->9, 97->99, etc, so we can push past the power of ten exactly as needed. That is, adding this produces a number with the correct number of digits after the addition. Then do the log again to find the length of that number and that's the answer.
To test this:
from math import log10
def find_length(s):
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
return L1
# test, just looking at lengths around 10**n
for i in range(9):
for j in range(30):
L = abs(10**i - j + 10) + 1
s = "a"*L
x0 = find_length(s)
new0 = s+`x0`
if len(new0)!=x0:
print "error", len(s), x0, log10(len(s)), log10(x0)

correct formation of if elif else statement in Python

Please find the below code block in python 2.7.
for i in range(len(p_routes)):
if len(p_routes[i]) > 2 :
if p_routes[i][2] == 'No Backup':
K = K + 1
for z in range(len(p_routes[i])):
nbup.write(K + 1 , z , p_routes[i][z])
elif p_routes[i][0][0] == 'E' :
L = L + 1
for z in range(len(p_routes[i])):
ex.write(L, z , (p_routes[i][z])
elif p_routes[i][0][0] == 'G':
M = M + 1
for z in range(len(p_routes[i]))
gh.write(M ,z, p_routes[i][z])
else len(p_routes[i]) < 2:
pass
print "\nFor some reason. "
Well, I am getting an syntax error saying elif p_routes[i][0][0] == 'G': . I couldn't figure out why this error is coming as I believe there is no syntax error in this line.
The ex and gh are two excel sheet variable created before this code block. And p_routes is a list of list of 2 degrees. The format is like p_routes = [['prov1' , 'address1' , 'No Backup'] , ['prov2', 'address2', 'Back1', 'Back2' ]]
You might have understood, that the inner list length is a variable size. Any advise would be much appreciated. Sorry for the silly question but I did a lot of searching and re-formatting my if..else block in number of ways. But every time I am getting this error.
By the way previously the syntax error was with L = L + 1. Funny! Then I changed the type of L by L = int(L). Now, that error is gone.

Notes:
Never forget to close the ( with )
Else will execute if none of the above case condition was right so you should not give any condition to else statement
Don't forget : in if else for.....
Changes to your code:
for i in range(len(p_routes)):
if len(p_routes[i]) > 2 :
if p_routes[i][2] == 'No Backup':
K = K + 1
for z in range(len(p_routes[i])):
nbup.write(K + 1 , z , p_routes[i][z])
elif p_routes[i][0][0] == 'E' :
L = L + 1
for z in range(len(p_routes[i])):
ex.write(L, z , (p_routes[i][z]))
elif p_routes[i][0][0] == 'G':
M = M + 1
for z in range(len(p_routes[i])):
gh.write(M ,z, p_routes[i][z])
else :
pass
print "\nFor some reason. "

First off as Vignesh pointed out, your error is actually on the previous line as you forgot to close your parenthesis ( )
Second, the else clause for the if, elif, else structure does not require a check.
Here is a video I made a while ago with how selection works in python linked to relevant time
(May not be relevant)
Also keep in mind with your current logic, what happens if: len(p_routes[i]) is 2? you currently only check if it's less than two or greater than 2.

With syntax errors, it is always wise to have a look at the preceding line to make sure it is also correct. As you missed the closing ), Python kept looking for it on the next line.
There are a number of areas you could make the code a bit cleaner. For example it is not necessary to keep using range(len(x)) when you can just iterate over the list itself.
Hopefully you find the following ideas helpful:
for route in p_routes:
length = len(route)
if length > 2 :
if route[2] == 'No Backup':
K += 1
for z in range(length):
nbup.write(K + 1, z, p_routes[i][z])
elif route[0][0] == 'E':
L += 1
for z in range(length):
ex.write(L, z, (p_routes[i][z]))
elif route[0][0] == 'G':
M += 1
for z in range(length):
gh.write(M ,z, p_routes[i][z])
elif length == 2:
print "It is equal to 2"
else:
print "It must be less than 2"
Note, if x > 2 followed by an else, the else would imply the value is <= 2.
It is also possible to add one to a variable as follows: L += 1, this is shorthand for L = L + 1.

Inbuilt Python Function for String Comparison like N-gram

Is there is any inbuilt function in Python Which performs like Ngram.Compare('text','text2') String Comparison.I don't want to install N-gram module.I tried all the Public and Private Functions which i got by doing dir('text')
I want to get a percentage Match on comparison of two strings.

You want the Levenshtein distance which is implemented through
http://pypi.python.org/pypi/python-Levenshtein/
Not wanting to install something means: you have to write the code yourself.
http://en.wikipedia.org/wiki/Levenshtein_distance

difflib in the standard library.
You can also do a Levenshtein distance:
def lev(seq1, seq2):
oneago = None
thisrow = range(1, len(seq2) + 1) + [0]
for x in xrange(len(seq1)):
twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
for y in xrange(len(seq2)):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len(seq2) - 1]
def di(seq1,seq2):
return float(lev(seq1,seq2))/min(len(seq1),len(seq2))
print lev('spa','spam')
print di('spa','spam')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Damerau-Levenshtein distance code throwing errors? - python

there is an error should be: ,#insertion

Related

NameError: Name 'ShiftArr' is not defined

using jacobi method to solve laplace equation PYTHON

Find length of a string that includes its own length?

correct formation of if elif else statement in Python

Inbuilt Python Function for String Comparison like N-gram

Categories

Resources