Comparing Strings - python

Does there exist any inbuilt function in python than can return number of mathching characters in two strings,for example:
INPUT:
TICK TOCK
CAT DOG
APPLE APPLES
OUTPUT:
3
0
5
The words "TICK" and "TOCK" have a score of 3, since three characters (T, C, K) are the same. Similarly, "CAT" and "DOG" score 0, since no letters match.
I am a new bie in python so please help me with examples.

Here's a version using list comprehensions:
[x == y for (x, y) in zip("TICK", "TOCK")].count(True)
Or, shorter (using operator):
import operator
map(operator.eq, "TICK", "TOCK").count(True)
According to #Kabie, <expr>.count(True) can be replaced by sum(<expr>) in both versions.

There is no built-in function. But you can do it using some simple expressions,.
>>> A, B = sorted("APPLE APPLES".split(), key=len)
>>> len([e for e in A if e in B])
5

If the position and order of the characters are important, then the chosen answer would suffice. The problem is, the given solution will not work if that is not the case.
If position is not important, but the order is, you could write a function that returns the length of the longest common subsequence. Here is a sample implementation:
def lcs(string1, string2):
m = len(string1)
n = len(string2)
C = [[0] * (n + 1)] * (m + 1)
for i in range(m + 1)[1:]:
for j in range(n + 1)[1:]:
if string1[i - 1] == string2[j - 1]:
C[i][j] = C[i - 1][j - 1] + 1
else:
C[i][j] = max(C[i][j - 1], C[i - 1][j])
return C[m][n]
If position and order does not matter, you can use collections.Counter (Python 2.7/3.1; or http://code.activestate.com/recipes/576611/) like so:
def f(string1, string2):
set_string1 = Counter(string1)
set_string2 = Counter(string2)
# get common characters
common = set_string1 & set_string2
# return the sum of the number of occurrences for each character
return reduce(lambda a, b: a + b, common.values())

Yes you import operator by writing
import operator
and use operator.eq method like this:
import operator
operator.eq(String, String)

Hope this will help:
def CommonLetters(s1, s2):
l1=list(''.join(s1.split()))
l2=list(''.join(s2.split()))
return [x for x in l1 if x in l2]
x= CommonLetters('cerberus', 'atorb')
print len(x)

Related

How to create an iterator with Python itertools that returns progressively larger repeats? [duplicate]

I would like to make a alphabetical list for an application similar to an excel worksheet.
A user would input number of cells and I would like to generate list.
For example a user needs 54 cells. Then I would generate
'a','b','c',...,'z','aa','ab','ac',...,'az', 'ba','bb'
I can generate the list from [ref]
from string import ascii_lowercase
L = list(ascii_lowercase)
How do i stitch it together?
A similar question for PHP has been asked here. Does some one have the python equivalent?
Use itertools.product.
from string import ascii_lowercase
import itertools
def iter_all_strings():
for size in itertools.count(1):
for s in itertools.product(ascii_lowercase, repeat=size):
yield "".join(s)
for s in iter_all_strings():
print(s)
if s == 'bb':
break
Result:
a
b
c
d
e
...
y
z
aa
ab
ac
...
ay
az
ba
bb
This has the added benefit of going well beyond two-letter combinations. If you need a million strings, it will happily give you three and four and five letter strings.
Bonus style tip: if you don't like having an explicit break inside the bottom loop, you can use islice to make the loop terminate on its own:
for s in itertools.islice(iter_all_strings(), 54):
print s
You can use a list comprehension.
from string import ascii_lowercase
L = list(ascii_lowercase) + [letter1+letter2 for letter1 in ascii_lowercase for letter2 in ascii_lowercase]
Following #Kevin 's answer :
from string import ascii_lowercase
import itertools
# define the generator itself
def iter_all_strings():
size = 1
while True:
for s in itertools.product(ascii_lowercase, repeat=size):
yield "".join(s)
size +=1
The code below enables one to generate strings, that can be used to generate unique labels for example.
# define the generator handler
gen = iter_all_strings()
def label_gen():
for s in gen:
return s
# call it whenever needed
print label_gen()
print label_gen()
print label_gen()
I've ended up doing my own.
I think it can create any number of letters.
def AA(n, s):
r = n % 26
r = r if r > 0 else 26
n = (n - r) / 26
s = chr(64 + r) + s
if n > 26:
s = AA(n, s)
elif n > 0:
s = chr(64 + n) + s
return s
n = quantity | r = remaining (26 letters A-Z) | s = string
To print the list :
def uprint(nc):
for x in range(1, nc + 1):
print AA(x,'').lower()
Used VBA before convert to python :
Function AA(n, s)
r = n Mod 26
r = IIf(r > 0, r, 26)
n = (n - r) / 26
s = Chr(64 + r) & s
If n > 26 Then
s = AA(n, s)
ElseIf n > 0 Then
s = Chr(64 + n) & s
End If
AA = s
End Function
Using neo's insight on a while loop.
For a given iterable with chars in ascending order. 'abcd...'.
n is the Nth position of the representation starting with 1 as the first position.
def char_label(n, chars):
indexes = []
while n:
residual = n % len(chars)
if residual == 0:
residual = len(chars)
indexes.append(residual)
n = (n - residual)
n = n // len(chars)
indexes.reverse()
label = ''
for i in indexes:
label += chars[i-1]
return label
Later you can print a list of the range n of the 'labels' you need using a for loop:
my_chrs = 'abc'
n = 15
for i in range(1, n+1):
print(char_label(i, my_chrs))
or build a list comprehension etc...
Print the set of xl cell range of lowercase and uppercase charterers
Upper_case:
from string import ascii_uppercase
import itertools
def iter_range_strings(start_colu):
for size in itertools.count(1):
for string in itertools.product(ascii_uppercase, repeat=size):
yield "".join(string)
input_colume_range = ['A', 'B']
input_row_range= [1,2]
for row in iter_range_strings(input_colume_range[0]):
for colum in range(int(input_row_range[0]), int(input_row_range[1]+1)):
print(str(row)+ str(colum))
if row == input_colume_range[1]:
break
Result:
A1
A2
B1
B2
In two lines (plus an import):
from string import ascii_uppercase as ABC
count = 100
ABC+=' '
[(ABC[x[0]] + ABC[x[1]]).strip() for i in range(count) if (x:= divmod(i-26, 26))]
Wrap it in a function/lambda if you need to reuse.
code:
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
for i in range(len(alphabet)):
for a in range(len(alphabet)):
print(alphabet[i] + alphabet[a])
result:
aa
ab
ac
ad
ae
af
ag
ah
ai
aj
ak
al
am
...

Check if string contains a certain amount of words of another string

Say we have a string 1 A B C D E F and a string 2 B D E (The letters are just for demo, in reality they are words). Now I would like to find out if there are any n conscutive "words" from string 2 in string 1. To convert the string to "words", I'd use string.split().
For example for n equals 2, I would like to check whether B D or D E is - in this order - in string 1. B D is not in this order in the string, but D E is.
Does anyone see a pythonic way of doing this?
I do have a solution for n equals 2 but realized that I need it for arbitrary n. Also it is not particularily beautiful:
def string_contains_words_of_string(words_str, words_to_check_str):
words = words_str.split()
words_to_check = words_to_check_str.split()
found_word_index = None
for word in words:
start = 0 if found_word_index is None else found_word_index + 1
for i, word_to_check in enumerate(words_to_check[start:]):
if word_to_check == word:
if found_word_index is not None:
return True
found_word_index = i
break
else:
found_word_index = None
return False
This is easy with a regex:
>>> import re
>>> st1='A B C D E F'
>>> st2='B D E'
>>> n=2
>>> pat=r'(?=({}))'.format(r's+'.join(r'\w+' for i in range(n)))
>>> print [(s, s in st1) for s in re.findall(pat, st2)]
[('B D', False), ('D E', True)]
The key is to use a zero width look ahead to find overlapping matches in the string. So:
>>> re.findall('(?=(\\w+\\s+\\w+))', 'B D E')
['B D', 'D E']
Now build that for n repetitions of the word found by \w+ with:
>>> n=2
>>> r'(?=({}))'.format(r's\+'.join(r'\w+' for i in range(n)))
'(?=(\\w+\\s+\\w+))'
Now since you have two strings, use Python's in operator to produce a tuple of the result of s from the regex matches to the target string.
Of course if you want a non-regex to do this, just produce substrings n words by n:
>>> li=st2.split()
>>> n=2
>>> [(s, s in st1) for s in (' '.join(li[i:i+n]) for i in range(len(li)-n+1))]
[('B D', False), ('D E', True)]
And if you want the index (either method) you can use str.find:
>>> [(s, st1.find(s)) for s in (' '.join(li[i:i+n]) for i in range(len(li)-n+1))
... if s in st1]
[('D E', 6)]
For regex that goes word by word, make sure you use a word boundary anchor:
>>> st='wordW wordX wordY wordZ'
>>> re.findall(r'(?=(\b\w+\s\b\w+))', st)
['wordW wordX', 'wordX wordY', 'wordY wordZ']
you could build ngrams like so:
a = 'this is an example, whatever'.split()
b = 'this is another example, whatever'.split()
def ngrams(string, n):
return set(zip(*[string[i:] for i in range(n)]))
def common_ngrams(string1, string2, n):
return ngrams(string1, n) & ngrams(string2, n)
results:
print(common_ngrams(a, b, 2))
{('this', 'is'), ('example,', 'whatever')}
print(common_ngrams(a, b, 1))
{('this',), ('is',), ('example,',), ('whatever',)}
Note that the tricky bit is in the ngrams function with the zip function
zip(*[string[i:] for i in range(n)]
This is essentialy the same as
zip(string, string[1:], string[2:])
for n = 3.
Also note that we're using sets of tuples, this is the best performance wise...
Lets say you have two strings (this can as easily be solved for strings containing more than just one letter each)
a = 'this is a beautiful day'
b = 'this day is awful'
Then to get all the words of b that also belong to a you write
x = [x for x in b.split() if x in a.split()]
Now x contains (after one line of code)
['this', 'day', 'is']
Then you check whether the serial combinations of x (from 0 up len(x)) belong in b
for i in range(len(x)):
for j in range(i, len(x)+1):
word = ' '.join(x[i:j])
if word in b:
print(word)
The Example prints the (order preservig) combinations of b's words that are also present in a in the same order (it takes a small tweak in the if statement of the nested for)
The longest common substring algorithm will work here, if you pass in a split list instead of a plain string - with the added bonus that it will also give the longest string made from the longest run of characters if you pass in the unsplit string.
def longest_common_substring(s1, s2):
m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
longest, x_longest = 0, 0
for x in xrange(1, 1 + len(s1)):
for y in xrange(1, 1 + len(s2)):
if s1[x - 1] == s2[y - 1]:
m[x][y] = m[x - 1][y - 1] + 1
if m[x][y] > longest:
longest = m[x][y]
x_longest = x
else:
m[x][y] = 0
return s1[x_longest - longest: x_longest]

Sum of even integers from a to b in Python

This is my code:
def sum_even(a, b):
count = 0
for i in range(a, b, 1):
if(i % 2 == 0):
count += [i]
return count
An example I put was print(sum_even(3,7)) and the output is 0. I cannot figure out what is wrong.
Your indentation is off, it should be:
def sum_even(a, b):
count = 0
for i in range(a, b, 1):
if(i % 2 == 0):
count += i
return count
so that return count doesn't get scoped to your for loop (in which case it would return on the 1st iteration, causing it to return 0)
(And change [i] to i)
NOTE: another problem - you should be careful about using range:
>>> range(3,7)
[3, 4, 5, 6]
so if you were to do calls to:
sum_even(3,7)
sum_even(3,8)
right now, they would both output 10, which is incorrect for sum of even integers between 3 and 8, inclusive.
What you really want is probably this instead:
def sum_even(a, b):
return sum(i for i in range(a, b + 1) if i % 2 == 0)
Move the return statement out of the scope of the for loop (otherwise you will return on the first loop iteration).
Change count += [i] to count += i.
Also (not sure if you knew this), range(a, b, 1) will contain all the numbers from a to b - 1 (not b). Moreover, you don't need the 1 argument: range(a,b) will have the same effect. So to contain all the numbers from a to b you should use range(a, b+1).
Probably the quickest way to add all the even numbers from a to b is
sum(i for i in xrange(a, b + 1) if not i % 2)
You can make it far simpler than that, by properly using the step argument to the range function.
def sum_even(a, b):
return sum(range(a + a%2, b + 1, 2))
You don't need the loop; you can use simple algebra:
def sum_even(a, b):
if (a % 2 == 1):
a += 1
if (b % 2 == 1):
b -= 1
return a * (0.5 - 0.25 * a) + b * (0.25 * b + 0.5)
Edit:
As NPE pointed out, my original solution above uses floating-point maths. I wasn't too concerned, since the overhead of floating-point maths is negligible compared with the removal of the looping (e.g. if calling sum_even(10, 10000)). Furthermore, the calculations use (negative) powers of two, so shouldn't be subject by rounding errors.
Anyhow, with the simple trick of multiplying everything by 4 and then dividing again at the end we can use integers throughout, which is preferable.
def sum_even(a, b):
if (a % 2 == 1):
a += 1
if (b % 2 == 1):
b -= 1
return (a * (2 - a) + b * (2 + b)) // 4
I'd like you see how your loops work if b is close to 2^32 ;-)
As Matthew said there is no loop needed but he does not explain why.
The problem is just simple arithmetic sequence wiki. Sum of all items in such sequence is:
(a+b)
Sn = ------- * n
2
where 'a' is a first item, 'b' is last and 'n' is number if items.
If we make 'a' and b' even numbers we can easily solve given problem.
So making 'a' and 'b' even is just:
if ((a & 1)==1):
a = a + 1
if ((b & 1)==1):
b = b - 1
Now think how many items do we have between two even numbers - it is:
b-a
n = --- + 1
2
Put it into equation and you get:
a+b b-a
Sn = ----- * ( ------ + 1)
2 2
so your code looks like:
def sum_even(a,b):
if ((a & 1)==1):
a = a + 1
if ((b & 1)==1):
b = b - 1
return ((a+b)/2) * (1+((b-a)/2))
Of course you may add some code to prevent a be equal or bigger than b etc.
Indentation matters in Python. The code you write returns after the first item processed.
This might be a simple way of doing it using the range function.
the third number in range is a step number, i.e, 0, 2, 4, 6...100
sum = 0
for even_number in range(0,102,2):
sum += even_number
print (sum)
def sum_even(a,b):
count = 0
for i in range(a, b):
if(i % 2 == 0):
count += i
return count
Two mistakes here :
add i instead of [i]
you return the value directly at the first iteration. Move the return count out of the for loop
The sum of all the even numbers between the start and end number (inclusive).
def addEvenNumbers(start,end):
total = 0
if end%2==0:
for x in range(start,end):
if x%2==0:
total+=x
return total+end
else:
for x in range(start,end):
if x%2==0:
total+=x
return total
print addEvenNumbers(4,12)
little bit more fancy with advanced python feature.
def sum(a,b):
return a + b
def evensum(a,b):
a = reduce(sum,[x for x in range(a,b) if x %2 ==0])
return a
SUM of even numbers including min and max numbers:
def sum_evens(minimum, maximum):
sum=0
for i in range(minimum, maximum+1):
if i%2==0:
sum = sum +i
i= i+1
return sum
print(sum_evens(2, 6))
OUTPUT is : 12
sum_evens(2, 6) -> 12 (2 + 4 + 6 = 12)
List based approach,
Use b+1 if you want to include last value.
def sum_even(a, b):
even = [x for x in range (a, b) if x%2 ==0 ]
return sum(even)
print(sum_even(3,6))
4
[Program finished]
This will add up all your even values between 1 and 10 and output the answer which is stored in the variable x
x = 0
for i in range (1,10):
if i %2 == 0:
x = x+1
print(x)

Testing equivalence of mathematical expressions in Python

I have got two strings in Python,
A m * B s / (A m + C m)
and
C m * B s / (C m + A m)
that are both equivalent functions of the unordered set (A, C) and the unordered set (B). m and s indicate units that can be swapped among the same but not with another unit.
So far, I'm doing permutations of A, B, and C and testing them using eval and SymPy's == operator. This has multiple drawbacks:
for more complicated expressions, I have to generate a large number of permutations (in my case 8 nested for loops)
I need to define A, B, C as symbols, which is not optimal when I don't know which parameters I will have (so I have to generate all of them -> terribly inefficient and messing up my variable namespace)
Is there a pythonian way to test for this kind of equivalence? It should work an arbitrary expressions.
Here is a simplified approach based on my previous answer.
The idea is that if two expressions are equivalent under permutations, the permutation carrying one to the other must map the ith symbol in the first string (ordered by index of first occurrence) to the ith symbol in the second string (again ordered by index of first occurrence). This principle can be used to construct a permutation, apply it to the first string and then check for equality with the second string - if they are equal they are equivalent, otherwise they are not.
Here is one possible implementation:
import re
# Unique-ify list, preserving order
def uniquify(l):
return reduce(lambda s, e: s + ([] if e in s else [e]), l, [])
# Replace all keys in replacements with corresponding values in str
def replace_all(str, replacements):
for old, new in replacements.iteritems():
str = str.replace(old, new)
return str
class Expression:
units = ["m", "s"]
def __init__(self, exp):
self.exp = exp
# Returns a list of symbols in the expression that are preceded
# by the given unit, ordered by first appearance. Assumes the
# symbol and unit are separated by a space. For example:
# Expression("A m * B s / (A m + C m)").symbols_for_unit("m")
# returns ['A', 'C']
def symbols_for_unit(self, unit):
sym_re = re.compile("(.) %s" % unit)
symbols = sym_re.findall(self.exp)
return uniquify(symbols)
# Returns a string with all symbols that have units other than
# unit "muted", that is replaced with the empty string. Example:
# Expression("A m * B s / (A m + C m)").mute_symbols_for_other_units("m")
# returns "A m * s / (A m + C m)"
def mute_symbols_for_other_units(self, unit):
other_units = "".join(set(self.units) - set(unit))
return re.sub("(.) ([%s])" % "".join(other_units), " \g<2>", self.exp)
# Returns a string with all symbols that have the given unit
# replaced with tokens of the form $0, $1, ..., by order of their
# first appearance in the string, and all other symbols muted.
# For example:
# Expression("A m * B s / (A m + C m)").canonical_form("m")
# returns "$0 m * s / ($0 m + $1 m)"
def canonical_form(self, unit):
symbols = self.symbols_for_unit(unit)
muted_self = self.mute_symbols_for_other_units(unit)
for i, sym in enumerate(symbols):
muted_self = muted_self.replace("%s %s" % (sym, unit), "$%s %s" % (i, unit))
return muted_self
# Define a permutation, represented as a dictionary, according to
# the following rule: replace $i with the ith distinct symbol
# occurring in the expression with the given unit. For example:
# Expression("C m * B s / (C m + A m)").permutation("m")
# returns {'$0':'C', '$1':'A'}
def permutation(self, unit):
enum = enumerate(self.symbols_for_unit(unit))
return dict(("$%s" % i, sym) for i, sym in enum)
# Return a string produced from the expression by first converting it
# into canonical form, and then performing the replacements defined
# by the given permutation. For example:
# Expression("A m * B s / (A m + C m)").permute("m", {"$0":"C", "$1":"A"})
# returns "C m * s / (C m + A m)"
def permute(self, unit, permutation):
new_exp = self.canonical_form(unit)
return replace_all(new_exp, permutation)
# Test for equality under permutation and muting of all other symbols
# than the unit provided.
def eq_under_permutation(self, unit, other_exp):
muted_self = self.mute_symbols_for_other_units(unit)
other_permuted_str = other_exp.permute(unit, self.permutation(unit))
return muted_self == other_permuted_str
# Test for equality under permutation. This is done for each of
# the possible units using eq_under_permutation
def __eq__(self, other):
return all([self.eq_under_permutation(unit, other) for unit in self.units])
e1 = Expression("A m * B s / (A m + C m)")
e2 = Expression("C m * B s / (C m + A m)")
e3 = Expression("A s * B s / (A m + C m)")
f1 = Expression("A s * (B s + D s) / (A m + C m)")
f2 = Expression("A s * (D s + B s) / (C m + A m)")
f3 = Expression("D s")
print "e1 == e2: ", e1 == e2 # True
print "e1 == e3: ", e1 == e3 # False
print "e2 == e3: ", e2 == e3 # False
print "f1 == f2: ", f1 == f2 # True
print "f1 == f3: ", f1 == f3 # False
As you pointed out, this checks for string equivalence under permutations without any regard to mathematical equivalence, but it is half the battle. If you had a canonical form for mathematical expressions, you could use this approach on two expressions in canonical form. Perhaps one of sympy's Simplify could do the trick.
Instead of iterating over all possible permutations, assume one exists and attempt to construct it. I believe that done in the right way, failure of the algorithm would imply inexistence of the permutation.
Here is the outline of the idea applied to the expressions above:
let:
str1 = "A m * B s / (A m + C m)"
str2 = "C m * B s / (C m + A m)"
We're looking for a permutation of the set (A, C) that would render the expressions identical. Relabel A and C as X1 and X2 according to the order of their first appearance in str2, so:
X1 = C
X2 = A
because C appears before A in str2. Next, create the array Y such that y[i] is the ith symbol A or C in order of first appearance in str1. So:
Y[1] = A
Y[2] = C
Because A appears before C in str1.
Now construct str3 from str2 by replacing A and C with X1 and X2:
str3 = "X1 m * B s / (X1 m + X2 m)"
And then start substituting Xi for Y[i]. First, X1 becomes Y[1]=A:
str3_1 = "A m * Bs / (A m + X2 m)"
At this stage, compare str3_1 and str1 up to the first occurrence of any of the Xi's, in this case X2, so because these two strings are equal:
str3_1[:18] = "A m * B s / (A m + "
str1[:18] = "A m * B s / (A m + "
You have a chance of constructing the permutation. If they were unequal, you'd have proven no suitable permutation exists (because any permutation would have had to make at least that substitution) and could deduce inequivalence. But they are equal, so you proceed to the next step, substituting X2 for Y[2]=C:
str3_2 = "A m * B s / (A m + C m)"
And this is equal to str1, so you have your permutation (A->C, C->A) and have shown the equivalence of the expressions.
This is only a demonstration of the algorithm to a particular case, but it should generalize. Not sure what the lowest order you could get it down to is, but it should be quicker than the n! of generating all permutations on n variables.
If I understand the significance of the units correctly, they limit which variables may be swapped for which others by the permutations. So that A can be substituted with C in the above expressions because both have 'm' units, but not with B which has 's' units. You can handle this in the following way:
construct expressions str1_m and str2_m from str1 and str2 by removing all symbols that don't have m units, and then carry out the above algorithm for str1_m and str2_m. If construction fails, no permutation exists. If construction succeeds, keep that permutation (call it the m-permutation) and construct str1_s and str2_s from str1 and str2 by removing all symbols that don't have s units, then carry out the algorithm again for str1_s and str2_s. If construction fails, they are not equivalent. If it succeeds, the final permutation will be a combination of the m-permutation and the s-permutation (although you probably don't even need to construct it, you just care that it exists).
If you pass a string to SymPy's sympify() function, it will automatically create the Symbols for you (no need to define them all).
>>> from sympy import *
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> sympify("x**2 + cos(x)")
x**2 + cos(x)
>>> sympify("diff(x**2 + cos(x), x)")
2*x - sin(x)
I did it once, in one simulater of mathemathics estudies..
Well, in my case, i knew what were the variables that will be used.
So, i tested the result putting values inside the vars.
A = 10
B = 20
C = 30
m = Math.e
s = Math.pi
And so, we solve:
s1 = 'A m * B s / (A m + C m)'
s2 = 'C m * B s / (C m + A m)'
If s1 != s2, was proved there isn't equivalence
With this method is impossible say that two expressions are equivalent,
But you can say that both isn't equivalent
if s1 != s2:
print "Not equivalent"
else:
print "Try with another sample"
Well.. I hope that this can help you.
This, like all other answers to date is not a robust solution to the problem, but instead contains more helpful information for our future meticulous friend to solve it.
I provide a difficult example using Euler's Formula https://en.wikipedia.org/wiki/Euler%27s_formula
I am certain all other overflow answers to date will not succeed in my example.
I show that all the suggestions on sympy's website also fail on my example. (https://github.com/sympy/sympy/wiki/Faq)
#SOURCE FOR HELPERS: https://github.com/sympy/sympy/wiki/Faq
import sympy
import sympy.parsing.sympy_parser
ExampleExpressionString1 = 'exp( i*( (x0 - 1)*(x0 + 2) ) )'
ExampleExpressionSympy1 = sympy.parsing.sympy_parser.parse_expr(ExampleExpressionString1)
ExampleExpressionString2 = 'i*sin( (x0 - 1)*(x0 + 2) ) + cos( (x0 - 1)*(x0 + 2) )'
ExampleExpressionSympy2 = sympy.parsing.sympy_parser.parse_expr(ExampleExpressionString2)
print '(ExampleExpressionSympy1 == ExampleExpressionSympy2):'
print ' ', (ExampleExpressionSympy1 == ExampleExpressionSympy2)
print '(ExampleExpressionSympy1.simplify() == ExampleExpressionSympy2.simplify()):'
print ' ', (ExampleExpressionSympy1.simplify() == ExampleExpressionSympy2.simplify())
print '(ExampleExpressionSympy1.expand() == ExampleExpressionSympy2.expand()):'
print ' ', (ExampleExpressionSympy1.trigsimp() == ExampleExpressionSympy2.trigsimp())
print '(ExampleExpressionSympy1.trigsimp() == ExampleExpressionSympy2.trigsimp()):'
print ' ', (ExampleExpressionSympy1.trigsimp() == ExampleExpressionSympy2.trigsimp())
print '(ExampleExpressionSympy1.simplify().expand().trigsimp() == ExampleExpressionSympy2.simplify().expand().trigsimp()):'
print ' ', (ExampleExpressionSympy1.simplify().expand().trigsimp() == ExampleExpressionSympy2.simplify().expand().trigsimp())
MORE NOTES:
I suspect this is a difficult problem to solve generically, and robustly. To properly check mathematical equivalence, you not only have to try order permutations, but you also have to have a library of mathematical equivalent transformations and try all those permutations as well.
I do however believe this might be a solvable problem, because Wolfram Alpha seems to have 'alternate expression' section, which seems to do the trick of providing all permutations most of the time on arbitrary expressions using these kinds of equivalences.
IN SUMMATION:
I suggest the following with the expectation that it will break:
import sympy
import sympy.parsing.sympy_parser
Expression.simplify().expand().trigsimp()

Average of two strings in alphabetical/lexicographical order

Suppose you take the strings 'a' and 'z' and list all the strings that come between them in alphabetical order: ['a','b','c' ... 'x','y','z']. Take the midpoint of this list and you find 'm'. So this is kind of like taking an average of those two strings.
You could extend it to strings with more than one character, for example the midpoint between 'aa' and 'zz' would be found in the middle of the list ['aa', 'ab', 'ac' ... 'zx', 'zy', 'zz'].
Might there be a Python method somewhere that does this? If not, even knowing the name of the algorithm would help.
I began making my own routine that simply goes through both strings and finds midpoint of the first differing letter, which seemed to work great in that 'aa' and 'az' midpoint was 'am', but then it fails on 'cat', 'doggie' midpoint which it thinks is 'c'. I tried Googling for "binary search string midpoint" etc. but without knowing the name of what I am trying to do here I had little luck.
I added my own solution as an answer
If you define an alphabet of characters, you can just convert to base 10, do an average, and convert back to base-N where N is the size of the alphabet.
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def enbase(x):
n = len(alphabet)
if x < n:
return alphabet[x]
return enbase(x/n) + alphabet[x%n]
def debase(x):
n = len(alphabet)
result = 0
for i, c in enumerate(reversed(x)):
result += alphabet.index(c) * (n**i)
return result
def average(a, b):
a = debase(a)
b = debase(b)
return enbase((a + b) / 2)
print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('cat', 'doggie') #budeel
print average('google', 'microsoft') #gebmbqkil
print average('microsoft', 'google') #gebmbqkil
Edit: Based on comments and other answers, you might want to handle strings of different lengths by appending the first letter of the alphabet to the shorter word until they're the same length. This will result in the "average" falling between the two inputs in a lexicographical sort. Code changes and new outputs below.
def pad(x, n):
p = alphabet[0] * (n - len(x))
return '%s%s' % (x, p)
def average(a, b):
n = max(len(a), len(b))
a = debase(pad(a, n))
b = debase(pad(b, n))
return enbase((a + b) / 2)
print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('aa', 'az') #m (equivalent to ma)
print average('cat', 'doggie') #cumqec
print average('google', 'microsoft') #jlilzyhcw
print average('microsoft', 'google') #jlilzyhcw
If you mean the alphabetically, simply use FogleBird's algorithm but reverse the parameters and the result!
>>> print average('cat'[::-1], 'doggie'[::-1])[::-1]
cumdec
or rewriting average like so
>>> def average(a, b):
... a = debase(a[::-1])
... b = debase(b[::-1])
... return enbase((a + b) / 2)[::-1]
...
>>> print average('cat', 'doggie')
cumdec
>>> print average('google', 'microsoft')
jlvymlupj
>>> print average('microsoft', 'google')
jlvymlupj
It sounds like what you want, is to treat alphabetical characters as a base-26 value between 0 and 1. When you have strings of different length (an example in base 10), say 305 and 4202, your coming out with a midpoint of 3, since you're looking at the characters one at a time. Instead, treat them as a floating point mantissa: 0.305 and 0.4202. From that, it's easy to come up with a midpoint of .3626 (you can round if you'd like).
Do the same with base 26 (a=0...z=25, ba=26, bb=27, etc.) to do the calculations for letters:
cat becomes 'a.cat' and doggie becomes 'a.doggie', doing the math gives cat a decimal value of 0.078004096, doggie a value of 0.136390697, with an average of 0.107197397 which in base 26 is roughly "cumcqo"
Based on your proposed usage, consistent hashing ( http://en.wikipedia.org/wiki/Consistent_hashing ) seems to make more sense.
Thanks for everyone who answered, but I ended up writing my own solution because the others weren't exactly what I needed. I am trying to average app engine key names, and after studying them a bit more I discovered they actually allow any 7-bit ASCII characters in the names. Additionally I couldn't really rely on the solutions that converted the key names first to floating point, because I suspected floating point accuracy just isn't enough.
To take an average, first you add two numbers together and then divide by two. These are both such simple operations that I decided to just make functions to add and divide base 128 numbers represented as lists. This solution hasn't been used in my system yet so I might still find some bugs in it. Also it could probably be a lot shorter, but this is just something I needed to get done instead of trying to make it perfect.
# Given two lists representing a number with one digit left to decimal point and the
# rest after it, for example 1.555 = [1,5,5,5] and 0.235 = [0,2,3,5], returns a similar
# list representing those two numbers added together.
#
def ladd(a, b, base=128):
i = max(len(a), len(b))
lsum = [0] * i
while i > 1:
i -= 1
av = bv = 0
if i < len(a): av = a[i]
if i < len(b): bv = b[i]
lsum[i] += av + bv
if lsum[i] >= base:
lsum[i] -= base
lsum[i-1] += 1
return lsum
# Given a list of digits after the decimal point, returns a new list of digits
# representing that number divided by two.
#
def ldiv2(vals, base=128):
vs = vals[:]
vs.append(0)
i = len(vs)
while i > 0:
i -= 1
if (vs[i] % 2) == 1:
vs[i] -= 1
vs[i+1] += base / 2
vs[i] = vs[i] / 2
if vs[-1] == 0: vs = vs[0:-1]
return vs
# Given two app engine key names, returns the key name that comes between them.
#
def average(a_kn, b_kn):
m = lambda x:ord(x)
a = [0] + map(m, a_kn)
b = [0] + map(m, b_kn)
avg = ldiv2(ladd(a, b))
return "".join(map(lambda x:chr(x), avg[1:]))
print average('a', 'z') # m#
print average('aa', 'zz') # n-#
print average('aa', 'az') # am#
print average('cat', 'doggie') # d(mstr#
print average('google', 'microsoft') # jlim.,7s:
print average('microsoft', 'google') # jlim.,7s:
import math
def avg(str1,str2):
y = ''
s = 'abcdefghijklmnopqrstuvwxyz'
for i in range(len(str1)):
x = s.index(str2[i])+s.index(str1[i])
x = math.floor(x/2)
y += s[x]
return y
print(avg('z','a')) # m
print(avg('aa','az')) # am
print(avg('cat','dog')) # chm
Still working on strings with different lengths... any ideas?
This version thinks 'abc' is a fraction like 0.abc. In this approach space is zero and a valid input/output.
MAX_ITER = 10
letters = " abcdefghijklmnopqrstuvwxyz"
def to_double(name):
d = 0
for i, ch in enumerate(name):
idx = letters.index(ch)
d += idx * len(letters) ** (-i - 1)
return d
def from_double(d):
name = ""
for i in range(MAX_ITER):
d *= len(letters)
name += letters[int(d)]
d -= int(d)
return name
def avg(w1, w2):
w1 = to_double(w1)
w2 = to_double(w2)
return from_double((w1 + w2) * 0.5)
print avg('a', 'a') # 'a'
print avg('a', 'aa') # 'a mmmmmmmm'
print avg('aa', 'aa') # 'a zzzzzzzz'
print avg('car', 'duck') # 'cxxemmmmmm'
Unfortunately, the naïve algorithm is not able to detect the periodic 'z's, this would be something like 0.99999 in decimal; therefore 'a zzzzzzzz' is actually 'aa' (the space before the 'z' periodicity must be increased by one.
In order to normalise this, you can use the following function
def remove_z_period(name):
if len(name) != MAX_ITER:
return name
if name[-1] != 'z':
return name
n = ""
overflow = True
for ch in reversed(name):
if overflow:
if ch == 'z':
ch = ' '
else:
ch=letters[(letters.index(ch)+1)]
overflow = False
n = ch + n
return n
print remove_z_period('a zzzzzzzz') # 'aa'
I haven't programmed in python in a while and this seemed interesting enough to try.
Bear with my recursive programming. Too many functional languages look like python.
def stravg_half(a, ln):
# If you have a problem it will probably be in here.
# The floor of the character's value is 0, but you may want something different
f = 0
#f = ord('a')
L = ln - 1
if 0 == L:
return ''
A = ord(a[0])
return chr(A/2) + stravg_half( a[1:], L)
def stravg_helper(a, b, ln, x):
L = ln - 1
A = ord(a[0])
B = ord(b[0])
D = (A + B)/2
if 0 == L:
if 0 == x:
return chr(D)
# NOTE: The caller of helper makes sure that len(a)>=len(b)
return chr(D) + stravg_half(a[1:], x)
return chr(D) + stravg_helper(a[1:], b[1:], L, x)
def stravg(a, b):
la = len(a)
lb = len(b)
if 0 == la:
if 0 == lb:
return a # which is empty
return stravg_half(b, lb)
if 0 == lb:
return stravg_half(a, la)
x = la - lb
if x > 0:
return stravg_helper(a, b, lb, x)
return stravg_helper(b, a, la, -x) # Note the order of the args

Categories