I am trying to solve LeetCode problem 295. Find Median from Data Stream:
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.
For example, for arr = [2,3,4], the median is 3.
For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.
Implement the MedianFinder class:
MedianFinder() initializes the MedianFinder object.
void addNum(int num) adds the integer num from the data stream to the data structure.
double findMedian() returns the median of all elements so far.
Answers within 10-5 of the actual answer will be accepted.
Example 1:
[...]
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0
For this question, I am adding numbers to a heap and then later use the smallest one to do some operations.
When I tried to do the operation, I found out that my heap returns a different value when doing heapq.heappop(self.small) than when doing self.small[0].
Could you please explain this to me? Any hint is much appreciated.
(Every number in self.small is added using heapq.heappush)
Here is my code when it works:
class MedianFinder:
def __init__(self):
self.small, self.large = [], []
def addNum(self, num):
heapq.heappush(self.small, -1 * num)
if (self.small and self.large) and -1 * self.small[0] > self.large[0]:
val = -1 * heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.small) > len(self.large) + 1:
val = -1 * heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.large) > len(self.small) + 1:
val = -1 * heapq.heappop(self.large)
heapq.heappush(self.small, val)
def findMedian(self):
if len(self.small) > len(self.large):
return -1 * self.small[0]
elif len(self.small) < len(self.large):
return self.large[0]
else:
return (-1 * self.small[0] + self.large[0]) / 2
For the last line, if I change:
-1 * self.small[0] + self.large[0]
into:
-1 * heapq.heappop(self.small) + heapq.heappop(self.large)
then the tests fail.
Why would that be any different?
When you change -1 * self.small[0] + self.large[0] into -1 * heapq.heappop(self.small) + heapq.heappop(self.large) then it will still work the first time findMedian is called, but when it is called again, it will return (in general) a different result. The reason is that with heappop, you remove a value from the heap. This should not happen, as this changes the data that a next call of findMedian will have to deal with. findMedian is supposed to leave the data structure unchanged.
Note how the challenge says this:
double findMedian() returns the median of all elements so far.
I highlight the last two words. These indicate that findMedian is not (only) called when the whole stream of data has been processed, but will be called several times during the processing of the data stream. That makes it crucial that findMedian does not modify the data structure, and so heappop should not be used.
I want to get the length of a string including a part of the string that represents its own length without padding or using structs or anything like that that forces fixed lengths.
So for example I want to be able to take this string as input:
"A string|"
And return this:
"A string|11"
On the basis of the OP tolerating such an approach (and to provide an implementation technique for the eventual python answer), here's a solution in Java.
final String s = "A String|";
int n = s.length(); // `length()` returns the length of the string.
String t; // the result
do {
t = s + n; // append the stringified n to the original string
if (n == t.length()){
return t; // string length no longer changing; we're good.
}
n = t.length(); // n must hold the total length
} while (true); // round again
The problem of, course, is that in appending n, the string length changes. But luckily, the length only ever increases or stays the same. So it will converge very quickly: due to the logarithmic nature of the length of n. In this particular case, the attempted values of n are 9, 10, and 11. And that's a pernicious case.
A simple solution is :
def addlength(string):
n1=len(string)
n2=len(str(n1))+n1
n2 += len(str(n2))-len(str(n1)) # a carry can arise
return string+str(n2)
Since a possible carry will increase the length by at most one unit.
Examples :
In [2]: addlength('a'*8)
Out[2]: 'aaaaaaaa9'
In [3]: addlength('a'*9)
Out[3]: 'aaaaaaaaa11'
In [4]: addlength('a'*99)
Out[4]: 'aaaaa...aaa102'
In [5]: addlength('a'*999)
Out[5]: 'aaaa...aaa1003'
Here is a simple python port of Bathsheba's answer :
def str_len(s):
n = len(s)
t = ''
while True:
t = s + str(n)
if n == len(t):
return t
n = len(t)
This is a much more clever and simple way than anything I was thinking of trying!
Suppose you had s = 'abcdefgh|, On the first pass through, t = 'abcdefgh|9
Since n != len(t) ( which is now 10 ) it goes through again : t = 'abcdefgh|' + str(n) and str(n)='10' so you have abcdefgh|10 which is still not quite right! Now n=len(t) which is finally n=11 you get it right then. Pretty clever solution!
It is a tricky one, but I think I've figured it out.
Done in a hurry in Python 2.7, please fully test - this should handle strings up to 998 characters:
import sys
orig = sys.argv[1]
origLen = len(orig)
if (origLen >= 98):
extra = str(origLen + 3)
elif (origLen >= 8):
extra = str(origLen + 2)
else:
extra = str(origLen + 1)
final = orig + extra
print final
Results of very brief testing
C:\Users\PH\Desktop>python test.py "tiny|"
tiny|6
C:\Users\PH\Desktop>python test.py "myString|"
myString|11
C:\Users\PH\Desktop>python test.py "myStringWith98Characters.........................................................................|"
myStringWith98Characters.........................................................................|101
Just find the length of the string. Then iterate through each value of the number of digits the length of the resulting string can possibly have. While iterating, check if the sum of the number of digits to be appended and the initial string length is equal to the length of the resulting string.
def get_length(s):
s = s + "|"
result = ""
len_s = len(s)
i = 1
while True:
candidate = len_s + i
if len(str(candidate)) == i:
result = s + str(len_s + i)
break
i += 1
This code gives the result.
I used a few var, but at the end it shows the output you want:
def len_s(s):
s = s + '|'
b = len(s)
z = s + str(b)
length = len(z)
new_s = s + str(length)
new_len = len(new_s)
return s + str(new_len)
s = "A string"
print len_s(s)
Here's a direct equation for this (so it's not necessary to construct the string). If s is the string, then the length of the string including the length of the appended length will be:
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
The idea here is that a direct calculation is only problematic when the appended length will push the length past a power of ten; that is, at 9, 98, 99, 997, 998, 999, 9996, etc. To work this through, 1 + int(log10(len(s))) is the number of digits in the length of s. If we add that to len(s), then 9->10, 98->100, 99->101, etc, but still 8->9, 97->99, etc, so we can push past the power of ten exactly as needed. That is, adding this produces a number with the correct number of digits after the addition. Then do the log again to find the length of that number and that's the answer.
To test this:
from math import log10
def find_length(s):
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
return L1
# test, just looking at lengths around 10**n
for i in range(9):
for j in range(30):
L = abs(10**i - j + 10) + 1
s = "a"*L
x0 = find_length(s)
new0 = s+`x0`
if len(new0)!=x0:
print "error", len(s), x0, log10(len(s)), log10(x0)
I have a string that is a mathematical equation, but with some custom functions. I need to find all such functions and replace them with some code.
For example, I have a string:
a+b+f1(f2(x,y),x)
I want code that will replace (say) f2(x,y) with x+y^2 and f1(x,y) with sin(x+y).
It would be ideal if nested functions were supported, like in the example. However, it would still be useful if nesting was not supported.
As I understand from similar topics this can be done using a compiler module like compiler.parse(eq). How I can work with AST object created by compiler.parse(eq) to reconstruct my string back, replacing all found functions?
I need only to perform substitution and then string will be used in other program. Evaluation is not needed.
Here is a minimal working example (+, - , *, /, ** binary and unary operations and function call implemented). The priority of operations are set with parenthesis.
A little bit more than the functionality for the example given is done:
from __future__ import print_function
import ast
def transform(eq,functions):
class EqVisitor(ast.NodeVisitor):
def visit_BinOp(self,node):
#generate("=>BinOp")
generate("(")
self.visit(node.left)
self.visit(node.op)
#generate("ici",str(node.op),node._fields,node._attributes)
#generate(dir(node.op))
self.visit(node.right)
generate(")")
#ast.NodeVisitor.generic_visit(self,node)
def visit_USub(self,node):
generate("-")
def visit_UAdd(self,node):
generate("+")
def visit_Sub(self,node):
generate("-")
def visit_Add(self,node):
generate("+")
def visit_Pow(self,node):
generate("**")
def visit_Mult(self,node):
generate("*")
def visit_Div(self,node):
generate("/")
def visit_Name(self,node):
generate(node.id)
def visit_Call(self,node):
debug("function",node.func.id)
if node.func.id in functions:
debug("defined function")
func_visit(functions[node.func.id],node.args)
return
debug("not defined function",node.func.id)
#generate(node._fields)
#generate("args")
generate(node.func.id)
generate("(")
sep = ""
for arg in node.args:
generate (sep)
self.visit(arg)
sep=","
generate(")")
def visit_Num(self,node):
generate(node.n)
def generic_visit(self, node):
debug ("\n",type(node).__name__)
debug (node._fields)
ast.NodeVisitor.generic_visit(self, node)
def func_visit(definition,concrete_args):
class FuncVisitor(EqVisitor):
def visit_arguments(self,node):
#generate("visit arguments")
#generate(node._fields)
self.arguments={}
for concrete_arg,formal_arg in zip(concrete_args,node.args):
#generate(formal_arg._fields)
self.arguments[formal_arg.id]=concrete_arg
debug(self.arguments)
def visit_Name(self,node):
debug("visit Name",node.id)
if node.id in self.arguments:
eqV.visit(self.arguments[node.id])
else:
generate(node.id)
funcV=FuncVisitor()
funcV.visit(ast.parse(definition))
eqV=EqVisitor()
result = []
def generate(s):
#following line maybe usefull for debug
debug(str(s))
result.append(str(s))
eqV.visit(ast.parse(eq,mode="eval"))
return "".join(result)
def debug(*args,**kwargs):
#print(*args,**kwargs)
pass
Usage:
functions= {
"f1":"def f1(x,y):return x+y**2",
"f2":"def f2(x,y):return sin(x+y)",
}
eq="-(a+b)+f1(f2(+x,y),z)*4/365.12-h"
print(transform(eq,functions))
Result
((-(a+b)+(((sin((+x+y))+(z**2))*4)/365.12))-h)
WARNING
The code works with Python 2.7 and as it is AST dependent is not guaranteed to work with another version of Python. The Python 3 version doesn't work.
The full substitution is quite tricky. Here is my attempt to do it. Here we can successfully inline expressions,
but not in all scenarios. This code works on AST only, made by ast module. And uses codegen to stringify it back to code. The stringifying of ast and modifying ast in general is covered in other SO Q/A: "Parse a .py file, read the AST, modify it, then write back the modified source code".
First we define few helpers:
import ast
import codegen
import copy
def parseExpr(expr):
# Strip:
# Module(body=[Expr(value=
return ast.parse(expr).body[0].value
def toSource(expr):
return codegen.to_source(expr)
After that we define a substitution function using NodeTransformer.
For example:
substitute(parseExpr("a + b"), { "a": parseExpr("1") }) # 1 + b
The simulatenous substitution of multiple variables is needed to properly avoid nasty situations.
For example substituting both a and b for a + b in a + b.
The result should be (a + b) + (a + b), but if we substitute first a for a + b, we'll get (a + b) + b, and then substitute b, we'll get (a + (a + b)) + b which is the wrong result! So simultaneous is important:
class NameTransformer(ast.NodeTransformer):
def __init__(self, names):
self.names = names
def visit_Name(self, node):
if node.id in self.names:
return self.names[node.id]
else:
return node
def substitute(expr, names):
print "substitute"
for varName, varValue in names.iteritems():
print " name " + varName + " for " + toSource(varValue)
print " in " + toSource(expr)
return NameTransformer(names).visit(expr)
Then we write similar NodeTransformer to find calls, where we can inline function definitions:
class CallTransformer(ast.NodeTransformer):
def __init__(self, fnName, varNames, fnExpr):
self.fnName = fnName
self.varNames = varNames
# substitute in new fn expr for each CallTransformer
self.fnExpr = copy.deepcopy(fnExpr)
self.modified = False
def visit_Call(self, node):
if (node.func.id == self.fnName):
if len(node.args) == len(self.varNames):
print "expand call to " + self.fnName + "(" + (", ".join(self.varNames)) + ")" + " with arguments "+ ", ".join(map(toSource, node.args))
# We substitute in args too!
old_node = node
args = map(self.visit, node.args)
names = dict(zip(self.varNames, args))
node = substitute(self.fnExpr, names)
self.modified = True
return node
else:
raise Exception("invalid arity " + toSource(node))
else:
return self.generic_visit(node)
def substituteCalls(expr, definitions, n = 3):
while True:
if (n <= 0):
break
n -= 1
modified = False
for fnName, varNames, fnExpr in definitions:
transformer = CallTransformer(fnName, varNames, fnExpr)
expr = transformer.visit(expr)
modified = modified or transformer.modified
if not modified:
break
return expr
The substituteCalls is recursive so we can inline recursive functions too. Also there is an explicit limit, because some definitions might be infinitely recursive (as fact below). There is a bit of ugly looking copying, but it is required to separate different subtrees.
And the example code:
if True:
print "f1 first, unique variable names"
ex = parseExpr("a+b+f1(f2(x, y), x)")
ex = substituteCalls(ex, [
("f1", ["u", "v"], parseExpr("sin(u + v)")),
("f2", ["i", "j"], parseExpr("i + j ^ 2"))])
print toSource(ex)
print "---"
if True:
print "f1 first"
ex = parseExpr("a+b+f1(f2(x, y), x)")
ex = substituteCalls(ex, [
("f1", ["x", "y"], parseExpr("sin(x + y)")),
("f2", ["x", "y"], parseExpr("x + y ^ 2"))])
print toSource(ex)
print "---"
if True:
print "f2 first"
ex = parseExpr("f1(f1(x, x), y)")
ex = substituteCalls(ex, [
("f1", ["x", "y"], parseExpr("x + y"))])
print toSource(ex)
print "---"
if True:
print "fact"
ex = parseExpr("fact(n)")
ex = substituteCalls(ex, [
("fact", ["n"], parseExpr("n if n == 0 else n * fact(n-1)"))])
print toSource(ex)
print "---"
Which prints out:
f1 first, unique variable names
expand call to f1(u, v) with arguments f2(x, y), x
substitute
name u for f2(x, y)
name v for x
in sin((u + v))
expand call to f2(i, j) with arguments x, y
substitute
name i for x
name j for y
in ((i + j) ^ 2)
((a + b) + sin((((x + y) ^ 2) + x)))
---
f1 first
expand call to f1(x, y) with arguments f2(x, y), x
substitute
name y for x
name x for f2(x, y)
in sin((x + y))
expand call to f2(x, y) with arguments x, y
substitute
name y for y
name x for x
in ((x + y) ^ 2)
((a + b) + sin((((x + y) ^ 2) + x)))
---
f2 first
expand call to f1(x, y) with arguments f1(x, x), y
expand call to f1(x, y) with arguments x, x
substitute
name y for x
name x for x
in (x + y)
substitute
name y for y
name x for (x + x)
in (x + x)
((x + x) + ((x + x) + x))
---
fact
expand call to fact(n) with arguments n
substitute
name n for n
in n if (n == 0) else (n * fact((n - 1)))
expand call to fact(n) with arguments (n - 1)
substitute
name n for (n - 1)
in n if (n == 0) else (n * fact((n - 1)))
expand call to fact(n) with arguments ((n - 1) - 1)
substitute
name n for ((n - 1) - 1)
in n if (n == 0) else (n * fact((n - 1)))
n if (n == 0) else (n * (n - 1) if ((n - 1) == 0) else ((n - 1) * ((n - 1) - 1) if (((n - 1) - 1) == 0) else (((n - 1) - 1) * fact((((n - 1) - 1) - 1)))))
Unfortunately codegen version in pypi is buggy. It doesn't parenthesise expressions properly, even AST says they should. I used jbremer/codegen (pip install git+git://github.com/jbremer/codegen). It adds unnecessary parenthesis too, but it's better than no at all. Thanks to #XavierCombelle for the tip.
The substitution gets trickier if you have anonymous functions, i.e lambda. Then you need to rename variables. You could try to search for lambda calculus with substitution or implementation. Yet I had bad luck to find any articles which use Python for the task.
Do you know the variables beforehand?
I recommend using SymPy!
Take for example the following:
import sympy
a,b,x,y = sympy.symbols('a b x y')
f1 = sympy.Function('f1')
f2 = sympy.Function('f2')
readString = "a+b+f1(f2(x,y),x)"
z = eval(readString)
'z' will now be a symbolic term representing the mathematical formula. You can print it out. You can then use subs to replace symbolic terms or functions. You can either represent sine symbolically again (like f1 and f2) or you can possibly use the sin() in sympy.mpmath.
Depending on your needs, this approach is great because you can eventually compute, evaluate or simplify this expression.
What is your long term goal? Is it to evaluate the function or simply perform substitution? In the former case you can simply try this (note that f1 and f2 could also be dynamically defined):
import math
math.sin
def f2(x, y):
return x + y ** 2
def f1(x, y):
return math.sin(x + y)
a, b = 1, 2
x, y = 3, 4
eval('a + b + f1(f2(x, y), x)')
# 2.991148690709596
If you want to replace the functions and get back the modified version, you will indeed have to resort to some sort of AST parser. Be careful though with the use of eval, as this opens up a security hole for malicious user input code.
(Using sympy as adrianX suggested, with some extra code.)
Code below converts a given string to a new string after combining given functions. It's hasty and poorly documented, but it works.
WARNING!
Contains exec eval, malicious code could probably have an effected, if input is provided by external users.
UPDATE:
Rewrote the whole code. Works in Python 2.7.
Function arguments can be separated by comma or whitespace or both.
All examples in question and comments are working.
import re
import sympy
##################################################
# Input string and functions
initial_str = 'a1+myf1(myf2(a, b),y)'
given_functions = {'myf1(x,y)': 'cross(x,y)', 'myf2(a, b)': 'value(a,b)'}
##################################################
print '\nEXECUTED/EVALUATED STUFF:\n'
processed_str = initial_str
def fixed_power_op(str_to_fix):
return str_to_fix.replace('^', '**')
def fixed_multiplication(str_to_fix):
"""
Inserts multiplication symbol wherever omitted.
"""
pattern_digit_x = r"(\d)([A-Za-z])" # 4x -> 4*x
pattern_par_digit = r"(\))(\d)" # )4 -> )*4
pattern_digit_par = r"[^a-zA-Z]?_?(\d)(\()" # 4( -> 4*(
for patt in (pattern_digit_x, pattern_par_digit, pattern_digit_par):
str_to_fix = re.sub(patt, r'\1*\2', str_to_fix)
return str_to_fix
processed_str = fixed_power_op(processed_str)
class FProcessing(object):
def __init__(self, func_key, func_body):
self.func_key = func_key
self.func_body = func_body
def sliced_func_name(self):
return re.sub(r'(.+)\(.+', r'\1', self.func_key)
def sliced_func_args(self):
return re.search(r'\((.*)\)', self.func_key).group()
def sliced_args(self):
"""
Returns arguments found for given function. Arguments can be separated by comma or whitespace.
:returns (list)
"""
if ',' in self.sliced_func_args():
arg_separator = ','
else:
arg_separator = ' '
return self.sliced_func_args().replace('(', '').replace(')', '').split(arg_separator)
def num_of_sliced_args(self):
"""
Returns number of arguments found for given function.
"""
return len(self.sliced_args())
def functions_in_function_body(self):
"""
Detects functions in function body.
e.g. f1(x,y): sin(x+y**2), will result in "sin"
:returns (set)
"""
return set(re.findall(r'([a-zA-Z]+_?\w*)\(', self.func_body))
def symbols_in_func_body(self):
"""
Detects non argument symbols in function body.
"""
symbols_in_body = set(re.findall(r'[a-zA-Z]+_\w*', self.func_body))
return symbols_in_body - self.functions_in_function_body()
# --------------------------------------------------------------------------------------
# SYMBOL DETECTION (x, y, z, mz,..)
# Prohibited symbols
prohibited_symbol_names = set()
# Custom function names are prohibited symbol names.
for key in given_functions.keys():
prohibited_symbol_names |= {FProcessing(func_key=key, func_body=None).sliced_func_name()}
def symbols_in_str(provided_str):
"""
Returns a set of symbol names that are contained in provided string.
Allowed symbols start with a letter followed by 0 or more letters,
and then 0 or more numbers (eg. x, x1, Na, Xaa_sd, xa123)
"""
symbol_pattern = re.compile(r'[A-Za-z]+\d*')
symbol_name_set = re.findall(symbol_pattern, provided_str)
# Filters out prohibited.
symbol_name_set = {i for i in symbol_name_set if (i not in prohibited_symbol_names)}
return symbol_name_set
# ----------------------------------------------------------------
# EXEC SYMBOLS
symbols_in_given_str = symbols_in_str(initial_str)
# e.g. " x, y, sd = sympy.symbols('x y sd') "
symbol_string_to_exec = ', '.join(symbols_in_given_str)
symbol_string_to_exec += ' = '
symbol_string_to_exec += "sympy.symbols('%s')" % ' '.join(symbols_in_given_str)
exec symbol_string_to_exec
# -----------------------------------------------------------------------------------------
# FUNCTIONS
# Detects secondary functions (functions contained in body of given_functions dict)
sec_functions = set()
for key, val in given_functions.items():
sec_functions |= FProcessing(func_key=key, func_body=val).functions_in_function_body()
def secondary_function_as_exec_str(func_key):
"""
Used for functions that are contained in the function body of given_functions.
E.g. given_functions = {f1(x): sin(4+x)}
"my_f1 = sympy.Function('sin')(x)"
:param func_key: (str)
:return: (str)
"""
returned_str = "%s = sympy.Function('%s')" % (func_key, func_key)
print returned_str
return returned_str
def given_function_as_sympy_class_as_str(func_key, func_body):
"""
Converts given_function to sympy class and executes it.
E.g. class f1(sympy.Function):
nargs = (1, 2)
#classmethod
def eval(cls, x, y):
return cross(x+y**2)
:param func_key: (str)
:return: (None)
"""
func_proc_instance = FProcessing(func_key=func_key, func_body=func_body)
returned_str = 'class %s(sympy.Function): ' % func_proc_instance.sliced_func_name()
returned_str += '\n\tnargs = %s' % func_proc_instance.num_of_sliced_args()
returned_str += '\n\t#classmethod'
returned_str += '\n\tdef eval(cls, %s):' % ','.join(func_proc_instance.sliced_args())
returned_str = returned_str.replace("'", '')
returned_str += '\n\t\treturn %s' % func_body
returned_str = fixed_power_op(returned_str)
print '\n', returned_str
return returned_str
# Executes functions in given_functions' body
for name in sec_functions:
exec secondary_function_as_exec_str(func_key=name)
# Executes given_functions
for key, val in given_functions.items():
exec given_function_as_sympy_class_as_str(func_key=key, func_body=val)
final_result = eval(initial_str)
# PRINTING
print '\n' + ('-'*40)
print '\nRESULTS'
print '\nInitial string: \n%s' % initial_str
print '\nGiven functions:'
for key, val in given_functions.iteritems():
print '%s: ' % key, val
print '\nResult: \n%s' % final_result
I think you want to use something like PyBison which is a parser generator.
See an example that contains the basic code you need here:
http://freenet.mcnabhosting.com/python/pybison/calc.py
You need to add a token type for functions, and a rule for functions, and then what happens with that function if it is encountered.
If you need other information about parsing and so on, try to read some basic tutorials on Lex and (Yacc or Bison).