Faster method of evaluating a boolean expression as a string in Python - python

I have been working on this project for a couple months right now. The ultimate goal of this project is to evaluate an entire digital logic circuit similar to functional testing; just to give a scope of the problem. The topic I created here deals with the issue I'm having with performance of analyzing a boolean expression. For any gate inside a digital circuit, it has an output expression in terms of the global inputs. EX: ((A&B)|(C&D)^E). What I want to do with this expression is then calculate all possible outcomes and determine how much influence each input has on the outcome.
The fastest way that I have found was by building a truth table as a matrix and looking at certain rows (won't go into specifics of that algorithm as it's offtopic), the problem with that is once the number of unique inputs goes above 26-27 (something around that) the memory usage is well beyond 16GB (Max my computer has). You might say "Buy more RAM", but as every increase in inputs by 1, memory usage doubles. Some of the expressions I analyze are well over 200 unique inputs...
The method I use right now uses the compile method to take the expression as the string. Then I create an array with all of the inputs found from the compile method. Then I generate a list row by row of "True" and "False" randomly chosen from a sample of possible values (that way it will be equivalent to rows in a truth table if the sample size is the same size as the range and it will allow me to limit the sample size when things get too long to calculate). These values are then zipped with the input names and used to evaluate the expression. This will give the initial result, after that I go column by column in the random boolean list and flip the boolean then zip it with the inputs again and evaluate it again to determine if the result changed.
So my question is this: Is there a faster way? I have included the code that performs the work. I have tried regular expressions to find and replace but it is always slower (from what I've seen). Take into account that the inner for loop will run N times where N is the number of unique inputs. The outside for loop I limit to run 2^15 if N > 15. So this turns into eval being executed Min(2^N, 2^15) * (1 + N)...
As an update to clarify what I am asking exactly (Sorry for any confusion). The algorithm/logic for calculating what I need is not the issue. I am asking for an alternative to the python built-in 'eval' that will perform the same thing faster. (take a string in the format of a boolean expression, replace the variables in the string with the values in the dictionary and then evaluate the string).
#value is expression as string
comp = compile(value.strip(), '-', 'eval')
inputs = comp.co_names
control = [0]*len(inputs)
#Sequences of random boolean values to be used
random_list = gen_rand_bits(len(inputs))
for row in random_list:
valuedict = dict(zip(inputs, row))
answer = eval(comp, valuedict)
for column in range(len(row)):
row[column] = ~row[column]
newvaluedict = dict(zip(inputs, row))
newanswer = eval(comp, newvaluedict)
row[column] = ~row[column]
if answer != newanswer:
control[column] = control[column] + 1

My question:
Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?
OP answered:
That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.
So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually “feel” the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.
In any way, let’s first look at how you are trying to solve this. I’m not exactly sure what gen_rand_bits is supposed to do, so I can’t really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. “Luckily”, these are just boolean variables, so you are “only” looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice it’s often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and can’t shortcut. Plus the compilation and value evaluation using Python’s eval is apparently not so fast to make the inefficient algorithm acceptable.
So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.
You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isn’t really that hard.
Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.
Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:
import ast
class ExpressionEvaluator:
def __init__ (self, rawExpression):
self.raw = rawExpression
self.ast = ast.parse(rawExpression)
def run (self):
return self.evaluate(self.ast.body[0])
def evaluate (self, expr):
if isinstance(expr, ast.Expr):
return self.evaluate(expr.value)
elif isinstance(expr, ast.Name):
return self.evaluateName(expr)
elif isinstance(expr, ast.UnaryOp):
if isinstance(expr.op, ast.Invert):
return self.evaluateInvert(expr)
else:
raise Exception('Unknown unary operation {}'.format(expr.op))
elif isinstance(expr, ast.BinOp):
if isinstance(expr.op, ast.BitOr):
return self.evaluateBitOr(expr.left, expr.right)
elif isinstance(expr.op, ast.BitAnd):
return self.evaluateBitAnd(expr.left, expr.right)
elif isinstance(expr.op, ast.BitXor):
return self.evaluateBitXor(expr.left, expr.right)
else:
raise Exception('Unknown binary operation {}'.format(expr.op))
else:
raise Exception('Unknown expression {}'.format(expr))
def evaluateName (self, expr):
return { expr.id: 1 }
def evaluateInvert (self, expr):
return self.evaluate(expr.operand)
def evaluateBitOr (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def evaluateBitAnd (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def evaluateBitXor (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def join (self, a, ratioA, b, ratioB):
d = { k: v * ratioA for k, v in a.items() }
for k, v in b.items():
if k in d:
d[k] += v * ratioB
else:
d[k] = v * ratioB
return d
expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}
This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; it’s similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.
In this case, the evaluation is just based on ocurrence in the expression. I don’t explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something I’ll leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.
In any way, it will not be necessary to actually test variable assignments.

Instead of reinventing the wheel and getting into risk like performance and security which you are already in, it is better to search for industry ready well accepted libraries.
Logic Module of sympy would do the exact thing that you want to achieve without resorting to evil ohh I meant eval. More importantly, as the boolean expression is not a string you don;t have to care about parsing the expression which generally turns out to be the bottleneck.

You don't have to prepare a static table for computing this. Python is a dynamic language, thus it's able to interpret and run a code by itself during runtime.
In you case, I would suggest a soluation that:
import random, re, time
#Step 1: Input your expression as a string
logic_exp = "A|B&(C|D)&E|(F|G|H&(I&J|K|(L&M|N&O|P|Q&R|S)&T)|U&V|W&X&Y)"
#Step 2: Retrieve all the variable names.
# You can design a rule for naming, and use regex to retrieve them.
# Here for example, I consider all the single-cap-lettler are variables.
name_regex = re.compile(r"[A-Z]")
#Step 3: Replace each variable with its value.
# You could get the value with reading files or keyboard input.
# Here for example I just use random 0 or 1.
for name in name_regex.findall(logic_exp):
logic_exp = logic_exp.replace(name, str(random.randrange(2)))
#Step 4: Replace the operators. Python use 'and', 'or' instead of '&', '|'
logic_exp = logic_exp.replace("&", " and ")
logic_exp = logic_exp.replace("|", " or " )
#Step 5: interpret the expression with eval(exp) and output its value.
print "exporession =", logic_exp
print "expression output =",eval(logic_exp)
This would be very fast and take very little memory. For a test, I run the example above with 25 input variables:
exporession = 1 or 1 and (1 or 1) and 0 or (0 or 0 or 1 and (1 and 0 or 0 or (0 and 0 or 0 and 0 or 1 or 0 and 0 or 0) and 1) or 0 and 1 or 0 and 1 and 0)
expression output= 1
computing time: 0.000158071517944 seconds
According to your comment, I see that you are computing all the possible combinations instead of the output at a given input values. If so, it would become a typical NP-complete Boolean satisfiability problem. I don't think there's any algorithm that could make it by a complexity lower than O(2^N). I suggest you to search with the keywords fast algorithm to solve SAT problem, you would find a lot of interesting things.

Related

How to optimize if statement given the if statement has: > or <

How to make code similar to the one below run faster.
I know you can use a dictionary for equality if-statements but no sure with this one.
Delta = 3
if (x - y) >= Delta:
pass
elif y < Delta:
pass
else:
pass
Here's an example of how a dictionary lookup would look here, if you really wanted to use one:
def do_something():
pass
def do_something_else_1():
pass
def do_something_else_2():
pass
{
y < Delta: do_something_else_1,
x - y >= Delta: do_something
}.get(True, do_something_else_2)()
But I can guarantee you this will run slower (mainly because all the conditions are greedily evaluated now, instead of lazily). The reason you cannot optimize your existing code with a dictionary lookup is because a dictionary lookup excels where computing a hash followed by computing equality with the narrowed search space is faster than computing equality with the entire search space. Because of this benefit, you have to pay the upfront cost of constructing the hash table in the first place.
However, you aren't checking equality here. You're using the inequality functions < and >=, which don't play nice with the concept of a hash table. The hash of a bool (the result of this inequality function) is no quicker to compute compared to using the bool itself, which means that constructing the hash table here will outweigh any time savings you get by using the constructed hash table immediately afterwards. Seeing as x and y may change each time, there's no way for you to cache this hash table, meaning you suffer the cost of construction each time.
Keep the code as it is.
Optimization usually takes advantage of some common expression or common code. You have none here. At the register level, your comparisons look like:
load r1, x
sub r1, y
sub r1, 3
brlt ELIF # if less than 0, branch to ELIF
# TRUE-clause stuff, aka "Do something"
br ENDIF
ELIF:
load r1, y
sub r1, 3
brge ELSE # if >= 0, branch to ELSE
# ELIF-clause stuff, aka "Do something else #1"
ELSE:
# ELSE-caluse stuff, aka "Do something else #2"
ENDIF:
# Remainder of program
The only commonality her in either data or flow is loading y into a register. Any reasonably aware optimization level will do this for you -- it will alter the first expression to load y into r2, a trivial cost in micro-instruction utilization.
There appears to be nothing else to optimize here. The normal flow analysis will recognize that 3 is a constant within this block, and substitute the immediate operand for Delta.

How can I get my function to add together its output?

So this is my line of code so far,
def Adder (i,j,k):
if i<=j:
for x in range (i, j+1):
print(x**k)
else:
print (0)
What it's supposed to do is get inputs (i,j,k) so that each number between [i,j] is multiplied the power of k. For example, Adder(3,6,2) would be 3^2 + 4^2 + 5^2 + 6^2 and eventually output 86. I know how to get the function to output the list of numbers between i and j to the power of K but I don't know how to make it so that the function sums that output. So in the case of my given example, my output would be 9, 16, 25, 36.
Is it possible to make it so that under my if conditional I can generate an output that adds up the numbers in the range after they've been taken to the power of K?
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Question now Answered, thanks to everyone who responded so quickly!
You could use built-in function sum()
def adder(i,j,k):
if i <= j:
print(sum(x**k for x in range(i,j+1)))
else:
print(0)
The documentation is here
I'm not sure if this is what you want but
if i<=j:
sum = 0
for x in range (i, j+1):
sum = sum + x**k #sum += x**k for simplicity
this will give you the sum of the powers
Looking at a few of the answers posted, they do a good job of giving you pythonic code for your solution, I thought I could answer your specific questions:
How can I get my function to add together its output?
A perhaps reasonable way is to iteratively and incrementally perform your calculations and store your interim solutions in a variable. See if you can visualize this:
Let's say (i,j,k) = (3,7,2)
We want the output to be: 135 (i.e., the result of the calculation 3^2 + 4^2 + 5^2 + 6^2 + 7^2)
Use a variable, call it result and initialize it to be zero.
As your for loop kicks off with x = 3, perform x^2 and add it to result. So result now stores the interim result 9. Now the loop moves on to x = 4. Same as the first iteration, perform x^2 and add it to result. Now result is 25. You can now imagine that result, by the time x = 7, contains the answer to the calculation 3^2+4^2+5^2+6^2. Let the loop finish, and you will find that 7^2 is also added to result.
Once loop is finished, print result to get the summed up answer.
A thing to note:
Consider where in your code you need to set and initialize the _result_ variable.
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Perhaps a bit advanced for you, but helpful to be made aware I think:
Alright, let's get some nuance added to this discussion. Since this is your first week, I wanted to jot down some things I had to learn which have helped greatly.
Iterative and Recursive Algorithms
First off, identify that the solution is an iterative type of algorithm. Where the actual calculation is the same, but is executed over different cumulative data.
In this example, if we were to represent the calculation as an operation called ADDER(i,j,k), then:
ADDER(3,7,2) = ADDER(3,6,2)+ 7^2
ADDER(3,6,2) = ADDER(3,5,2) + 6^2
ADDER(3,5,2) = ADDER(3,4,2) + 5^2
ADDER(3,4,2) = ADDER(3,3,2) + 4^2
ADDER(3,3,2) = 0 + 3^2
Problems like these can be solved iteratively (like using a loop, be it while or for) or recursively (where a function calls itself using a subset of the data). In your example, you can envision a function calling itself and each time it is called it does the following:
calculates the square of j and
adds it to the value returned from calling itself with j decremented
by 1 until
j < i, at which point it returns 0
Once the limiting condition (Point 3) is reached, a bunch of additions that were queued up along the way are triggered.
Learn to Speak The Language before using Idioms
I may get down-voted for this, but you will encounter a lot of advice displaying pythonic idioms for standard solutions. The idiomatic solution for your example would be as follows:
def adder(i,j,k):
return sum(x**k for x in range(i,j+1)) if i<=j else 0
But for a beginner this obscures a lot of the "science". It is far more rewarding to tread the simpler path as a beginner. Once you develop your own basic understanding of devising and implementing algorithms in python, then the idioms will make sense.
Just so you can lean into the above idiom, here's an explanation of what it does:
It calls the standard library function called sum which can operate over a list as well as an iterator. We feed it as argument a generator expression which does the job of the iterator by "drip feeding" the sum function with x^k values as it iterates over the range (1, j+1). In cases when N (which is j-i) is arbitrarily large, using a standard list can result in huge memory overhead and performance disadvantages. Using a generator expression allows us to avoid these issues, as iterators (which is what generator expressions create) will overwrite the same piece of memory with the new value and only generate the next value when needed.
Of course it only does all this if i <= j else it will return 0.
Lastly, make mistakes and ask questions. The community is great and very helpful
Well, do not use print. It is easy to modify your function like this,
if i<=j:
s = 0
for x in range (i, j+1):
s += x**k
return s # print(s) if you really want to
else:
return 0
Usually functions do not print anything. Instead they return values for their caller to either print or further process. For example, someone may want to find the value of Adder(3, 6, 2)+1, but if you return nothing, they have no way to do this, since the result is not passed to the program. A side note, do not capitalize functions. Those are for classes.

genetic programming for trading: How to represent chromosomes

I am working on a genetic algorithm in python that can be used for trading. The principe is simple if you are familiar with evolutionary algorithms:
The genes represent trading strategies: To be more specific, each gene is a tree of this form:
this can be interpreted as a boolean value like this:
If:
the average of the 50 last stock values is less the
actual price
and the min of the 6 last stock values is less the actual price.
then answer *True**, else False
If the answer is True, then send a BUY signal, and SELL signal otherwise.
This is an example of how I represent a tree like this in python:
class BinaryRule:
def __init__(self, child1, child2):
self.child1 = child1
self.child2 = child2
class LessThan(BinaryRule):
name = '<'
def eval(self):
return self.child1.eval() < self.child2.eval()
# Here there is the definition of the other classes
# and then I create the tree
tree = rules.LessThan(
rules.Max( rules.Float(1) ),
rules.SMA( rules.Float(15) ),
)
print tree.eval() # True or false
The problem is that I can't think of good technique for the crossover and the mutation operators. Any ideas?
This is not typically the way that genetic algorithms are represented, and I personally don't feel that a genetic algorithm is the right approach for this, but nonetheless this is certainly possible.
Assuming you just want to interact with this specific set of variables, you have a small set of potential values:
boolean = "and"
comparator = "<"
etc...
this means you can easily represent these as a flat list:
chromosome = ["and", "<", ...]
crossover is then just mixing two chromosomes at some specific split point:
def crossover(cr1, cr2):
split_point = random.randint(1, len(cr1))
return [cr1[:split_point] + cr2[split_point:]], [cr2[:split_point] + cr1[split_point:]]
mutation then is also relatively trivial, just changing a number at random, switching the boolean operator, etc... I'll leave that one as an exercise to the reader though.
Your operators should be safely interchangeable - should all accept the same input-types and return the same output-type (most likely float).
Not necessary but a good idea (by way of neural networks): use differentiable operator-functions - ie instead of returning YES or NO, return "degree-of-yes-ness". This provides more graduated feedback on improvements to your expression. It can also be useful to give each operator a setpoint control value to tweak its behavior.
A mutation operator can change an operator's type or setpoint; a crossover operation usually swaps sub-trees.
A fairly standard binary tree representation is in the form of a list like [head, left, right, left-left, left-right, right-left, right-right, left-left-left, ...]. The nice thing is that, given the index of any node, you can immediately calculate the parent, left-child, and right-child indices; a downside is that it can waste a lot of space if you have a sparse tree structure. Also, properly copying subtrees should be done via a recursive function, not a simple slice-and-splice as #Slater shows above.

Is there a way to use this MACRO-based language to calculate fibnacci iteratively?

I tried to work out a python-like language which combined with the feature of MACRO(weird, but just for fun..), for example, the codes to calculate fibonacci seq analyticly is like this:
from math import *
def analytic_fibonacci(n):
sqrt_5 = sqrt(5);
p = (1 + sqrt_5) / 2;
q = 1/p;
return int( (p**n + q**n) / sqrt_5 + 0.5 )
print analytic_fibonacci(10),
And I can rewrite it in the python-like-with-MACRO language like this:
from math import sqrt
sqrt
def analytic_fibonacci(n):
_2(5)
(1+_1)/2
1/_1
return int((_2**n+_1**n)/_3+0.5)
print analytic_fibonacci(10)
The idea is to use line number to expand the expression so that no explicit assignment is needed. The _2 means to replace it with the expression appeared 2 lines smaller than the current line, so the _2 in the 4th line becomes the expression in the 2nd line, which is sqrt, and _2(5) is expanded to sqrt(5). (Lines before current line starts with _, after current line starts with |)
The example above is simple. When I tried to rewrite a more complex example, I encountered problem:
def fibIter(n):
if n < 2:
return n
fibPrev = 1
fib = 1
for num in xrange(2, n):
fibPrev, fib = fib, fib + fibPrev
return fib
I don't know how to use the line-number-based MACRO to express fibPrev, fib = fib, fib + fibPrev. I think some features is missing in this "MACRO langugage" , and fibPrev, fib = fib, fib+fibPrev is expressible if I fixed it.. (I heard that the MACRO in Lisp is Turing Complete so I think the example above should be expressed by MACRO) Does anyone have ideas about this?
I see two ways to interpret your language. Neither is very powerful.
The first way is to literally expand the macros to expressions, rather than values. Then analytic_fibonacci expands to
def analytic_fibonacci(n):
return int(((1+sqrt(5))/2**n+1/(1+sqrt(5))/2**n)/sqrt(5)+0.5)
You probably want some parentheses in there; depending on how you define the language, those may or may not be added for you.
This is pretty useless. Multiple-evaluation problems abound (where a function is reexecuted every time a macro refers to it), and it only lets you do things you could have done with ordinary expressions.
The second interpretation is that every statement consisting of a Python expression implicitly assigns that expression to a variable. This is also pretty useless, because only one statement can assign to any of these implicit variables. There's no way to do
x = 0
for i in range(5):
x += i
because you can't have the equivalent of x refer to either _2 or _0 depending on where the last assignment came from. Also, this really isn't a macro system at all.
Using the second interpretation, we can add a new operator to bring back the power of ordinary variable assignments. We'll call this the merge operator.
merge(_1, _2)
evaluates to either _1 or _2, depending on which was evaluated most recently. If one of the arguments hasn't yet been evaluated, it defaults to the other. fibIter then becomes
def fibIter(n):
if n < 2:
return n
1 # fibPrev
1 # fib
for num in xrange(2, n):
merge(_2, _-1) # temp
merge(_4, _-1) + merge(_3, _0) # fib
_2 # fibPrev
return merge(_2, _5)
This is quite awkward; essentially, we have to replace every use of a variable like x by a merge of every location it could have been assigned. It also requires awkward line counting, making it hard to tell which "variable" is which, and it doesn't handle multiple assignments, for loop targets, etc. I had to use negative indices to refer to future lines, because we need some way to refer to things assigned later.
Lisp macros are more powerful than your language because they let you apply arbitrary Lisp code to your Lisp code. Your language only allows a macro to expand to fixed expressions. A Lisp macro can take arbitrary code as arguments, cut it up, rearrange it, replace parts of it with different things depending on conditionals, recurse, etc. Your macros can't even take arguments.

Testing a vector without scanning it [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
For a series of algorithms I'm implementing I need to simulate things like sets of coins being weighed or pooled blood samples. The overriding goal is to identify a sparse set of interesting items in a set of otherwise identical items. This identification is done by testing groups of items together. For example the classic problem is to find a light counterfeit coin in a group of 81 (identical) coins, using as few weightings of a pan balance as possible. The trick is to split the 81 coins into three groups and weigh two groups against each other. You then do this on the group which doesn't balance until you have 2 coins left.
The key point in the discussion above is that the set of interesting items is sparse in the wider set - the algorithms I'm implementing all outperform binary search etc for this type of input.
What I need is a way to test the entire vector that indicates the presence of a single, or more ones, without scanning the vector componentwise.
I.e. a way to return the Hamming Weight of the vector in an O(1) operation - this will accurately simulate pooling blood samples/weighing groups of coins in a pan balance.
It's key that the vector isn't scanned - but the output should indicate that there is at least one 1 in the vector. By scanning I mean looking at the vector with algorithms such as binary search or looking at each element in turn. That is need to simulate pooling groups of items (such as blood samples) and s single test on the group which indicates the presence of a 1.
I've implemented this 'vector' as a list currently, but this needn't be set in stone. The task is to determine, by testing groups of the sublist, where the 1s in the vector are. An example of the list is:
sparselist = [0]*100000
sparselist[1024] = 1
But this could equally well be a long/set/something else as suggested below.
Currently I'm using any() as the test but it's been pointed out to me that any() will scan the vector - defeating the purpose of what I'm trying to achieve.
Here is an example of a naive binary search using any to test the groups:
def binary_search(inList):
low = 0
high = len(inList)
while low < high:
mid = low + (high-low) // 2
upper = inList[mid:high]
lower = inList[low:mid]
if any(lower):
high = mid
elif any(upper):
low = mid+1
else:
# Neither side has a 1
return -1
return mid
I apologise if this code isn't production quality. Any suggestions to improve it (beyond the any() test) will be appreciated.
I'm trying to come up with a better test than any() as it's been pointed out that any() will scan the list - defeating the point of what I'm trying to do. The test needn't return the exact Hamming weight - it merely needs to indicate that there is (or isn't!) a 1 in the group being tested (i.e. upper/lower in the code above).
I've also thought of using a binary xor, but don't know how to use it in a way that isn't componentwise.
Here is a sketch:
class OrVector (list):
def __init__(self):
self._nonzero_counter = 0
list.__init__(self)
def append(self, x):
list.append(self, x)
if x:
self._nonzero_counter += 1
def remove(self, x):
if x:
self._nonzero_counter -= 1
list.remove(self, x)
def hasOne(self):
return self._nonzero_counter > 0
v = OrVector()
v.append(0)
print v
print v.hasOne()
v.append(1);
print v
print v.hasOne()
v.remove(1);
print v
print v.hasOne()
Output:
[0]
False
[0, 1]
True
[0]
False
The idea is to inherit from list, and add a single variable which stores the number of nonzero entries. While the crucial functionality is delegated to the base list class, at the same time you monitor the number of nonzero entries in the list, and can query it in O(1) time using hasOne() member function.
HTH.
any will only scan the whole vector if does not find you you're after before the end of the "vector".
From the docs it is equivalent to
def any(iterable):
for element in iterable:
if element:
return True
return False
This does make it O(n). If you have things sorted (in your "binary vector") you can use bisect.
e.g. position = index(myVector, value)
Ok, maybe I will try an alternative answer.
You cannot do this with out any prior knowledge of your data. The only thing you can do it to make a test and cache the results. You can design a data structure that will help you determine a result of any subsequent tests in case your data structure is mutable, or a data structure that will be able to determine answer in better time on a subset of your vector.
However, your question does not indicate this. At least it did not at the time of writing the answer. For now you want to make one test on a vector, for a presence of a particular element, giving no prior knowledge about the data, in time complexity less than O(log n) in average case or O(n) in worst. This is not possible.
Also keep in mind you need to load a vector at some point which takes O(n) operations, so if you are interested in performing one test over a set of elements you wont loose much. On the average case with more elements, the loading time will take much more than testing.
If you want to perform a set of tests you can design an algorithm that will "build up" some knowledge during the subsequent test, that will help it determine results in better times. However, that holds only if you want make more than one test!

Categories