Speed/structure optimization for a recursive tree - python

Edit -> short version:
In Python, unlike in C, if I pass a parameter to a function I -say: a dict-, the changes made within the function call will reflect outside (as if I passed a pointer instead of just the value)
I want to avoid this so:
-> I make a copy of my dict and pass the copy to my function
But the values of my dict can be some dict and this goes on until an undefinite depth
-> the recursive copy is very long.
Question: what is a pythonic way to go about this?
Long version:
I'm coding a master-mind playing robot with a n-digit code in Python.
You try to guess the code and for each try you get an answer in terms of how many white/black/none you have, meaning resp. "good digit good position"/"good digit wrong position"/"wrong digit" (but you don't know to which digit the whites/blacks/none refer)
I analyze the answers and build a tree of possibilities with a dictionary storing white/black/none.
I store a map of the possible positions of the numbers 0-9 within the code (a digit can appear more than once) in a list.
Ex: for a 3-digit game I will have [[x,y1,y2,y3][-1,0,1,4][...][...][][][][][][]] with:
x: the total number of times this digit appears in the code (default value being n+1, ie. 4 in the exemple) with positive meaning sure and negative "at least"
y1,y2,..,yn the position within the code: 1 means I know the digit is in this position, 0 I know it's not, and 4 (or anything) as default
In my exemple: I know that '1' appears at least once in the code (-1) that it is present in position 2 and that it is NOT present in position 1 and that position 3 is still hypothetically possible.
While I explore my tree of possibilities, I update this list. Which means that each branch of the tree will have its own copy of the list.
Since I recently discovered that, unlike in C, when I pass my list to a sub-method, any change made to it within the sub will reflect on the list outside, I manually copy my list each time with a small method:
def bak_symb(_s):
_b = [[z for z in _s[i]] for i in xrange(10)]
return _b
Now, I profiled my programm and noticed that 90% of the time is spent either in
append()
(the branches of my tree are nested dictionaries {w:{},b:0,n:{}} to which I append each branch of possibilities that I explore)For each branch : the programm has to find a n-digit code
or
my copying function
So I have three questions.
Is there a way to make this function faster?
Is there a something better adapted than the structures I chose (2-depth list for the symbols and nested dict for the hypothesis)
Is there a more adequate way of doing this than building this huge tree
All comments and remarks are welcome.
I'm self-taught in and might have missed some obvious pythonic way of doing some things.
Last but not least, I tried to find a good compromise between making this short and clear, here again don't hesitate to ask for more details.
Thanks in advance,
Matt

Related

str.split() in the for-loop instantiation, does it cause slower execution?

I'm a sucker for reducing code to its bare minimum and love keeping it short and slim, but occasionally I get into the dilemma of whether I'm doing more harm than good. Below is an example of a situation I frequently encounter and where I start pondering if I am minifying at the expense of speed.
str = "my name is john"
##Alternative 1
for el in str.split(" "):
print(el)
##Alternative 2
splittedStr = str.split(" ")
for el in splittedStr:
print(el)
What is faster? I'd assume it's the second one because we don't split the string after every iteration (not even sure we do that)?
str.split(" ") does the exact same thing in both cases. It creates an anonymous list of the split strings. In the second case you have the minor overhead of assigning it to a variable and then fetching the value of the variable. Its wasted time if you don't need to keep the object for other reasons. But this is a trivial amount of time compared to other object referencing taking place in the same loop. Alternative 2 also leaves the data in memory which is another small performance issue.
The real reason Alternative 1 is better than 2, IMHO, is that it doesn't leave the hint that splittedStr is going to be needed later.
Look my friend, if you want to actually reduce the amount of time in the code in general,loop on a tuple instead of list but assigning the result in a variable then using the variable is not the best approach is you just reserved a memory location just to store the value but sometimes you can do that just for the sake of having a clean code like if you have more than one operation in one line like
min(str.split(mylist)[3:10])
In this case, it is better to have a variable called min_value for example just to make things cleaner.
returning back to the performance issue, you could actually notice the difference in performance if you loop through a list or a tuple like
This is looping through a tuple
for i in (1,2,3):
print(i)
& This is looping through a list
for i in [1,2,3]:
print(i)
you will find that using tuple will be faster !

My python code that converts numbers between bases has several errors. What could be wrong and how can I find them?

My program is a function that converts numbers from one base to another. It takes three arguments: the initial value, the base of the initial value, then the base it is to be converted to.
The thing has several errors. For one, the thing won't accept any value that contains a letter for cnum. I don't know why. And I can't seem to figure out how to force the thing to recognize the argument 'cnum' as a string within the function call. I have to convert it into a function in the code itself.
Also, I can't get the second half, the part that converts the number to the final base, to work. Either it gives me an infinite loop (for some reason I can't figure out), or it doesn't do the complete calculation. This one, if I enter fconbase(100, 10, 12) Should convert 100 from base 10 to base 12. It only spits out 8. The answer should be 84.
Here's my entire function.
#delcaring variables
cnum=0 #number to be converted
cbase1=0 #base the number is written in
cbase2=0 #base the number will be converted to
cnumlen=0 #number of digits
digitNum=0 #used to fetch out each digit one by one in order
exp=0 #used to calculate position in result
currentDigit="blank" #stores the digit that's been pulled from the string
result=0 #stores the result of internal calculations
decimalResult=0 #stores cnum as a base 10 number
finalResult=0 #the final result of the conversion
def fconbase(cnum, cbase1, cbase2):
#converts number into base 10, because the math must be done in base 10
#resets variables used in calculations
exp=0
result=0
decimalResult=0
currentDigit="blank"
cnumlen=len(str(cnum)) #finds length of cnum, stays constant
digitNum=cnumlen #sets starting placement
while exp<cnumlen:
currentDigit=str(cnum)[digitNum-1:digitNum]
#the following converts letters into their corresponding integers
if currentDigit=="a" or currentDigit=="A":
currentDigit="10"
if currentDigit=="b" or currentDigit=="B":
currentDigit="11"
if currentDigit=="c" or currentDigit=="C":
currentDigit="12"
if currentDigit=="d" or currentDigit=="D":
currentDigit="13"
if currentDigit=="e" or currentDigit=="E":
currentdigit="14"
if currentDigit=="f" or currentDigit=="F":
currentDigit="15"
result=int(currentDigit)
decimalResult=decimalResult+result*(cbase1**exp)
exp=exp+1
digitNum=digitNum-1
#this part converts the decimal number into the target base
#resetting variables again
exp=0
result=0
finalResult=""
while int(decimalResult)>(cbase2**exp):
exp=exp+1
exp=exp-1
while int(decimalResult)/cbase2**exp!=int(decimalResult):
result=int(decimalResult/(cbase2**exp))
if result==10:
result="a"
if result==11:
result="b"
if result==12:
result="c"
if result==13:
result="d"
if result==14:
result="e"
if result==15:
result="f"
finalResult=str(finalResult)+str(result)
decimalResult=decimalResult%cbase2**exp
exp=exp+1
print(finalResult)
Here is what is supposed to happen in the latter half of the equation:
The program solves cbase2^exp. Exp starts at 0. If that number is less than the decimalResult, then it increases the exp(onent) by 1 and tries again until it results in a number that's greater than the decimalResult.
Then, it divides the decimalResult by cbase2^exp. It converts numbers between 10 and 15 as letters (for bases higher than 10), then appends the result to the final result. It should be concatenating the results together to form the final result that gets printed. I don't understand why its not doing that.
Why does it not generate the right result and why can't I enter a string into the function call?
Without going into specific problems with your code, which as you stated are many, I'll give a brief answer to the actual question in the title
What could be wrong and how can I find [the errors in my code]?
Rather than treating your code as one big complicated function that you have to stare at and understand all at once (I can rarely hold more than 10 lines of code in my own internal brain cache at once), try to break it down into smaller pieces "first I do this and expect this result. Then I take that result and do this to it, and expect another result."
From your description of the problem it seems like you're already thinking that way, but you still dumped this big chunk of code and seemed to struggle with figuring out exactly where the problem is. A lot of beginners will write some big pile of code, and then treat it as a black box while testing it. Like "I'm not getting the right answer and I don't know where the problem begins." This is where learning good debugging skills is crucial.
I would first break things into smaller pieces to just try out at the interactive Python prompt. Put in dummy values for different variables and make sure small snippets of code (1 to 5 lines or so, small small enough that it's easy to reason about) do exactly what you expect to do with different values of the variables.
If that doesn't help, then for starters the tried and true method, often for beginners and advanced developers alike, is to riddle your code with print statements. In as many places as you think is necessary, put a statement to print the values of one or more variables. Like print("exp = %s; result = %s" % (exp, result). Put something this in as many places as you need to trace the values of some variables through the execution. See where it starts to give answers that don't make sense.
Sometimes this is hard to do though. You might not be able to guess the most effective places to put print statements, or even what's important to print. In cases like this (and IMO in most cases) it is more effective to use an interactive debugger like Python's built in pdb. There are many good resources to learn pdb but the basics shouldn't take too long to get down and will save you a whole lot of headache.
pdb will run your code line-by-line, stopping after each line (and in loops it will step through each loop through the loop), allowing you to examine the contents of each variable before advancing to the next line. This gives you full power to check that each part of your code does or doesn't do what you expect, and should help you pinpoint numerous problem areas.
You should use the exp you find in the first step:
while int(decimalResult)>=(cbase2**exp):
exp=exp+1
exp -= 1
while exp >= 0:
...
finalResult=str(finalResult)+str(result)
decimalResult=decimalResult%cbase2**exp
exp -= 1
First of all, the entire first part of the code is not needed, as the int function does it for you. Instead of all that, you can do this.
int(cnum, base=cbase1)
This converts cnum from cbase1 to base 10.
The second part might go to an infinite loop because at the bottom, it says
exp = exp + 1
When it should say
exp = exp - 1
Since you want to go from (for example) 5^2 to 5^0.
The resulting not having the last digit is because it breaks out of the loop at exp = 0.
It doesn't actually add the digit to the result. A simple fix for that is
finalResult = str(finalResult) + str(decimalResult)

Need help understanding some code (Beginner)

I am trying to learn about while and for loops. This function prints out the highest number in a list. But, I'm not entirely sure how it works. Can anyone break down how it works for me. Maybe step by step and/or with a flowchart. I'm struggling and want to learn.
def highest_number(list_tested):
x=list_tested[0]
for number in list_tested:
if x<number:
x=number
print(x)
highest_number([1,5,3,2,3,4,5,8,5,21,2,8,9,3])
One of the most helpful things for understanding new code is going through it step by step:
PythonTutor has a visualizer: Paste in your code and hit visualize execution.
What this is going form the first to the last number and saying:
Is this new number bigger than the one I have? If so, keep the new number, if not keep the old number.
At the end, x will be the largest number.
See my comments for step by step explanation of each line
def highest_number(list_tested): # function defined to take a list
x=list_tested[0] # x is assigned the value of first element of list
for number in list_tested: # iterate over all the elements of input list
if x<number: # if value in 'x' is smaller than the current number
x=number # then store the value of current element in 'x'
print(x) # after iteration complete, print the value of 'x'
highest_number([1,5,3,2,3,4,5,8,5,21,2,8,9,3]) # just call to the function defined above
So basically, the function finds the largest number in the list by value.
It starts by setting the large number (x) as the first element of list, and then keeps comparing it to other elements of the list, until it finds an element which is greater than the largest number found till now (which is stored in x). So at the end, the largest value is stored in x.
Looks like you are new to the programming world. Maybe you should start with some basic concepts, for/while loops are some among which, that would be helpful for you before jumping into something like this.
Here is one of the explanations you may easily find on the Internet http://www.teamten.com/lawrence/programming/intro/intro8.html

Tuples birthday paradox

Tuples is n
The birthday problem equation is this:
Question:
For n = 200, write an algorithm (in Python) for enumerating the number of tuples in the sample space that satisfy the condition that at least two people have the same birthday. (Note that your algorithm will need to scan each tuple)
import itertools
print(list(itertools.permutations([0,0,0]))
I am wondering for this question how do I insert a n into this?
"how to get n in there":
n = 200
space = itertools.permutations(bday_pairs, n)
I've left out a couple parts of your code:
itertools returns a list; you don't need to coerce it.
Printing this result is likely not what you want with n = 200; that's a huge list.
Now, all you need to do is to build bday_pairs, the list of all possible pairs of birthdays. For convenience, I suggest that you use the integers 1-365. Since you haven't attacked that part of the problem at all, I'll leave that step up to you.
You still need to do the processing to count the sets with at least one matching birthday, another part of the problem you haven't attacked. However, I trust that the above code solves your stated problem?

Python: Nested for loops or "next" statement

I'm a rookie hobbyist and I nest for loops when I write python, like so:
dict = {
key1: {subkey/value1: value2}
...
keyn: {subkeyn/valuen: valuen+1}
}
for key in dict:
for subkey/value in key:
do it to it
I'm aware of a "next" keyword that would accomplish the same goal in one line (I asked a question about how to use it but I didn't quite understand it).
So to me, a nested for loop is much more readable. Why, then do people use "next"? I read somewhere that Python is a dynamically-typed and interpreted language and because + both concontinates strings and sums numbers, that it must check variable types for each loop iteration in order to know what the operators are, etc. Does using "next" prevent this in some way, speeding up the execution or is it just a matter of style/preference?
next is precious to advance an iterator when necessary, without that advancement controlling an explicit for loop. For example, if you want "the first item in S that's greater than 100", next(x for x in S if x > 100) will give it to you, no muss, no fuss, no unneeded work (as everything terminates as soon as a suitable x is located) -- and you get an exception (StopIteration) if unexpectedly no x matches the condition. If a no-match is expected and you want None in that case, next((x for x in S if x > 100), None) will deliver that. For this specific purpose, it might be clearer to you if next was actually named first, but that would betray its much more general use.
Consider, for example, the task of merging multiple sequences (e.g., a union or intersection of sorted sequences -- say, sorted files, where the items are lines). Again, next is just what the doctor ordered, because none of the sequences can dominate over the others by controlling A "main for loop". So, assuming for simplicity no duplicates can exist (a condition that's not hard to relax if needed), you keep pairs (currentitem, itsfile) in a list controlled by heapq, and the merging becomes easy... but only thanks to the magic of next to advance the correct file once its item has been used, and that file only.
import heapq
def merge(*theopentextfiles):
theheap = []
for afile in theopentextfiles:
theitem = next(afile, '')
if theitem: theheap.append((theitem, afile))
heapq.heapify(theheap)
while theheap:
theitem, afile = heapq.heappop(theheap)
yielf theitem
theitem = next(afile, '')
if theitem: heapq.heappush(theheap, (theitem, afile))
Just try to do anything anywhere this elegant without next...!-)
One could go on for a long time, but the two use cases "advance an iterator by one place (without letting it control a whole for loop)" and "get just the first item from an iterator" account for most important uses of next.

Categories