python: multiplication in for loop skipped on second iteration - python

I am trying to implement the Sieve of Euler (as described on programmingpraxis.com).I have code that runs fine on the first iteration, however on the next iteration my multiplication is skipped for some reason that escapes me (probably just missing some python-behavior there that is common sense to a more experienced programmer)
I am running this:
import numpy as np
#set up parameters
primes = [2]
startval = 2
stopval = 10000
firstrun = True
candidates = np.arange(start=startval,stop=stopval+1,step=1)
#actual program
for idx in range(int(np.ceil(np.sqrt(stopval)))): #only up until sqrt(n) needs checking
print('pos1')
print(candidates)
print(candidates[0])
times = candidates[0]*candidates
print(times)
diffset = list(set(candidates)^set(times))
if len(diffset) is not 0: #to make sure the program quits properly if diffset=[]
primes.append(diffset.pop(0))
print('pos2')
print(diffset)
candidates = diffset
print('pos3')
print(candidates)
else:
break
print(primes)
The various print statements are just so i can get a grasp on whats going on. Note the first outputs are fine, the interesting part starts the second time pos1 is printed. my candidates are updated just as I want them to, the new first element is also correct. So my question is:
Why is times = candidates[0]*candidatesapparently skipped on the second iteration?
Please note: I am not asking for a "scratch your code, copy this working and faster, better, nicer code" answer. There are plenty of python implementations out there, I want to do this myself. I think I am missing a fairly important concept of python here, and thats why my code doesn't behave.
(Should anyone ask: No, this is not a homework assignment. I am using a bit of python at my workplace and like to do these sorts of things at home to get better at coding)

I just ran your code. Looking at the output of times in line 14 you can see that after the first iteration the operation is performed, but not in the way you intended to. The list times is just three times the list candidates put after one another. To elaborate:
1st iteration
candidates = np.arange(start=startval,stop=stopval+1,step=1)
so candidates is a numpy array. Doing
candidates*candidates[0]
is the same as candidates*2, which is "numpy_array*number", which is just element-wise multiplication.
Now further down you do
diffset = list(set(candidates) ^ set(times))
....
candidates = diffset
which sets up:
2nd iteration
candidates is now a list (see above). Doing
candidates*candidates[0]
is just candidates*3 which is now "list*number" which in python is not "multiply each list element by number", but instead: "create new list as being original list concatenated number times with itself". This is why you don't see the output you expected.
To fix this, simply do:
candidates = np.array(diffset)

Related

Efficient reverse order comparison of huge growing list in Python

In Python, my goal is to maintain a unique list of points (complex scalars, rounded), while steadily creating new ones with a function, like in this pseudo code
list_of_points = []
while True
# generate new point according to some rule
z = generate()
# check whether this point is already there
if z not in list_of_points:
list_of_points.append(z)
if some_condition:
break
Now list_of_points can become potentially huge (like 10 million entries or even more) during the process and duplicates are quite frequent. In fact about 50% of the time, a newly created point is already somewhere in the list. However, what I know is that oftentimes the already existing point is near the end of the list. Sometimes it is in the "bulk" and only very occasionally it can be found near the beginning.
This brought me to the idea of doing the search in reverse order. But how would I do this most efficiently (in terms of raw speed), given my potentially large list which grows during the process. Is the list container even the best way here?
I managed to gain some performance by doing this
list_of_points = []
while True
# generate new point according to some rule
z = generate()
# check very end of list
if z in list_of_points[-10:]:
continue
# check deeper into the list
if z in list_of_points[-100:-10]:
continue
# check the rest
if z not in list_of_points[:-100]:
list_of_points.append(z)
if some_condition:
break
Apparently, this is not very elegant. Using instead a second, FIFO-type container (collection.deque), gives about the same speed up.
Your best bet might to be to use a set instead of a list, python sets use hashing to insert items, so it is very fast. And, you can skip the step of checking if an item is already in the list by simply trying to add it, if it is already in the set it wont be added since duplicates are not allowed.
Stealing your pseudo code axample
set_of_points = {}
while True
# get size of set
a = len(set_of_points)
# generate new point according to some rule
z = generate()
# try to add z to the set
set_of_points.add(z)
b = len(set_of_points)
# if a == b it was not added, thus already existed in the set
if some_condition:
break
Use a set. This is what sets are for. Ah - you already have answer saying that. So my other comment: this part of your code appears to be incorrect:
# check the rest
if z not in list_of_points[100:]:
list_of_points.append(z)
In context, I believe you meant to write list_of_points[:-100] there instead. You already checked the last 100, but, as is, you're skipping checking the first 100 instead.
But even better, use plain list_of_points. As the list grows longer, the cost to possibly do 100 redundant comparisons becomes trivial compared to the cost of copying len(list_of_points) - 100 elements

Optimization of Unique, non-reversible permutations in Python

I'm currently trying to solve the 'dance recital' kattis challenge in Python 3. See here
After taking input on how many performances there are in the dance recital, you must arrange performances in such a way that sequential performances by a dancer are minimized.
I've seen this challenge completed in C++, but my code kept running out of time and I wanted to optimize it.
Question: As of right now, I generate all possible permutations of performances and run comparisons off of that. A faster way to would be to not generate all permutations, as some of them are simply reversed and would result in the exact same output.
import itertools
print(list(itertools.permutations(range(2)))) --> [(0,1),(1,0)] #They're the same, backwards and forwards
print(magic_algorithm(range(2))) --> [(0,1)] #This is what I want
How might I generate such a list of permutations?
I've tried:
-Generating all permutation, running over them again to reversed() duplicates and saving them. This takes too long and the result cannot be hard coded into the solution as the file becomes too big.
-Only generating permutations up to the half-way mark, then stopping, assuming that after that, no unique permutations are generated (not true, as I found out)
-I've checked out questions here, but no one seems to have the same question as me, ditto on the web
Here's my current code:
from itertools import permutations
number_of_routines = int(input()) #first line is number of routines
dance_routine_list = [0]*10
permutation_list = list(permutations(range(number_of_routines))) #generate permutations
for q in range(number_of_routines):
s = input()
for c in s:
v = ord(c) - 65
dance_routine_list[q] |= (1 << v) #each routine ex.'ABC' is A-Z where each char represents a performer in the routine
def calculate():
least_changes_possible = 1e9 #this will become smaller, as optimizations are found
for j in permutation_list:
tmp = 0
for i in range(1,number_of_routines):
tmp += (bin(dance_routine_list[j[i]] & dance_routine_list[j[i - 1]]).count('1')) #each 1 represents a performer who must complete sequential routines
least_changes_possible = min(least_changes_possible, tmp)
return least_changes_possible
print(calculate())
Edit: Took a shower and decided adding a 2-element-comparison look-up table would speed it up, as many of the operations are repeated. Still doesn't fix iterating over the whole permutations, but it should help.
Edit: Found another thread that answered this pretty well. How to generate permutations of a list without "reverse duplicates" in Python using generators
Thank you all!
There are at most 10 possible dance routines, so at most 3.6M permutations, and even bad algorithms like generate 'em all and test will be done very quickly.
If you wanted a fast solution for up to 24 or so routines, then I would do it like this...
Given the the R dance routines, at any point in the recital, in order to decide which routine you can perform next, you need to know:
Which routines you've already performed, because there you can't do those ones next. There are 2R possible sets of already-performed routines; and
Which routine was performed last, because that helps determine the cost of the next one. There are at most R-1 possible values for that.
So there are at less than (R-2)*2R possible recital states...
Imagine a directed graph that connects each possible state to all the possible following states, by an edge for the routine that you would perform to get to that state. Label each edge with the cost of performing that routine.
For example, if you've performed routines 5 and 6, with 5 last, then you would be in state (5,6):5, and there would be an edge to (3,5,6):3 that you could get to after performing routine 3.
Starting at the initial "nothing performed yet" state ():-, use Dijkstra's algorithm to find the least cost path to a state with all routines performed.
Total complexity is O(R2*2R) or so, depending exactly how you implement it.
For R=10, R2*2R is ~100 000, which won't take very long at all. For R=24 it's about 9 billion, which is going to take under half a minute in pretty good C++.

My python code that converts numbers between bases has several errors. What could be wrong and how can I find them?

My program is a function that converts numbers from one base to another. It takes three arguments: the initial value, the base of the initial value, then the base it is to be converted to.
The thing has several errors. For one, the thing won't accept any value that contains a letter for cnum. I don't know why. And I can't seem to figure out how to force the thing to recognize the argument 'cnum' as a string within the function call. I have to convert it into a function in the code itself.
Also, I can't get the second half, the part that converts the number to the final base, to work. Either it gives me an infinite loop (for some reason I can't figure out), or it doesn't do the complete calculation. This one, if I enter fconbase(100, 10, 12) Should convert 100 from base 10 to base 12. It only spits out 8. The answer should be 84.
Here's my entire function.
#delcaring variables
cnum=0 #number to be converted
cbase1=0 #base the number is written in
cbase2=0 #base the number will be converted to
cnumlen=0 #number of digits
digitNum=0 #used to fetch out each digit one by one in order
exp=0 #used to calculate position in result
currentDigit="blank" #stores the digit that's been pulled from the string
result=0 #stores the result of internal calculations
decimalResult=0 #stores cnum as a base 10 number
finalResult=0 #the final result of the conversion
def fconbase(cnum, cbase1, cbase2):
#converts number into base 10, because the math must be done in base 10
#resets variables used in calculations
exp=0
result=0
decimalResult=0
currentDigit="blank"
cnumlen=len(str(cnum)) #finds length of cnum, stays constant
digitNum=cnumlen #sets starting placement
while exp<cnumlen:
currentDigit=str(cnum)[digitNum-1:digitNum]
#the following converts letters into their corresponding integers
if currentDigit=="a" or currentDigit=="A":
currentDigit="10"
if currentDigit=="b" or currentDigit=="B":
currentDigit="11"
if currentDigit=="c" or currentDigit=="C":
currentDigit="12"
if currentDigit=="d" or currentDigit=="D":
currentDigit="13"
if currentDigit=="e" or currentDigit=="E":
currentdigit="14"
if currentDigit=="f" or currentDigit=="F":
currentDigit="15"
result=int(currentDigit)
decimalResult=decimalResult+result*(cbase1**exp)
exp=exp+1
digitNum=digitNum-1
#this part converts the decimal number into the target base
#resetting variables again
exp=0
result=0
finalResult=""
while int(decimalResult)>(cbase2**exp):
exp=exp+1
exp=exp-1
while int(decimalResult)/cbase2**exp!=int(decimalResult):
result=int(decimalResult/(cbase2**exp))
if result==10:
result="a"
if result==11:
result="b"
if result==12:
result="c"
if result==13:
result="d"
if result==14:
result="e"
if result==15:
result="f"
finalResult=str(finalResult)+str(result)
decimalResult=decimalResult%cbase2**exp
exp=exp+1
print(finalResult)
Here is what is supposed to happen in the latter half of the equation:
The program solves cbase2^exp. Exp starts at 0. If that number is less than the decimalResult, then it increases the exp(onent) by 1 and tries again until it results in a number that's greater than the decimalResult.
Then, it divides the decimalResult by cbase2^exp. It converts numbers between 10 and 15 as letters (for bases higher than 10), then appends the result to the final result. It should be concatenating the results together to form the final result that gets printed. I don't understand why its not doing that.
Why does it not generate the right result and why can't I enter a string into the function call?
Without going into specific problems with your code, which as you stated are many, I'll give a brief answer to the actual question in the title
What could be wrong and how can I find [the errors in my code]?
Rather than treating your code as one big complicated function that you have to stare at and understand all at once (I can rarely hold more than 10 lines of code in my own internal brain cache at once), try to break it down into smaller pieces "first I do this and expect this result. Then I take that result and do this to it, and expect another result."
From your description of the problem it seems like you're already thinking that way, but you still dumped this big chunk of code and seemed to struggle with figuring out exactly where the problem is. A lot of beginners will write some big pile of code, and then treat it as a black box while testing it. Like "I'm not getting the right answer and I don't know where the problem begins." This is where learning good debugging skills is crucial.
I would first break things into smaller pieces to just try out at the interactive Python prompt. Put in dummy values for different variables and make sure small snippets of code (1 to 5 lines or so, small small enough that it's easy to reason about) do exactly what you expect to do with different values of the variables.
If that doesn't help, then for starters the tried and true method, often for beginners and advanced developers alike, is to riddle your code with print statements. In as many places as you think is necessary, put a statement to print the values of one or more variables. Like print("exp = %s; result = %s" % (exp, result). Put something this in as many places as you need to trace the values of some variables through the execution. See where it starts to give answers that don't make sense.
Sometimes this is hard to do though. You might not be able to guess the most effective places to put print statements, or even what's important to print. In cases like this (and IMO in most cases) it is more effective to use an interactive debugger like Python's built in pdb. There are many good resources to learn pdb but the basics shouldn't take too long to get down and will save you a whole lot of headache.
pdb will run your code line-by-line, stopping after each line (and in loops it will step through each loop through the loop), allowing you to examine the contents of each variable before advancing to the next line. This gives you full power to check that each part of your code does or doesn't do what you expect, and should help you pinpoint numerous problem areas.
You should use the exp you find in the first step:
while int(decimalResult)>=(cbase2**exp):
exp=exp+1
exp -= 1
while exp >= 0:
...
finalResult=str(finalResult)+str(result)
decimalResult=decimalResult%cbase2**exp
exp -= 1
First of all, the entire first part of the code is not needed, as the int function does it for you. Instead of all that, you can do this.
int(cnum, base=cbase1)
This converts cnum from cbase1 to base 10.
The second part might go to an infinite loop because at the bottom, it says
exp = exp + 1
When it should say
exp = exp - 1
Since you want to go from (for example) 5^2 to 5^0.
The resulting not having the last digit is because it breaks out of the loop at exp = 0.
It doesn't actually add the digit to the result. A simple fix for that is
finalResult = str(finalResult) + str(decimalResult)

slow script trying to find unique values in list

I've got a problem in Python:
I want to find how many UNIQUE a**b values exist if:
2 ≤ a ≤ 100and 2 ≤ b ≤ 100?
I wrote the following script, but it's too slow on my laptop (and doesnt even produce the results):
List=[]
a = 2
b = 2
c = pow(a, b)
while b != 101:
while a != 101:
if List.count(c) == 0:
List.append(c)
a += 1
b += 1
print len(List)
Is it good? Why is it slow?
This code doesn't work; it's an infinite loop because of the way you don't increment a on every iteration of the loop. After you fix that, you still won't get the right answer because you never reset a to 2 when b reaches 101.
Then, List will ever contain only 4 because you set c outside the loop to 2 ** 2 and never change it inside the loop. And when you fix that it'll still be slower than it really needs to be because you are reading the entire list each time through to get the count, and as it gets longer, that takes more and more time.
You generally should use in rather than count if you just need to know if an item is in a list, since it will stop as soon as it finds the the item, but in this specific instance you should be using a set anyway, since you are looking for unique values. You can just add to the set without checking to see whether the item is already in it.
Finally, using for loops is more readable than using while loops.
result = set()
for a in xrange(2, 101):
for b in xrange(2, 101):
result.add(a ** b)
print len(result)
This takes less than a second on my machine.
The reason your script is slow and doesn't return a value is that you have created an infinite loop. You need to dedent the a += 1 line by one level, otherwise, after the first time through the inner while loop a will not get incremented again.
There are some additional issues with the script that have been pointed out in the comments, but this is what is responsible for the issues your are experiencing.
Your code is not good, since it does not produce correct results. As the comment by #grael pointed out, you do not recalculate the value of c inside the loop, so you are counting only one value over and over again. There are also other problems, as other people have noted.
Your code is not fast for several reasons.
You are using a brute-force method. The answer can be found more simply by using number theory and combinatorics. Look at the prime factorization of each number between 2 and 100 and consider the prime factorization of each power of that number. You never need to calculate the complete number--the prime factorization is enough. I'll leave the details to you but this would be much faster.
You are rolling your own loop, but it is faster to use python's. Loop a and b with:
for a in range(2,101):
for b in range(2,101):
c = pow(a, b)
# other code here
This code uses the built-in capabilities of the language and should be faster. This also avoids your errors since it is simpler.
You use a very slow method to see if a number has already been calculated. Your if List.count(c) == 0 must check every previous number to see if the current number has been seen. This will become very slow when you have already seen thousands of numbers. It is much faster to keep the already-seen numbers in a set rather than a list. Checking if a number is in a set is much faster than using count() on a list.
Try combining all these suggestions. As another answer shows, just using the last two probably suffice.

Unexpected behaviour in python random number generation

I have the following code:
import random
rand1 = random.Random()
rand2 = random.Random()
rand1.seed(0)
rand2.seed(0)
rand1.jumpahead(1)
rand2.jumpahead(2)
x = [rand1.random() for _ in range(0,5)]
y = [rand2.random() for _ in range(0,5)]
According to the documentation of jumpahead() function I expected x and y to be (pseudo)independent sequences. But the output that I get is:
x: [0.038378463064751012, 0.79353887395667977, 0.13619161852307016, 0.82978789012683285, 0.44296031215986331]
y: [0.98374801970498793, 0.79353887395667977, 0.13619161852307016, 0.82978789012683285, 0.44296031215986331]
If you notice, the 2nd-5th numbers are same. This happens each time I run the code.
Am I missing something here?
rand1.seed(0)
rand2.seed(0)
You initialize them with the same values so you get the same (non-)randomness. Use some value like the current unix timestamp to seed it and you will get better values. But note that if you initialize two RNGs at the same time with the current time though, you will get the same "random" values from them of course.
Update: Just noticed the jumpahead() stuff: Have a look at How should I use random.jumpahead in Python - it seems to answer your question.
I think there is a bug, python's documentation does not make this as clear as it should.
The difference between your two parameters to jumpahead is 1, this means you are only guaranteed to get 1 unique value (which is what happens). if you want more values, you need larger parameters.
EDIT: Further Explanation
Originally, as the name suggests, jumpahead merely jumped ahead in the sequence. Its clear to see in that case where jumping 1 or 2 places ahead in the sequence would not produce independent results. As it turns out, jumping ahead in most random number generators is inefficient. For that reason, python only approximates jumping ahead. Because its only approximate, python can implement a more effecient algorithm. However, the method is "pretending" to jump ahead, passing two similiar integers will not result in a very different sequence.
To get different sequences you need the integers passed in to be far apart. In particular, if you want to read a million random integers, you need to seperate your jumpaheads by a million.
As a final note, if you have two random number generators, you only need to jumpahead on one of them. You can (and should) leave the other in its original state.

Categories