I'm a very new python user (had only a little prior experience with html/javascript as far as programming goes), and was trying to find some ways to output only intermittent numbers in my loop for a basic bicycle racing simulation (10,000 lines of biker positions would be pretty excessive :P).
I tried in this loop several 'reasonable' ways to communicate a condition where a floating point number equals its integer floor (int, floor division) to print out every 100 iterations or so:
for i in range (0,10000):
i = i + 1
t = t + t_step #t is initialized at 0 while t_step is set at .01
acceleration_rider1 = (power_rider1 / (70 * velocity_rider1)) - (force_drag1 / 70)
velocity_rider1 = velocity_rider1 + (acceleration_rider1 * t_step)
position_rider1 = position_rider1 + (velocity_rider1 * t_step)
force_drag1 = area_rider1 * (velocity_rider1 ** 2)
acceleration_rider2 = (power_rider2 / (70 * velocity_rider1)) - (force_drag2 / 70)
velocity_rider2 = velocity_rider2 + (acceleration_rider2 * t_step)
position_rider2 = position_rider2 + (velocity_rider2 * t_step)
force_drag2 = area_rider1 * (velocity_rider2 ** 2)
if t == int(t): #TRIED t == t // 1 AND OTHER VARIANTS THAT DON'T WORK HERE:(
print t, "biker 1", position_rider1, "m", "\t", "biker 2", position_rider2, "m"
The for loop auto increments for you, so you don't need to use i = i + 1.
You don't need t, just use % (modulo) operator to find multiples of a number.
# Log every 1000 lines.
LOG_EVERY_N = 1000
for i in range(1000):
... # calculations with i
if (i % LOG_EVERY_N) == 0:
print "logging: ..."
To print out every 100 iterations, I'd suggest
if i % 100 == 0: ...
If you'd rather not print the very first time, then maybe
if i and i % 100 == 0: ...
(as another answer noted, the i = i + 1 is supererogatory given that i is the control variable of the for loop anyway -- it's not particularly damaging though, just somewhat superfluous, and is not really relevant to the issue of why your if doesn't trigger).
While basing the condition on t may seem appealing, t == int(t) is unlikely to work unless the t_step is a multiple of 1.0 / 2**N for some integer N -- fractions cannot be represented exactly in a float unless this condition holds, because floats use a binary base. (You could use decimal.Decimal, but that would seriously impact the speed of your computation, since float computation are directly supported by your machine's hardware, while decimal computations are not).
The other answers suggest that you use the integer variable i instead. That also works, and is the solution I would recommend. This answer is mostly for educational value.
I think it's a roundoff error that is biting you. Floating point numbers can often not be represented exactly, so adding .01 to t for 100 times is not guaranteed to result in t == 1:
>>> sum([.01]*100)
1.0000000000000007
So when you compare to an actual integer number, you need to build in a small tolerance margin. Something like this should work:
if abs(t - int(t)) < 1e-6:
print t, "biker 1", position_rider1, "m", "\t", "biker 2", position_rider2, "m"
You can use python library called tqdm (tqdm derives from the Arabic word taqaddum (تقدّم) which can mean "progress) for showing progress and use write() method from tqdm to print intermittent log statements as answered by #Stephen
Why using tqdm is useful in your case?
Shows compact & fancy progress bar with very minimal code change.
Does not fill your console with thousands of log statement and yet shows accurate iteration progress of your for loop.
Caveats:
Can not use logging library as it writes output stdout only. Though you can redirect it to logfile very easily.
Adds little performance overhead.
Code
from tqdm import tqdm
from time import sleep
# Log every 100 lines.
LOG_EVERY_N = 100
for i in tqdm(range(1,1000)):
if i%LOG_EVERY_N == 0:
tqdm.write(f"loggig : {i}")
sleep(0.5)
How to install ?
pip install tqdm
Sample GIF that shows console output
Related
I am attempting to calculate the probablility of a for loop returning a value lower than 10% of the initial value input using a monte-carlo simulation.
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
print(stock)
This is the for-loop that I wish to run a large number of times (200000 as a rough number) and calculate the probability that:
stock < stock_initial * .9
I've found examples that define their initial loop as a function and then will use that function in the loop, so I have tried to define a function from my loop:
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
This produces values for 'stock' that don't seem to fit the same range as before being defined as a function.
using this code I tried to run a monte-carlo simulation:
# code to implement monte-carlo simulation
number_of_loops = 200 # lower number to run quicker
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
#print(stock) this is to check the value of stock being produced.
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
but this returns no results outside the 10% range even when looped 200000 times, it seems the range/spread of results gets hugely reduced in my defined function somehow.
Can anyone see a problem in my defined function 'stock_value' or can see a way of implementing a monte-carlo simulation in a way I've not come across?
My full code for reference:
#import all modules required
import numpy as np # using different notation for easier writting
import scipy as sp
import matplotlib.pyplot as plt
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#collect variables provided by the user
stock_initial = float(12000) # can be input for variable price of stock initially.
period = int(63) # can be edited to an input() command for variable periods.
mrgn_dec = .10 # decimal value of 10%, can be manipulated to produce a 10% increase/decrease
addmoremoney = stock_initial*(1-mrgn_dec)
rtn_annual = np.repeat(np.arange(0.00,0.15,0.05), 31)
sig_annual = np.repeat(np.arange(0.01,0.31,0.01), 3) #use .31 as python doesn't include the upper range value.
#functions for variables of daily return and risk.
rtn_daily = float((1/252))*rtn_annual
sig_daily = float((1/(np.sqrt(252))))*sig_annual
D=np.random.normal(size=period) # unsure of range to use for standard distribution
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# returns the final value of stock after 63rd day(possibly?)
def stock_value(period):
for i in range(0, period):
if i < 1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock_initial * (1+r)
elif i >=1:
r=(rtn_daily[i]+sig_daily[i]*D[i])
stock = stock * (1+r)
return(stock)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# code to implement monte-carlo simulation
number_of_loops = 20000
for stock_calc in range(1,period+1):
moneyneeded = 0
for i in range(number_of_loops):
stock=stock_value(stock_calc)
if stock < stock_initial * 0.90:
moneyneeded += 1
print(stock)
stock_percentage = float(moneyneeded) / number_of_loops
print(stock_percentage)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Posting an answer as I don't have the points to comment. Some queries about your code - going through these might help you find an answer:
Why have you defined rtn_annual as an array, np.repeat(np.arange(0.00,0.15,0.05), 31)? Since it just repeats the values [0.0, 0.05, 0.1], why not define it as a function?:
def rtn_annual(i):
vals = [0.0, 0.05, 0.1]
return vals[i % 3]
Likewise for sig_annual, rtn_daily, and sig_daily - the contents of these are all straightforward functions of the index, so I'm not sure what the advantage could be of making them arrays.
What does D actually represent? As you've defined it, it's a random variable with mean of 0.0, and standard deviation 1.0. So around 95% of the values in D will be in the range (-2.0, +2.0) - is that what you expect?
Have you tested your stock_value() function even on small periods (e.g. from 0 to a few days) to ensure it's doing what you think it should? It's not clear from your question whether you've verified that it's ever doing the right thing, for any input, and your comment "...(possibly?)" doesn't sound very confident.
Spoiler alert - it almost certainly doesn't. In the function stock_value, your return statement is within the for loop. It will get executed the first time round, when i = 0, and the loop will never get any further than that. This would be the chief reason why the function is giving different results to the loop.
Also, where you say "returning a value lower than 10% of...", I assume you mean "returning a value at least 10% lower than...", since that's what your probability stock < stock_initial * .9 is calculating.
I hope this helps. You may want to step through your code with a debugger in your preferred IDE (idle, or thonny, or eclipse, whatever it may be) to see what your code is actually doing.
I am running an algorithm which reads an excel document by rows, and pushes the rows to a SQL Server, using Python. I would like to print some sort of progression through the loop. I can think of two very simple options and I would like to know which is more lightweight and why.
Option A:
for x in xrange(1, sheet.nrows):
print x
cur.execute() # pushes to sql
Option B:
for x in xrange(1, sheet.nrows):
if x % some_check_progress_value == 0:
print x
cur.execute() # pushes to sql
I have a feeling that the second one would be more efficient but only for larger scale programs. Is there any way to calculate/determine this?
I'm a newbie, so I can't comment. An "answer" might be overkill, but it's all I can do for now.
My favorite thing for this is tqdm. It's minimally invasive, both code-wise and output-wise, and it gets the job done.
I am one of the developers of tqdm, a Python progress bar that tries to be as efficient as possible while providing as many automated features as possible.
The biggest performance sink we had was indeed I/O: printing to the console/file/whatever.
But if your loop is tight (more than 100 iterations/second), then it's useless to print every update, you'd just as well print just 1/10 of the updates and the user would see no difference, while your bar would be 10 times less overhead (faster).
To fix that, at first we added a mininterval parameter which updated the display only every x seconds (which is by default 0.1 seconds, the human eye cannot really see anything faster than that). Something like that:
import time
def my_bar(iterator, mininterval=0.1)
counter = 0
last_print_t = 0
for item in iterator:
if (time.time() - last_print_t) >= mininterval:
last_print_t = time.time()
print_your_bar_update(counter)
counter += 1
This will mostly fix your issue as your bar will always have a constant display overhead which will be more and more negligible as you have bigger iterators.
If you want to go further in the optimization, time.time() is also an I/O operation and thus has a cost greater than simple Python statements. To avoid that, you want to minimize the calls you do to time.time() by introducing another variable: miniters, which is the minimum number of iterations you want to skip before even checking the time:
import time
def my_bar(iterator, mininterval=0.1, miniters=10)
counter = 0
last_print_t = 0
last_print_counter = 0
for item in iterator:
if (counter - last_print_counter) >= miniters:
if (time.time() - last_print_t) >= mininterval:
last_print_t = time.time()
last_print_counter = counter
print_your_bar_update(counter)
counter += 1
You can see that miniters is similar to your Option B modulus solution, but it's better fitted as an added layer over time because time is more easily configured.
With these two parameters, you can manually finetune your progress bar to make it the most efficient possible for your loop.
However, miniters (or modulus) is tricky to get to work generally for everyone without manual finetuning, you need to make good assumptions and clever tricks to automate this finetuning. This is one of the major ongoing work we are doing on tqdm. Basically, what we do is that we try to calculate miniters to equal mininterval, so that time checking isn't even needed anymore. This automagic setting kicks in after mininterval gets triggered, something like that:
from __future__ import division
import time
def my_bar(iterator, mininterval=0.1, miniters=10, dynamic_miniters=True)
counter = 0
last_print_t = 0
last_print_counter = 0
for item in iterator:
if (counter - last_print_counter) >= miniters:
cur_time = time.time()
if (cur_time - last_print_t) >= mininterval:
if dynamic_miniters:
# Simple rule of three
delta_it = counter - last_print_counter
delta_t = cur_time - last_print_t
miniters = delta_it * mininterval / delta_t
last_print_t = cur_time
last_print_counter = counter
print_your_bar_update(counter)
counter += 1
There are various ways to compute miniters automatically, but usually you want to update it to match mininterval.
If you are interested in digging more, you can check the dynamic_miniters internal parameters, maxinterval and an experimental monitoring thread of the tqdm project.
Using the modulus check (counter % N == 0) is almost free compared print and a great solution if you run a high frequency iteration (log a lot).
Specially if you does not need to print for each iteration but want some feedback along the way.
I'm writing a script to download videos from a website. I've added a report hook to get download progress. So, far it shows the percentage and size of the downloaded data. I thought it'd be interesting to add download speed and eta.
Problem is, if I use a simple speed = chunk_size/time the speeds shown are accurate enough but jump around like crazy. So, I've used the history of time taken to download individual chunks. Something like, speed = chunk_size*n/sum(n_time_history).
Now it shows a stable download speed, but it is most certainly wrong because its value is in a few bits/s, while the downloaded file visibly grows at a faster pace.
Can somebody tell me where I'm going wrong?
Here's my code.
def dlProgress(count, blockSize, totalSize):
global init_count
global time_history
try:
time_history.append(time.monotonic())
except NameError:
time_history = [time.monotonic()]
try:
init_count
except NameError:
init_count = count
percent = count*blockSize*100/totalSize
dl, dlu = unitsize(count*blockSize) #returns size in kB, MB, GB, etc.
tdl, tdlu = unitsize(totalSize)
count -= init_count #because continuation of partial downloads is supported
if count > 0:
n = 5 #length of time history to consider
_count = n if count > n else count
time_history = time_history[-_count:]
time_diff = [i-j for i,j in zip(time_history[1:],time_history[:-1])]
speed = blockSize*_count / sum(time_diff)
else: speed = 0
n = int(percent//4)
try:
eta = format_time((totalSize-blockSize*(count+1))//speed)
except:
eta = '>1 day'
speed, speedu = unitsize(speed, True) #returns speed in B/s, kB/s, MB/s, etc.
sys.stdout.write("\r" + percent + "% |" + "#"*n + " "*(25-n) + "| " + dl + dlu + "/" + tdl + tdlu + speed + speedu + eta)
sys.stdout.flush()
Edit:
Corrected the logic. Download speed shown is now much better.
As I increase the length of history used to calculate the speed, the stability increases but sudden changes in speed (if download stops, etc.) aren't shown.
How do I make it stable, yet sensitive to large changes?
I realize the question is now more math oriented, but it'd be great if somebody could help me out or point me in the right direction.
Also, please do tell me if there's a more efficient way to accomplish this.
_count = n if count > n else count
time_history = time_history[-_count:]
time_weights = list(range(1,len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
To make it more stable and not react when download spikes up or down you could add this as well:
_count = n if count > n else count
time_history = time_history[-_count:]
time_history.remove(min(time_history))
time_history.remove(max(time_history))
time_weights = list(range(1, len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
This will remove highest and lowest spike in time_history which will make number displayed more stable. If you want to be picky, you probably could generate weights before removal, and then filter mapped values using time_diff.index(min(time_diff)).
Also using non-linear function (like sqrt()) for weights generation will give you better results. Oh and as I said in comments: adding statistical methods to filter times should be marginally better, but I suspect it's not worth overhead it would add.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am researching wireless security and trying to write a python script to generate passwords, not random, but a dictionary of hex numbers. The letters need to be capital, and it has to go from 12 characters to 20 characters. I went from 11 f's to 20 f's, this seems like it would meet the requirements. I then tried to place them in a text file. After I made the file, I chmod'ed it to 777 and then clicked run. It has been a few minutes, but I cannot tell if it is working or not. I am running it in kali right now, on a 64 bit core i3 with 8gb of ram. I'm not sure how long it would be expected to take, but this is my code, let me know if it looks right please:
# generate 10 to 32 character password list using hex numbers, 0-9 A-F
def gen_pwd(x):
x = range(17592186044415 -295147905179352830000)
def toHex(dec):
x = (dec % 16)
digits = "0123456789ABCDEF"
rest = dec / 16
if (rest == 0):
return digits[x]
return toHex(rest) + digits[x]
for x in range(x):
print toHex(x)
f = open(/root/Home/sdnlnk_pwd.txt)
print f
value = x
string = str(value)
f.write(string)
gen_pwd
how bout just
password = hex(random.randint(1000000,100000000))[2:]
or
pw_len = 12
my_alphabet = "1234567890ABCDEF"
password = "".join(random.choice(my_alphabet) for _ in range(pw_len))
or what maybe closer to what you are trying to do
struct.pack("Q",12365468987654).encode("hex").upper()
basically you are overcomplicating a very simple task
to do exactly what you are asking you can simplify it
import itertools, struct
def int_to_chars(d):
'''
step 1: break into bytes
'''
while d > 0: # while we have not consumed
yield struct.pack("B",d&0xFF) # decode char
d>>=8 # shift right one byte
yield "" # a terminator just in case its empty
def to_password(d):
# this will convert an arbitrarily large number to a password
return "".join(int_to_chars(d)).encode("hex").upper()
# you could probably just get away with `return hex(d)[2:]`
def all_the_passwords(minimum,maximum):
#: since our numbers are so big we need to resort to some trickery
all_pw = itertools.takewhile(lambda x:x<maximum,
itertools.count(minimum))
for pw in all_pw:
yield to_password(pw)
all_passwords = all_the_passwords( 0xfffffffffff ,0xffffffffffffffffffff)
#this next bit is gonna take a while ... go get some coffee or something
for pw in all_passwords:
print pw
#you will be waiting for it to finish for a very long time ... but it will get there
You can use time.time() to get the execution time. and if you are using python 2 use xrange() instead range because xrange return an iterator :
import time
def gen_pwd(x):
def toHex(dec):
x = (dec % 16)
digits = "0123456789ABCDEF"
rest = dec / 16
if (rest == 0):
return digits[x]
return toHex(rest) + digits[x]
for x in range(x):
print toHex(x)
f = open("/root/Home/sdnlnk_pwd.txt")
print f
value = x
string = str(value)
f.write(string)
start= time.time()
gen_pwd()
last=time.time()-start
print last
Note : you need () to call your function and "" in your open() function. also i think your first range is an extra command , as its wrong , you need to remove it.
Disclaimer
I'd like to comment on the OP question but I need to show some code and also the output that said code produces, so that I eventually decided to present my comment in the format of an answer.
OTOH, I hope that this comment persuades the OP that her/his undertaking, while conceptually simple (see my previous answer, 6 lines of Python code), is not feasible with available resources (I mean, available on Planet Earth).
Code
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
pg = lambda n: locale.format("%d", n, grouping=True)
def count_bytes(low, hi):
count = low+1
for i in range(low+1,hi+1):
nn = 15*16**(i-1)
nc = i+1
count = count + nn*nc
return count
n_b = count_bytes(10,20)
n_d = n_b/4/10**12
dollars = 139.99*n_d
print "Total number of bytes to write on disk:", pg(n_b)
print """
Considering the use of
WD Green WD40EZRX 4TB IntelliPower 64MB Cache SATA 6.0Gb/s 3.5\" Internal Hard Drives,
that you can shop at $139.99 each
(see <http://www.newegg.com/Product/Product.aspx?Item=N82E16822236604>,
retrieved on December 29th, 2014)."""
print "\nNumber of 4TB hard disk drives necessary:", pg(n_d)
print "\nCost of said hard disks: $" + pg(dollars)
Output
Total number of bytes to write on disk: 25,306,847,157,254,216,063,385,611
Considering the use of
WD Green WD40EZRX 4TB IntelliPower 64MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drives,
that you can shop at $139.99 each
(see <http://www.newegg.com/Product/Product.aspx?Item=N82E16822236604>,
retrieved on December 29th, 2014).
Number of 4TB hard disk drives necessary: 6,326,711,789,313
Cost of said hard disks: $885,676,383,385,926
My comment on what the OP wants to do
Quite a bit of disk storage (and money) is needed to accomplish your undertaking.
Perspective
Projected US Federal debt at the end of fiscal year 2014 is $18.23 trillion, my estimated cost, not considering racks, power supplies and energy bills, is $886 trillion.
Recommended reading
Combinatorial_Explosion#SussexUniversity,
There is hope
If you are still convinced to pursue your research project on wireless security in the direction you've described, it is possible that you can get a substantial volume discount on the drives'purchase.
characters=["a","b","c"]
for x,y in zip(range(5),characters):
print (hex(x)+y)
Output:
>>>
0x0a
0x1b
0x2c
>>>
You see, its actually doing that with a short way. It is not possible if you use a range like that, keep it small and try to add another things to your result.
Also for file process, here is a better way:
with open("filepath/name","a+") as f:
f.write("whateveryouwanttowrite")
I was working with password generators, well better if you define a dict with complicated characters and compile them like:
passw={"h":"_*2ac","e":"=.kq","y":"%.hq1"}
x=input("Wanna make some passwords? Enter a sentence or word: ")
for i in x:
print (passw[i],end="")
with open("passwords.txt","a+") as f:
f.write(passw[i])
Output:
>>>
Wanna make some passwords? Enter a sentence or word: hey
_*2ac=.kq%.hq1
>>>
So, just define a dict with keys=alphabet and values=complicated characters, and you can make very strong passwords with simple words-sentences.I just wrote it for an example, of course you can add them to dict later, you dont have to write. But basic way is for that is better I think.
Preamble
I don't want to comment on what you want to do.
Code MkI
Your code can be trimmed (quite a bit) to the following
with open("myfile", "w") as f:
for x in xrange(0xff,0xff*2+1): f.write("%X\n"%x)
Comments on my code
Please note that
You can write hex numbers in source code as, ehm, hex numbers and you can mix hex and decimal notation as well
The to_hex function is redundant as python has (surprise!) a number of different ways to format your output as you please (here I've used so called string interpolation).
Of course you have to change the filename in the open statement and
adjust the extremes of the interval generated by xrange (it seems
you're using python 2.x) to your content.
Code MkII
Joran Beasley remarked that (at least in Python 2.7) xrange internally uses a C long and as such it cannot step up to the task of representing
0XFFFFFFFFFFFFFFFFFFFF. This alternative code may be a possibility:
f = open("myfile", "w")
cursor = 0XFFFFFFFFFF
end = 0XFFFFFFFFFFFFFFFFFFFF
while cursor <= end:
f.write("%X\n"%cursor)
cursor += 1
all of this is well and good, however, none of it accomplishes my purpose. if python cannot handle such large numbers, i will have to use something else. as i stated, i do not want to generate random anything, i need a list of sequential hex characters which are anywhere from 12 characters to 20 characters long. it is to make a dictionary of passwords which are nothing more than a hex number that should be about 16 characters long.
does anyone have any suggestions on what i can use for this purpose? i think some type of c language should do the trick, but i know less about c or c++ than python. sounds like this will take a while, but that's ok, it is just a research project.
i have come up with another possibility, counting in hex starting from 11 f's and going until i reach 20 f's. this would produce about 4.3 billion numbes, which should fit in a 79 million page word document. sounds like it is a little large, but if i go from 14 f's to 18 f's, that should be manageable. here is the code i am proposing now:
x = 0xffffffffffffff
def gen_pwd(x):
while x <= 0xffffffffffffffffff:
return x
string = str(x)
f = open("root/Home/sdnlnk_pwd.txt")
print f.upper(string, 'a')
f.write(string)
x = x + 0x1
gen_pwd()
I was trying to solve the Infinite Monkey Theorem which is part of a programming assignment that I came across online.
The problem statement is:
The theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. Well, suppose we replace a monkey with a Python function. How long do you think it would take for a Python function to generate just one sentence of Shakespeare? The sentence we’ll shoot for is: “methinks it is like a weasel”
I am trying to see a) whether it will be possible to generate the string b) After how many iterations was the string generated
I have set recursion limit as 10000 looking at a previous SO question, but I am still getting the run time error for Maximum recursion depth reached.
I am still finding my way around python. I hope to see suggestions on how I could do it in a better way without coming across recursion depth issue.
Here is my code so far:
import random
import sys
alphabet=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',' ']
quote="methinks it is like a weasel"
msg='cont'
count=0
sys.setrecursionlimit(10000)
def generate(msg):
sentence=''
while len(sentence)!=27:
#random.choice() prints a random element from list 'alphabet'
sentence=sentence+random.choice(alphabet)
if msg=='cont':
verify(sentence)
def verify(msg2):
global count
if msg2.find(quote)==-1:
count+=1
generate('cont')
else:
print 'sentence is ',msg2 ,'count is',count
if __name__ == '__main__':
generate(msg)
This is a case where it's better to think before doing. If we ignore capitalization and punctuation, your string is comprised of 28 characters, each of which can in principle be any of the 26 letters of the alphabet or a space. The number of combinations is 2728, which happens to be 11972515182562019788602740026717047105681. If you could enumerate a billion guesses per second, 2728 / 1E9 (tries/sec) / 3600 (sec/hr) / 24 (hrs/day) / 365.25 (days/yr) / 14E9 (yrs/current age of universe)
=> 27099008032844.297. The good news is that you might stumble on the answer at any point, so the expected amount of time is only half of 27 trillion times the current age of the universe.
Blowing out the stack is the least of your problems.
The reason it's called the infinite monkey theorem is that you can divide by the number of monkeys who can process this in parallel, and if that's infinity the solution time becomes the per monkey amount of time to generate a guess, 1 billionth of a second.
It would be better not to call verify() from generate() (and vice-versa) in the likely event that the monkeys have not written Shakespeare.
Having two functions which repeatedly call one another without returning if what causes the recursion depth to be exceeded.
Instead of using recursion, you could simply check whether you've produced your sentence with an iterative approach. For example have a loop which takes a random sentence, then checks whether it matches your required sentence, and if so, outputs the number of tries it took (and if not loops back to the start).
done = False
count = 1
while not done:
msg = generate()
if verify(msg):
print 'success, count = ', count
done = True
count += 1
Maybe something like the following. It runs on CPython 2.[67], CPython 3.[01234], pypy 2.4.0, pypy3 2.3.1 and jython 2.7b3. It should take a very long time to run with --production, even on pypy or pypy3.
#!/usr/local/cpython-3.4/bin/python
'''Infinite monkeys randomly typing Shakespeare (or one monkey randomly typing Shakespeare very fast'''
# pylint: disable=superfluous-parens
# superfluous-parens: Parentheses are good for clarity and portability
import sys
import itertools
def generate(alphabet, desired_string, divisor):
'''Generate matches'''
desired_tuple = tuple(desired_string)
num_possibilities = len(alphabet) ** len(desired_string)
for candidateno, candidate_tuple in enumerate(itertools.product(alphabet, repeat=len(desired_string))):
if candidateno % divisor == 0:
sys.stderr.write('checking candidateno {0} ({1}%)\n'.format(candidateno, candidateno * 100.0 / num_possibilities))
if candidate_tuple == desired_tuple:
match = ''.join(candidate_tuple)
yield match
def usage(retval):
'''Output a usage message'''
sys.stderr.write('Usage: {0} --production\n'.format(sys.argv[0]))
sys.exit(retval)
def print_them(alphabet, quote, divisor):
'''Print the matches'''
for matchno, match in enumerate(generate(alphabet, quote, divisor)):
print('{0} {1}'.format(matchno, match))
def main():
'''Main function'''
production = False
while sys.argv[1:]:
if sys.argv[1] == '--production':
production = True
elif sys.argv[1] in ['--help', '-h']:
usage(0)
else:
sys.stderr.write('{0}: Unrecognized option: {1}\n'.format(sys.argv[0], sys.argv[1]))
usage(1)
if production:
print_them(alphabet='abcdefghijklmnopqrstuvwxyz ', quote='methinks it is like a weasel', divisor=10000)
else:
print_them(alphabet='abcdef', quote='cab', divisor=10)
main()