Why (in Python) is random.randint so much slower than random.random? - python

I got curious about the relative speeds of some random integer generating code. I wrote the following to check it out:
from random import random
from random import choice
from random import randint
from math import floor
import time
def main():
times = 1000000
startTime = time.time()
for i in range(times):
randint(0,9)
print(time.time()-startTime)
startTime = time.time()
for i in range(times):
choice([0,1,2,3,4,5,6,7,8,9])
print(time.time()-startTime)
startTime = time.time()
for i in range(times):
floor(10*random())##generates random integers in the same range as randint(0,9)
print(time.time()-startTime)
main()
The results of one trial of this code were
0.9340872764587402
0.6552846431732178
0.23188304901123047
Even after executing multiplication and math.floor, the final way of generating integers was by far fastest. Messing with the size of the range from which numbers were generated didn't change anything.
So, why is random way faster than randint? and is there any reason why (besides ease of use, readability, and not inviting mistakes) that one would prefer randint to random (e.g., randint produces more random pseudo-random integers)? If floor(x*random()) feels not readable enough but you want faster code, should you go for a specialized routine?
def myrandint(low,high): ###still about 1.6 longer than the above, but almost 2.5 times faster than random.randint
return floor((high-low+1)*random())+low ##returns a random integer between low and high, inclusive. Results may not be what you expect if int(low) != low, etc. But the numpty who writes 'randint(1.9,3.2)' gets what they deserve.

Before I answer your question (and don't worry, I do get there), take note of the common programmer's idiom:
Premature optimization is the root of all evil.
While this isn't always the case, don't worry about micro-optimizations unless you need them.
This goes double for Python: if you're writing something where speed is critical, you'll usually want write it in a language that will run faster, like C. You can then write Python bindings for that C code if you want to use Python for the non-critical parts of your application (as is the case with, for example, NumPy).
Instead of focusing on making individual expressions or functions in your code run as fast as possible, focus on algorithms you use and the the overall structure of your code (and on making it readable, but you are already aware of that). Then, when your application starts running slowly, you can profile it to figure out what parts take the most time, and improve only those parts.
The changes will be easier to make to well-structured, readable code and optimizing the actual bottlenecks will generally give a much better speedup-to-time-coding ratio than most micro-optimizations. The time spent wondering which of two expressions runs faster is time you could have spent getting other things done.
As an exception, I'd say learning why one option is faster than the other is sometimes worth the time, because then you can incorporate that more general knowledge into your future programming, letting you make quicker calls without worrying about the details.
But enough about why we shouldn't waste time worrying about speed, let's talk about speed.
Taking a look at the source of the random module (for CPython 3.7.4), this line from the end of the opening comment provides a short answer:
* The random() method is implemented in C, executes in a single Python step,
and is, therefore, threadsafe.
The first statement is the ones that matters the most to us. random is a python binding for a C function, so the complexity of its operation runs at the blinding speed of machine code rather than the relatively slow speed of Python.
randint, on the other hand, is implemented in Python, and suffers a significant speed penalty for it. randint calls randrange, which ensures that the range's bounds (and step size) are integers, that the range isn't empty, and that the step size isn't zero, before calling getrandbits, which is implemented in C.
This alone produces the majority of randint's slowness. However, there is one more variable in play.
Going a little deeper, into the internal function _randbelow, it turns out that the algorithm for getting a random number between 0 and n is very straightforward: it gets the number of bits in n, then generates that many bits at random repeatedly until the resulting number is no greater than n.
On average (across all possible values of n), this has little effect, but comparing the extremes, it is noticeable.
I wrote a function that tests the impact of that loop. Here are the results:
bits 2 ** (n - 1) (2 ** n) - 1 ratio
64 1.358526759 1.084741422 1.2523968675
128 1.43073282 1.02119227 1.4010415688
256 1.600253063 1.271662798 1.2583941793
512 1.845024581 1.363168823 1.3534820852
1024 2.371779281 1.620392686 1.4637064839
2048 2.98949864 2.01788896 1.48149809
The first column is the number of bits, the second and third are the average time (in microseconds) to find a random integer with that many bits, in microseconds, over 1 000 000 runs. The last column is the ratio of the second and third columns.
You'll notice that average runtimes for the largest number with a given bit length are larger than for the smallest number with that bit length. This is because of that loop:
When looking for a n-bit number less than the largest n-bit number, a second attempt is needed only if that largest number is generated, which is unlikely except for very small n. But to find a number smaller than the smallest (2n−1 is a single 1-bit followed by n−1 0-bits), half of the attempts fail.
Addendum: I removed the tests for bit lengths from 1 to 32 because, upon inspection of the C source for getrandbits, I discovered that it uses a separate, faster, function for those numbers.

Related

Random module in python

How does the random module works in python (or any related/preferable language) ? Is there any way that we can actually write a random number generator by ourselves/What does the inner working of the random module looks like?
Do you think the time complexities varies for different functions like randint(), randrange(), and others? Or it is just that the time complexities varies between the functions mentioned above and for the methods like random.choice() ?
There are many good articles on random number generators. They're not hard to create, but it's very easy to create one that isn't really random. The previous generation used "linear congruential" random number generators, where you just did a multiply, and addition, and a modulo operation. Python today uses a "Mersenne twister" algorithm, which requires a bit more arithmetic, but has a very, very long sequence before repeating.
Almost all of the modules in random just grab the next 64-bit integer, and then do a simple conversion to produce randint or randfloat. choice, it should be clear, just done one array lookup after picking the number.
https://en.wikipedia.org/wiki/Random_number_generation
https://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
https://en.wikipedia.org/wiki/Mersenne_Twister

Python's pow function : How to integrate pow function into assembly language

I am designing an operating system, written in Nasm. The purpose is to calculate Fermat's primality test on a low level system in protected mod without multitasking. I use DOS level system calls in DPMI. The code is a standard one and i do not think it is a good idea blowing up this question with long codes.
The question is about python's pow(int,exponent,modulus) function. I will obviously calculate very long numbers about 10^100000000 length. (Example: 2**332200000-1) The code gets the input from user or reads this from a file. It takes approx. 40Mb to cache this large sized integer into file or just into memory.
This means i just allocate a memory for a size of 40Mb in protected mod.
The Fermat's little theorem is working as you know:
if p is prime,
a is integer and gcd(a,p)=1 then,
a**(p-1) mod p = 1
In python, it is being calculated like a charm with no extra effort it gives much back, but extra-ordinary integers like 2^332200000-1 are at a low speed and i decided to make my own operating system-shell, fired on when my computer is booting. The reason is to get the most out of my piety computer system without any system calls which are lowering the speed of my calculations. I have following question:
Is there a website where i can observe and calculate the assembly-code of python's power function ?
If yes or no, can you give me a hint how to do this effectively in assembly very short and speedy ?
The idea is very basic and brief:
The 4-Byte integer will not work in assembly. So decided to read the long hex integers from a file into allocated memory(40Mbyte). When i do calculations with this integer which is very long, for example multiply by 2 then i roll every 4 Byte-Integer to right into a second memory free place. If there is a carry in rest, this will be added to second 4-Byte calculation and so on. It is possible to use memory for these long integers. Everything is ready designed, but the kick to make sense in assembly is just in researching phase. Can you help or inform me in some way.
Again, how to calculate with very, very long numbers in assembly and how to make a power function python's-like with exponent and a modulus inside. How would that seem to be like in code-form.

Does this sentence contradicts the python paradigm "list should not be initialized"?

People coming from other coding languages to python often ask how they should pre-allocate or initialize their list. This is especially true for people coming from Matlab where codes as
l = []
for i = 1:100
l(end+1) = 1;
end
returns a warning that explicitly suggest you to initialize the list.
There are several posts on SO explaining (and showing through tests) that list initialization isn't required in python. A good example with a fair bit of discussion is this one (but the list could be very long): Create a list with initial capacity in Python
The other day, however, while looking for operations complexity in python, I stumbled this sentence on the official python wiki:
the largest [cost for list operations] come from growing beyond the current allocation size (because everything must move),
This seems to suggest that indeed lists do have a pre-allocation size and that growing beyond that size cause the whole list to move.
This shacked a bit my foundations. Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code? If not, what does that sentence means?
EDIT:
Clearly my question regards the (very common) code:
container = ... #some iterable with 1 gazilion elements
new_list = []
for x in container:
... #do whatever you want with x
new_list.append(x) #or something computed using x
In this case the compiler cannot know how many items there are in container, so new_list could potentially require his allocated memory to change an incredible number of times if what is said in that sentence is true.
I know that this is different for list-comprehensions
Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code?
No, the overall time complexity of the code will be the same, because the time cost of reallocating the list is O(1) when amortised over all of the operations which increase the size of the list.
If not, what does that sentence means?
In principle, pre-allocating the list could reduce the running time by some constant factor, by avoiding multiple re-allocations. This doesn't mean the complexity is lower, but it may mean the code is faster in practice. If in doubt, benchmark or profile the relevant part of your code to compare the two options; in most circumstances it won't matter, and when it does, there are likely to be better alternatives anyway (e.g. NumPy arrays) for achieving the same goal.
new_list could potentially require his allocated memory to change an incredible number of times
List reallocation follows a geometric progression, so if the final length of the list is n then the list is reallocated only O(log n) times along the way; not an "incredible number of times". The way the maths works out, the average number of times each element gets copied to a new underlying array is a constant regardless of how large the list gets, hence the O(1) amortised cost of appending to the list.

Why is the following algorithm O(1) space?

Input: A list of positive integers where one entry occurs exactly once, and all other entries occur exactly twice (for example [1,3,2,5,3,4,1,2,4])
Output: The unique entry (5 in the above example)
The following algorithm is supposed to be O(m) time and O(1) space where m is the size of the list.
def get_unique(intlist):
unique_val = 0
for int in intlist:
unique_val ^= int
return unique_val
My analysis: Given a list of length m there will be (m + 1)/2 unique positive integers in the input list, so that the smallest possible maximum integer in the list will be (m+1)/2. If we assume this best case, then when taking an XOR sum the variable unique_val will require ceiling(log((m+1)/2)) bits in memory, so I thought the space complexity should be at least O(log(m)).
Your analysis is certainly one correct answer, particularly in a language like Python which gracefully handles arbitrarily large numbers.
It's important to be clear about what you're trying to measure when thinking about space and time complexity. A reasonable assumption might be that the size of an integer is constant (e.g. you're using 64-bit integers). In that case, the space complexity is certainly O(1), but the time complexity is still O(m).
Now, you could also argue that using a fixed-size integer means you have a constant upper-bound on the size of m, so perhaps the time complexity is also O(1). But in most cases where you need to analyze the running time of this sort of algorithm, you're probably very interested in the difference between a list of length 10 and one of length 1 billion.
I'd say it's important to clarify and state your assumptions when analyzing space- and time-complexity. In this case, I would assume we have a fixed size integer and a value of m much smaller than the maximum integer value. In that case, O(1) space and O(m) time are probably the best answers.
EDIT (based on discussion in other answers)
Since all m gives you is a lower-bound no the maximum value in the list, you really can't provide a worst-case estimate of the space. I.e. a number in the list can be arbitrarily large. To have any reasonable answer as to the space complexity of this algorithm, you need to make some assumption about the maximum size of the input values.
The (space/time) complexity analysis is usually applied to algorithms on a higher level. While you can drop down to specific language implementation level, it may not be useful in all cases.
Your analysis is both right and possibly wrong. It's right for current cpython implementation where integers do not have a maximum value. It's ok if all your integers are relatively small and fit into the implementation-specific case of small numbers.
But it doesn't have to be valid for all other implementations of python. For example, you could have an optimizing implementation which figures out that intlist is not used again and instead of using unique_val, it reuses the space of the consumed list elements. (basically transforming this function into a space-optimized reduce call)
Then again, can we even talk about space complexity in a GC'd language with allocated integers? Your analysis of the complexity is wrong, because a ^= b will allocate new memory for big value b and the size of that depends on the system, architecture, python version, and luck.
Your original question is however "Why is the following algorithm O(1) space?". If you look at the algorithm itself and assume you have some arbitrary maximum integer limits, or your language can represent any number in a limited space, then the answer is yes. The algorithm itself with those conditions uses constant space.
The complexity of an algorithm is always dependent on the machine model (= platform) you use. E.g. we often say that multiplying and dividing IEEE floating point numbers is of run-time complexity O(1) - which is not always the case (e.g. on an 8086 processor without FPU).
For the above algorithm, the space complexity O(1) only holds as long as your input list has no element > 2147483647 (= sys.maxint). Usually, python stores integers as signed 32 bit values. For those datatypes, your processor has all relevant operations already implemented in hardware and it generally takes only a constant number of clock cycles (in most cases only one) to perform them (= run-time complexity O(1)) and only a constant number of memory addresses (only one) is occupied to store the result (= space complexity O(1)).
However, if your input exceeds 2147483647, python generally uses a software-implemented datatype to store these big integers. Operations on these are no longer in O(1) and they require more than constant O(1) space.

How should I use random.jumpahead in Python

I have a application that does a certain experiment 1000 times (multi-threaded, so that multiple experiments are done at the same time). Every experiment needs appr. 50.000 random.random() calls.
What is the best approach to get this really random. I could copy a random object to every experiment and do than a jumpahead of 50.000 * expid. The documentation suggests that jumpahead(1) already scrambles the state, but is that really true?
Or is there another way to do this in 'the best way'?
(No, the random numbers are not used for security, but for a metropolis hasting algorithm. The only requirement is that the experiments are independent, not whether the random sequence is somehow predictable or so)
I could copy a random object to every experiment and do than a jumpahead of 50.000 * expid.
Approximately correct. Each thread gets their own Random instance.
Seed all of them to the same seed value. Use a constant to test, use /dev/random when you "run for the record".
Edit. Outside Python and in older implementations, use jumpahead( 50000 * expid ) to avoid the situation where two generators wind up with a parallel sequences of values. In any reasonably current (post 2.3) Python, jumpahead is no longer linear and using expid is sufficient to scramble the state.
You can't simply do jumpahead(1) in each thread, since that will assure they are synchronized. Use jumpahead( expid ) to assure each thread is distinctly scrambled.
The documentation suggests that jumpahead(1) already scrambles the state, but is that really true?
Yes, jumpahead does indeed "scramble" the state. Recall that for a given seed you get one -- long -- but fixed sequence of pseudo-random numbers. You're jumping ahead in this sequence. To pass randomness tests, you must get all your values from this one sequence.
Edit. Once upon a time, jumpahead(1) was limited. Now jumpahead(1) really does a larger scrambling. The scrambling, however, is deterministic. You can't simply do jumpahead(1) in each thread.
If you have multiple generators with different seeds, you violate the "one sequence from one seed" assumption and your numbers aren't going to be as random as if you get them from a single sequence.
If you only jumphead 1, you'll may be getting parallel sequences which will might be similar. [This similarity might not be detectable; theoretically, there's a similarity.]
When you jumpahead 50,000, you assure that you follow the 1-sequence-1-seed premise. You also assure that you won't have adjacent sequences of numbers in two experiments.
Finally, you also have repeatability. For a given seed, you get consistent results.
Same jumpahead: Not Good.
>>> y=random.Random( 1 )
>>> z=random.Random( 1 )
>>> y.jumpahead(1)
>>> z.jumpahead(1)
>>> [ y.random() for i in range(5) ]
[0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567]
>>> [ z.random() for i in range(5) ]
[0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567]
You shouldn't use that function. There is no proof it can work on Mersenne Twister generator. Indeed, it was removed from Python 3 for that reason.
For more information about generation of pseudo-random numbers on parallel environments, see this article from David Hill.
jumpahead(1) is indeed sufficient (and identical to jumpahead(50000) or any other such call, in the current implementation of random -- I believe that came in at the same time as the Mersenne Twister based implementation). So use whatever argument fits in well with your programs' logic. (Do use a separate random.Random instance per thread for thread-safety purposes of course, as your question already hints).
(random module generated numbers are not meant to be cryptographically strong, so it's a good thing that you're not using for security purposes;-).
Per the random module docs at python.org:
"You can instantiate your own instances of Random to get generators that don’t share state."
And there's also a relevant-looking note on jumpahead, as you mention. But the guarantees there are kind of vague. If the calls to OS-provided randomness aren't so expensive as to dominate your running time, I'd skip all the subtlety and do something like:
randoms = [random.Random(os.urandom(4)) for _ in range(num_expts)]
If num_expts is ~1000, then you're unlikely to have any collisions in your seed (birthday paradox says you need about 65000 experiments before there's a >50% probability that you have a collision). If this isn't good enough for you or if the number of experiments is more like 100k instead of 1k, then I think it's reasonable to follow this up with
for idx, r in enumerate(randoms):
r.jumpahead(idx)
Note that I don't think it will work to just make your seed longer (os.urandom(8), for example), since the random docs state that the seed must be hashable, and so on a 32-bit platform you're only going to get at most 32 bits (4 bytes) of useful entropy in your seed.
This question piqued my curiosity, so I went and looked at the code implementing the random module. I am definitely not a PRNG expert, but it does seem like slightly differing values of n in jumpahead(n) will lead to markedly different Random instance states. (Always scary to contradict Alex Martelli, but the code does use the value of n when shuffling the random state).

Categories