Python frozenset hashing algorithm / implementation

Python frozenset hashing algorithm / implementation - python

I'm currently trying to understand the mechanism behind the hash function defined for Python's built-in frozenset data type. The implementation is shown at the bottom for reference. What I'm interested in particular is the rationale for the choice of this scattering operation:
lambda h: (h ^ (h << 16) ^ 89869747) * 3644798167
where h is the hash of each element. Does anyone know where these came from? (That is, was there any particular reason to pick these numbers?) Or were they simply chosen arbitrarily?
Here is the snippet from the official CPython implementation,
static Py_hash_t
frozenset_hash(PyObject *self)
{
PySetObject *so = (PySetObject *)self;
Py_uhash_t h, hash = 1927868237UL;
setentry *entry;
Py_ssize_t pos = 0;
if (so->hash != -1)
return so->hash;
hash *= (Py_uhash_t)PySet_GET_SIZE(self) + 1;
while (set_next(so, &pos, &entry)) {
/* Work to increase the bit dispersion for closely spaced hash
values. The is important because some use cases have many
combinations of a small number of elements with nearby
hashes so that many distinct combinations collapse to only
a handful of distinct hash values. */
h = entry->hash;
hash ^= (h ^ (h << 16) ^ 89869747UL) * 3644798167UL;
}
hash = hash * 69069U + 907133923UL;
if (hash == -1)
hash = 590923713UL;
so->hash = hash;
return hash;
}
and an equivalent implementation in Python:
def _hash(self):
MAX = sys.maxint
MASK = 2 * MAX + 1
n = len(self)
h = 1927868237 * (n + 1)
h &= MASK
for x in self:
hx = hash(x)
h ^= (hx ^ (hx << 16) ^ 89869747) * 3644798167
h &= MASK
h = h * 69069 + 907133923
h &= MASK
if h > MAX:
h -= MASK + 1
if h == -1:
h = 590923713
return h

The problem being solved is that the previous hash algorithm in Lib/sets.py had horrendous performance on datasets that arise in a number of graph algorithms (where nodes are represented as frozensets):
# Old-algorithm with bad performance
def _compute_hash(self):
result = 0
for elt in self:
result ^= hash(elt)
return result
def __hash__(self):
if self._hashcode is None:
self._hashcode = self._compute_hash()
return self._hashcode
A new algorithm was created because it had much better performance. Here is an overview of the salient parts of the new algorithm:
1) The xor-equal in h ^= (hx ^ (hx << 16) ^ 89869747) * 3644798167 is necessary so that the algorithm is commutative (the hash does not depend on the order that set elements are encountered). Since sets has an unordered equality test, the hash for frozenset([10, 20]) needs to be the same as for frozenset([20, 10]).
2) The xor with89869747 was chosen for its interesting bit pattern 101010110110100110110110011 which is used to break-up sequences of nearby hash values prior to multiplying by 3644798167, a randomly chosen large prime with another interesting bit pattern.
3) The xor with hx << 16 was included so that the lower bits had two chances to affect the outcome (resulting in better dispersion of nearby hash values). In this, I was inspired by how CRC algorithms shuffled bits back on to themselves.
4) If I recall correctly, the only one of the constants that is special is 69069. It had some history from the world of linear congruential random number generators. See https://www.google.com/search?q=69069+rng for some references.
5) The final step of computing hash = hash * 69069U + 907133923UL was added to handle cases with nested frozensets and to make the algorithm disperse in a pattern orthogonal to the hash algorithms for other objects (strings, tuples, ints, etc).
6) Most of the other constants were randomly chosen large prime numbers.
Though I would like to claim divine inspiration for the hash algorithm, the reality was that I took a bunch of badly performing datasets, analyzed why their hashes weren't dispersing, and then toyed with the algorithm until the collision statistics stopped being so embarrassing.
For example, here is an efficacy test from Lib/test/test_set.py that failed for algorithms with less diffusion:
def test_hash_effectiveness(self):
n = 13
hashvalues = set()
addhashvalue = hashvalues.add
elemmasks = [(i+1, 1<<i) for i in range(n)]
for i in xrange(2**n):
addhashvalue(hash(frozenset([e for e, m in elemmasks if m&i])))
self.assertEqual(len(hashvalues), 2**n)
Other failing examples included powersets of strings and small integer ranges as well as the graph algorithms in the test suite: See TestGraphs.test_cuboctahedron and TestGraphs.test_cube in Lib/test/test_set.py.

Unless Raymond Hettinger (the code's author) chimes in, we'll never know for sure ;-) But there's usually less "science" in these things than you might expect: you take some general principles, and a test suite, and fiddle the constants almost arbitrarily until the results look "good enough".
Some general principles "obviously" at work here:
To get the desired quick "bit dispersion", you want to multiply by a large integer. Since CPython's hash result has to fit in 32 bits on many platforms, an integer that requires 32 bits is best for this. And, indeed, (3644798167).bit_length() == 32.
To avoid systematically losing the low-order bit(s), you want to multiply by an odd integer. 3644798167 is odd.
More generally, to avoid compounding patterns in the input hashes, you want to multiply by a prime. And 3644798167 is prime.
And you also want a multiplier whose binary representation doesn't have obvious repeating patterns. bin(3644798167) == '0b11011001001111110011010011010111'. That's pretty messed up, which is a good thing ;-)
The other constants look utterly arbitrary to me. The
if h == -1:
h = 590923713
part is needed for another reason: internally, CPython takes a -1 return value from an integer-valued C function as meaning "an exception needs to be raised"; i.e., it's an error return. So you'll never see a hash code of -1 for any object in CPython. The value returned instead of -1 is wholly arbitrary - it just needs to be the same value (instead of -1) each time.
EDIT: playing around
I don't know what Raymond used to test this. Here's what I would have used: look at hash statistics for all subsets of a set of consecutive integers. Those are problematic because hash(i) == i for a great many integers i.
>>> all(hash(i) == i for i in range(1000000))
True
Simply xor'ing hashes together will yield massive cancellation on inputs like that.
So here's a little function to generate all subsets, and another to do a dirt-simple xor across all hash codes:
def hashxor(xs):
h = 0
for x in xs:
h ^= hash(x)
return h
def genpowerset(xs):
from itertools import combinations
for length in range(len(xs) + 1):
for t in combinations(xs, length):
yield t
Then a driver, and a little function to display collision statistics:
def show_stats(d):
total = sum(d.values())
print "total", total, "unique hashes", len(d), \
"collisions", total - len(d)
def drive(n, hasher=hashxor):
from collections import defaultdict
d = defaultdict(int)
for t in genpowerset(range(n)):
d[hasher(t)] += 1
show_stats(d)
Using the dirt-simple hasher is disastrous:
>> drive(20)
total 1048576 unique hashes 32 collisions 1048544
Yikes! OTOH, using the _hash() designed for frozensets does a perfect job in this case:
>>> drive(20, _hash)
total 1048576 unique hashes 1048576 collisions 0
Then you can play with that to see what does - and doesn't - make a real difference in _hash(). For example, it still does a perfect job on these inputs if
h = h * 69069 + 907133923
is removed. And I have no idea why that line is there. Similarly, it continues to do a perfect job on these inputs if the ^ 89869747 in the inner loop is removed - don't know why that's there either. And initialization can be changed from:
h = 1927868237 * (n + 1)
to:
h = n
without harm here too. That all jibes with what I expected: it's the multiplicative constant in the inner loop that's crucial, for reasons already explained. For example, add 1 to it (use 3644798168) and then it's no longer prime or odd, and the stats degrade to:
total 1048576 unique hashes 851968 collisions 196608
Still quite usable, but definitely worse. Change it to a small prime, like 13, and it's worse:
total 1048576 unique hashes 483968 collisions 564608
Use a multiplier with an obvious binary pattern, like 0b01010101010101010101010101010101, and worse again:
total 1048576 unique hashes 163104 collisions 885472
Play around! These things are fun :-)

In
(h ^ (h << 16) ^ 89869747) * 3644798167
the multiplicative integer is a large prime to reduce collisions. This is especially relevant since the operation is under modulo.
The rest is probably arbitrary; I see no reason for the 89869747 to be specific. The most important usage you would get out of that is enlarging hashes of small numbers (most integers hash to themselves). This prevents high collisions for sets of small integers.
That's all I can think of. What do you need this for?

Related

How to prevent Python from identifying a very large float as integer [duplicate]

How could I check if a number is a perfect square?
Speed is of no concern, for now, just working.
See also: Integer square root in python.

The problem with relying on any floating point computation (math.sqrt(x), or x**0.5) is that you can't really be sure it's exact (for sufficiently large integers x, it won't be, and might even overflow). Fortunately (if one's in no hurry;-) there are many pure integer approaches, such as the following...:
def is_square(apositiveint):
x = apositiveint // 2
seen = set([x])
while x * x != apositiveint:
x = (x + (apositiveint // x)) // 2
if x in seen: return False
seen.add(x)
return True
for i in range(110, 130):
print i, is_square(i)
Hint: it's based on the "Babylonian algorithm" for square root, see wikipedia. It does work for any positive number for which you have enough memory for the computation to proceed to completion;-).
Edit: let's see an example...
x = 12345678987654321234567 ** 2
for i in range(x, x+2):
print i, is_square(i)
this prints, as desired (and in a reasonable amount of time, too;-):
152415789666209426002111556165263283035677489 True
152415789666209426002111556165263283035677490 False
Please, before you propose solutions based on floating point intermediate results, make sure they work correctly on this simple example -- it's not that hard (you just need a few extra checks in case the sqrt computed is a little off), just takes a bit of care.
And then try with x**7 and find clever way to work around the problem you'll get,
OverflowError: long int too large to convert to float
you'll have to get more and more clever as the numbers keep growing, of course.
If I was in a hurry, of course, I'd use gmpy -- but then, I'm clearly biased;-).
>>> import gmpy
>>> gmpy.is_square(x**7)
1
>>> gmpy.is_square(x**7 + 1)
0
Yeah, I know, that's just so easy it feels like cheating (a bit the way I feel towards Python in general;-) -- no cleverness at all, just perfect directness and simplicity (and, in the case of gmpy, sheer speed;-)...

Use Newton's method to quickly zero in on the nearest integer square root, then square it and see if it's your number. See isqrt.
Python ≥ 3.8 has math.isqrt. If using an older version of Python, look for the "def isqrt(n)" implementation here.
import math
def is_square(i: int) -> bool:
return i == math.isqrt(i) ** 2

Since you can never depend on exact comparisons when dealing with floating point computations (such as these ways of calculating the square root), a less error-prone implementation would be
import math
def is_square(integer):
root = math.sqrt(integer)
return integer == int(root + 0.5) ** 2
Imagine integer is 9. math.sqrt(9) could be 3.0, but it could also be something like 2.99999 or 3.00001, so squaring the result right off isn't reliable. Knowing that int takes the floor value, increasing the float value by 0.5 first means we'll get the value we're looking for if we're in a range where float still has a fine enough resolution to represent numbers near the one for which we are looking.

If youre interested, I have a pure-math response to a similar question at math stackexchange, "Detecting perfect squares faster than by extracting square root".
My own implementation of isSquare(n) may not be the best, but I like it. Took me several months of study in math theory, digital computation and python programming, comparing myself to other contributors, etc., to really click with this method. I like its simplicity and efficiency though. I havent seen better. Tell me what you think.
def isSquare(n):
## Trivial checks
if type(n) != int: ## integer
return False
if n < 0: ## positivity
return False
if n == 0: ## 0 pass
return True
## Reduction by powers of 4 with bit-logic
while n&3 == 0:
n=n>>2
## Simple bit-logic test. All perfect squares, in binary,
## end in 001, when powers of 4 are factored out.
if n&7 != 1:
return False
if n==1:
return True ## is power of 4, or even power of 2
## Simple modulo equivalency test
c = n%10
if c in {3, 7}:
return False ## Not 1,4,5,6,9 in mod 10
if n % 7 in {3, 5, 6}:
return False ## Not 1,2,4 mod 7
if n % 9 in {2,3,5,6,8}:
return False
if n % 13 in {2,5,6,7,8,11}:
return False
## Other patterns
if c == 5: ## if it ends in a 5
if (n//10)%10 != 2:
return False ## then it must end in 25
if (n//100)%10 not in {0,2,6}:
return False ## and in 025, 225, or 625
if (n//100)%10 == 6:
if (n//1000)%10 not in {0,5}:
return False ## that is, 0625 or 5625
else:
if (n//10)%4 != 0:
return False ## (4k)*10 + (1,9)
## Babylonian Algorithm. Finding the integer square root.
## Root extraction.
s = (len(str(n))-1) // 2
x = (10**s) * 4
A = {x, n}
while x * x != n:
x = (x + (n // x)) >> 1
if x in A:
return False
A.add(x)
return True
Pretty straight forward. First it checks that we have an integer, and a positive one at that. Otherwise there is no point. It lets 0 slip through as True (necessary or else next block is infinite loop).
The next block of code systematically removes powers of 4 in a very fast sub-algorithm using bit shift and bit logic operations. We ultimately are not finding the isSquare of our original n but of a k<n that has been scaled down by powers of 4, if possible. This reduces the size of the number we are working with and really speeds up the Babylonian method, but also makes other checks faster too.
The third block of code performs a simple Boolean bit-logic test. The least significant three digits, in binary, of any perfect square are 001. Always. Save for leading zeros resulting from powers of 4, anyway, which has already been accounted for. If it fails the test, you immediately know it isnt a square. If it passes, you cant be sure.
Also, if we end up with a 1 for a test value then the test number was originally a power of 4, including perhaps 1 itself.
Like the third block, the fourth tests the ones-place value in decimal using simple modulus operator, and tends to catch values that slip through the previous test. Also a mod 7, mod 8, mod 9, and mod 13 test.
The fifth block of code checks for some of the well-known perfect square patterns. Numbers ending in 1 or 9 are preceded by a multiple of four. And numbers ending in 5 must end in 5625, 0625, 225, or 025. I had included others but realized they were redundant or never actually used.
Lastly, the sixth block of code resembles very much what the top answerer - Alex Martelli - answer is. Basically finds the square root using the ancient Babylonian algorithm, but restricting it to integer values while ignoring floating point. Done both for speed and extending the magnitudes of values that are testable. I used sets instead of lists because it takes far less time, I used bit shifts instead of division by two, and I smartly chose an initial start value much more efficiently.
By the way, I did test Alex Martelli's recommended test number, as well as a few numbers many orders magnitude larger, such as:
x=1000199838770766116385386300483414671297203029840113913153824086810909168246772838680374612768821282446322068401699727842499994541063844393713189701844134801239504543830737724442006577672181059194558045164589783791764790043104263404683317158624270845302200548606715007310112016456397357027095564872551184907513312382763025454118825703090010401842892088063527451562032322039937924274426211671442740679624285180817682659081248396873230975882215128049713559849427311798959652681930663843994067353808298002406164092996533923220683447265882968239141724624870704231013642255563984374257471112743917655991279898690480703935007493906644744151022265929975993911186879561257100479593516979735117799410600147341193819147290056586421994333004992422258618475766549646258761885662783430625 ** 2
for i in range(x, x+2):
print(i, isSquare(i))
printed the following results:
1000399717477066534083185452789672211951514938424998708930175541558932213310056978758103599452364409903384901149641614494249195605016959576235097480592396214296565598519295693079257885246632306201885850365687426564365813280963724310434494316592041592681626416195491751015907716210235352495422858432792668507052756279908951163972960239286719854867504108121432187033786444937064356645218196398775923710931242852937602515835035177768967470757847368349565128635934683294155947532322786360581473152034468071184081729335560769488880138928479829695277968766082973795720937033019047838250608170693879209655321034310764422462828792636246742456408134706264621790736361118589122797268261542115823201538743148116654378511916000714911467547209475246784887830649309238110794938892491396597873160778553131774466638923135932135417900066903068192088883207721545109720968467560224268563643820599665232314256575428214983451466488658896488012211237139254674708538347237589290497713613898546363590044902791724541048198769085430459186735166233549186115282574626012296888817453914112423361525305960060329430234696000121420787598967383958525670258016851764034555105019265380321048686563527396844220047826436035333266263375049097675787975100014823583097518824871586828195368306649956481108708929669583308777347960115138098217676704862934389659753628861667169905594181756523762369645897154232744410732552956489694024357481100742138381514396851789639339362228442689184910464071202445106084939268067445115601375050153663645294106475257440167535462278022649865332161044187890625 True
1000399717477066534083185452789672211951514938424998708930175541558932213310056978758103599452364409903384901149641614494249195605016959576235097480592396214296565598519295693079257885246632306201885850365687426564365813280963724310434494316592041592681626416195491751015907716210235352495422858432792668507052756279908951163972960239286719854867504108121432187033786444937064356645218196398775923710931242852937602515835035177768967470757847368349565128635934683294155947532322786360581473152034468071184081729335560769488880138928479829695277968766082973795720937033019047838250608170693879209655321034310764422462828792636246742456408134706264621790736361118589122797268261542115823201538743148116654378511916000714911467547209475246784887830649309238110794938892491396597873160778553131774466638923135932135417900066903068192088883207721545109720968467560224268563643820599665232314256575428214983451466488658896488012211237139254674708538347237589290497713613898546363590044902791724541048198769085430459186735166233549186115282574626012296888817453914112423361525305960060329430234696000121420787598967383958525670258016851764034555105019265380321048686563527396844220047826436035333266263375049097675787975100014823583097518824871586828195368306649956481108708929669583308777347960115138098217676704862934389659753628861667169905594181756523762369645897154232744410732552956489694024357481100742138381514396851789639339362228442689184910464071202445106084939268067445115601375050153663645294106475257440167535462278022649865332161044187890626 False
And it did this in 0.33 seconds.
In my opinion, my algorithm works the same as Alex Martelli's, with all the benefits thereof, but has the added benefit highly efficient simple-test rejections that save a lot of time, not to mention the reduction in size of test numbers by powers of 4, which improves speed, efficiency, accuracy and the size of numbers that are testable. Probably especially true in non-Python implementations.
Roughly 99% of all integers are rejected as non-Square before Babylonian root extraction is even implemented, and in 2/3 the time it would take the Babylonian to reject the integer. And though these tests dont speed up the process that significantly, the reduction in all test numbers to an odd by dividing out all powers of 4 really accelerates the Babylonian test.
I did a time comparison test. I tested all integers from 1 to 10 Million in succession. Using just the Babylonian method by itself (with my specially tailored initial guess) it took my Surface 3 an average of 165 seconds (with 100% accuracy). Using just the logical tests in my algorithm (excluding the Babylonian), it took 127 seconds, it rejected 99% of all integers as non-Square without mistakenly rejecting any perfect squares. Of those integers that passed, only 3% were perfect Squares (a much higher density). Using the full algorithm above that employs both the logical tests and the Babylonian root extraction, we have 100% accuracy, and test completion in only 14 seconds. The first 100 Million integers takes roughly 2 minutes 45 seconds to test.
EDIT: I have been able to bring down the time further. I can now test the integers 0 to 100 Million in 1 minute 40 seconds. A lot of time is wasted checking the data type and the positivity. Eliminate the very first two checks and I cut the experiment down by a minute. One must assume the user is smart enough to know that negatives and floats are not perfect squares.

import math
def is_square(n):
sqrt = math.sqrt(n)
return (sqrt - int(sqrt)) == 0
A perfect square is a number that can be expressed as the product of two equal integers. math.sqrt(number) return a float. int(math.sqrt(number)) casts the outcome to int.
If the square root is an integer, like 3, for example, then math.sqrt(number) - int(math.sqrt(number)) will be 0, and the if statement will be False. If the square root was a real number like 3.2, then it will be True and print "it's not a perfect square".
It fails for a large non-square such as 152415789666209426002111556165263283035677490.

My answer is:
def is_square(x):
return x**.5 % 1 == 0
It basically does a square root, then modulo by 1 to strip the integer part and if the result is 0 return True otherwise return False. In this case x can be any large number, just not as large as the max float number that python can handle: 1.7976931348623157e+308
It is incorrect for a large non-square such as 152415789666209426002111556165263283035677490.

This can be solved using the decimal module to get arbitrary precision square roots and easy checks for "exactness":
import math
from decimal import localcontext, Context, Inexact
def is_perfect_square(x):
# If you want to allow negative squares, then set x = abs(x) instead
if x < 0:
return False
# Create localized, default context so flags and traps unset
with localcontext(Context()) as ctx:
# Set a precision sufficient to represent x exactly; `x or 1` avoids
# math domain error for log10 when x is 0
ctx.prec = math.ceil(math.log10(x or 1)) + 1 # Wrap ceil call in int() on Py2
# Compute integer square root; don't even store result, just setting flags
ctx.sqrt(x).to_integral_exact()
# If previous line couldn't represent square root as exact int, sets Inexact flag
return not ctx.flags[Inexact]
For demonstration with truly huge values:
# I just kept mashing the numpad for awhile :-)
>>> base = 100009991439393999999393939398348438492389402490289028439083249803434098349083490340934903498034098390834980349083490384903843908309390282930823940230932490340983098349032098324908324098339779438974879480379380439748093874970843479280329708324970832497804329783429874329873429870234987234978034297804329782349783249873249870234987034298703249780349783497832497823497823497803429780324
>>> sqr = base ** 2
>>> sqr ** 0.5 # Too large to use floating point math
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: int too large to convert to float
>>> is_perfect_power(sqr)
True
>>> is_perfect_power(sqr-1)
False
>>> is_perfect_power(sqr+1)
False
If you increase the size of the value being tested, this eventually gets rather slow (takes close to a second for a 200,000 bit square), but for more moderate numbers (say, 20,000 bits), it's still faster than a human would notice for individual values (~33 ms on my machine). But since speed wasn't your primary concern, this is a good way to do it with Python's standard libraries.
Of course, it would be much faster to use gmpy2 and just test gmpy2.mpz(x).is_square(), but if third party packages aren't your thing, the above works quite well.

I just posted a slight variation on some of the examples above on another thread (Finding perfect squares) and thought I'd include a slight variation of what I posted there here (using nsqrt as a temporary variable), in case it's of interest / use:
import math
def is_square(n):
if not (isinstance(n, int) and (n >= 0)):
return False
else:
nsqrt = math.sqrt(n)
return nsqrt == math.trunc(nsqrt)
It is incorrect for a large non-square such as 152415789666209426002111556165263283035677490.

A variant of #Alex Martelli's solution without set
When x in seen is True:
In most cases, it is the last one added, e.g. 1022 produces the x's sequence 511, 256, 129, 68, 41, 32, 31, 31;
In some cases (i.e., for the predecessors of perfect squares), it is the second-to-last one added, e.g. 1023 produces 511, 256, 129, 68, 41, 32, 31, 32.
Hence, it suffices to stop as soon as the current x is greater than or equal to the previous one:
def is_square(n):
assert n > 1
previous = n
x = n // 2
while x * x != n:
x = (x + (n // x)) // 2
if x >= previous:
return False
previous = x
return True
x = 12345678987654321234567 ** 2
assert not is_square(x-1)
assert is_square(x)
assert not is_square(x+1)
Equivalence with the original algorithm tested for 1 < n < 10**7. On the same interval, this slightly simpler variant is about 1.4 times faster.

This is my method:
def is_square(n) -> bool:
return int(n**0.5)**2 == int(n)
Take square root of number. Convert to integer. Take the square. If the numbers are equal, then it is a perfect square otherwise not.
It is incorrect for a large square such as 152415789666209426002111556165263283035677489.

If the modulus (remainder) leftover from dividing by the square root is 0, then it is a perfect square.
def is_square(num: int) -> bool:
return num % math.sqrt(num) == 0
I checked this against a list of perfect squares going up to 1000.

It is possible to improve the Babylonian method by observing that the successive terms form a decreasing sequence if one starts above the square root of n.
def is_square(n):
assert n > 1
a = n
b = (a + n // a) // 2
while b < a:
a = b
b = (a + n // a) // 2
return a * a == n

If it's a perfect square, its square root will be an integer, the fractional part will be 0, we can use modulus operator to check fractional part, and check if it's 0, it does fail for some numbers, so, for safety, we will also check if it's square of the square root even if the fractional part is 0.
import math
def isSquare(n):
root = math.sqrt(n)
if root % 1 == 0:
if int(root) * int(root) == n:
return True
return False
isSquare(4761)

You could binary-search for the rounded square root. Square the result to see if it matches the original value.
You're probably better off with FogleBirds answer - though beware, as floating point arithmetic is approximate, which can throw this approach off. You could in principle get a false positive from a large integer which is one more than a perfect square, for instance, due to lost precision.

A simple way to do it (faster than the second one) :
def is_square(n):
return str(n**(1/2)).split(".")[1] == '0'
Another way:
def is_square(n):
if n == 0:
return True
else:
if n % 2 == 0 :
for i in range(2,n,2):
if i*i == n:
return True
else :
for i in range(1,n,2):
if i*i == n:
return True
return False

This response doesn't pertain to your stated question, but to an implicit question I see in the code you posted, ie, "how to check if something is an integer?"
The first answer you'll generally get to that question is "Don't!" And it's true that in Python, typechecking is usually not the right thing to do.
For those rare exceptions, though, instead of looking for a decimal point in the string representation of the number, the thing to do is use the isinstance function:
>>> isinstance(5,int)
True
>>> isinstance(5.0,int)
False
Of course this applies to the variable rather than a value. If I wanted to determine whether the value was an integer, I'd do this:
>>> x=5.0
>>> round(x) == x
True
But as everyone else has covered in detail, there are floating-point issues to be considered in most non-toy examples of this kind of thing.

If you want to loop over a range and do something for every number that is NOT a perfect square, you could do something like this:
def non_squares(upper):
next_square = 0
diff = 1
for i in range(0, upper):
if i == next_square:
next_square += diff
diff += 2
continue
yield i
If you want to do something for every number that IS a perfect square, the generator is even easier:
(n * n for n in range(upper))

I think that this works and is very simple:
import math
def is_square(num):
sqrt = math.sqrt(num)
return sqrt == int(sqrt)
It is incorrect for a large non-square such as 152415789666209426002111556165263283035677490.

a=int(input('enter any number'))
flag=0
for i in range(1,a):
if a==i*i:
print(a,'is perfect square number')
flag=1
break
if flag==1:
pass
else:
print(a,'is not perfect square number')

In kotlin :
It's quite easy and it passed all test cases as well.
really thanks to >> https://www.quora.com/What-is-the-quickest-way-to-determine-if-a-number-is-a-perfect-square
fun isPerfectSquare(num: Int): Boolean {
var result = false
var sum=0L
var oddNumber=1L
while(sum<num){
sum = sum + oddNumber
oddNumber = oddNumber+2
}
result = sum == num.toLong()
return result
}

def isPerfectSquare(self, num: int) -> bool:
left, right = 0, num
while left <= right:
mid = (left + right) // 2
if mid**2 < num:
left = mid + 1
elif mid**2 > num:
right = mid - 1
else:
return True
return False

Decide how long the number will be.
take a delta 0.000000000000.......000001
see if the (sqrt(x))^2 - x is greater / equal /smaller than delta and decide based on the delta error.

import math
def is_square(n):
sqrt = math.sqrt(n)
return sqrt == int(sqrt)
It fails for a large non-square such as 152415789666209426002111556165263283035677490.

The idea is to run a loop from i = 1 to floor(sqrt(n)) then check if squaring it makes n.
bool isPerfectSquare(int n)
{
for (int i = 1; i * i <= n; i++) {
// If (i * i = n)
if ((n % i == 0) && (n / i == i)) {
return true;
}
}
return false;
}

Time Complexity - letter combinations of a phone number [duplicate]

Most people with a degree in CS will certainly know what Big O stands for.
It helps us to measure how well an algorithm scales.
But I'm curious, how do you calculate or approximate the complexity of your algorithms?

I'll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.
There is no mechanical procedure that can be used to get the BigOh.
As a "cookbook", to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.
The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.
For example, let's say you have this piece of code:
int sum(int* data, int N) {
int result = 0; // 1
for (int i = 0; i < N; i++) { // 2
result += data[i]; // 3
}
return result; // 4
}
This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:
Number_Of_Steps = f(N)
So we have f(N), a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called such as:
Number_Of_Steps = f(data.length)
The parameter N takes the data.length value. Now we need the actual definition of the function f(). This is done from the source code, in which each interesting line is numbered from 1 to 4.
There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn't depend on the size of the input data takes a constant C number computational steps.
We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the data array.
That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:
f(N) = C + ??? + C
The next part is to define the value of the for statement. Remember that we are counting the number of computational steps, meaning that the body of the for statement gets executed N times. That's the same as adding C, N times:
f(N) = C + (C + C + ... + C) + C = C + N * C + C
There is no mechanical rule to count how many times the body of the for gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of the for statement.
To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:
Take away all the constants C.
From f() get the polynomium in its standard form.
Divide the terms of the polynomium and sort them by the rate of growth.
Keep the one that grows bigger when N approaches infinity.
Our f() has two terms:
f(N) = 2 * C * N ^ 0 + 1 * C * N ^ 1
Taking away all the C constants and redundant parts:
f(N) = 1 + N ^ 1
Since the last term is the one which grows bigger when f() approaches infinity (think on limits) this is the BigOh argument, and the sum() function has a BigOh of:
O(N)
There are a few tricks to solve some tricky ones: use summations whenever you can.
As an example, this code can be easily solved using summations:
for (i = 0; i < 2*n; i += 2) { // 1
for (j=n; j > i; j--) { // 2
foo(); // 3
}
}
The first thing you needed to be asked is the order of execution of foo(). While the usual is to be O(1), you need to ask your professors about it. O(1) means (almost, mostly) constant C, independent of the size N.
The for statement on the sentence number one is tricky. While the index ends at 2 * N, the increment is done by two. That means that the first for gets executed only N steps, and we need to divide the count by two.
f(N) = Summation(i from 1 to 2 * N / 2)( ... ) =
= Summation(i from 1 to N)( ... )
The sentence number two is even trickier since it depends on the value of i. Take a look: the index i takes the values: 0, 2, 4, 6, 8, ..., 2 * N, and the second for get executed: N times the first one, N - 2 the second, N - 4 the third... up to the N / 2 stage, on which the second for never gets executed.
On formula, that means:
f(N) = Summation(i from 1 to N)( Summation(j = ???)( ) )
Again, we are counting the number of steps. And by definition, every summation should always start at one, and end at a number bigger-or-equal than one.
f(N) = Summation(i from 1 to N)( Summation(j = 1 to (N - (i - 1) * 2)( C ) )
(We are assuming that foo() is O(1) and takes C steps.)
We have a problem here: when i takes the value N / 2 + 1 upwards, the inner Summation ends at a negative number! That's impossible and wrong. We need to split the summation in two, being the pivotal point the moment i takes N / 2 + 1.
f(N) = Summation(i from 1 to N / 2)( Summation(j = 1 to (N - (i - 1) * 2)) * ( C ) ) + Summation(i from 1 to N / 2) * ( C )
Since the pivotal moment i > N / 2, the inner for won't get executed, and we are assuming a constant C execution complexity on its body.
Now the summations can be simplified using some identity rules:
Summation(w from 1 to N)( C ) = N * C
Summation(w from 1 to N)( A (+/-) B ) = Summation(w from 1 to N)( A ) (+/-) Summation(w from 1 to N)( B )
Summation(w from 1 to N)( w * C ) = C * Summation(w from 1 to N)( w ) (C is a constant, independent of w)
Summation(w from 1 to N)( w ) = (N * (N + 1)) / 2
Applying some algebra:
f(N) = Summation(i from 1 to N / 2)( (N - (i - 1) * 2) * ( C ) ) + (N / 2)( C )
f(N) = C * Summation(i from 1 to N / 2)( (N - (i - 1) * 2)) + (N / 2)( C )
f(N) = C * (Summation(i from 1 to N / 2)( N ) - Summation(i from 1 to N / 2)( (i - 1) * 2)) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2)( i - 1 )) + (N / 2)( C )
=> Summation(i from 1 to N / 2)( i - 1 ) = Summation(i from 1 to N / 2 - 1)( i )
f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2 - 1)( i )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N / 2 - 1) * (N / 2 - 1 + 1) / 2) ) + (N / 2)( C )
=> (N / 2 - 1) * (N / 2 - 1 + 1) / 2 =
(N / 2 - 1) * (N / 2) / 2 =
((N ^ 2 / 4) - (N / 2)) / 2 =
(N ^ 2 / 8) - (N / 4)
f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N ^ 2 / 8) - (N / 4) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - ( (N ^ 2 / 4) - (N / 2) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - (N ^ 2 / 4) + (N / 2)) + (N / 2)( C )
f(N) = C * ( N ^ 2 / 4 ) + C * (N / 2) + C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + 2 * C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + C * N
f(N) = C * 1/4 * N ^ 2 + C * N
And the BigOh is:
O(N²)

Big O gives the upper bound for time complexity of an algorithm. It is usually used in conjunction with processing data sets (lists) but can be used elsewhere.
A few examples of how it's used in C code.
Say we have an array of n elements
int array[n];
If we wanted to access the first element of the array this would be O(1) since it doesn't matter how big the array is, it always takes the same constant time to get the first item.
x = array[0];
If we wanted to find a number in the list:
for(int i = 0; i < n; i++){
if(array[i] == numToFind){ return i; }
}
This would be O(n) since at most we would have to look through the entire list to find our number. The Big-O is still O(n) even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm (omega is for lower bound and theta is for tight bound).
When we get to nested loops:
for(int i = 0; i < n; i++){
for(int j = i; j < n; j++){
array[j] += 2;
}
}
This is O(n^2) since for each pass of the outer loop ( O(n) ) we have to go through the entire list again so the n's multiply leaving us with n squared.
This is barely scratching the surface but when you get to analyzing more complex algorithms complex math involving proofs comes into play. Hope this familiarizes you with the basics at least though.

While knowing how to figure out the Big O time for your particular problem is useful, knowing some general cases can go a long way in helping you make decisions in your algorithm.
Here are some of the most common cases, lifted from http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions:
O(1) - Determining if a number is even or odd; using a constant-size lookup table or hash table
O(logn) - Finding an item in a sorted array with a binary search
O(n) - Finding an item in an unsorted list; adding two n-digit numbers
O(n2) - Multiplying two n-digit numbers by a simple algorithm; adding two n×n matrices; bubble sort or insertion sort
O(n3) - Multiplying two n×n matrices by simple algorithm
O(cn) - Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute force
O(n!) - Solving the traveling salesman problem via brute-force search
O(nn) - Often used instead of O(n!) to derive simpler formulas for asymptotic complexity

Small reminder: the big O notation is used to denote asymptotic complexity (that is, when the size of the problem grows to infinity), and it hides a constant.
This means that between an algorithm in O(n) and one in O(n2), the fastest is not always the first one (though there always exists a value of n such that for problems of size >n, the first algorithm is the fastest).
Note that the hidden constant very much depends on the implementation!
Also, in some cases, the runtime is not a deterministic function of the size n of the input. Take sorting using quick sort for example: the time needed to sort an array of n elements is not a constant but depends on the starting configuration of the array.
There are different time complexities:
Worst case (usually the simplest to figure out, though not always very meaningful)
Average case (usually much harder to figure out...)
...
A good introduction is An Introduction to the Analysis of Algorithms by R. Sedgewick and P. Flajolet.
As you say, premature optimisation is the root of all evil, and (if possible) profiling really should always be used when optimising code. It can even help you determine the complexity of your algorithms.

Seeing the answers here I think we can conclude that most of us do indeed approximate the order of the algorithm by looking at it and use common sense instead of calculating it with, for example, the master method as we were thought at university.
With that said I must add that even the professor encouraged us (later on) to actually think about it instead of just calculating it.
Also I would like to add how it is done for recursive functions:
suppose we have a function like (scheme code):
(define (fac n)
(if (= n 0)
1
(* n (fac (- n 1)))))
which recursively calculates the factorial of the given number.
The first step is to try and determine the performance characteristic for the body of the function only in this case, nothing special is done in the body, just a multiplication (or the return of the value 1).
So the performance for the body is: O(1) (constant).
Next try and determine this for the number of recursive calls. In this case we have n-1 recursive calls.
So the performance for the recursive calls is: O(n-1) (order is n, as we throw away the insignificant parts).
Then put those two together and you then have the performance for the whole recursive function:
1 * (n-1) = O(n)
Peter, to answer your raised issues; the method I describe here actually handles this quite well. But keep in mind that this is still an approximation and not a full mathematically correct answer. The method described here is also one of the methods we were taught at university, and if I remember correctly was used for far more advanced algorithms than the factorial I used in this example.
Of course it all depends on how well you can estimate the running time of the body of the function and the number of recursive calls, but that is just as true for the other methods.

If your cost is a polynomial, just keep the highest-order term, without its multiplier. E.g.:
O((n/2 + 1)*(n/2)) = O(n2/4 + n/2) = O(n2/4) = O(n2)
This doesn't work for infinite series, mind you. There is no single recipe for the general case, though for some common cases, the following inequalities apply:
O(log N) < O(N) < O(N log N) < O(N2) < O(Nk) < O(en) < O(n!)

I think about it in terms of information. Any problem consists of learning a certain number of bits.
Your basic tool is the concept of decision points and their entropy. The entropy of a decision point is the average information it will give you. For example, if a program contains a decision point with two branches, it's entropy is the sum of the probability of each branch times the log2 of the inverse probability of that branch. That's how much you learn by executing that decision.
For example, an if statement having two branches, both equally likely, has an entropy of 1/2 * log(2/1) + 1/2 * log(2/1) = 1/2 * 1 + 1/2 * 1 = 1. So its entropy is 1 bit.
Suppose you are searching a table of N items, like N=1024. That is a 10-bit problem because log(1024) = 10 bits. So if you can search it with IF statements that have equally likely outcomes, it should take 10 decisions.
That's what you get with binary search.
Suppose you are doing linear search. You look at the first element and ask if it's the one you want. The probabilities are 1/1024 that it is, and 1023/1024 that it isn't. The entropy of that decision is 1/1024*log(1024/1) + 1023/1024 * log(1024/1023) = 1/1024 * 10 + 1023/1024 * about 0 = about .01 bit. You've learned very little! The second decision isn't much better. That is why linear search is so slow. In fact it's exponential in the number of bits you need to learn.
Suppose you are doing indexing. Suppose the table is pre-sorted into a lot of bins, and you use some of all of the bits in the key to index directly to the table entry. If there are 1024 bins, the entropy is 1/1024 * log(1024) + 1/1024 * log(1024) + ... for all 1024 possible outcomes. This is 1/1024 * 10 times 1024 outcomes, or 10 bits of entropy for that one indexing operation. That is why indexing search is fast.
Now think about sorting. You have N items, and you have a list. For each item, you have to search for where the item goes in the list, and then add it to the list. So sorting takes roughly N times the number of steps of the underlying search.
So sorts based on binary decisions having roughly equally likely outcomes all take about O(N log N) steps. An O(N) sort algorithm is possible if it is based on indexing search.
I've found that nearly all algorithmic performance issues can be looked at in this way.

Lets start from the beginning.
First of all, accept the principle that certain simple operations on data can be done in O(1) time, that is, in time that is independent of the size of the input. These primitive operations in C consist of
Arithmetic operations (e.g. + or %).
Logical operations (e.g., &&).
Comparison operations (e.g., <=).
Structure accessing operations (e.g. array-indexing like A[i], or pointer fol-
lowing with the -> operator).
Simple assignment such as copying a value into a variable.
Calls to library functions (e.g., scanf, printf).
The justification for this principle requires a detailed study of the machine instructions (primitive steps) of a typical computer. Each of the described operations can be done with some small number of machine instructions; often only one or two instructions are needed.
As a consequence, several kinds of statements in C can be executed in O(1) time, that is, in some constant amount of time independent of input. These simple include
Assignment statements that do not involve function calls in their expressions.
Read statements.
Write statements that do not require function calls to evaluate arguments.
The jump statements break, continue, goto, and return expression, where
expression does not contain a function call.
In C, many for-loops are formed by initializing an index variable to some value and
incrementing that variable by 1 each time around the loop. The for-loop ends when
the index reaches some limit. For instance, the for-loop
for (i = 0; i < n-1; i++)
{
small = i;
for (j = i+1; j < n; j++)
if (A[j] < A[small])
small = j;
temp = A[small];
A[small] = A[i];
A[i] = temp;
}
uses index variable i. It increments i by 1 each time around the loop, and the iterations
stop when i reaches n − 1.
However, for the moment, focus on the simple form of for-loop, where the difference between the final and initial values, divided by the amount by which the index variable is incremented tells us how many times we go around the loop. That count is exact, unless there are ways to exit the loop via a jump statement; it is an upper bound on the number of iterations in any case.
For instance, the for-loop iterates ((n − 1) − 0)/1 = n − 1 times,
since 0 is the initial value of i, n − 1 is the highest value reached by i (i.e., when i
reaches n−1, the loop stops and no iteration occurs with i = n−1), and 1 is added
to i at each iteration of the loop.
In the simplest case, where the time spent in the loop body is the same for each
iteration, we can multiply the big-oh upper bound for the body by the number of
times around the loop. Strictly speaking, we must then add O(1) time to initialize
the loop index and O(1) time for the first comparison of the loop index with the
limit, because we test one more time than we go around the loop. However, unless
it is possible to execute the loop zero times, the time to initialize the loop and test
the limit once is a low-order term that can be dropped by the summation rule.
Now consider this example:
(1) for (j = 0; j < n; j++)
(2) A[i][j] = 0;
We know that line (1) takes O(1) time. Clearly, we go around the loop n times, as
we can determine by subtracting the lower limit from the upper limit found on line
(1) and then adding 1. Since the body, line (2), takes O(1) time, we can neglect the
time to increment j and the time to compare j with n, both of which are also O(1).
Thus, the running time of lines (1) and (2) is the product of n and O(1), which is O(n).
Similarly, we can bound the running time of the outer loop consisting of lines
(2) through (4), which is
(2) for (i = 0; i < n; i++)
(3) for (j = 0; j < n; j++)
(4) A[i][j] = 0;
We have already established that the loop of lines (3) and (4) takes O(n) time.
Thus, we can neglect the O(1) time to increment i and to test whether i < n in
each iteration, concluding that each iteration of the outer loop takes O(n) time.
The initialization i = 0 of the outer loop and the (n + 1)st test of the condition
i < n likewise take O(1) time and can be neglected. Finally, we observe that we go
around the outer loop n times, taking O(n) time for each iteration, giving a total
O(n^2) running time.
A more practical example.

If you want to estimate the order of your code empirically rather than by analyzing the code, you could stick in a series of increasing values of n and time your code. Plot your timings on a log scale. If the code is O(x^n), the values should fall on a line of slope n.
This has several advantages over just studying the code. For one thing, you can see whether you're in the range where the run time approaches its asymptotic order. Also, you may find that some code that you thought was order O(x) is really order O(x^2), for example, because of time spent in library calls.

Basically the thing that crops up 90% of the time is just analyzing loops. Do you have single, double, triple nested loops? The you have O(n), O(n^2), O(n^3) running time.
Very rarely (unless you are writing a platform with an extensive base library (like for instance, the .NET BCL, or C++'s STL) you will encounter anything that is more difficult than just looking at your loops (for statements, while, goto, etc...)

Less useful generally, I think, but for the sake of completeness there is also a Big Omega Ω, which defines a lower-bound on an algorithm's complexity, and a Big Theta Θ, which defines both an upper and lower bound.

Big O notation is useful because it's easy to work with and hides unnecessary complications and details (for some definition of unnecessary). One nice way of working out the complexity of divide and conquer algorithms is the tree method. Let's say you have a version of quicksort with the median procedure, so you split the array into perfectly balanced subarrays every time.
Now build a tree corresponding to all the arrays you work with. At the root you have the original array, the root has two children which are the subarrays. Repeat this until you have single element arrays at the bottom.
Since we can find the median in O(n) time and split the array in two parts in O(n) time, the work done at each node is O(k) where k is the size of the array. Each level of the tree contains (at most) the entire array so the work per level is O(n) (the sizes of the subarrays add up to n, and since we have O(k) per level we can add this up). There are only log(n) levels in the tree since each time we halve the input.
Therefore we can upper bound the amount of work by O(n*log(n)).
However, Big O hides some details which we sometimes can't ignore. Consider computing the Fibonacci sequence with
a=0;
b=1;
for (i = 0; i <n; i++) {
tmp = b;
b = a + b;
a = tmp;
}
and lets just assume the a and b are BigIntegers in Java or something that can handle arbitrarily large numbers. Most people would say this is an O(n) algorithm without flinching. The reasoning is that you have n iterations in the for loop and O(1) work in side the loop.
But Fibonacci numbers are large, the n-th Fibonacci number is exponential in n so just storing it will take on the order of n bytes. Performing addition with big integers will take O(n) amount of work. So the total amount of work done in this procedure is
1 + 2 + 3 + ... + n = n(n-1)/2 = O(n^2)
So this algorithm runs in quadradic time!

Familiarity with the algorithms/data structures I use and/or quick glance analysis of iteration nesting. The difficulty is when you call a library function, possibly multiple times - you can often be unsure of whether you are calling the function unnecessarily at times or what implementation they are using. Maybe library functions should have a complexity/efficiency measure, whether that be Big O or some other metric, that is available in documentation or even IntelliSense.

Break down the algorithm into pieces you know the big O notation for, and combine through big O operators. That's the only way I know of.
For more information, check the Wikipedia page on the subject.

As to "how do you calculate" Big O, this is part of Computational complexity theory. For some (many) special cases you may be able to come with some simple heuristics (like multiplying loop counts for nested loops), esp. when all you want is any upper bound estimation, and you do not mind if it is too pessimistic - which I guess is probably what your question is about.
If you really want to answer your question for any algorithm the best you can do is to apply the theory. Besides of simplistic "worst case" analysis I have found Amortized analysis very useful in practice.

For the 1st case, the inner loop is executed n-i times, so the total number of executions is the sum for i going from 0 to n-1 (because lower than, not lower than or equal) of the n-i. You get finally n*(n + 1) / 2, so O(n²/2) = O(n²).
For the 2nd loop, i is between 0 and n included for the outer loop; then the inner loop is executed when j is strictly greater than n, which is then impossible.

I would like to explain the Big-O in a little bit different aspect.
Big-O is just to compare the complexity of the programs which means how fast are they growing when the inputs are increasing and not the exact time which is spend to do the action.
IMHO in the big-O formulas you better not to use more complex equations (you might just stick to the ones in the following graph.) However you still might use other more precise formula (like 3^n, n^3, ...) but more than that can be sometimes misleading! So better to keep it as simple as possible.
I would like to emphasize once again that here we don't want to get an exact formula for our algorithm. We only want to show how it grows when the inputs are growing and compare with the other algorithms in that sense. Otherwise you would better use different methods like bench-marking.

In addition to using the master method (or one of its specializations), I test my algorithms experimentally. This can't prove that any particular complexity class is achieved, but it can provide reassurance that the mathematical analysis is appropriate. To help with this reassurance, I use code coverage tools in conjunction with my experiments, to ensure that I'm exercising all the cases.
As a very simple example say you wanted to do a sanity check on the speed of the .NET framework's list sort. You could write something like the following, then analyze the results in Excel to make sure they did not exceed an n*log(n) curve.
In this example I measure the number of comparisons, but it's also prudent to examine the actual time required for each sample size. However then you must be even more careful that you are just measuring the algorithm and not including artifacts from your test infrastructure.
int nCmp = 0;
System.Random rnd = new System.Random();
// measure the time required to sort a list of n integers
void DoTest(int n)
{
List<int> lst = new List<int>(n);
for( int i=0; i<n; i++ )
lst[i] = rnd.Next(0,1000);
// as we sort, keep track of the number of comparisons performed!
nCmp = 0;
lst.Sort( delegate( int a, int b ) { nCmp++; return (a<b)?-1:((a>b)?1:0)); }
System.Console.Writeline( "{0},{1}", n, nCmp );
}
// Perform measurement for a variety of sample sizes.
// It would be prudent to check multiple random samples of each size, but this is OK for a quick sanity check
for( int n = 0; n<1000; n++ )
DoTest(n);

Don't forget to also allow for space complexities that can also be a cause for concern if one has limited memory resources. So for example you may hear someone wanting a constant space algorithm which is basically a way of saying that the amount of space taken by the algorithm doesn't depend on any factors inside the code.
Sometimes the complexity can come from how many times is something called, how often is a loop executed, how often is memory allocated, and so on is another part to answer this question.
Lastly, big O can be used for worst case, best case, and amortization cases where generally it is the worst case that is used for describing how bad an algorithm may be.

First of all, the accepted answer is trying to explain nice fancy stuff,
but I think, intentionally complicating Big-Oh is not the solution,
which programmers (or at least, people like me) search for.
Big Oh (in short)
function f(text) {
var n = text.length;
for (var i = 0; i < n; i++) {
f(text.slice(0, n-1))
}
// ... other JS logic here, which we can ignore ...
}
Big Oh of above is f(n) = O(n!) where n represents number of items in input set,
and f represents operation done per item.
Big-Oh notation is the asymptotic upper-bound of the complexity of an algorithm.
In programming: The assumed worst-case time taken,
or assumed maximum repeat count of logic, for size of the input.
Calculation
Keep in mind (from above meaning) that; We just need worst-case time and/or maximum repeat count affected by N (size of input),
Then take another look at (accepted answer's) example:
for (i = 0; i < 2*n; i += 2) { // line 123
for (j=n; j > i; j--) { // line 124
foo(); // line 125
}
}
Begin with this search-pattern:
Find first line that N caused repeat behavior,
Or caused increase of logic executed,
But constant or not, ignore anything before that line.
Seems line hundred-twenty-three is what we are searching ;-)
On first sight, line seems to have 2*n max-looping.
But looking again, we see i += 2 (and that half is skipped).
So, max repeat is simply n, write it down, like f(n) = O( n but don't close parenthesis yet.
Repeat search till method's end, and find next line matching our search-pattern, here that's line 124
Which is tricky, because strange condition, and reverse looping.
But after remembering that we just need to consider maximum repeat count (or worst-case time taken).
It's as easy as saying "Reverse-Loop j starts with j=n, am I right? yes, n seems to be maximum possible repeat count", so:
Add n to previous write down's end,
but like "( n " instead of "+ n" (as this is inside previous loop),
and close parenthesis only if we find something outside of previous loop.
Search Done! why? because line 125 (or any other line after) does not match our search-pattern.
We can now close any parenthesis (left-open in our write down), resulting in below:
f(n) = O( n( n ) )
Try to further shorten "n( n )" part, like:
n( n ) = n * n
= n2
Finally, just wrap it with Big Oh notation, like O(n2) or O(n^2) without formatting.

What often gets overlooked is the expected behavior of your algorithms. It doesn't change the Big-O of your algorithm, but it does relate to the statement "premature optimization. . .."
Expected behavior of your algorithm is -- very dumbed down -- how fast you can expect your algorithm to work on data you're most likely to see.
For instance, if you're searching for a value in a list, it's O(n), but if you know that most lists you see have your value up front, typical behavior of your algorithm is faster.
To really nail it down, you need to be able to describe the probability distribution of your "input space" (if you need to sort a list, how often is that list already going to be sorted? how often is it totally reversed? how often is it mostly sorted?) It's not always feasible that you know that, but sometimes you do.

great question!
Disclaimer: this answer contains false statements see the comments below.
If you're using the Big O, you're talking about the worse case (more on what that means later). Additionally, there is capital theta for average case and a big omega for best case.
Check out this site for a lovely formal definition of Big O: https://xlinux.nist.gov/dads/HTML/bigOnotation.html
f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
Ok, so now what do we mean by "best-case" and "worst-case" complexities?
This is probably most clearly illustrated through examples. For example if we are using linear search to find a number in a sorted array then the worst case is when we decide to search for the last element of the array as this would take as many steps as there are items in the array. The best case would be when we search for the first element since we would be done after the first check.
The point of all these adjective-case complexities is that we're looking for a way to graph the amount of time a hypothetical program runs to completion in terms of the size of particular variables. However for many algorithms you can argue that there is not a single time for a particular size of input. Notice that this contradicts with the fundamental requirement of a function, any input should have no more than one output. So we come up with multiple functions to describe an algorithm's complexity. Now, even though searching an array of size n may take varying amounts of time depending on what you're looking for in the array and depending proportionally to n, we can create an informative description of the algorithm using best-case, average-case, and worst-case classes.
Sorry this is so poorly written and lacks much technical information. But hopefully it'll make time complexity classes easier to think about. Once you become comfortable with these it becomes a simple matter of parsing through your program and looking for things like for-loops that depend on array sizes and reasoning based on your data structures what kind of input would result in trivial cases and what input would result in worst-cases.

I don't know how to programmatically solve this, but the first thing people do is that we sample the algorithm for certain patterns in the number of operations done, say 4n^2 + 2n + 1 we have 2 rules:
If we have a sum of terms, the term with the largest growth rate is kept, with other terms omitted.
If we have a product of several factors constant factors are omitted.
If we simplify f(x), where f(x) is the formula for number of operations done, (4n^2 + 2n + 1 explained above), we obtain the big-O value [O(n^2) in this case]. But this would have to account for Lagrange interpolation in the program, which may be hard to implement. And what if the real big-O value was O(2^n), and we might have something like O(x^n), so this algorithm probably wouldn't be programmable. But if someone proves me wrong, give me the code . . . .

For code A, the outer loop will execute for n+1 times, the '1' time means the process which checks the whether i still meets the requirement. And inner loop runs n times, n-2 times.... Thus,0+2+..+(n-2)+n= (0+n)(n+1)/2= O(n²).
For code B, though inner loop wouldn't step in and execute the foo(), the inner loop will be executed for n times depend on outer loop execution time, which is O(n)

How do I find all 32 bit binary numbers that have exactly six 1 and rest 0

I could do this in brute force, but I was hoping there was clever coding, or perhaps an existing function, or something I am not realising...
So some examples of numbers I want:
00000000001111110000
11111100000000000000
01010101010100000000
10101010101000000000
00100100100100100100
The full permutation. Except with results that have ONLY six 1's. Not more. Not less. 64 or 32 bits would be ideal. 16 bits if that provides an answer.

I think what you need here is using the itertools module.
BAD SOLUTION
But you need to be careful, for instance, using something like permutations would just work for very small inputs. ie:
Something like the below would give you a binary representation:
>>> ["".join(v) for v in set(itertools.permutations(["1"]*2+["0"]*3))]
['11000', '01001', '00101', '00011', '10010', '01100', '01010', '10001', '00110', '10100']
then just getting decimal representation of those number:
>>> [int("".join(v), 16) for v in set(itertools.permutations(["1"]*2+["0"]*3))]
[69632, 4097, 257, 17, 65552, 4352, 4112, 65537, 272, 65792]
if you wanted 32bits with 6 ones and 26 zeroes, you'd use:
>>> [int("".join(v), 16) for v in set(itertools.permutations(["1"]*6+["0"]*26))]
but this computation would take a supercomputer to deal with (32! = 263130836933693530167218012160000000 )
DECENT SOLUTION
So a more clever way to do it is using combinations, maybe something like this:
import itertools
num_bits = 32
num_ones = 6
lst = [
f"{sum([2**vv for vv in v]):b}".zfill(num_bits)
for v in list(itertools.combinations(range(num_bits), num_ones))
]
print(len(lst))
this would tell us there is 906192 numbers with 6 ones in the whole spectrum of 32bits numbers.
CREDITS:
Credits for this answer go to #Mark Dickinson who pointed out using permutations was unfeasible and suggested the usage of combinations

Well I am not a Python coder so I can not post a valid code for you. Instead I can do a C++ one...
If you look at your problem you set 6 bits and many zeros ... so I would approach this by 6 nested for loops computing all the possible 1s position and set the bits...
Something like:
for (i0= 0;i0<32-5;i0++)
for (i1=i0+1;i1<32-4;i1++)
for (i2=i1+1;i2<32-3;i2++)
for (i3=i2+1;i3<32-2;i3++)
for (i4=i3+1;i4<32-1;i4++)
for (i5=i4+1;i5<32-0;i5++)
// here i0,...,i5 marks the set bits positions
So the O(2^32) become to less than `~O(26.25.24.23.22.21/16) and you can not go faster than that as that would mean you miss valid solutions...
I assume you want to print the number so for speed up you can compute the number as a binary number string from the start to avoid slow conversion between string and number...
The nested for loops can be encoded as increment operation of an array (similar to bignum arithmetics)
When I put all together I got this C++ code:
int generate()
{
const int n1=6; // number of set bits
const int n=32; // number of bits
char x[n+2]; // output number string
int i[n1],j,cnt; // nested for loops iterator variables and found solutions count
for (j=0;j<n;j++) x[j]='0'; x[j]='b'; j++; x[j]=0; // x = 0
for (j=0;j<n1;j++){ i[j]=j; x[i[j]]='1'; } // first solution
for (cnt=0;;)
{
// Form1->mm_log->Lines->Add(x); // here x is the valid answer to print
cnt++;
for (j=n1-1;j>=0;j--) // this emulates n1 nested for loops
{
x[i[j]]='0'; i[j]++;
if (i[j]<n-n1+j+1){ x[i[j]]='1'; break; }
}
if (j<0) break;
for (j++;j<n1;j++){ i[j]=i[j-1]+1; x[i[j]]='1'; }
}
return cnt; // found valid answers
};
When I use this with n1=6,n=32 I got this output (without printing the numbers):
cnt = 906192
and it was finished in 4.246 ms on AMD A8-5500 3.2GHz (win7 x64 32bit app no threads) which is fast enough for me...
Beware once you start outputing the numbers somewhere the speed will drop drastically. Especially if you output to console or what ever ... it might be better to buffer the output somehow like outputting 1024 string numbers at once etc... But as I mentioned before I am no Python coder so it might be already handled by the environment...
On top of all this once you will play with variable n1,n you can do the same for zeros instead of ones and use faster approach (if there is less zeros then ones use nested for loops to mark zeros instead of ones)
If the wanted solution numbers are wanted as a number (not a string) then its possible to rewrite this so the i[] or i0,..i5 holds the bitmask instead of bit positions ... instead of inc/dec you just shift left/right ... and no need for x array anymore as the number would be x = i0|...|i5 ...

You could create a counter array for positions of 1s in the number and assemble it by shifting the bits in their respective positions. I created an example below. It runs pretty fast (less than a second for 32 bits on my laptop):
bitCount = 32
oneCount = 6
maxBit = 1<<(bitCount-1)
ones = [1<<b for b in reversed(range(oneCount)) ] # start with bits on low end
ones[0] >>= 1 # shift back 1st one because it will be incremented at start of loop
index = 0
result = []
while index < len(ones):
ones[index] <<= 1 # shift one at current position
if index == 0:
number = sum(ones) # build output number
result.append(number)
if ones[index] == maxBit:
index += 1 # go to next position when bit reaches max
elif index > 0:
index -= 1 # return to previous position
ones[index] = ones[index+1] # and prepare it to move up (relative to next)
64 bits takes about a minute, roughly proportional to the number of values that are output. O(n)
The same approach can be expressed more concisely in a recursive generator function which will allow more efficient use of the bit patterns:
def genOneBits(bitcount=32,onecount=6):
for bitPos in range(onecount-1,bitcount):
value = 1<<bitPos
if onecount == 1: yield value; continue
for otherBits in genOneBits(bitPos,onecount-1):
yield value + otherBits
result = [ n for n in genOneBits(32,6) ]
This is not faster when you get all the numbers but it allows partial access to the list without going through all values.
If you need direct access to the Nth bit pattern (e.g. to get a random one-bits pattern), you can use the following function. It works like indexing a list but without having to generate the list of patterns.
def numOneBits(bitcount=32,onecount=6):
def factorial(X): return 1 if X < 2 else X * factorial(X-1)
return factorial(bitcount)//factorial(onecount)//factorial(bitcount-onecount)
def nthOneBits(N,bitcount=32,onecount=6):
if onecount == 1: return 1<<N
bitPos = 0
while bitPos<=bitcount-onecount:
group = numOneBits(bitcount-bitPos-1,onecount-1)
if N < group: break
N -= group
bitPos += 1
if bitPos>bitcount-onecount: return None
result = 1<<bitPos
result |= nthOneBits(N,bitcount-bitPos-1,onecount-1)<<(bitPos+1)
return result
# bit pattern at position 1000:
nthOneBit(1000) # --> 10485799 (00000000101000000000000000100111)
This allows you to get the bit patterns on very large integers that would be impossible to generate completely:
nthOneBits(10000, bitcount=256, onecount=9)
# 77371252457588066994880639
# 100000000000000000000000000000000001000000000000000000000000000000000000000000001111111
It is worth noting that the pattern order does not follow the numerical order of the corresponding numbers
Although nthOneBits() can produce any pattern instantly, it is much slower than the other functions when mass producing patterns. If you need to manipulate them sequentially, you should go for the generator function instead of looping on nthOneBits().
Also, it should be fairly easy to tweak the generator to have it start at a specific pattern so you could get the best of both approaches.
Finally, it may be useful to obtain then next bit pattern given a known pattern. This is what the following function does:
def nextOneBits(N=0,bitcount=32,onecount=6):
if N == 0: return (1<<onecount)-1
bitPositions = []
for pos in range(bitcount):
bit = N%2
N //= 2
if bit==1: bitPositions.insert(0,pos)
index = 0
result = None
while index < onecount:
bitPositions[index] += 1
if bitPositions[index] == bitcount:
index += 1
continue
if index == 0:
result = sum( 1<<bp for bp in bitPositions )
break
if index > 0:
index -= 1
bitPositions[index] = bitPositions[index+1]
return result
nthOneBits(12) #--> 131103 00000000000000100000000000011111
nextOneBits(131103) #--> 262175 00000000000001000000000000011111 5.7ns
nthOneBits(13) #--> 262175 00000000000001000000000000011111 49.2ns
Like nthOneBits(), this one does not need any setup time. It could be used in combination with nthOneBits() to get subsequent patterns after getting an initial one at a given position. nextOneBits() is much faster than nthOneBits(i+1) but is still slower than the generator function.
For very large integers, using nthOneBits() and nextOneBits() may be the only practical options.

You are dealing with permutations of multisets. There are many ways to achieve this and as #BPL points out, doing this efficiently is non-trivial. There are many great methods mentioned here: permutations with unique values. The cleanest (not sure if it's the most efficient), is to use the multiset_permutations from the sympy module.
import time
from sympy.utilities.iterables import multiset_permutations
t = time.process_time()
## Credit to #BPL for the general setup
multiPerms = ["".join(v) for v in multiset_permutations(["1"]*6+["0"]*26)]
elapsed_time = time.process_time() - t
print(elapsed_time)
On my machine, the above computes in just over 8 seconds. It generates just under a million results as well:
len(multiPerms)
906192

Counting the number of set bits in a number

The problem statement is:
Write an efficient program to count number of 1s in binary representation of an integer.
I found a post on this problem here which outlines multiple solutions which run in log(n) time including Brian Kernigan's algorithm and the gcc __builtin_popcount() method.
One solution that wasn't mentioned was the python method: bin(n).count("1")
which also achieves the same effect. Does this method also run in log n time?

You are converting the integer to a string, which means it'll have to produce N '0' and '1' characters. You then use str.count() which must visit every character in the string to count the '1' characters.
All in all you have a O(N) algorithm, with a relatively high constant cost.
Note that this is the same complexity as the code you linked to; the integer n has log(n) bits, but the algorithm still has to make N = log(n) steps to calculate the number of bits. The bin(n).count('1') algorithm is thus equivalent, but slow as there is a high cost to produce the string in the first place.
At the cost of a table, you could move to processing integers per byte:
table = [0]
while len(table) < 256:
table += [t + 1 for t in table]
length = sum(map(table.__getitem__, n.to_bytes(n.bit_length() // 8 + 1, 'little')))
However, because Python needs to produce a series of new objects (a bytes object and several integers) this method never quite is fast enough to beat the bin(n).count('1') method:
>>> from random import choice
>>> import timeit
>>> table = [0]
>>> while len(table) < 256:
... table += [t + 1 for t in table]
...
>>> def perbyte(n): return sum(map(table.__getitem__, n.to_bytes(n.bit_length() // 8 + 1, 'little')))
...
>>> def strcount(n): return bin(n).count('1')
...
>>> n = int(''.join([choice('01') for _ in range(2 ** 16)]))
>>> for f in (strcount, perbyte):
... print(f.__name__, timeit.timeit('f(n)', 'from __main__ import f, n', number=1000))
...
strcount 1.11822146497434
perbyte 1.4401431040023454
No matter the bit-length of the test number, perbyte is always a percentage slower.

Let's say you are trying to count the number of set bits of n. On Python typical implementations, bin will compute the binary representation in O(log n) time and count will go through the string, therefore resulting in an overall O(log n) complexity.
However, note that usually, the input parameter of algorithms is the "size" of the input. When you work with integers, this corresponds to their logarithm. That's why the current algorithm is said to have a linear complexity (the variable is m = log n, and the complexity O(m)).

Times-two faster than bit-shift, for Python 3.x integers?

I was looking at the source of sorted_containers and was surprised to see this line:
self._load, self._twice, self._half = load, load * 2, load >> 1
Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:
(times, divide)
(shift, shift)
(times, shift)
(shift, divide)
and found that #3 is consistently faster than other alternatives:
# self._load, self._twice, self._half = load, load * 2, load >> 1
import random
import timeit
import pandas as pd
x = random.randint(10 ** 3, 10 ** 6)
def test_naive():
a, b, c = x, 2 * x, x // 2
def test_shift():
a, b, c = x, x << 1, x >> 1
def test_mixed():
a, b, c = x, x * 2, x >> 1
def test_mixed_swapped():
a, b, c = x, x << 1, x // 2
def observe(k):
print(k)
return {
'naive': timeit.timeit(test_naive),
'shift': timeit.timeit(test_shift),
'mixed': timeit.timeit(test_mixed),
'mixed_swapped': timeit.timeit(test_mixed_swapped),
}
def get_observations():
return pd.DataFrame([observe(k) for k in range(100)])
The question:
Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?
I run Python 3.5 on Ubuntu 14.04.
Edit
Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.
For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.
Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)
When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:
static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;
CHECK_BINOP(a, b);
/* fast path for single-digit multiplication */
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
/* if we don't have long long then we're almost certainly
using 15-bit digits, so v will fit in a long. In the
unlikely event that we're using 30-bit digits on a platform
without long long, a large v will just cause us to fall
through to the general multiplication code below. */
if (v >= LONG_MIN && v <= LONG_MAX)
return PyLong_FromLong((long)v);
#endif
}
So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.
In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)
static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
/*** ... ***/
wordshift = shiftby / PyLong_SHIFT; /*** zero for small w ***/
remshift = shiftby - wordshift * PyLong_SHIFT; /*** w for small w ***/
oldsize = Py_ABS(Py_SIZE(a)); /*** 1 for small v > 0 ***/
newsize = oldsize + wordshift;
if (remshift)
++newsize; /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
z = _PyLong_New(newsize);
/*** ... ***/
}
Integer division
You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.
Or at least, that was the case when this question was originally asked. In CPython 3.6, a fast path for small int // was added, so // now beats >> for small ints too.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.