Counting the number of set bits in a number - python

The problem statement is:
Write an efficient program to count number of 1s in binary representation of an integer.
I found a post on this problem here which outlines multiple solutions which run in log(n) time including Brian Kernigan's algorithm and the gcc __builtin_popcount() method.
One solution that wasn't mentioned was the python method: bin(n).count("1")
which also achieves the same effect. Does this method also run in log n time?

You are converting the integer to a string, which means it'll have to produce N '0' and '1' characters. You then use str.count() which must visit every character in the string to count the '1' characters.
All in all you have a O(N) algorithm, with a relatively high constant cost.
Note that this is the same complexity as the code you linked to; the integer n has log(n) bits, but the algorithm still has to make N = log(n) steps to calculate the number of bits. The bin(n).count('1') algorithm is thus equivalent, but slow as there is a high cost to produce the string in the first place.
At the cost of a table, you could move to processing integers per byte:
table = [0]
while len(table) < 256:
table += [t + 1 for t in table]
length = sum(map(table.__getitem__, n.to_bytes(n.bit_length() // 8 + 1, 'little')))
However, because Python needs to produce a series of new objects (a bytes object and several integers) this method never quite is fast enough to beat the bin(n).count('1') method:
>>> from random import choice
>>> import timeit
>>> table = [0]
>>> while len(table) < 256:
... table += [t + 1 for t in table]
...
>>> def perbyte(n): return sum(map(table.__getitem__, n.to_bytes(n.bit_length() // 8 + 1, 'little')))
...
>>> def strcount(n): return bin(n).count('1')
...
>>> n = int(''.join([choice('01') for _ in range(2 ** 16)]))
>>> for f in (strcount, perbyte):
... print(f.__name__, timeit.timeit('f(n)', 'from __main__ import f, n', number=1000))
...
strcount 1.11822146497434
perbyte 1.4401431040023454
No matter the bit-length of the test number, perbyte is always a percentage slower.

Let's say you are trying to count the number of set bits of n. On Python typical implementations, bin will compute the binary representation in O(log n) time and count will go through the string, therefore resulting in an overall O(log n) complexity.
However, note that usually, the input parameter of algorithms is the "size" of the input. When you work with integers, this corresponds to their logarithm. That's why the current algorithm is said to have a linear complexity (the variable is m = log n, and the complexity O(m)).

Related

Fast way to find bit length of large positive integer from decimal

Given the decimal string representation of a large positive integer, what's a fast way to find the integer's bit length? Using int() and then bit_length() is slow. This example with a million digits takes over five seconds to tell me it has 3321926 bits:
s = '1234567890' * 10**5
print(int(s).bit_length())
Result should be exact, at least for all strings one can actually have in memory (so let's say up to up to 100 billion decimal digits).
If storage space is not an issue and you don't mind spending time up-front, (and you'd rather have a solution that doesn't depend on floating point accuracy, even if it's otherwise impractical) you can solve just about any speed issue with more memory. Build a lookup table of the string representations of 2**n. Set up a dictionary, keying the string length to a list of (string of that length, corresponding n value) pairs. To test an input, look up the appropriate list, and then use ordinary string comparison to figure out which bit-length category it's in.
This should be accurate for billions of digits, I think. Calculate the exact result for 100000...00000 by simply bits-per-digit, then add the log of the first 10 digits.
import math
s = '1234567890' * 10**5
dper = math.log(10)/math.log(2)
base= (len(s)-10)*dper
extra = math.log(int(s[:10]))/math.log(2)
print(int(base+extra+0.99))
This does the example in about 0.15 seconds, and '1234567890' * 10**6 in about 2 seconds and '1234567890' * 10**7 in about 20 seconds. First I approximate the bit length with logarithms (similar to Tim's way), then I use decimal.Decimal to adjust until exact. That class uses base 10, so it doesnt need a costly base conversion.
Bit length b covers the interval [2**(b-1), 2**b). So we want (the exponent of) the smallest power of 2 larger than the number.
Try it online!
from time import time
from math import log2
from decimal import *
setcontext(Context(prec=MAX_PREC, Emax=MAX_EMAX, Emin=MIN_EMIN))
def bit_length(s):
if len(s) <= 20:
return int(s).bit_length()
head_bits = log2(int(s[:20]))
tail_bits = (len(s) - 20) * log2(10)
b = int(head_bits + tail_bits)
n = Decimal(s)
power = Decimal(2) ** b
while power > n:
b -= 1
power //= 2
while power <= n:
b += 1
power *= 2
return b
s = '1234567890' * 10**5
start = time()
print(bit_length(s))
print(time() - start, 'seconds')

String sorting problem with code execution time limit

I was recently trying to solve a HackerEarth problem. The code worked on the sample inputs and some custom inputs that I gave. But, when I submitted, it showed errors for exceeding the time limit. Can someone explain how I can make the code run faster?
Problem Statement: Cyclic shift
A large binary number is represented by a string A of size N and comprises of 0s and 1s. You must perform a cyclic shift on this string. The cyclic shift operation is defined as follows:
If the string A is [A0, A1,..., An-1], then after performing one cyclic shift, the string becomes [A1, A2,..., An-1, A0].
You performed the shift infinite number of times and each time you recorded the value of the binary number represented by the string. The maximum binary number formed after performing (possibly 0) the operation is B. Your task is to determine the number of cyclic shifts that can be performed such that the value represented by the string A will be equal to B for the Kth time.
Input format:
First line: A single integer T denoting the number of test cases
For each test case:
First line: Two space-separated integers N and K
Second line: A denoting the string
Output format:
For each test case, print a single line containing one integer that represents the number of cyclic shift operations performed such that the value represented by string A is equal to B for the Kth time.
Code:
import math
def value(s):
u = len(s)
d = 0
for h in range(u):
d = d + (int(s[u-1-h]) * math.pow(2, h))
return d
t = int(input())
for i in range(t):
x = list(map(int, input().split()))
n = x[0]
k = x[1]
a = input()
v = 0
for j in range(n):
a = a[1:] + a[0]
if value(a) > v:
b = a
v = value(a)
ctr = 0
cou = 0
while ctr < k:
a = a[1:] + a[0]
cou = cou + 1
if a == b:
ctr = ctr + 1
print(cou)
In the problem, the constraint on n is 0<=n<=1e5. In the function value(), you calculating integer from the binary string whose length can go up to 1e5. so the integer calculating by you can go as high as pow(2, 1e5). This surely impractical.
As mentioned by Prune, you must use some efficient algorithms for finding a subsequence, say sub1, whose repetitions make up the given string A. If you solve this by brute-force, the time complexity will be O(n*n), as maximum value of n is 1e5, time limit will exceed. so use some efficient algorithm.
I can't do much with the code you posted, since you obfuscated it with meaningless variables and a lack of explanation. When I scan it, I get the impression that you've made the straightforward approach of doing a single-digit shift in a long-running loop. You count iterations until you hit B for the Kth time.
This is easy to understand, but cumbersome and inefficient.
Since the cycle repeats every N iterations, you gain no new information from repeating that process. All you need to do is find where in the series of N iterations you encounter B ... which could be multiple times.
In order for B to appear multiple times, A must consist of a particular sub-sequence of bits, repeated 2 or more times. For instance, 101010 or 011011. You can detect this with a simple addition to your current algorithm: at each iteration, check to see whether the current string matches the original. The first time you hit this, simply compute the repetition factor as rep = len(a) / j. At this point, exit the shifting loop: the present value of b is the correct one.
Now that you have b and its position in the first j rotations, you can directly compute the needed result without further processing.
I expect that you can finish the algorithm and do the coding from here.
Ah -- taken as a requirements description, the wording of your problem suggests that B is a given. If not, then you need to detect the largest value.
To find B, append A to itself. Find the A-length string with the largest value. You can hasten this by finding the longest string of 1s, applying other well-known string-search algorithms for the value-trees after the first 0 following those largest strings.
Note that, while you iterate over A, you look for the first place in which you repeat the original value: this is the desired repetition length, which drives the direct-computation phase in the first part of my answer.

find the duplicate number with O(1) space and O(n) time

I'm solving a question in leetcode
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one in O(n) time and O(1) space complexity
class Solution(object):
def findDuplicate(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
xor=0
for num in nums:
newx=xor^(2**num)
if newx<xor:
return num
else:
xor=newx
I got the solution accepted but I have been told that it is neither O(1) space nor O(n) time.
can anyone please help me understand why?
Your question is actually hard to answer. Typically when dealing with complexities, there's an assumed machine model. A standard model assumes that memory locations are of size log(n) bits when the input is of size n, and that arithmetic operations on numbers of size log(n) bits are O(1).
In this model, your code isn't O(1) in space and O(n) in time. Your xor value has n bits, and this doesn't fit in a constant memory location (it actually needs n/log(n) memory locations. Similarly, it's not O(n) in time, since the arithmetic operations are on numbers larger than log(n) bits.
To solve your problem in O(1) space and O(n) time, you've got to make sure your values don't get too large. One approach is to xor all the numbers in the array, and then you'll get 1^2^3...^n ^ d where d is the duplicate. Thus you can xor 1^2^3^..^n from the total xor of the array, and find the duplicate value.
def find_duplicate(ns):
r = 0
for i, n in enumerate(ns):
r ^= i ^ n
return r
print find_duplicate([1, 3, 2, 4, 5, 4, 6])
This is O(1) space, and O(n) time since r never uses more bits than n does (that is, approximately ln(n) bits).
Your solution is not O(1) space, meaning: your space/memory is not constant but depending on the input!
newx=xor^(2**num)
This is a bitwise XOR over log_2(2**num) = num bits, where num is one of your input-numbers, resulting in a log_2(2**num) = num bit result.
So n=10 = log_2(2^10) = 10 bits, n=100 = log_2(2^100) = 100 bits. It's growing linearly (not constant).
It's also not within O(n) time-complexity as you got:
an outer loop over all n numbers
and a non-constant / non O(1) inner-loop (see above)
assumption: XOR is not constant in regards to bit-representation of input
that's not always treated like that; but physics support this claim (Chandrasekhar limit, speed of light, ...)
This question has to be solved with linked list Floyd's algorighm.
Convert the array to a linked list. There are n+1 positions but only n values.
For example if you have this array: [1,3,4,2,2] convert it to linked list.
How the pointing works
Starting from index 0, look at which element in that position. it is 1. Then index 0 will point to nums1. 0 is pointing 3. then figure out which value 3 is pointing to. That will nums[3] and so on.
Now you converted this to linked list, you have to use Floyd's hare and tortoise algorithm. Basically you have two pointers, slow and fast. If there is cycle, slow and fast pointers are gonna meet at some point.
from typing import List
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
# slow and fast are index
slow,fast=0,0
while True:
slow=nums[slow]
fast=nums[nums[fast]]
if slow==fast:
break
# so far we found where slow and fast met.
# to find where cycle starts we initialize another pointer from start, let's name is start
# start and slow will move towards each other, and meeting point will be the point that you are looking for
start=0
while True:
slow=nums[slow]
start=nums[start]
if slow==start:
return slow
Notice none of the elements after first index ever points to value at index 0. because our range is 1-n. We are tracking where we point to by nums[value] but since no value will be 0, nothing will point to nums[0]
You can find the xor of all the number in the array (lets call it x) and then calculator xor of the number 1,2,3,....,n (lets call it y). Now, x xor y will be your answer

Efficiently calculating mathematical formulas with exponents

I'm implementing a program that calculates an equation: F(n) = F(n-1) + 'a' + func1(func2(F(n-1))).
func1 takes every 'a' and makes it 'c' and every 'c' becomes 'a'.
func2 reverses the string (e.x. "xyz" becomes "zyx").
I want to calculate the Kth character of F(10**2017).
The basic rules are F(0) = "" (empty string), and examples are F(1) = "a", F(2) = "aac", and so on.
How do I do this efficiently?
The basic part of my code is this:
def op1 (str1):
if str1 == 'a':
return 'c'
else:
return 'a'
def op2 (str2):
return str2[::-1]
sinitial = ''
while (counter < 10**2017):
Finitial = Finitial + 'a' + op1(op2(Finitial))
counter += 1
print Finitial
Let's start by fixing your original code and defining a function to compute F(n) for small n. We'll also print out the first few values of F. All code below is for Python 3; if you're using Python 2, you'll need to make some minor changes, like replacing str.maketrans with string.maketrans and range with xrange.
swap_ac = str.maketrans({ord('a'): 'c', ord('c'): 'a'})
def F(n):
s = ''
for _ in range(n):
s = s + 'a' + s[::-1].translate(swap_ac)
return s
for n in range(7):
print("F({}) = {!r}".format(n, F(n)))
This gives the following output:
F(0) = ''
F(1) = 'a'
F(2) = 'aac'
F(3) = 'aacaacc'
F(4) = 'aacaaccaaaccacc'
F(5) = 'aacaaccaaaccaccaaacaacccaaccacc'
F(6) = 'aacaaccaaaccaccaaacaacccaaccaccaaacaaccaaaccacccaacaacccaaccacc'
A couple of observations at this point:
F(n) is a string of length 2**n-1. That means that F(n) grows fast. Computing F(50) would already require some serious hardware: even if we stored one character per bit, we'd need over 100 terabytes to store the full string. F(200) has more characters than there are estimated atoms in the solar system. So the idea of computing F(10**2017) directly is laughable: we need a different approach.
By construction, each F(n) is a prefix of F(n+1). So what we really have is a well-defined infinite string, where each F(n) merely gives us the first 2**n-1 characters of that infinite string, and we're looking to compute its kth character. And for any practical purpose, F(10**2017) might as well be that infinite string: for example, when we do our computation, we don't need to check that k < 2**(10**2017)-1, since a k exceeding this can't even be represented in normal binary notation in this universe.
Luckily, the structure of the string is simple enough that computing the kth character directly is straightforward. The major clue comes when we look at the characters at even and odd positions:
>>> F(6)[::2]
'acacacacacacacacacacacacacacacac'
>>> F(6)[1::2]
'aacaaccaaaccaccaaacaacccaaccacc'
The characters at even positions simply alternate between a and c (and it's straightforward to prove that this is true, based on the construction). So if our k is even, we can simply look at whether k/2 is odd or even to determine whether we'll get an a or a c.
What about the odd positions? Well F(6)[1::2] should look somewhat familiar: it's just F(5):
>>> F(6)[1::2] == F(5)
True
Again, it's straightforward to prove (e.g., by induction) that this isn't simply a coincidence, and that F(n+1)[1::2] == F(n) for all nonnegative n.
We now have an effective way to compute the kth character in our infinite string: if k is even, we just look at the parity of k/2. If k is odd, then we know that the character at position k is equal to that at position (k-1)/2. So here's a first solution to computing that character:
def char_at_pos(k):
"""
Return the character at position k of the string F(n), for any
n satisfying 2**n-1 > k.
"""
while k % 2 == 1:
k //= 2
return 'ac'[k//2%2]
And a check that this does the right thing:
>>> ''.join(char_at_pos(i) for i in range(2**6-1))
'aacaaccaaaccaccaaacaacccaaccaccaaacaaccaaaccacccaacaacccaaccacc'
>>> ''.join(char_at_pos(i) for i in range(2**6-1)) == F(6)
True
But we can do better. We're effectively staring at the binary representation of k, removing all trailing '1's and the next '0', then simply looking at the next bit to determine whether we've got an 'a' or a 'c'. Identifying the trailing 1s can be done by bit-operation trickery. This gives us the following semi-obfuscated loop-free solution, which I leave it to you to unwind:
def char_at_pos2(k):
"""
Return the character at position k of the string F(n), for any
n satisfying 2**n-1 > k.
"""
return 'ac'[k//(1+(k+1^k))%2]
Again, let's check:
>>> F(20) == ''.join(char_at_pos2(i) for i in range(2**20-1))
True
Final comments: this is a very well-known and well-studied sequence: it's called the dragon curve sequence, or the regular paper-folding sequence, and is sequence A014577 in the online encyclopaedia of integer sequences. Some Google searches will likely give you many other ways to compute its elements. See also this codegolf question.
Based on what you have already coded, here's my suggestion:
def main_function(num):
if num == 0:
return ''
previous = main_function(num-1)
return previous + 'a' + op1(op2(previous))
print(main_function(10**2017))
P.S: I'm not sure of the efficiency.

Random number generator that returns only one number each time

Does Python have a random number generator that returns only one random integer number each time when next() function is called? Numbers should not repeat and the generator should return random integers in the interval [1, 1 000 000] that are unique.
I need to generate more than million different numbers and that sounds as if it is very memory consuming in case all the number are generated at same time and stored in a list.
You are looking for a linear congruential generator with a full period. This will allow you to get a pseudo-random sequence of non-repeating numbers in your target number range.
Implementing a LCG is actually very simple, and looks like this:
def lcg(a, c, m, seed = None):
num = seed or 0
while True:
num = (a * num + c) % m
yield num
Then, it just comes down to choosing the correct values for a, c, and m to guarantee that the LCG will generate a full period (which is the only guarantee that you get non-repeating numbers). As the Wikipedia article explains, the following three conditions need to be true:
m and c need to be relatively prime.
a - 1 is divisible by all prime factors of m
a - 1 is divisible by 4, if m is also divisible by 4.
The first one is very easily guaranteed by simply choosing a prime for c. Also, this is the value that can be chosen last, and this will ultimately allow us to mix up the sequence a bit.
The relationship between a - 1 and m is more complicated though. In a full period LCG, m is the length of the period. Or in other words, it is the number range your numbers come from. So this is what you are usually choosing first. In your case, you want m to be around 1000000. Choosing exactly your maximum number might be difficult since that restricts you a lot (in both your choice of a and also c), so you can also choose numbers larger than that and simply skip all numbers outside of your range later.
Let’s choose m = 1000000 now though. The prime factors of m are 2 and 5. And it’s also obviously divisible by 4. So for a - 1, we need a number that is a multiple of 2 * 2 * 5 to satisfy the conditions 2 and 3. Let’s choose a - 1 = 160, so a = 161.
For c, we are using a random prime that’s somewhere in between of our range: c = 506903
Putting that into our LCG gives us our desired sequence. We can choose any seed value from the range (0 <= seed <= m) as the starting point of our sequence.
So let’s try it out and verify that what we thought of actually works. For this purpose, we are just collecting all numbers from the generator in a set until we hit a duplicate. At that point, we should have m = 1000000 numbers in the set:
>>> g = lcg(161, 506903, 1000000)
>>> numbers = set()
>>> for n in g:
if n in numbers:
raise Exception('Number {} already encountered before!'.format(n))
numbers.add(n)
Traceback (most recent call last):
File "<pyshell#5>", line 3, in <module>
raise Exception('Number {} already encountered before!'.format(n))
Exception: Number 506903 already encountered before!
>>> len(numbers)
1000000
And it’s correct! So we did create a pseudo-random sequence of numbers that allowed us to get non-repeating numbers from our range m. Of course, by design, this sequence will be always the same, so it is only random once when you choose those numbers. You can switch up the values for a and c to get different sequences though, as long as you maintain the properties mentioned above.
The big benefit of this approach is of course that you do not need to store all the previously generated numbers. It is a constant space algorithm as it only needs to remember the initial configuration and the previously generated value.
It will also not deteriorate as you get further into the sequence. This is a general problem with solutions that just keep generating a random number until a new one is found that hasn’t been encountered before. This is because the longer the list of generated numbers gets, the less likely you are going to hit a numbers that’s not in that list with an evenly distributed random algorithm. So getting the 1000000th number will likely take you a long time to generate with memory based random generators.
But of course, having this simply algorithm which just performs some multiplication and some addition does not appear very random. But you have to keep in mind that this is actually the basis for most pseudo-random number generators out there. So random.random() uses something like this internally. It’s just that the m is a lot larger, so you don’t notice it there.
If you really care about the memory you could use a NumPy array (or a Python array).
A one million NumPy array of int32 (more than enough to contain integers between 0 and 1 000 000) will only consume ~4MB, Python itself would require ~36MB (roughly 28byte per integer and 8 byte for each list element + overallocation) for an identical list:
>>> # NumPy array
>>> import numpy as np
>>> np.arange(1000000, dtype=np.int32).nbytes
4 000 000
>>> # Python list
>>> import sys
>>> import random
>>> l = list(range(1000000))
>>> random.shuffle(l)
>>> size = sys.getsizeof(l) # size of the list
>>> size += sum(sys.getsizeof(item) for item in l) # size of the list elements
>>> size
37 000 108
You only want unique values and you have a consecutive range (1 million requested items and 1 million different numbers), so you could simply shuffle the range and then yield items from your shuffled array:
def generate_random_integer():
arr = np.arange(1000000, dtype=np.int32)
np.random.shuffle(arr)
yield from arr
# yield from is equivalent to:
# for item in arr:
# yield item
And it can be called using next:
>>> gen = generate_random_integer()
>>> next(gen)
443727
However that will throw away the performance benefit of using NumPy, so in case you want to use NumPy don't bother with the generator and just perform the operations (vectorized - if possible) on the array. It consumes much less memory than Python and it could be orders of magnitude faster (factors of 10-100 faster are not uncommon!).
For a large number of non-repeating random numbers use an encryption. With a given key, encrypt the numbers: 0, 1, 2, 3, ... Since encryption is uniquely reversible then each encrypted number is guaranteed to be unique, provided you use the same key. For 64 bit numbers use DES. For 128 bit numbers use AES. For other size numbers use some Format Preserving Encryption. For pure numbers you might find Hasty Pudding cipher useful as that allows a large range of different bit sizes and non-bit sizes as well, like [0..5999999].
Keep track of the key and the last number you encrypted. When you need a new unique random number just encrypt the next number you haven't used so far.
Considering your numbers should fit in a 64bit integer, one million of them stored in a list would be up to 64 mega bytes plus the list object overhead, if your processing computer can afford that the easyest way is to use shuffle:
import random
randInts = list(range(1000000))
random.shuffle(randInts)
print(randInts)
Note that the other method is to keep track of the previously generated numbers, which will get you to the point of having all of them stored too.
I just needed that function, and to my huge surprise I haven't found anything that would suit my needs. #poke's answer didn't satisfy me because I needed to have precise borders, and other ones which included lists caused heaped memory.
Initially, I needed a function that would generate numbers from a to b, where a - b could be anything from 0 to 2^32 - 1, which means the range of those numbers could be as high as maximal 32-bit unsigned integer.
The idea of my own algorithm is simple both to understand and implement. It's a binary tree, where the next branch is chosen by 50/50 chance boolean generator. Basically, we divide all numbers from a to b into two branches, then decide from which one we yield the next value, then do that recursively until we end up with single nodes, which are also being picked up by random.
The depth of recursion is:
, which implies that for the given stack limit of 256, your highest range would be 2^256, which is impressive.
Things to note:
a must be lesser or equal b - otherwise no output will be displayed.
Boundaries are included, meaning unique_random_generator(0, 3) will generate [0, 1, 2, 3].
TL;DR - here's the code
import math, random
# a, b - inclusive
def unique_random_generator(a, b):
# corner case on wrong input
if a > b:
return
# end node of the tree
if a == b:
yield a
return
# middle point of tree division
c = math.floor((a + b) / 2)
generator_left = unique_random_generator(a, c) # left branch - contains all the numbers between 'a' and 'c'
generator_right = unique_random_generator(c + 1, b) # right branch - contains all the numbers between 'c + 1' and 'b'
has_values = True
while (has_values):
# decide whether we pick up a value from the left branch, or the right
decision = bool(random.getrandbits(1))
if decision:
next_left = next(generator_left, None)
# if left branch is empty, check the right one
if next_left == None:
next_right = next(generator_right, None)
# if both empty, current recursion's dessicated
if next_right == None:
has_values = False
else:
yield next_right
else:
yield next_left
next_right = next(generator_right, None)
if next_right != None:
yield next_right
else:
next_right = next(generator_right, None)
# if right branch is empty, check the left one
if next_right == None:
next_left = next(generator_left, None)
# if both empty, current recursion's dessicated
if next_left == None:
has_values = False
else:
yield next_left
else:
yield next_right
next_left = next(generator_left, None)
if next_left != None:
yield next_left
Usage:
for i in unique_random_generator(0, 2**32):
print(i)
import random
# number of random entries
x = 1000
# The set of all values
y = {}
while (x > 0) :
a = random.randint(0 , 10**10)
if a not in y :
a -= 1
This way you are sure you have perfectly random unique values
x represents the number of values you want
You can easily make one yourself:
from random import random
def randgen():
while True:
yield random()
ran = randgen()
next(ran)
next(ran)
...

Categories