Answer
Thanks to #TheDark for spotting the overflow. The new C++ solution is pretty freakin' funny, too. It's extremely redundant:
if(2*i > n && 2*i > i)
replaced the old line of code if(2*i > n).
Background
I'm doing this problem on HackerRank, though the problem may not be entirely related to this question. If you cannot see the webpage, or have to make an account and don't want to, the problem is listed in plain text below.
Question
My C++ code is timing out, but my python code is not. I first suspected this was due to overflow, but I used sizeof to be sure that unsigned long long can reach 2^64 - 1, the upper limit of the problem.
I practically translated my C++ code directly into Python to see if it was my algorithms causing the timeouts, but to my surprise my Python code passed every test case.
C++ code:
#include <iostream>
bool pot(unsigned long long n)
{
if (n % 2 == 0) return pot(n/2);
return (n==1); // returns true if n is power of two
}
unsigned long long gpt(unsigned long long n)
{
unsigned long long i = 1;
while(2*i < n) {
i *= 2;
}
return i; // returns greatest power of two less than n
}
int main()
{
unsigned int t;
std::cin >> t;
std::cout << sizeof(unsigned long long) << std::endl;
for(unsigned int i = 0; i < t; i++)
{
unsigned long long n;
unsigned long long count = 1;
std::cin >> n;
while(n > 1) {
if (pot(n)) n /= 2;
else n -= gpt(n);
count++;
}
if (count % 2 == 0) std::cout << "Louise" << std::endl;
else std::cout << "Richard" << std::endl;
}
}
Python 2.7 code:
def pot(n):
while n % 2 == 0:
n/=2
return n==1
def gpt(n):
i = 1
while 2*i < n:
i *= 2
return i
t = int(raw_input())
for i in range(t):
n = int(raw_input())
count = 1
while n != 1:
if pot(n):
n /= 2
else:
n -= gpt(n)
count += 1
if count % 2 == 0:
print "Louise"
else:
print "Richard"
To me, both versions look identical. I still think I'm somehow being fooled and am actually getting overflow, causing timeouts, in my C++ code.
Problem
Louise and Richard play a game. They have a counter is set to N. Louise gets the first turn and the turns alternate thereafter. In the game, they perform the following operations.
If N is not a power of 2, they reduce the counter by the largest power of 2 less than N.
If N is a power of 2, they reduce the counter by half of N.
The resultant value is the new N which is again used for subsequent operations.
The game ends when the counter reduces to 1, i.e., N == 1, and the last person to make a valid move wins.
Given N, your task is to find the winner of the game.
Input Format
The first line contains an integer T, the number of testcases.
T lines follow. Each line contains N, the initial number set in the counter.
Constraints
1 ≤ T ≤ 10
1 ≤ N ≤ 2^64 - 1
Output Format
For each test case, print the winner's name in a new line. So if Louise wins the game, print "Louise". Otherwise, print "Richard". (Quotes are for clarity)
Sample Input
1
6
Sample Output
Richard
Explanation
As 6 is not a power of 2, Louise reduces the largest power of 2 less than 6 i.e., 4, and hence the counter reduces to 2.
As 2 is a power of 2, Richard reduces the counter by half of 2 i.e., 1. Hence the counter reduces to 1.
As we reach the terminating condition with N == 1, Richard wins the game.
When n is greater than 2^63, your gpt function will eventually have i as 2^63 and then multiply 2^63 by 2, giving an overflow and a value of 0. This will then end up with an infinite loop, multiplying 0 by 2 each time.
Try this bit-twiddling hack, which is probably slightly faster:
unsigned long largest_power_of_two_not_greater_than(unsigned long x) {
for (unsigned long y; (y = x & (x - 1)); x = y) {}
return x;
}
x&(x-1) is x without its least significant one-bit. So y will be zero (terminating the loop) exactly when x has been reduced to a power of two, which will be the largest power of two not greater than the original x. The loop is executed once for every 1-bit in x, which is on average half as many iterations as your approach. Also, this one has not issues with overflow. (It does return 0 if the original x was 0. That may or may not be what you want.)
Note the if the original x was a power of two, that value is simply returned immediately. So the function doubles as a test whether x is a power of two (or 0).
While that is fun and all, in real-life code you'd probably be better off finding your compiler's equivalent to this gcc built-in (unless your compiler is gcc, in which case here it is):
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in X, starting at the most
significant bit position. If X is 0, the result is undefined.
(Also available as __builtin_clzl for unsigned long arguments and __builtin_clzll for unsigned long long.)
Related
I was participating in a competitive programming contest, and faced a question where out of four test cases, my answer was correct in 3, but exceeded time limit in 4th.
I tried to get better results by converting my code from python to cpp (I know that time complexity remains same, but it was worth a shot :))
Following is the question:
A string is said to be using strong language if it contains at least K consecutive characters '*'.
You are given a string S with length N. Determine whether it uses strong language or not.
Input:
The first line of the input contains a single integer T denoting the number of test cases. The description of T test cases follows.
The first line of each test case contains two space-separated integers N and K.
The second line contains a single string S with length N.
Output:
Print a single line containing the string "YES" if the string contains strong language or "NO" if it does not
My python approach:
for _ in range(int(input())):
k = int(input().split()[1])
s = input()
s2 = "".join(["*"]*k)
if len(s.split(s2))>1:
print("YES")
else:
print("NO")
My converted Cpp code (converted it myself)
#include <iostream>
#include<string>
using namespace std;
int main() {
// your code goes here
int t;
std::cin >> t;
for (int i = 0; i < t; i++) {
/* code */
int n,k;
std::cin >> n >> k;
string str;
cin >> str;
string str2(k,'*');
size_t found = str.find(str2);
if (found != string::npos){
std::cout << "YES" << std::endl;
} else {
std::cout << "NO" << std::endl;
}
}
return 0;
}
Please guide me how can I reduce my time complexity?
Other approaches : "Using find() function instead of split or using for loop"
Edit:
Sample Input :
2
5 1
abd
5 2
*i**j
Output :
NO
YES
The bounds you posted suggest that linear time is OK in Python. You can simply keep a running track of how many asterisks you have seen in a row.
T = int(input())
for _ in range(T):
n, k = map(int, input())
s = input()
count, ans = 0, False
for c in s:
if c == "*":
count += 1
else:
count = 0
ans = ans or count >= k
if ans:
print("NO")
else:
print("YES")
I can also tell you why you are TLE'ing. Consider the case where n = 1e6, k = 5e5, and s is a string where the first k-1 characters are asterisks. The find method you have is going to check every position for matching the str2 you created. This will take O(n^2) time, giving you a TLE.
There's a bit manipulation problem that asks you to sum up two integers without using + or - operators. Below is the code in Java:
public int getSum(int a, int b) {
while (b != 0) {
int carry = a & b;
a = a ^ b;
b = carry << 1;
}
return a;
}
When you try to sum up -1 and 1, the intermediate values take on [-2, 2], [-4, 4] and so on until the number overflows and the result reaches 0. You can't do the same in Python, the process goes on forever taking up an entire CPU thread and slowly growing in memory. It seems that on my machine the numbers will grow for a while until no memory is left.
def getSum(a, b):
while c != 0:
carry = a & b
a = a ^ b
b = carry << 1
return a
if __name__ == '__main__':
print getSum(-1, 1) # will run forever
This is rather peculiar example, are there any real-world implications of not having the integers limited in size?
the implications are that you must know and enforce your own integer widths when computing checksums
you make it the size you want
carry = (a & b)&255
a = (a ^ b)&255
b = (carry << 1)&255
would be one byte wide integers ...
I was looking at the source of sorted_containers and was surprised to see this line:
self._load, self._twice, self._half = load, load * 2, load >> 1
Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:
(times, divide)
(shift, shift)
(times, shift)
(shift, divide)
and found that #3 is consistently faster than other alternatives:
# self._load, self._twice, self._half = load, load * 2, load >> 1
import random
import timeit
import pandas as pd
x = random.randint(10 ** 3, 10 ** 6)
def test_naive():
a, b, c = x, 2 * x, x // 2
def test_shift():
a, b, c = x, x << 1, x >> 1
def test_mixed():
a, b, c = x, x * 2, x >> 1
def test_mixed_swapped():
a, b, c = x, x << 1, x // 2
def observe(k):
print(k)
return {
'naive': timeit.timeit(test_naive),
'shift': timeit.timeit(test_shift),
'mixed': timeit.timeit(test_mixed),
'mixed_swapped': timeit.timeit(test_mixed_swapped),
}
def get_observations():
return pd.DataFrame([observe(k) for k in range(100)])
The question:
Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?
I run Python 3.5 on Ubuntu 14.04.
Edit
Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.
For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.
This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.
Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)
When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:
static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;
CHECK_BINOP(a, b);
/* fast path for single-digit multiplication */
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
/* if we don't have long long then we're almost certainly
using 15-bit digits, so v will fit in a long. In the
unlikely event that we're using 30-bit digits on a platform
without long long, a large v will just cause us to fall
through to the general multiplication code below. */
if (v >= LONG_MIN && v <= LONG_MAX)
return PyLong_FromLong((long)v);
#endif
}
So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.
In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)
static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
/*** ... ***/
wordshift = shiftby / PyLong_SHIFT; /*** zero for small w ***/
remshift = shiftby - wordshift * PyLong_SHIFT; /*** w for small w ***/
oldsize = Py_ABS(Py_SIZE(a)); /*** 1 for small v > 0 ***/
newsize = oldsize + wordshift;
if (remshift)
++newsize; /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
z = _PyLong_New(newsize);
/*** ... ***/
}
Integer division
You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.
Or at least, that was the case when this question was originally asked. In CPython 3.6, a fast path for small int // was added, so // now beats >> for small ints too.
Find the largest palindrome made from the product of two 3-digit numbers.
Even though the algorithm is fast enough for the problem at hand, I'd like to know if I missed any obvious optimizations.
from __future__ import division
from math import sqrt
def createPalindrome(m):
m = str(m) + str(m)[::-1]
return int(m)
def problem4():
for x in xrange(999,99,-1):
a = createPalindrome(x)
for i in xrange(999,int(sqrt(a)),-1):
j = a/i
if (j < 1000) and (j % 1 == 0):
c = int(i * j)
return c
It seems the biggest slowdown in my code is converting an integer to a string, adding its reverse and converting the result back to an integer.
I looked up more information on palindromes and stumbled upon this formula, which allows me to convert a 3-digit number "n" into a 6-digit palindrome "p" (can be adapted for other digits but I'm not concerned about that).
p = 1100*n−990*⌊n/10⌋−99*⌊n/100⌋
My original code runs in about 0.75 ms and the new one takes practically the same amount of time (not to mention the formula would have to be adapted depending on the number of digits "n" has), so I guess there weren't many optimizations left to perform.
Look here for Ideas
In C++ I do it like this:
int euler004()
{
// A palindromic number reads the same both ways. The largest palindrome
// made from the product of two 2-digit numbers is 9009 = 91 99.
// Find the largest palindrome made from the product of two 3-digit numbers.
const int N=3;
const int N2=N<<1;
int min,max,a,b,c,i,j,s[N2],aa=0,bb=0,cc=0;
for (min=1,a=1;a<N;a++) min*=10; max=(min*10)-1;
i=-1;
for (a=max;a>=min;a--)
for (b=a;b>=min;b--)
{
c=a*b; if (c<cc) continue;
for (j=c,i=0;i<N2;i++) { s[i]=j%10; j/=10; }
for (i=0,j=N2-1;i<j;i++,j--)
if (s[i]!=s[j]) { i=-1; break; }
if (i>=0) { aa=a; bb=b; cc=c; }
}
return cc; // cc is the output
}
no need for sqrt ...
the subcall to createPalindrome can slow things down due to heap/stack trashing
string manipulation m = str(m) + str(m)[::-1] is slow
string to int conversion can be faster if you do it your self on fixed size array
mine implementation runs around 1.7ms but big portion of that time is the App output and formating (AMD 3.2GHz 32bit app on W7 x64)...
[edit1] implementing your formula
int euler004()
{
int i,c,cc,c0,a,b;
for (cc=0,i=999,c0=1100*i;i>=100;i--,c0-=1100)
{
c=c0-(990*int(i/10))-(99*int(i/100));
for(a=999;a>=300;a--)
if (c%a==0)
{
b=c/a;
if ((b>=100)&&(b<1000)) { cc=c; i=0; break; }
}
}
return cc;
}
this takes ~0.4 ms
[edit2] further optimizations
//---------------------------------------------------------------------------
int euler004()
{
// A palindromic number reads the same both ways. The largest palindrome
// made from the product of two 2-digit numbers is 9009 = 91 99.
// Find the largest palindrome made from the product of two 3-digit numbers.
int i0,i1,i2,c0,c1,c,cc=0,a,b,da;
for (c0= 900009,i0=9;i0>=1;i0--,c0-=100001) // first digit must be non zero so <1,9>
for (c1=c0+90090,i1=9;i1>=0;i1--,c1-= 10010) // all the rest <0,9>
for (c =c1+ 9900,i2=9;i2>=0;i2--,c -= 1100) // c is palindrome from 999999 to 100001
for(a=999;a>=948;a-- )
if (c%a==0)
{
// biggest palindrome is starting with 9
// so smallest valid result is 900009
// it is odd and sqrt(900009)=948 so test in range <948,999>
b=c/a;
if ((b>=100)&&(b<1000)) { cc=c; i0=0; i1=0; i2=0; break; }
}
return cc;
}
//---------------------------------------------------------------------------
this is too fast for me to properly measure the time (raw time is around 0.037 ms)
removed the divisions and multiplications from palindrome generation
changed the ranges after some numeric analysis and thinking while waiting for bus
the first loop can be eliminated (result starts with 9)
I wrote this a while back when I just started learning python, but here it is:
for i in range (999, 800, -1):
for j in range (999,800, -1):
number = i*j
str_number = str(number)
rev_str_number = str_number[::-1]
if str_number == rev_str_number:
print("%s a palendrome") % number
I did not check all the numbers you did, but I still got the correct answer. What I really learned in this exercise is the "::" and how it works. You can check that out here.
Good luck with Euler!
EDIT: Solved! Simple mistake, accidentally left the int values at just int which couldn't hold that big of a number. Thanks for the help!
I already completed the Project Euler third problem:
"The prime factors of 13195 are 5, 7, 13 and 29. What is the largest prime factor of the number 600851475143 ?"?
In Python with this code (that works):
def main():
num = 600851475143 # You can replace this number with any number you want to find the largest prime to
x = 2
while x * x < num:
while num % x == 0:
num = num / x #Divide number by generated number (X) to get the prime number.
x = x + 1 # Continue in formula searching for largest prime
print num #Prints largest prime of the assigned number (600851475143)
main()
and that worked fine. However, when I tried replacating said code into C++ with this code:
#include "stdafx.h"
#include <iostream>
int main()
{
int num = 600851475143;
int x = 2;
while (x*x < num)
{
while (num % x == 0)
{
num /= x;
}
x = x++;
}
std::cout << num;
char z;
std::cin >> z;
return 0;
}
I always get the output "-443946297" instead of the correct and very different output I was expecting, "6857"
Can anyone help explain how I am getting such an extremely crazy answer from essentially the same code? Thanks in advance!
600851475143 is probably too large to fit in an int, leading to overflow. Try changing the type to long long. (You should probably change x to long long too, although it might not matter in this case.)