nan float identity compares False, but nans in tuples compare True - python

Can someone explain how the following is possible? I tried it in Python 2 and 3, and got the same result. Shouldn't the nans always compare not equal? Or, if it's comparing pointers, shouldn't the pointers always compare equal? What's going on?
>>> n = float('nan')
>>> n == n
False
>>> (n,) == (n,)
True

For n == n, it uses the compare method of float number.
For (n,) == (n,), it calls the compare method of tuple,
/* Search for the first index where items are different.
* Note that because tuples are immutable, it's safe to reuse
* vlen and wlen across the comparison calls.
*/
for (i = 0; i < vlen && i < wlen; i++) {
int k = PyObject_RichCompareBool(vt->ob_item[i],
wt->ob_item[i], Py_EQ);
if (k < 0)
return NULL;
if (!k)
break;
}
then it calls the compare method of object. It returns true immediately if two objects are the same.
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}

Related

Is there a way to convert Python to R?

Hey I am trying to convert my python code to R and can't seem to figure out the last part of the recursion. If anyone who has experience in both languages could help that would be great!
def robber(nums):
if len(nums) == 0: return 0
elif len(nums) <= 2: return max(nums)
else:
A = [nums[0], max(nums[0:2])]
for i in range(2, len(nums)):
A.append(max(A[i-1], A[i-2] + nums[i]))
return A[-1]
Above is the Python version and below is my attempt so far on converting to R
robbing <- function(nums) {
if (length(nums) == 0){
result <- 0
}
else if(length(nums) <= 2){
result <- max(nums)
}
else{
a <- list(nums[0], max(nums(0:2)))
for (i in range(2, length(nums))){
result <- max(a[i-1], a[i-2] + nums[i])
}
}
#result <- a[-1]
}
You have a couple of problems.
You are zero-indexing your vectors. R is 1-indexed (first element of y is y[1] not y[0].
Ranges (slices in python) in R are inclusive. Eg: 0:2 = c(0, 1, 2) while python is right-exclusive 0:2 = [0, 1].
R uses minus elements to "remove" elements of vectors, while Python uses these to extract from reverse order. Eg: y[-1] = y[2:length(y)] in R.
R's range function is not the same as Python's range function. The equivalent in R would be seq or a:b (example 3:n). Not again that it is right-inclusive while pythons is right-exclusive!
You are not storing your intermediary results in a as you are doing in python. You need to do this at run-time
And last: R functions will return the last evaluation by default. So there is no need to explicitly use return. This is not a problem per-say, but something that can make code look cleaner (or less clean in some cases). So one option to fix you problem would be:
robber <- function(nums){
n <- length(nums) # <= Only compute length **once** =>
if(n == 0)
0 # <= Returned because no more code is run after this =>
else if(n <= 2)
max(nums) # <= Returned because no more code is run after this =>
else{
a <- numeric(n) # <= pre-allocate our vector =>
a[1:2] <- cummax(nums[1:2]) # <= Cummax instead of c(nums[1], max(nums[1:2])) =>
for(i in 3:n){ # <= Note that we start at 3, because of R's 1-indexing =>
a[i] <- max(a[i - 1], a[i - 2] + nums[i])
}
a[n]
}
}
Note 3 things:
I use that R vectors are 1-indexed, and my range goes from 3 as a consequence of this.
I pre-allocate my a vector (here using numeric(n)). R vector expansion is slow while python lists are constant in time-complexity. So preallocation is the recommended way to go in all cases.
I extract my length once and store it in a variable. n <- length(nums). It is inherently unnecessary to evaluate this multiple times, and it is recommended to store these intermediary results in a variable. This goes for any language such as R, Python and even in compild languages such as C++ (while for the latter, in many cases the compiler is smart enough to not recompute the result).
Last I use cummax where I can. I feel there is an optimized way to get your result almost immediately using vectorization, but I can't quite see it.
I would avoid to use a list. Because appending lists is slow. (Especially in R! - Vector is much better. But we don't need any sequence and indexing, if we use variables like I show you here).
You don't need to build a list.
All you need to keep in memory is the previous
and the preprevious value for res.
def robber(nums, res=0, prev=0, preprev=0): # local vars predefined here
for x in nums:
prev, preprev = res, prev
res = max(prev, preprev + x)
return res
This python function does the same like your given. (Try it out!).
In R this would be:
robber <- function(nums, res=0, prev=0, preprev=0) {
for (x in nums) {
preprev <- prev
prev <- res # correct order important!
res <- max(prev, preprev + x)
}
res
}
Taking the local variable definitions into the argument list saves in R 3 lines of code, therefore I did it.
I suggest you can change result to return() and renaming object a outside the function, also change len to length() by the end of the function.
a <- list(nums[0], max(nums(0:2)))
robbing <- function(nums) {
if (length(nums) == 0){
return(0)
}
else if(length(nums) <= 2){
return(max(nums))
}
else{
for (i in range(2, length(nums))){
return(max(a[i-1], a[i-2] + nums[i]))
}
}
return(a[length(a)])
}

Worst-case time complexity of Python's int.bit_length()

When we call the function int.bit_length passing an integer n,
is the worst-case time complexity O(log(n)) or Python uses some trick to improve it (e.g. storing the position of the most significant bit of n when it is created)?
In CPython, for values with fewer internal-representation digits than PY_SSIZE_T_MAX/PyLong_SHIFT – i.e. fewer than PY_SSIZE_T_MAX binary digits – it’s calculated from the number of internal digits, yes:
msd = ((PyLongObject *)self)->ob_digit[ndigits-1];
msd_bits = bits_in_digit(msd);
if (ndigits <= PY_SSIZE_T_MAX/PyLong_SHIFT)
return PyLong_FromSsize_t((ndigits-1)*PyLong_SHIFT + msd_bits);
Otherwise, it goes through bigints again, for overall time complexity of O(log log N) (which isn’t exactly true either in this strange mix of practice and theory, so…).
/* expression above may overflow; use Python integers instead */
result = (PyLongObject *)PyLong_FromSsize_t(ndigits - 1);
if (result == NULL)
return NULL;
x = (PyLongObject *)PyLong_FromLong(PyLong_SHIFT);
if (x == NULL)
goto error;
y = (PyLongObject *)long_mul(result, x);
Py_DECREF(x);
if (y == NULL)
goto error;
Py_DECREF(result);
result = y;
x = (PyLongObject *)PyLong_FromLong((long)msd_bits);
if (x == NULL)
goto error;
y = (PyLongObject *)long_add(result, x);
Py_DECREF(x);
if (y == NULL)
goto error;
Py_DECREF(result);
result = y;
return (PyObject *)result;
tl;dr: it’s O(1)

Python Runtime of collections.Counter Equality

I am wondering what the big-O runtime complexity is for comparing two collections.Counter objects. Here is some code to demonstrate what I mean:
import collections
counter_1 = collections.Counter("abcabcabcabcabcabcdefg")
counter_2 = collections.Counter("xyzxyzxyzabc")
comp = counter_1 == counter_2 # What is the runtime of this comparison statement?
Is the runtime of the equality comparison in the final statement O(1)? Or is it O(num_of_unique_keys_in_largest_counter)? Or is it something else?
For reference, here is the source code for collections.Counter https://github.com/python/cpython/blob/0250de48199552cdaed5a4fe44b3f9cdb5325363/Lib/collections/init.py#L497
I do not see the class implementing an __eq()__ method.
Bonus points: If the answer to this question changes between python2 and python3, I would love to hear the difference?
Counter is a subclass of dict, therefore the big O analysis is the one of dict, with the caveat that Counter objects are specialized to only hold int values (i/e they can not hold collections of values as dicts can); this simplifies the analysis.
Looking at the c code implementation of the equality comparison:
There is an early exit if the number of keys is different. (this does not influence big-O).
Then a loop that iterates over all the keys that exits early if the key is not found, or if the corresponding value is different. (again, this has no bearing on big-O).
if all keys are found, and the corresponding values are all equal, then the dictionaries are declared equal. The lookup and comparisons of each key-value pair is O(1); this operation is repeated at most n times (n being the number of keys)
In all, the time complexity is O(n), with n the number of keys.
This applies to both python 2 and 3.
from dictobject.c
/* Return 1 if dicts equal, 0 if not, -1 if error.
* Gets out as soon as any difference is detected.
* Uses only Py_EQ comparison.
*/
static int
dict_equal(PyDictObject *a, PyDictObject *b)
{
Py_ssize_t i;
if (a->ma_used != b->ma_used)
/* can't be equal if # of entries differ */
return 0;
/* Same # of entries -- check all of 'em. Exit early on any diff. */
for (i = 0; i < a->ma_keys->dk_nentries; i++) {
PyDictKeyEntry *ep = &DK_ENTRIES(a->ma_keys)[i];
PyObject *aval;
if (a->ma_values)
aval = a->ma_values[i];
else
aval = ep->me_value;
if (aval != NULL) {
int cmp;
PyObject *bval;
PyObject *key = ep->me_key;
/* temporarily bump aval's refcount to ensure it stays
alive until we're done with it */
Py_INCREF(aval);
/* ditto for key */
Py_INCREF(key);
/* reuse the known hash value */
b->ma_keys->dk_lookup(b, key, ep->me_hash, &bval);
if (bval == NULL) {
Py_DECREF(key);
Py_DECREF(aval);
if (PyErr_Occurred())
return -1;
return 0;
}
cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);
Py_DECREF(key);
Py_DECREF(aval);
if (cmp <= 0) /* error or not equal */
return cmp;
}
}
return 1;
}
Internally, collections.Counter stores the count as a dictionary (that's why it subclasses dict) so the same rules apply as with comparing dictionaries - namely, it compares each key with each value from both dictionaries to ensure equality. For CPython, that is implemented in dict_equal(), other implementations may vary but, logically, you have to do the each-with-each comparison to ensure equality.
This also means that the complexity is O(N) at its worst (loops through one of the dictionaries, looks if the value is the same in the other). There are no significant changes between Python 2.x and Python 3.x in this regard.

Can a string ever get shorter when converted to upper/lowercase?

A string may get longer (in terms of Unicode codepoints) when converted to upper or lower case. For example, 'ß'.upper() evaluates to 'SS'. But are there strings that get shorter? That is, does there exist a string s such that the expression
len(s.lower()) < len(s) or len(s.upper()) < len(s)
evaluates to True?
I think this may be implementation dependent. I'll answer based on the CPython source.
It seems to me that there are two possible situations where calling lower on a string can make it shorter.
Some combination of two Unicode points next to one another get converted into one Unicode point.
A single Unicode point gets converted into an empty string.
We can determine whether case 1 is possible by examining the type signature of the internal lowercase conversion function. Here it is in Objects/unicodectype.c.
int _PyUnicode_ToLowerFull(Py_UCS4 ch, Py_UCS4 *res)
{
const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);
if (ctype->flags & EXTENDED_CASE_MASK) {
int index = ctype->lower & 0xFFFF;
int n = ctype->lower >> 24;
int i;
for (i = 0; i < n; i++)
res[i] = _PyUnicode_ExtendedCase[index + i];
return n;
}
res[0] = ch + ctype->lower;
return 1;
}
I don't 100% understand this code, but I observe that the first parameter ch is a single Unicode point. Since it operates only on individual characters and not character combinations, it seems like case 1 is ruled out; combinations of code points won't get turned into a smaller sequence.
With that out of the way, we can determine whether case 2 ever occurs by just iterating up to sys.maxunicode and seeing if any single value has a length of zero after lowering.
>>> import sys
>>> unicode_chars = list(map(chr, range(sys.maxunicode+1)))
>>> [x for x in unicode_chars if len(x.lower()) == 0]
[]
Looks like case 2 is also busted.
We can apply the above logic to upper as well. For case 1, the implementation for _PyUnicode_ToUpperFull is nearly identical to its lower counterpart; and for case 2, the corresponding list comprehension likewise returns an empty list.
Conclusion
Nope, lower and upper never make anything shorter.

What is the difference between this C++ code and this Python code?

Answer
Thanks to #TheDark for spotting the overflow. The new C++ solution is pretty freakin' funny, too. It's extremely redundant:
if(2*i > n && 2*i > i)
replaced the old line of code if(2*i > n).
Background
I'm doing this problem on HackerRank, though the problem may not be entirely related to this question. If you cannot see the webpage, or have to make an account and don't want to, the problem is listed in plain text below.
Question
My C++ code is timing out, but my python code is not. I first suspected this was due to overflow, but I used sizeof to be sure that unsigned long long can reach 2^64 - 1, the upper limit of the problem.
I practically translated my C++ code directly into Python to see if it was my algorithms causing the timeouts, but to my surprise my Python code passed every test case.
C++ code:
#include <iostream>
bool pot(unsigned long long n)
{
if (n % 2 == 0) return pot(n/2);
return (n==1); // returns true if n is power of two
}
unsigned long long gpt(unsigned long long n)
{
unsigned long long i = 1;
while(2*i < n) {
i *= 2;
}
return i; // returns greatest power of two less than n
}
int main()
{
unsigned int t;
std::cin >> t;
std::cout << sizeof(unsigned long long) << std::endl;
for(unsigned int i = 0; i < t; i++)
{
unsigned long long n;
unsigned long long count = 1;
std::cin >> n;
while(n > 1) {
if (pot(n)) n /= 2;
else n -= gpt(n);
count++;
}
if (count % 2 == 0) std::cout << "Louise" << std::endl;
else std::cout << "Richard" << std::endl;
}
}
Python 2.7 code:
def pot(n):
while n % 2 == 0:
n/=2
return n==1
def gpt(n):
i = 1
while 2*i < n:
i *= 2
return i
t = int(raw_input())
for i in range(t):
n = int(raw_input())
count = 1
while n != 1:
if pot(n):
n /= 2
else:
n -= gpt(n)
count += 1
if count % 2 == 0:
print "Louise"
else:
print "Richard"
To me, both versions look identical. I still think I'm somehow being fooled and am actually getting overflow, causing timeouts, in my C++ code.
Problem
Louise and Richard play a game. They have a counter is set to N. Louise gets the first turn and the turns alternate thereafter. In the game, they perform the following operations.
If N is not a power of 2, they reduce the counter by the largest power of 2 less than N.
If N is a power of 2, they reduce the counter by half of N.
The resultant value is the new N which is again used for subsequent operations.
The game ends when the counter reduces to 1, i.e., N == 1, and the last person to make a valid move wins.
Given N, your task is to find the winner of the game.
Input Format
The first line contains an integer T, the number of testcases.
T lines follow. Each line contains N, the initial number set in the counter.
Constraints
1 ≤ T ≤ 10
1 ≤ N ≤ 2^64 - 1
Output Format
For each test case, print the winner's name in a new line. So if Louise wins the game, print "Louise". Otherwise, print "Richard". (Quotes are for clarity)
Sample Input
1
6
Sample Output
Richard
Explanation
As 6 is not a power of 2, Louise reduces the largest power of 2 less than 6 i.e., 4, and hence the counter reduces to 2.
As 2 is a power of 2, Richard reduces the counter by half of 2 i.e., 1. Hence the counter reduces to 1.
As we reach the terminating condition with N == 1, Richard wins the game.
When n is greater than 2^63, your gpt function will eventually have i as 2^63 and then multiply 2^63 by 2, giving an overflow and a value of 0. This will then end up with an infinite loop, multiplying 0 by 2 each time.
Try this bit-twiddling hack, which is probably slightly faster:
unsigned long largest_power_of_two_not_greater_than(unsigned long x) {
for (unsigned long y; (y = x & (x - 1)); x = y) {}
return x;
}
x&(x-1) is x without its least significant one-bit. So y will be zero (terminating the loop) exactly when x has been reduced to a power of two, which will be the largest power of two not greater than the original x. The loop is executed once for every 1-bit in x, which is on average half as many iterations as your approach. Also, this one has not issues with overflow. (It does return 0 if the original x was 0. That may or may not be what you want.)
Note the if the original x was a power of two, that value is simply returned immediately. So the function doubles as a test whether x is a power of two (or 0).
While that is fun and all, in real-life code you'd probably be better off finding your compiler's equivalent to this gcc built-in (unless your compiler is gcc, in which case here it is):
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in X, starting at the most
significant bit position. If X is 0, the result is undefined.
(Also available as __builtin_clzl for unsigned long arguments and __builtin_clzll for unsigned long long.)

Categories