Python String Overflow?

Python String Overflow? - python

For some reason, the below results in an output of 0. I'm using a very large string (100,000 characters), and looking for a large integer, in the hundred billions, e.g 500,000,000,000. Is there something special I need to do? The goal is to find the number of sub-sequences of 1,2,3 in the first 100,000 digits of pi. I know the below is algorithmically correct. It's just not "code-correct."
pi100k = "3.14159[100,000 digits of pi]"
subSeqInit = 0
subSeqPair = 0
subSeqTotal = 0
for c in pi100k:
if c == 1:
subSeqInit = subSeqInit + 1
elif c == 2 and subSeqInit > 0:
subSeqPair = subSeqPair + 1
elif c == 3 and subSeqTotal > 0:
subSeqTotal = subSeqTotal + 1
print(subSeqTotal)

The simplest and fastest way is probably this:
subSeqTotal = pi100k.count("123")

pi100k = "3.14159[100,000 digits of pi]"
subSeqInit = 0
subSeqTotal = 0
for c in pi100k:
if c == '1':
subSeqInit = 1
elif c == '2' and subSeqInit == 1:
subSeqInit = 2
elif c == '3' and subSeqTotal == 2:
subSeqTotal = subSeqTotal + 1
subSeqInit = 0
print(subSeqTotal)
Python does not implicitly convert string characters to integers. Furthermore, your algorithm is not sound, what I have above will work better.
EDIT:
You could make this much shorter by using the regular expression module
import re
subSeqTotal = len(re.findall('123',pi100k))\
EDIT 2: As MRAB pointed out the best thing to use is pi100k.count('123')

It appears none of these solutions are correct. I don't think they search for the sub-sequence correctly.
I solved it recursively in C, with this algorithm:
/* global cstring for our pi data */
const char *g_numbers = 3.14........[100,000 digits];
/* global to hold our large total : some compilers don't support uint64_t */
uint64_t g_total = 0;
/* recursively compute the subsequnces of 1-2-3 */
void count_sequences(const char c, unsigned int index)
{
while (index < 100000){
switch (g_numbers[index]){
case '1': if (c == '1') count_sequences('2', index+1); break;
case '2': if (c == '2') count_sequences('3', index+1); break;
case '3':
if (c == '3'){
g_total++;
count_sequences('3', index+1);
return;
}
default: break;
}
index++;
}
}
Sorry I can't hand out the Python solution-- but I hope this helps. It shouldn't be too hard to re-work. I tried the given answers in Python and they didn't seem to work.

Related

C function to Python (different results)

I am trying to port this snippet of code to python from C. The outputs are different even though it's the same code.
This is the C version of the code which works:
int main(void)
{
uint8_t pac[] = {0x033,0x55,0x22,0x65,0x76};
uint8_t len = 5;
uint8_t chan = 0x64;
btLeWhiten(pac, len, chan);
for(int i = 0;i<=len;i++)
{
printf("Whiten %02d \r\n",pac[i]);
}
while(1)
{
}
return 0;
}
void btLeWhiten(uint8_t* data, uint8_t len, uint8_t whitenCoeff)
{
uint8_t m;
while(len--){
for(m = 1; m; m <<= 1){
if(whitenCoeff & 0x80){
whitenCoeff ^= 0x11;
(*data) ^= m;
}
whitenCoeff <<= 1;
}
data++;
}
}
What I currently have in Python is:
def whiten(data, len, whitenCoeff):
idx = len
while(idx > 0):
m = 0x01
for i in range(0,8):
if(whitenCoeff & 0x80):
whitenCoeff ^= 0x11
data[len - idx -1 ] ^= m
whitenCoeff <<= 1
m <<= 0x01
idx = idx - 1
pac = [0x33,0x55,0x22,0x65,0x76]
len = 5
chan = 0x64
def main():
whiten(pac,5,chan)
print pac
if __name__=="__main__":
main()
The problem i see is that whitenCoeff always remain 8 bits in the C snippet but it gets larger than 8 bits in Python on each loop pass.

You've got a few more problems.
whitenCoeff <<= 1; is outside of the if block in your C code, but it's inside of the if block in your Python code.
data[len - idx -1 ] ^= m wasn't translated correctly, it works backwards from the C code.
This code produces the same output as your C code:
def whiten(data, whitenCoeff):
for index in range(len(data)):
for i in range(8):
if (whitenCoeff & 0x80):
whitenCoeff ^= 0x11
data[index] ^= (1 << i)
whitenCoeff = (whitenCoeff << 1) & 0xff
return data
if __name__=="__main__":
print whiten([0x33,0x55,0x22,0x65,0x76], 0x64)

In C you are writing data from 0 to len-1 but in Python you are writing data from -1 to len-2. Remove the -1 from this line:
data[len - idx -1 ] ^= m
like this
data[len - idx] ^= m
you also need to put this line outside the if:
whitenCoeff <<= 1

whitenCoeff <<= 1 in C becomes 0 after a while because it is a 8-bit data.
In python, there's no such limit, so you have to write:
whitenCoeff = (whitenCoeff<<1) & 0xFF
to mask higher bits out.
(don't forget to check vz0 remark on array boundary)
plus there was an indentation issue.
rewritten code which gives same result:
def whiten(data, whitenCoeff):
idx = len(data)
while(idx > 0):
m = 0x01
for i in range(0,8):
if(whitenCoeff & 0x80):
whitenCoeff ^= 0x11
data[-idx] ^= m
whitenCoeff = (whitenCoeff<<1) & 0xFF
m <<= 0x01
idx = idx - 1
pac = [0x33,0x55,0x22,0x65,0x76]
chan = 0x64
def main():
whiten(pac,chan)
print(pac)
if __name__=="__main__":
main()
Slightly off-topic: Note that the C version already has problems:
for(int i = 0;i<=len;i++)
should be
for(int i = 0;i<len;i++)

I solved it by anding the python code with 0xFF. That keeps the variable from increasing beyond 8 bits.

Your code in C does not appear to work as intended since it displays one more value than is available in pac. Correcting for this should cause 5 values to be displayed instead of 6 values. To copy the logic from C over to Python, the following was written in an attempt to duplicate the results:
#! /usr/bin/env python3
def main():
pac = bytearray(b'\x33\x55\x22\x65\x76')
chan = 0x64
bt_le_whiten(pac, chan)
print('\n'.join(map('Whiten {:02}'.format, pac)))
def bt_le_whiten(data, whiten_coeff):
for offset in range(len(data)):
m = 1
while m & 0xFF:
if whiten_coeff & 0x80:
whiten_coeff ^= 0x11
data[offset] ^= m
whiten_coeff <<= 1
whiten_coeff &= 0xFF
m <<= 1
if __name__ == '__main__':
main()
To simulate 8-bit unsigned integers, the snippet & 0xFF is used in several places to truncate numbers to the proper size. The bytearray data type is used to store pac since that appears to be the most appropriate storage method in this case. The code still needs documentation to properly understand it.

Primitive Calculator - Dynamic Approach

I'm having some trouble getting the correct solution for the following problem:
Your goal is given a positive integer n, find the minimum number of
operations needed to obtain the number n starting from the number 1.
More specifically the test case I have in the comments below.
# Failed case #3/16: (Wrong answer)
# got: 15 expected: 14
# Input:
# 96234
#
# Your output:
# 15
# 1 2 4 5 10 11 22 66 198 594 1782 5346 16038 16039 32078 96234
# Correct output:
# 14
# 1 3 9 10 11 22 66 198 594 1782 5346 16038 16039 32078 96234
# (Time used: 0.10/5.50, memory used: 8601600/134217728.)
def optimal_sequence(n):
sequence = []
while n >= 1:
sequence.append(n)
if n % 3 == 0:
n = n // 3
optimal_sequence(n)
elif n % 2 == 0:
n = n // 2
optimal_sequence(n)
else:
n = n - 1
optimal_sequence(n)
return reversed(sequence)
input = sys.stdin.read()
n = int(input)
sequence = list(optimal_sequence(n))
print(len(sequence) - 1)
for x in sequence:
print(x, end=' ')
It looks like I should be outputting 9 where I'm outputting 4 & 5 but I'm not sure why this isn't the case. What's the best way to troubleshoot this problem?

You are doing a greedy approach.
When n == 10, you check and see if it's divisible by 2 assuming that's the best step, which is wrong in this case.
What you need to do is proper dynamic programming. v[x] will hold the minimum number of steps to get to result x.
def solve(n):
v = [0]*(n+1) # so that v[n] is there
v[1] = 1 # length of the sequence to 1 is 1
for i in range(1,n):
if not v[i]: continue
if v[i+1] == 0 or v[i+1] > v[i] + 1: v[i+1] = v[i] + 1
# Similar for i*2 and i*3
solution = []
while n > 1:
solution.append(n)
if v[n-1] == v[n] - 1: n = n-1
if n%2 == 0 and v[n//2] == v[n] -1: n = n//2
# Likewise for n//3
solution.append(1)
return reverse(solution)

One more solution:
private static List<Integer> optimal_sequence(int n) {
List<Integer> sequence = new ArrayList<>();
int[] arr = new int[n + 1];
for (int i = 1; i < arr.length; i++) {
arr[i] = arr[i - 1] + 1;
if (i % 2 == 0) arr[i] = Math.min(1 + arr[i / 2], arr[i]);
if (i % 3 == 0) arr[i] = Math.min(1 + arr[i / 3], arr[i]);
}
for (int i = n; i > 1; ) {
sequence.add(i);
if (arr[i - 1] == arr[i] - 1)
i = i - 1;
else if (i % 2 == 0 && (arr[i / 2] == arr[i] - 1))
i = i / 2;
else if (i % 3 == 0 && (arr[i / 3] == arr[i] - 1))
i = i / 3;
}
sequence.add(1);
Collections.reverse(sequence);
return sequence;
}

List<Integer> sequence = new ArrayList<Integer>();
while (n>0) {
sequence.add(n);
if (n % 3 == 0&&n % 2 == 0)
n=n/3;
else if(n%3==0)
n=n/3;
else if (n % 2 == 0&& n!=10)
n=n/2;
else
n=n-1;
}
Collections.reverse(sequence);
return sequence;

Here's my Dynamic programming (bottom-up & memoized)solution to the problem:
public class PrimitiveCalculator {
1. public int minOperations(int n){
2. int[] M = new int[n+1];
3. M[1] = 0; M[2] = 1; M[3] = 1;
4. for(int i = 4; i <= n; i++){
5. M[i] = M[i-1] + 1;
6. M[i] = Math.min(M[i], (i %3 == 0 ? M[i/3] + 1 : (i%3 == 1 ? M[(i-1)/3] + 2 : M[(i-2)/3] + 3)));
7. M[i] = Math.min(M[i], i%2 == 0 ? M[i/2] + 1: M[(i-1)/2] + 2);
8. }
9. return M[n];
10. }
public static void main(String[] args) {
System.out.println(new PrimitiveCalculator().minOperations(96234));
}
}
Before going ahead with the explanation of the algorithm I would like to add a quick disclaimer:
A DP solution is not reached at first attempt unless you have good
experience solving lot of DP problems.
Approach to solving through DP
If you are not comfortable with DP problems then the best approach to solve the problem would be following:
Try to get a working brute-force recursive solution.
Once we have a recursive solution, we can look for ways to reduce the recursive step by adding memoization, where in we try remember the solution to the subproblems of smaller size already solved in a recursive step - This is generally a top-down solution.
After memoization, we try to flip the solution around and solve it bottom up (my Java solution above is a bottom-up one)
Once you have done above 3 steps, you have reached a DP solution.
Now coming to the explanation of the solution above:
Given a number 'n' and given only 3 operations {+1, x2, x3}, the minimum number of operations needed to reach to 'n' from 1 is given by recursive formula:
min_operations_to_reach(n) = Math.min(min_operations_to_reach(n-1), min_operations_to_reach(n/2), min_operations_to_reach(n/3))
If we flip up the memoization process and begin with number 1 itself then the above code starts to make better sense.
Starting of with trivial cases of 1, 2, 3
min_operations_to_reach(1) = 0 because we dont need to do any operation.
min_operations_to_reach(2) = 1 because we can either do (1 +1) or (1 x2), in either case number of operations is 1.
Similarly, min_operations_to_reach(3) = 1 because we can multiply 1 by 3 which is one operation.
Now taking any number x > 3, the min_operations_to_reach(x) is the minimum of following 3:
min_operations_to_reach(x-1) + 1 because whatever is the minimum operations to reach (x-1) we can add 1 to it to get the operation count to make it number x.
Or, if we consider making number x from 1 using multiplication by 3 then we have to consider following 3 options:
If x is divisible by 3 then min_operations_to_reach(x/3) + 1,
if x is not divisible by 3 then x%3 can be 1, in which case its min_operations_to_reach((x-1)/3) + 2, +2 because one operation is needed to multiply by 3 and another operation is needed to add 1 to make the number 'x'
Similarly if x%3 == 2, then the value will be min_operations_to_reach((x-2)/3) + 3. +3 because 1 operation to multiply by 3 and then add two 1s subsequently to make the number x.
Or, if we consider making number x from 1 using multiplication by 2 then we have to consider following 2 options:
if x is divisible by 2 then its min_operations_to_reach(x/2) + 1
if x%2 == 1 then its min_operations_to_reach((x-1)/2) + 2.
Taking the minimum of above 3 we can get the minimum number of operations to reach x. Thats what is done in code above in lines 5, 6 and 7.

def DPoptimal_sequence(n,operations):
MinNumOperations=[0]
l_no=[]
l_no2=[]
for i in range(1,n+1):
MinNumOperations.append(None)
for operation in operations:
if operation==1:
NumOperations=MinNumOperations[i-1]+1
if operation==2 and i%2==0:
NumOperations=MinNumOperations[int(i/2)]+1
if operation==3 and i%3==0:
NumOperations=MinNumOperations[int(i/3)]+1
if MinNumOperations[i]==None:
MinNumOperations[i]=NumOperations
elif NumOperations<MinNumOperations[i]:
MinNumOperations[i]=NumOperations
if MinNumOperations[i] == MinNumOperations[i-1]+1:
l_no2.append((i,i-1))
elif MinNumOperations[i] == MinNumOperations[int(i/2)]+1 and i%2 == 0:
l_no2.append((i,int(i/2)))
elif MinNumOperations[i] == MinNumOperations[int(i/3)]+1 and i%3 == 0:
l_no2.append((i,int(i/3)))
l_no.append((i,MinNumOperations[i]-1))
#print(l_no)
#print(l_no2)
x=MinNumOperations[n]-1
#print('x',x)
l_no3=[n]
while n>1:
a,b = l_no2[n-1]
#print(a,b)
if b == a-1:
n = n-1
#print('1111111111111')
#print('n',n)
l_no3.append(n)
elif b == int(a/2) and a%2==0:
n = int(n/2)
#print('22222222222222222')
#print('n',n)
l_no3.append(n)
elif b == int(a/3) and a%3==0:
n = int(n/3)
#print('333333333333333')
#print('n',n)
l_no3.append(n)
#print(l_no3)
return x,l_no3

def optimal_sequence(n):
hop_count = [0] * (n + 1)
hop_count[1] = 1
for i in range(2, n + 1):
indices = [i - 1]
if i % 2 == 0:
indices.append(i // 2)
if i % 3 == 0:
indices.append(i // 3)
min_hops = min([hop_count[x] for x in indices])
hop_count[i] = min_hops + 1
ptr = n
optimal_seq = [ptr]
while ptr != 1:
candidates = [ptr - 1]
if ptr % 2 == 0:
candidates.append(ptr // 2)
if ptr % 3 == 0:
candidates.append(ptr // 3)
ptr = min(
[(c, hop_count[c]) for c in candidates],
key=lambda x: x[1]
)[0]
optimal_seq.append(ptr)
return reversed(optimal_seq)

private int count(int n, Map<Integer, Integer> lookup) {
if(lookup.containsKey(n)) {
return lookup.get(n);
}
if(n==1) {
return 0;
} else {
int result;
if(n%2==0 && n%3==0) {
result =1+
//Math.min(count(n-1, lookup),
Math.min(count(n/2, lookup),
count(n/3, lookup));
} else if(n%2==0) {
result = 1+ Math.min(count(n-1, lookup),
count(n/2, lookup));
} else if(n%3==0) {
result = 1+ Math.min(count(n-1, lookup), count(n/3, lookup));
} else {
result = 1+ count(n-1, lookup);
}
//System.out.println(result);
lookup.put(n, result);
return result;
}
}

Why does this algorithm work so much faster in python than in C++?

I was reading "Algorithms in C++" by Robert Sedgewick and I was given this exercise: rewrite this weigted quick-union with path compression by halving algorithm in another programming language.
The algorithm is used to check if two objects are connected, for example for entry like 1 - 2, 2 - 3 and 1 - 3 first two entries create new connections whereas in the third entry 1 and 3 are already connected because 3 can be reached from 1: 1 - 2 - 3, so the third entry would not require creating a new connection.
Sorry if the algorithm description is not understandable, english is not my mother's tongue.
So here is the algorithm itself:
#include <iostream>
#include <ctime>
using namespace std;
static const int N {100000};
int main()
{
srand(time(NULL));
int i;
int j;
int id[N];
int sz[N]; // Stores tree sizes
int Ncount{}; // Counts the numbeer of new connections
int Mcount{}; // Counts the number of all attempted connections
for (i = 0; i < N; i++)
{
id[i] = i;
sz[i] = 1;
}
while (Ncount < N - 1)
{
i = rand() % N;
j = rand() % N;
for (; i != id[i]; i = id[i])
id[i] = id[id[i]];
for (; j != id[j]; j = id[j])
id[j] = id[id[j]];
Mcount++;
if (i == j) // Checks if i and j are connected
continue;
if (sz[i] < sz[j]) // Smaller tree will be
// connected to a bigger one
{
id[i] = j;
sz[j] += sz[i];
}
else
{
id[j] = i;
sz[i] += sz[j];
}
Ncount++;
}
cout << "Mcount: " << Mcount << endl;
cout << "Ncount: " << Ncount << endl;
return 0;
}
I know a tiny bit of python so I chose it for this exercise. This is what got:
import random
N = 100000
idList = list(range(0, N))
sz = [1] * N
Ncount = 0
Mcount = 0
while Ncount < N - 1:
i = random.randrange(0, N)
j = random.randrange(0, N)
while i is not idList[i]:
idList[i] = idList[idList[i]]
i = idList[i]
while j is not idList[j]:
idList[j] = idList[idList[j]]
j = idList[j]
Mcount += 1
if i is j:
continue
if sz[i] < sz[j]:
idList[i] = j
sz[j] += sz[i]
else:
idList[j] = i
sz[i] += sz[j]
Ncount += 1
print("Mcount: ", Mcount)
print("Ncount: ", Ncount)
But I stumbled upon this interesting nuance: when I set N to 100000 or more C++ version version appears to be a lot slower than the python one - it took about 10 seconds to complete the task for the algorithm in python whereas C++ version was doing it so slow I just had to shut it down.
So my question is: what is the cause of that? Does this happen because of the difference in rand() % N and random.randrange(0, N)? Or have I just done something wrong?
I'd be very grateful if someone could explain this to me, thanks in advance!

Those codes do different things.
You have to compare numbers in python with ==.
>>> x=100000
>>> y=100000
>>> x is y
False
There might be other problems, haven't checked. Have you compared the results of the apps?

As pointed out above the codes are not equivalent and especially when it comes to the use of is vs ==.
Look at the following Pyhton code:
while i is not idList[i]:
idList[i] = idList[idList[i]]
i = idList[i]
This is evaluated 0 or 1 times. Why?. Because if the while evaluates to True the 1st time, then i = idList[i] makes the condition True in the 2nd pass, because now i is for sure a number which is in idList
The equivalent c++
for (; i != id[i]; i = id[i])
id[i] = id[id[i]];
Here the code is checking against equality and not against presence and the number of times it runs it is not fixed to be 0 or 1
So yes ... using is vs == makes a huge difference because in Python you are testing instance equality and being contained in, rather than testing simple equality in the sense of equivalence.
The comparison of Python and C++ above is like comparing apples and pears.
Note: The short answer to the question would be: The Python version runs much faster because it is doing a lot less than the C++ version

python: read integer's from stream step by step

I can in Python:
n = int(input())
a = [int(x) for x in input().split()]
I can in c++:
int main()
{
int n, x;
cin >> n;
for (int i = 0; i < n; i++)
{
cin >> x;
somthing(x)
}
}
How to write it on Python (3.x)? Can I handle the numbers in the flow without saving all in list?
Input data (for example):
6
1 4 4 4 1 1
Can I use sys.stdin?
UPD:
Ok, I wrote this:
import sys
n = int(input())
i = 0
c = ""
s = ""
while i < n:
c = sys.stdin.read(1)
if c in [" ","\n"]:
x = int(s)
somthing(x)
s = ""
i += 1
else:
s += c
Is there a more elegant solution?

Python doesn't special-case such a specific form of input for you. By default, the input() function then reads a line from input (delimited by newline character) and converts it to a string.
You'll have to use the split() to separate the values.

Need help porting C function to Python

I'm trying to port a C function which calculates a GPS checksum over to Python. According to the receiving end I am sometimes miscalculating the checksum, so must still have a bug in there.
C code is
void ComputeAsciiChecksum(unsigned char *data, unsigned int len,
unsigned char *p1, unsigned char *p2)
{
unsigned char c,h,l;
assert(Stack_Low());
c = 0;
while (len--) {
c ^= *data++;
}
h = (c>>4);
l = c & 0xf;
h += '0';
if (h > '9') {
h += 'A'-'9'-1;
}
l += '0';
if (l > '9') {
l += 'A'-'9'-1;
}
*p1 = h;
*p2 = l;
}
My attempt at a Python function is
def calcChecksum(line):
c = 0
i = 0
while i < len(line):
c ^= ord(line[i]) % 256
i += 1
return '%02X' % c;

Here is how you can set up a testing environment to diagnose your problem.
Copy the above C function to a file, remove the assert() line, and compile it to a shared library with
gcc -shared -o checksum.so checksum.c
(If you are on Windows or whatever, do the equivalent of the above.)
Copy this code to a Python file:
import ctypes
import random
c = ctypes.CDLL("./checksum.so")
c.ComputeAsciiChecksum.rettype = None
c.ComputeAsciiChecksum.argtypes = [ctypes.c_char_p, ctypes.c_uint,
ctypes.c_char_p, ctypes.c_char_p]
def compute_ascii_checksum_c(line):
p1 = ctypes.create_string_buffer(1)
p2 = ctypes.create_string_buffer(1)
c.ComputeAsciiChecksum(line, len(line), p1, p2)
return p1.value + p2.value
def compute_ascii_checksum_py(line):
c = 0
i = 0
while i < len(line):
c ^= ord(line[i]) % 256
i += 1
return '%02X' % c;
Now you have access to both versions of the checksum function and can compare the results. I wasn't able to find any differences.
(BTW, how are you computing the length of the string in C? If you are using strlen(), this would stop at NUL bytes.)
As a side note, your Python version isn't really idiomatic Python. Here are two more idiomatic versions:
def compute_ascii_checksum_py(line):
checksum = 0
for c in line:
checksum ^= ord(c)
return "%02X" % checksum
or
def compute_ascii_checksum_py(line):
return "%02X" % reduce(operator.xor, map(ord, line))
Note that these implementations should do exactly the same as yours.

Have you checked out this cookbook recipe? It hints at what input you should include in "line", returns a asterisk in front of the checksum, and gives one (input, output) data pair that you can use as test data.
Are you sure that "the receiver" is working correctly? Is the problem due to upper vs lower case hex letters?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python String Overflow? - python

The simplest and fastest way is probably this: subSeqTotal = pi100k.count("123")

Related

C function to Python (different results)

Primitive Calculator - Dynamic Approach

Why does this algorithm work so much faster in python than in C++?

python: read integer's from stream step by step

Need help porting C function to Python

Categories

Resources