My goal is to translate the following written in C, into python, so I can evaluate the loop over a string of digits. Originally, the objective takes a range from 0 to 9, gathers input for each iteration while also multiplying, and sums the entire range. Last of all, determines the remainder after dividing by 11.
int checksum(char *str) {
int i, sum = 0;
for (i=0; i<9; i++) {
sum+= (str[i] - '0') * (i + 1);
}
return sum % 11;
}
A direct (but not very idiomatic) translation of the code would look as follows, notice that I renamed some of the variables to avoid clashes with Python's built-in functions:
def checksum(s):
c = 0
for i in range(9):
c += int(s[i]) * (i + 1)
return c % 11
Another more Pythonic option would be to use generator expressions instead of explicit loops:
def checksum(s):
return sum(int(e) * i for i, e in enumerate(s[:9], 1)) % 11
Related
Having an excellent solution to: Cartesian product in Gray code order with itertools?, is there a way to add something simple to this solution to also report the set (its index) that underwent the change in going from one element to the next of the Cartesian product in Gray code order? That is, a gray_code_product_with_change(['a','b','c'], [0,1], ['x','y']) which would produce something like:
(('a',0,'x'), -1)
(('a',0,'y'), 2)
(('a',1,'y'), 1)
(('a',1,'x'), 2)
(('b',1,'x'), 0)
(('b',1,'y'), 2)
(('b',0,'y'), 1)
(('b',0,'x'), 2)
(('c',0,'x'), 0)
(('c',0,'y'), 2)
(('c',1,'y'), 1)
(('c',1,'x'), 2)
I want to avoid taking the "difference" between consecutive tuples, but to have constant-time updates --- hence the Gray code order thing to begin with. One solution could be to write an index_changed iterator, i.e., index_changed(3,2,2) would return the sequence -1,2,1,2,0,2,1,2,0,2,1,2 that I want, but can something even simpler be added to the solution above to achieve the same result?
There are several things wrong with this question, but I'll keep it like this, rather than only making it worse by turning it into a "chameleon question"
Indeed, why even ask for the elements of the Cartesian product in Gray code order, when you have this "index changed" sequence? So I suppose what I was really looking for was efficient computation of this sequence. So I ended up implementing the above-mentioned gray_code_product_with_change, which takes a base set of sets, e.g., ['a','b','c'], [0,1], ['x','y'], computing this "index changed" sequence, and updating this base set of sets as it moves through the sequence. Since the implementation ended up being more interesting than I thought, I figured I would share, should someone find it useful:
(Disclaimer: probably not the most pythonic code, rather almost C-like)
def gray_code_product_with_change(*args, repeat=1) :
sets = args * repeat
s = [len(x) - 1 for x in sets]
n = len(s)
# setup parity array and first combination
p = n * [True] # True: move foward (False: move backward)
c = n * [0] # inital combo: all 0's (first element of each set)
# emit the first combination
yield tuple(sets[i][x] for i, x in enumerate(c))
# incrementally update combination in Gray code order
has_next = True
while has_next :
# look for the smallest index to increment/decrement
has_next = False
for j in range(n-1,-1,-1) :
if p[j] : # currently moving forward..
if c[j] < s[j] :
c[j] += 1
has_next = True
# emit affected set (forward direction)
yield j
else : # ..moving backward
if c[j] > 0 :
c[j] -= 1
has_next = True
# emit affected set (reverse direction)
yield -j
# we did manage to increment/decrement at position j..
if has_next :
# emit the combination
yield tuple(sets[i][x] for i, x in enumerate(c))
for q in range(n-1,j,-1) : # cascade
p[q] = not p[q]
break
Trying to tease out as much performance as I could in just computing this sequence --- since the number of elements in the Cartesian product of a set of sets grows exponentially with the number of sets (of size 2 or more) --- I implemented this in C. What it essentially does, is implement the above-mentioned index_changed (using a slightly different notation):
(Disclaimer: there is much room for optimization here)
void gray_code_sequence(int s[], int n) {
// set up parity array
int p[n];
for(int i = 0; i < n; ++i) {
p[i] = 1; // 1: move forward, (1: move backward)
}
// initialize / emit first combination
int c[n];
printf("(");
for(int i = 0; i < n-1; ++i) {
c[i] = 0; // initial combo: all 0s (first element of each set)
printf("%d, ", c[i]); // emit the first combination
}
c[n-1] = 0;
printf("%d)\n", c[n-1]);
int has_next = 1;
while(has_next) {
// look for the smallest index to increment/decrement
has_next = 0;
for(int j = n-1; j >= 0; --j) {
if(p[j] > 0) { // currently moving forward..
if(c[j] < s[j]) {
c[j] += 1;
has_next = 1;
printf("%d\n", j);
}
}
else { // ..moving backward
if(c[j] > 0) {
c[j] -= 1;
has_next = 1;
printf("%d\n", -j);
}
}
if(has_next) {
for(int q = n-1; q > j; --q) {
p[q] = -1 * p[q]; // cascade
}
break;
}
}
}
}
When compared to the above python (where the yielding of the elements of the Cartesian product is suppressed, and only the elements of the sequence are yielded, so that the output is essentially the same, for a fair comparison), this C implementation seems to be about 15 times as fast, asymptotically.
Again this C code could be highly optimized (the irony that python code is so C-like being well-noted), for example, this parity array could stored in a single int type, performing bit shift >> operations, etc., so I bet that even a 30 or 40x speedup could be achieved.
For example, if L = [1,4,2,6,4,3,2,6,3], then we want 1 as the unique element. Here's pseudocode of what I had in mind:
initialize a dictionary to store number of occurrences of each element: ~O(n),
look through the dictionary to find the element whose value is 1: ~O(n)
This ensures that the total time complexity then stay to be O(n). Does this seem like the right idea?
Also, if the array was sorted, say for example, how would the time complexity change? I'm thinking it would be some variation of binary search which would reduce it to O(log n).
You can use collections.Counter
from collections import Counter
uniques = [k for k, cnt in Counter(L).items() if cnt == 1]
Complexity will always be O(n). You only ever need to traverse the list once (which is what Counter is doing). Sorting doesn't matter, since dictionary assignment is always O(1).
There is a very simple-looking solution that is O(n): XOR elements of your sequence together using the ^ operator. The end value of the variable will be the value of the unique number.
The proof is simple: XOR-ing a number with itself yields zero, so since each number except one contains its own duplicate, the net result of XOR-ing them all would be zero. XOR-ing the unique number with zero yields the number itself.
Your outlined algorithm is basically correct, and it's what the Counter-based solution by #BrendanAbel does. I encourage you to implement the algorithm yourself without Counter as a good exercise.
You can't beat O(n) even if the array is sorted (unless the array is sorted by the number of occurrences!). The unique element could be anywhere in the array, and until you find it, you can't narrow down the search space (unlike binary search, where you can eliminate half of the remaining possibilities with each test).
In the general case, where duplicates can be present any number of times, I don't think you can reduce the complexity below O(N), but for the special case outlined in dasblinkenlight's answer, one can do better.
If the array is already sorted and if duplicates are present an even number of times as is the case in the simple example shown, you can find the unique element in O(log N) time with a binary search. You will search for the position where a[2*n] != a[2*n+1]:
size_t find_unique_index(type *array, size_t size) {
size_t a = 0, b = size / 2;
while (a < b) {
size_t m = (a + b) / 2;
if (array[2 * m] == array[2 * m + 1]) {
/* the unique element is the the right half */
a = m + 1;
} else {
b = m;
}
}
return array[2 * m];
}
You can use variation of binary search if you have array is already sorted. It will reduce your cost to O(lg N). You just have to search left and right appropriate position. Here is the C/C++ implementation of your problem.(I am assuming array is already sorted)
#include<stdio.h>
#include<stdlib.h>
// Input: Indices Range [l ... r)
// Invariant: A[l] <= key and A[r] > key
int GetRightPosition(int A[], int l, int r, int key)
{
int m;
while( r - l > 1 )
{
m = l + (r - l)/2;
if( A[m] <= key )
l = m;
else
r = m;
}
return l;
}
// Input: Indices Range (l ... r]
// Invariant: A[r] >= key and A[l] > key
int GetLeftPosition(int A[], int l, int r, int key)
{
int m;
while( r - l > 1 )
{
m = l + (r - l)/2;
if( A[m] >= key )
r = m;
else
l = m;
}
return r;
}
int CountOccurances(int A[], int size, int key)
{
// Observe boundary conditions
int left = GetLeftPosition(A, 0, size, key);
int right = GetRightPosition(A, 0, size, key);
return (A[left] == key && key == A[right])?
(right - left + 1) : 0;
}
int main() {
int arr[] = {1,1,1,2,2,2,3};
printf("%d",CountOccurances(arr,7,2));
return 0;
}
I was reading "Algorithms in C++" by Robert Sedgewick and I was given this exercise: rewrite this weigted quick-union with path compression by halving algorithm in another programming language.
The algorithm is used to check if two objects are connected, for example for entry like 1 - 2, 2 - 3 and 1 - 3 first two entries create new connections whereas in the third entry 1 and 3 are already connected because 3 can be reached from 1: 1 - 2 - 3, so the third entry would not require creating a new connection.
Sorry if the algorithm description is not understandable, english is not my mother's tongue.
So here is the algorithm itself:
#include <iostream>
#include <ctime>
using namespace std;
static const int N {100000};
int main()
{
srand(time(NULL));
int i;
int j;
int id[N];
int sz[N]; // Stores tree sizes
int Ncount{}; // Counts the numbeer of new connections
int Mcount{}; // Counts the number of all attempted connections
for (i = 0; i < N; i++)
{
id[i] = i;
sz[i] = 1;
}
while (Ncount < N - 1)
{
i = rand() % N;
j = rand() % N;
for (; i != id[i]; i = id[i])
id[i] = id[id[i]];
for (; j != id[j]; j = id[j])
id[j] = id[id[j]];
Mcount++;
if (i == j) // Checks if i and j are connected
continue;
if (sz[i] < sz[j]) // Smaller tree will be
// connected to a bigger one
{
id[i] = j;
sz[j] += sz[i];
}
else
{
id[j] = i;
sz[i] += sz[j];
}
Ncount++;
}
cout << "Mcount: " << Mcount << endl;
cout << "Ncount: " << Ncount << endl;
return 0;
}
I know a tiny bit of python so I chose it for this exercise. This is what got:
import random
N = 100000
idList = list(range(0, N))
sz = [1] * N
Ncount = 0
Mcount = 0
while Ncount < N - 1:
i = random.randrange(0, N)
j = random.randrange(0, N)
while i is not idList[i]:
idList[i] = idList[idList[i]]
i = idList[i]
while j is not idList[j]:
idList[j] = idList[idList[j]]
j = idList[j]
Mcount += 1
if i is j:
continue
if sz[i] < sz[j]:
idList[i] = j
sz[j] += sz[i]
else:
idList[j] = i
sz[i] += sz[j]
Ncount += 1
print("Mcount: ", Mcount)
print("Ncount: ", Ncount)
But I stumbled upon this interesting nuance: when I set N to 100000 or more C++ version version appears to be a lot slower than the python one - it took about 10 seconds to complete the task for the algorithm in python whereas C++ version was doing it so slow I just had to shut it down.
So my question is: what is the cause of that? Does this happen because of the difference in rand() % N and random.randrange(0, N)? Or have I just done something wrong?
I'd be very grateful if someone could explain this to me, thanks in advance!
Those codes do different things.
You have to compare numbers in python with ==.
>>> x=100000
>>> y=100000
>>> x is y
False
There might be other problems, haven't checked. Have you compared the results of the apps?
As pointed out above the codes are not equivalent and especially when it comes to the use of is vs ==.
Look at the following Pyhton code:
while i is not idList[i]:
idList[i] = idList[idList[i]]
i = idList[i]
This is evaluated 0 or 1 times. Why?. Because if the while evaluates to True the 1st time, then i = idList[i] makes the condition True in the 2nd pass, because now i is for sure a number which is in idList
The equivalent c++
for (; i != id[i]; i = id[i])
id[i] = id[id[i]];
Here the code is checking against equality and not against presence and the number of times it runs it is not fixed to be 0 or 1
So yes ... using is vs == makes a huge difference because in Python you are testing instance equality and being contained in, rather than testing simple equality in the sense of equivalence.
The comparison of Python and C++ above is like comparing apples and pears.
Note: The short answer to the question would be: The Python version runs much faster because it is doing a lot less than the C++ version
I have a working algorithm in Python which I want to convert to C++:
def gcd(a, b):
if (a % b == 0):
return b
else:
return gcd(b, a % b)
def solution(N, M):
lcm = N * M / gcd(N, M)
return lcm / M
I'm having a problem with large input values as the multiple of N and M causes integer overflow and using long to store its value doesn't seem to help, unless I'm doing something wrong.
Here's my current code:
int gcd(int a, int b)
{
if (a % b == 0)
return b;
else
return gcd(b, a % b);
}
int solution(int N, int M) {
// Calculate greatest common divisor
int g = gcd(N, M);
// Calculate the least common multiple
long m = N * M;
int lcm = m / g;
return lcm / M;
}
You are computing g=gcd(N,M), then m=N*M, then lcm=m/g, and finally returning lcm/M. That's the same as returning N/gcd(N,M). You don't need those intermediate calculations. Get rid of them. Now there's no problem with overflow (unless M=0, that is, which you aren't protecting against).
int solution(int N, int M) {
if (M == 0) {
handle_error();
}
else {
return N / gcd(N,M);
}
}
To begin with, change:
long m = N * M;
int lcm = m / g;
To:
long long m = (long long)N * M;
int lcm = (int)(m / g);
In general, you might as well change every int in your code to unsigned long long...
But if you have some BigInt class at hand, then you might want to use it instead.
Here is one for free: http://planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=9735&lngWId=3
It stores a natural number of any conceivable size, and supports all arithmetic operators provided in C++.
The problem is in long m = N*M.
The multiplication takes place as 32 bit integers only. Since both are of int type, overflow occurs.
Correction is long long m = (long long)N*M
I'm trying to port a C function which calculates a GPS checksum over to Python. According to the receiving end I am sometimes miscalculating the checksum, so must still have a bug in there.
C code is
void ComputeAsciiChecksum(unsigned char *data, unsigned int len,
unsigned char *p1, unsigned char *p2)
{
unsigned char c,h,l;
assert(Stack_Low());
c = 0;
while (len--) {
c ^= *data++;
}
h = (c>>4);
l = c & 0xf;
h += '0';
if (h > '9') {
h += 'A'-'9'-1;
}
l += '0';
if (l > '9') {
l += 'A'-'9'-1;
}
*p1 = h;
*p2 = l;
}
My attempt at a Python function is
def calcChecksum(line):
c = 0
i = 0
while i < len(line):
c ^= ord(line[i]) % 256
i += 1
return '%02X' % c;
Here is how you can set up a testing environment to diagnose your problem.
Copy the above C function to a file, remove the assert() line, and compile it to a shared library with
gcc -shared -o checksum.so checksum.c
(If you are on Windows or whatever, do the equivalent of the above.)
Copy this code to a Python file:
import ctypes
import random
c = ctypes.CDLL("./checksum.so")
c.ComputeAsciiChecksum.rettype = None
c.ComputeAsciiChecksum.argtypes = [ctypes.c_char_p, ctypes.c_uint,
ctypes.c_char_p, ctypes.c_char_p]
def compute_ascii_checksum_c(line):
p1 = ctypes.create_string_buffer(1)
p2 = ctypes.create_string_buffer(1)
c.ComputeAsciiChecksum(line, len(line), p1, p2)
return p1.value + p2.value
def compute_ascii_checksum_py(line):
c = 0
i = 0
while i < len(line):
c ^= ord(line[i]) % 256
i += 1
return '%02X' % c;
Now you have access to both versions of the checksum function and can compare the results. I wasn't able to find any differences.
(BTW, how are you computing the length of the string in C? If you are using strlen(), this would stop at NUL bytes.)
As a side note, your Python version isn't really idiomatic Python. Here are two more idiomatic versions:
def compute_ascii_checksum_py(line):
checksum = 0
for c in line:
checksum ^= ord(c)
return "%02X" % checksum
or
def compute_ascii_checksum_py(line):
return "%02X" % reduce(operator.xor, map(ord, line))
Note that these implementations should do exactly the same as yours.
Have you checked out this cookbook recipe? It hints at what input you should include in "line", returns a asterisk in front of the checksum, and gives one (input, output) data pair that you can use as test data.
Are you sure that "the receiver" is working correctly? Is the problem due to upper vs lower case hex letters?