translating C++ for loop to python - python

I'm having a little trouble translating C++ to python. The trouble I have is with the boolean statements in the for loop for (Nbound = 1; Nbound < (Nobs + 1) && B < Beta; Nbound++) and for (Ndm = 0;(Ndm < (i + 1) && P3 > (0)) || PP == 0; Ndm++). I'm unsure how this would work in python, I don't think python allows boolean statements in a for loop, so I think I would have to call it inside with an IF statement, but I'm not entirely sure. Thanks for your help!
also, I've noticed a lot of empty variables in this code for example, float PP is there a way of doing this in python or would I just assign it a value of 0 then change it later?
float Pf = 0; //The complement of Beta
float B = 0; //Beta
float P3;
float PP;
float Nbound = 1;
for (Nbound = 1; Nbound < (Nobs + 1) && B < Beta; Nbound++) {
int Ndm = 0;
int Nbgd = Nobs; //Setting Ndm=Nobs
Pf = 0; //Zeroing the placeholder for the sum
float exp; //A variable to store the exponential
for (int i = 0; i < (Nobs + 1); i++) //Summing over Nbgd+Ndm<NObs
{
P3 = 1;
PP = 0;
if (P1[Nbgd] > 0) {
for (Ndm = 0;(Ndm < (i + 1) && P3 > (0)) || PP == 0; Ndm++) {
//P3 = dist(Ndm, Nbound);
Pf = Pf + (P1[Nbgd] * P3); //Summing over the probability
PP = PP + P3;
}
}
}
}
}

For loops in Python are meant for iteration over objects. If you want a loop with specific exit condition then you should use the while loop.
for loops in C can be described as :
for {initialization_statement; condition_expression; update_statement)
{
body_statement_1;
body_statement_2;
...
body_statement_n;
}
The corresponding loop in Python is :
initialization_statement
while condition_expression:
body_statement_1
body_statement_2
...
body_statement_n
update_statement

Related

Is there an efficient algorithm to compute the Jacobsthal matrix or quadratic character in GF(q)?

Is there an efficient algorithm to compute the Jacobsthal matrix [WP] or equivalently the quadratic character χ in GF(q),
J [ i, j ] = χ ( i - j ) = 0 if i = j else 1 if i - j is a square in GF(q) else -1,
where i, j run over the elements of GF(q)?
The order of the elements <=> rows/columns does not really matter, so it's mainly to know whether an element of GF(q) is a square.
Unfortunately, when q = p n with n > 1, one cannot just take i, j ∈ Z/qZ (which works well iff q is a prime <=> n = 1).
On the other hand, implementing arithmetics in GF(q) appears a nontrivial task to me, at least the naive way (constructing an irreducible polynomial P of degree n over Z/pZ and implementing multiplication through multiplication of polynomials modulo P...).
The problem is easily solved in Python using the galois package (see here), but this is quite heavy artillery which I'd like to avoid to deploy.
Of course dedicated number theory software may also have GF arithmetics implemented. But I needed this just to produce Hadamard matrices through the Paley construction [WP], so I'd like to be able to compute this without using sophisticated software (and anyway I think it would be interesting to know whether there's a simple algorithm to do this).
Since we only need to know which elements are squares, I hoped there might be an efficient way to determine that.
EDIT: Let me clarify again that the question is whether there exists an efficient way of implementing this function (for arbitrary q = p k) without implementing general arithmetic in GF(q). It's not difficult to solve the problem using dedicated software: For example, Python's galois package provides the is_quadratic_residue() function which immediately gives the matrix elements - in spite of its name, since quadratic residues (mod p^k) aren't the same as squares in GF(p^k): Indeed, default modular arithmetic, i.e., issquare(Mod(i-j, p^k)), will usually yield incorrect results for when k > 1. For example, in G(2^k) every element is a square, but 2 and 3 aren't squares mod 2^2). A crude check is to compute J JT which should equal q I - U (for p > 2) where U is the "all 1s" matrix.)
Here is a basic math table setup for GF(3^4) based on 1 x^4 + 1 x^3 + 1 x^2 + 2 x + 2. At the end of this answer is a brute force search for any primitive polynomial (where powers of 3 map all non-zero elements). Numbers are stored as integer equivalents, for example, x^3 + 2 x + 1 = 1*(3^3) + 2*(3) + 1 = 16, so I store this as 16. Add and subtract just map from integer to vector and back. Multiply and divide use exp and log tables. Exp table is generated by taking powers of 3 (multiplying by x). Log table is the reverse mapped exp table. InitGF initializes the exp table using GFMpyA (multiply by alpha == multiply by x). Showing the math starting at integer 27 = 1 x^3 * x, showing the long hand division of
ex = e0 * x modulo polynomial
1 q = 1 = quotient
-----------
1 1 1 2 2 | 1 0 0 0 0 poly | ex
1 1 1 2 2 poly * q
---------
2 2 1 1 remainder
2 q = 2 = quotient
-----------
1 1 1 2 2 | 2 2 1 1 0 poly | ex
2 2 2 1 1 poly * 2
---------
0 2 0 2 remainder
Basic math code with initialization:
typedef unsigned char BYTE;
/* GFS(3) */
#define GFS 3
/* GF(3^2) */
#define GF 81
/* alpha = 1x + 0 */
#define ALPHA 3
typedef struct{ /* element of field */
int d; /* = dx^3 + cx^2 + bx + a */
int c;
int b;
int a;
}ELEM;
typedef struct{ /* extended element of field */
int e; /* = ex^4 + dx^3 + cx^2 +bx + a */
int d;
int c;
int b;
int a;
}ELEMX;
/*----------------------------------------------------------------------*/
/* GFAdd(i0, i1) */
/*----------------------------------------------------------------------*/
static int GFAdd(int i0, int i1)
{
ELEM e0, e1;
e0 = aiI2E[i0];
e1 = aiI2E[i1];
e0.d = (e0.d + e1.d);
if(e0.d >= GFS)e0.d -= GFS;
e0.c = (e0.c + e1.c);
if(e0.c >= GFS)e0.c -= GFS;
e0.b = (e0.b + e1.b);
if(e0.b >= GFS)e0.b -= GFS;
e0.a = (e0.a + e1.a);
if(e0.a >= GFS)e0.a -= GFS;
return (((((e0.d*GFS)+e0.c)*GFS)+e0.b)*GFS)+e0.a;
}
/*----------------------------------------------------------------------*/
/* GFSub(i0, i1) */
/*----------------------------------------------------------------------*/
static int GFSub(int i0, int i1)
{
ELEM e0, e1;
e0 = aiI2E[i0];
e1 = aiI2E[i1];
e0.d = (e0.d - e1.d);
if(e0.d < 0)e0.d += GFS;
e0.c = (e0.c - e1.c);
if(e0.c < 0)e0.c += GFS;
e0.b = (e0.b - e1.b);
if(e0.b < 0)e0.b += GFS;
e0.a = (e0.a - e1.a);
if(e0.a < 0)e0.a += GFS;
return (((((e0.d*GFS)+e0.c)*GFS)+e0.b)*GFS)+e0.a;
}
/*----------------------------------------------------------------------*/
/* GFMpy(i0, i1) i0*i1 using logs */
/*----------------------------------------------------------------------*/
static int GFMpy(int i0, int i1)
{
if(i0 == 0 || i1 == 0)
return(0);
return(aiExp[aiLog[i0]+aiLog[i1]]);
}
/*----------------------------------------------------------------------*/
/* GFDiv(i0, i1) i0/i1 */
/*----------------------------------------------------------------------*/
static int GFDiv(int i0, int i1)
{
if(i0 == 0)
return(0);
return(aiExp[(GF-1)+aiLog[i0]-aiLog[i1]]);
}
/*----------------------------------------------------------------------*/
/* GFPow(i0, i1) i0^i1 */
/*----------------------------------------------------------------------*/
static int GFPow(int i0, int i1)
{
if(i1 == 0)
return (1);
if(i0 == 0)
return (0);
return(aiExp[(aiLog[i0]*i1)%(GF-1)]);
}
/*----------------------------------------------------------------------*/
/* GFMpyA(i0) i0*ALPHA using low level math */
/*----------------------------------------------------------------------*/
/* hard coded for elements of size 4 */
static int GFMpyA(int i0)
{
ELEM e0;
ELEMX ex;
int q; /* quotient */
e0 = aiI2E[i0]; /* e0 = i0 split up */
ex.e = e0.d; /* ex = e0*x */
ex.d = e0.c;
ex.c = e0.b;
ex.b = e0.a;
ex.a = 0;
q = ex.e;
/* ex.e -= q * pGFPoly.aata[0] % GFS; ** always == 0 */
/* if(ex.e < 0)ex.d += GFS; ** always == 0 */
ex.d -= q * pGFPoly.data[1] % GFS;
if(ex.d < 0)ex.d += GFS;
ex.c -= q * pGFPoly.data[2] % GFS;
if(ex.c < 0)ex.c += GFS;
ex.b -= q * pGFPoly.data[3] % GFS;
if(ex.b < 0)ex.b += GFS;
ex.a -= q * pGFPoly.data[4] % GFS;
if(ex.a < 0)ex.a += GFS;
return (((((ex.d*GFS)+ex.c)*GFS)+ex.b)*GFS)+ex.a;
}
/*----------------------------------------------------------------------*/
/* InitGF Initialize Galios Stuff */
/*----------------------------------------------------------------------*/
static void InitGF(void)
{
int i;
int t;
for(i = 0; i < GF; i++){ /* init index to element table */
t = i;
aiI2E[i].a = t%GFS;
t /= GFS;
aiI2E[i].b = t%GFS;
t /= GFS;
aiI2E[i].c = t%GFS;
t /= GFS;
aiI2E[i].d = t;
}
pGFPoly.size = 5; /* init GF() polynomial */
pGFPoly.data[0] = 1;
pGFPoly.data[1] = 1;
pGFPoly.data[2] = 1;
pGFPoly.data[3] = 2;
pGFPoly.data[4] = 2;
t = 1; /* init aiExp[] */
for(i = 0; i < GF*2; i++){
aiExp[i] = t;
t = GFMpyA(t);
}
aiLog[0] = -1; /* init aiLog[] */
for(i = 0; i < GF-1; i++)
aiLog[aiExp[i]] = i;
}
/*----------------------------------------------------------------------*/
/* main */
/*----------------------------------------------------------------------*/
int main()
{
InitGF();
return(0);
}
Code to display a list of primitive polynomials for GF(3^4)
pGFPoly.size = 5; /* display primitive polynomials */
pGFPoly.data[0] = 1;
pGFPoly.data[1] = 0;
pGFPoly.data[2] = 0;
pGFPoly.data[3] = 0;
pGFPoly.data[4] = 1;
while(1){
i = 0;
t = 1;
do{
i++;
t = GFMpyA(t);}
while(t != 1);
if(i == (GF-1)){
printf("pGFPoly: ");
ShowVector(&pGFPoly);}
pGFPoly.data[4] += 1;
if(pGFPoly.data[4] == GFS){
pGFPoly.data[4] = 1;
pGFPoly.data[3] += 1;
if(pGFPoly.data[3] == GFS){
pGFPoly.data[3] = 0;
pGFPoly.data[2] += 1;
if(pGFPoly.data[2] == GFS){
pGFPoly.data[2] = 0;
pGFPoly.data[1] += 1;
if(pGFPoly.data[1] == GFS){
break;}}}}}
This produces this list:
1 0 0 1 2 The one normally used x^4 + x + 2
1 0 0 2 2
1 1 0 0 2
1 1 1 2 2 I used this to test all 5 terms
1 1 2 2 2
1 2 0 0 2
1 2 1 1 2
1 2 2 1 2

Unusual behaviour of Ant Colony Optimization for Closest String Problem in Python and C++

This is probably going to be a long question, I apologize in advance.
I'm working on a project with the goal of researching different solutions for the closest string problem.
Let s_1, ... s_n be strings of length m. Find a string s of length m such that it minimizes max{d(s, s_i) | i = 1, ..., n}, where d is the hamming distance.
One solution that has been tried is one using ant colony optimization, as decribed here.
The paper itself does not go into implementation details, so I've done my best on efficiency. However, efficiency is not the only unusual behaviour.
I'm not sure whether it's common pratice to do so, but I will present my code through pastebin since I believe it would overwhelm the thread if I should put it directly here. If that turns out to be a problem, I won't mind editing the thread to put it here directly. As all the previous algorithms I've experimented with, I've written this one in python initially. Here's the code:
def solve_(self, problem: CSProblem) -> CSSolution:
m, n, alphabet, strings = problem.m, problem.n, problem.alphabet, problem.strings
A = len(alphabet)
rho = self.config['RHO']
colony_size = self.config['COLONY_SIZE']
global_best_ant = None
global_best_metric = m
ants = np.full((colony_size, m), '')
world_trails = np.full((m, A), 1 / A)
for iteration in range(self.config['MAX_ITERS']):
local_best_ant = None
local_best_metric = m
for ant_idx in range(colony_size):
for next_character_index in range(m):
ants[ant_idx][next_character_index] = random.choices(alphabet, weights=world_trails[next_character_index], k=1)[0]
ant_metric = utils.problem_metric(ants[ant_idx], strings)
if ant_metric < local_best_metric:
local_best_metric = ant_metric
local_best_ant = ants[ant_idx]
# First we perform pheromone evaporation
for i in range(m):
for j in range(A):
world_trails[i][j] = world_trails[i][j] * (1 - rho)
# Now, using the elitist strategy, only the best ant is allowed to update his pheromone trails
best_ant_ys = (alphabet.index(a) for a in local_best_ant)
best_ant_xs = range(m)
for x, y in zip(best_ant_xs, best_ant_ys):
world_trails[x][y] = world_trails[x][y] + (1 - local_best_metric / m)
if local_best_metric < global_best_metric:
global_best_metric = local_best_metric
global_best_ant = local_best_ant
return CSSolution(''.join(global_best_ant), global_best_metric)
The utils.problem_metric function looks like this:
def hamming_distance(s1, s2):
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
def problem_metric(string, references):
return max(hamming_distance(string, r) for r in references)
I've seen that there are a lot more tweaks and other parameters you can add to ACO, but I've kept it simple for now. The configuration I'm using is is 250 iterations, colony size od 10 ants and rho=0.1. The problem that I'm testing it on is from here: http://tcs.informatik.uos.de/research/csp_cssp , the one called 2-10-250-1-0.csp (the first one). The alphabet consists only of '0' and '1', the strings are of length 250, and there are 10 strings in total.
For the ACO configuration that I've mentioned, this problem, using the python solver, gets solved on average in 5 seconds, and the average target function value is 108.55 (simulated 20 times). The correct target function value is 96. Ironically, the 5-second average is good compared to what it used to be in my first attempt of implementing this solution. However, it's still surprisingly slow.
After doing all kinds of optimizations, I've decided to try and implement the exact same solution in C++ so see whether there will be a significant difference between the running times. Here's the C++ solution:
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <random>
#include <chrono>
#include <map>
class CSPProblem{
public:
int m;
int n;
std::vector<char> alphabet;
std::vector<std::string> strings;
CSPProblem(int m, int n, std::vector<char> alphabet, std::vector<std::string> strings)
: m(m), n(n), alphabet(alphabet), strings(strings)
{
}
static CSPProblem from_csp(std::string filepath){
std::ifstream file(filepath);
std::string line;
std::vector<std::string> input_lines;
while (std::getline(file, line)){
input_lines.push_back(line);
}
int alphabet_size = std::stoi(input_lines[0]);
int n = std::stoi(input_lines[1]);
int m = std::stoi(input_lines[2]);
std::vector<char> alphabet;
for (int i = 3; i < 3 + alphabet_size; i++){
alphabet.push_back(input_lines[i][0]);
}
std::vector<std::string> strings;
for (int i = 3 + alphabet_size; i < input_lines.size(); i++){
strings.push_back(input_lines[i]);
}
return CSPProblem(m, n, alphabet, strings);
}
int hamm(const std::string& s1, const std::string& s2) const{
int h = 0;
for (int i = 0; i < s1.size(); i++){
if (s1[i] != s2[i])
h++;
}
return h;
}
int measure(const std::string& sol) const{
int mm = 0;
for (const auto& s: strings){
int h = hamm(sol, s);
if (h > mm){
mm = h;
}
}
return mm;
}
friend std::ostream& operator<<(std::ostream& out, CSPProblem problem){
out << "m: " << problem.m << std::endl;
out << "n: " << problem.n << std::endl;
out << "alphabet_size: " << problem.alphabet.size() << std::endl;
out << "alphabet: ";
for (const auto& a: problem.alphabet){
out << a << " ";
}
out << std::endl;
out << "strings:" << std::endl;
for (const auto& s: problem.strings){
out << "\t" << s << std::endl;
}
return out;
}
};
std::random_device rd;
std::mt19937 gen(rd());
int get_from_distrib(const std::vector<float>& weights){
std::discrete_distribution<> d(std::begin(weights), std::end(weights));
return d(gen);
}
int max_iter = 250;
float rho = 0.1f;
int colony_size = 10;
int ant_colony_solver(const CSPProblem& problem){
srand(time(NULL));
int m = problem.m;
int n = problem.n;
auto alphabet = problem.alphabet;
auto strings = problem.strings;
int A = alphabet.size();
float init_pher = 1.0 / A;
std::string global_best_ant;
int global_best_matric = m;
std::vector<std::vector<float>> world_trails(m, std::vector<float>(A, 0.0f));
for (int i = 0; i < m; i++){
for (int j = 0; j < A; j++){
world_trails[i][j] = init_pher;
}
}
std::vector<std::string> ants(colony_size, std::string(m, ' '));
for (int iteration = 0; iteration < max_iter; iteration++){
std::string local_best_ant;
int local_best_metric = m;
for (int ant_idx = 0; ant_idx < colony_size; ant_idx++){
for (int next_character_idx = 0; next_character_idx < m; next_character_idx++){
char next_char = alphabet[get_from_distrib(world_trails[next_character_idx])];
ants[ant_idx][next_character_idx] = next_char;
}
int ant_metric = problem.measure(ants[ant_idx]);
if (ant_metric < local_best_metric){
local_best_metric = ant_metric;
local_best_ant = ants[ant_idx];
}
}
// Evaporation
for (int i = 0; i < m; i++){
for (int j = 0; j < A; j++){
world_trails[i][j] = world_trails[i][j] + (1.0 - rho);
}
}
std::vector<int> best_ant_xs;
for (int i = 0; i < m; i++){
best_ant_xs.push_back(i);
}
std::vector<int> best_ant_ys;
for (const auto& c: local_best_ant){
auto loc = std::find(std::begin(alphabet), std::end(alphabet), c);
int idx = loc- std::begin(alphabet);
best_ant_ys.push_back(idx);
}
for (int i = 0; i < m; i++){
int x = best_ant_xs[i];
int y = best_ant_ys[i];
world_trails[x][y] = world_trails[x][y] + (1.0 - static_cast<float>(local_best_metric) / m);
}
if (local_best_metric < global_best_matric){
global_best_matric = local_best_metric;
global_best_ant = local_best_ant;
}
}
return global_best_matric;
}
int main(){
auto problem = CSPProblem::from_csp("in.csp");
int TRIES = 20;
std::vector<int> times;
std::vector<int> measures;
for (int i = 0; i < TRIES; i++){
auto start = std::chrono::high_resolution_clock::now();
int m = ant_colony_solver(problem);
auto stop = std::chrono::high_resolution_clock::now();
int duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count();
times.push_back(duration);
measures.push_back(m);
}
float average_time = static_cast<float>(std::accumulate(std::begin(times), std::end(times), 0)) / TRIES;
float average_measure = static_cast<float>(std::accumulate(std::begin(measures), std::end(measures), 0)) / TRIES;
std::cout << "Average running time: " << average_time << std::endl;
std::cout << "Average solution: " << average_measure << std::endl;
std::cout << "all solutions: ";
for (const auto& m: measures) std::cout << m << " ";
std::cout << std::endl;
return 0;
}
The average running time now is only 530.4 miliseconds. However, the average target function value is 122.75, which is significantly higher than that of the python solution.
If the average function values were the same, and the times were as they are, I would simply write this off as 'C++ is faster than python' (even though the difference in speed is also very suspiscious). But, since C++ yields worse solutions, it leads me to believe that I've done something wrong in C++. What I'm suspiscious of is the way I'm generating an alphabet index using weights. In python I've done it using random.choices as follows:
ants[ant_idx][next_character_index] = random.choices(alphabet, weights=world_trails[next_character_index], k=1)[0]
As for C++, I haven't done it in a while so I'm a bit rusty on reading cppreference (which is a skill of its own), and the std::discrete_distribution solution is something I've plain copied from the reference:
std::random_device rd;
std::mt19937 gen(rd());
int get_from_distrib(const std::vector<float>& weights){
std::discrete_distribution<> d(std::begin(weights), std::end(weights));
return d(gen);
}
The suspiscious thing here is the fact that I'm declaring the std::random_device and std::mt19937 objects globally and using the same ones every time. I have not been able to find an answer to whether this is the way they're meant to be used. However, if I put them in the function:
int get_from_distrib(const std::vector<float>& weights){
std::random_device rd;
std::mt19937 gen(rd());
std::discrete_distribution<> d(std::begin(weights), std::end(weights));
return d(gen);
}
the average running time gets significantly worse, clocking in at 8.84 seconds. However, even more surprisingly, the average function value gets worse as well, at 130.
Again, if only one of the two things changed (say if only the time went up) I would have been able to draw some conclusions. This way it only gets more confusing.
So, does anybody have an idea of why this is happening?
Thanks in advance.
MAJOR EDIT: I feel embarrased having asked such a huge question when in fact the problem lies in a simple typo. Namely in the evaporation step in the C++ version I put a + instead of a *.
Now the algorithms behave identically in terms of average solution quality.
However, I could still use some tips on how to optimize the python version.
Apart form the dumb mistake I've mentioned in the question edit, it seems I've finally found a way to optimize the python solution decently. First of all, keeping world_trails and ants as numpy arrays instead of lists of lists actually slowed things down. Furthermore, I actually stopped keeping a list of ants altogether since I only ever need the best one per iteration.
Lastly, running cProfile indicated that a lot of the time was spent on random.choices, therefore I've decided to implement my own version of it suited specifically for this case. I've done this by pre-computing total weight sum per character for each next iteration (in the trail_row_wise_sums array), and using the following function:
def fast_pick(arr, weights, ws):
r = random.random()*ws
for i in range(len(arr)):
if r < weights[i]:
return arr[i]
r -= weights[i]
return 0
The new version now looks like this:
def solve_(self, problem: CSProblem) -> CSSolution:
m, n, alphabet, strings = problem.m, problem.n, problem.alphabet, problem.strings
A = len(alphabet)
rho = self.config['RHO']
colony_size = self.config['COLONY_SIZE']
miters = self.config['MAX_ITERS']
global_best_ant = None
global_best_metric = m
init_pher = 1.0 / A
world_trails = [[init_pher for _ in range(A)] for _ in range(m)]
trail_row_wise_sums = [1.0 for _ in range(m)]
for iteration in tqdm(range(miters)):
local_best_ant = None
local_best_metric = m
for _ in range(colony_size):
ant = ''.join(fast_pick(alphabet, world_trails[next_character_index], trail_row_wise_sums[next_character_index]) for next_character_index in range(m))
ant_metric = utils.problem_metric(ant, strings)
if ant_metric <= local_best_metric:
local_best_metric = ant_metric
local_best_ant = ant
# First we perform pheromone evaporation
for i in range(m):
for j in range(A):
world_trails[i][j] = world_trails[i][j] * (1 - rho)
# Now, using the elitist strategy, only the best ant is allowed to update his pheromone trails
best_ant_ys = (alphabet.index(a) for a in local_best_ant)
best_ant_xs = range(m)
for x, y in zip(best_ant_xs, best_ant_ys):
world_trails[x][y] = world_trails[x][y] + (1 - 1.0*local_best_metric / m)
if local_best_metric < global_best_metric:
global_best_metric = local_best_metric
global_best_ant = local_best_ant
trail_row_wise_sums = [sum(world_trails[i]) for i in range(m)]
return CSSolution(global_best_ant, global_best_metric)
The average running time is now down to 800 miliseconds (compared to 5 seconds that it was before). Granted, applying the same fast_pick optimization to the C++ solution did also speed up the C++ version (around 150 ms) but I guess now I can write it off as C++ being faster than python.
Profiler also showed that a lot of the time was spent on calculating Hamming distances, but that's to be expected, and apart from that I see no other way of computing the Hamming distance between arbitrary strings more efficiently.

Incrementing a for loop upward in C

I am hoping to mimic a Python for loop with the range() function in C. I'd like to accomplish a task an increasing number of times each loop until I reach the value of a given variable, in this case 5 (for the variable h). Here it is in Python:
x = 5
y = 0
while x > y:
for i in range(y+1):
print("#",end='')
print('')
y+=1
Output:
#
##
###
####
#####
I was able to accomplish the opposite (executing something a decreasing number of times) in C, as below:
{
int h = 5;
while (h > 0)
{
for (int i = 0; i < h; i++)
{
printf("#");
}
printf("\n");
h--;
}
}
Output:
#####
####
###
##
#
When I've attempted the top version in C, with the increasing number of executions, I run into the problem of not knowing how to control the various incrementing and decrementing variables.
I suggest you should think simply:
Increment up the number of # to print
Use loop to print that number of #
#include <stdio.h>
int main(void)
{
int h = 5;
for (int c = 1; c <= h; c++) // the number of # to print
{
for (int i = 0; i < c; i++)
{
printf("#");
}
printf("\n");
}
return 0;
}
Another way is simply writing in just the same way as the Python version:
#include <stdio.h>
int main(void)
{
int x = 5;
int y = 0;
while (x > y)
{
for (int i = 0; i < y+1; i++)
{
printf("#");
}
printf("\n");
y += 1;
}
return 0;
}
The solution in C:
#include <stdio.h>
int main ()
{
int x = 5;
int y = 0;
while (x > y)
{
for (int i=0;i<y+1;i++)
{
printf("#");
}
printf("\n");
}
return 0;
}
In Python, in the for loop, the variable is initialized as zero and increments by 1 by default. But in C, you need to do it explicitly.

Python+C is (slightly) faster than pure C

I've been implementing the same code (the number of ways to deal a hand in Blackjack without busting) in a variety of languages and implementations. One oddity I've noticed is that the implementation of Python calling the partitions function in C is actually slightly faster than the entire program written in C. The same appears to be true for other languages (Ada vs Python calling Ada, Nim vs Python calling Nim). This seems counterintuitive to me - any idea how this is possible?
The code is all in my GitHub repo here:
https://github.com/octonion/puzzles/tree/master/blackjack
Here's the C code, compiled using 'gcc -O3 outcomes.c'.
#include <stdio.h>
int partitions(int cards[10], int subtotal)
{
//writeln(cards,subtotal);
int m = 0;
int total;
// Hit
for (int i = 0; i < 10; i++)
{
if (cards[i] > 0)
{
total = subtotal + i + 1;
if (total < 21)
{
// Stand
m += 1;
// Hit again
cards[i] -= 1;
m += partitions(cards, total);
cards[i] += 1;
}
else if (total == 21)
{
// Stand; hit again is an automatic bust
m += 1;
}
}
}
return m;
}
int main(void)
{
int deck[] =
{ 4, 4, 4, 4, 4, 4, 4, 4, 4, 16 };
int d = 0;
for (int i = 0; i < 10; i++)
{
// Dealer showing
deck[i] -= 1;
int p = 0;
for (int j = 0; j < 10; j++)
{
deck[j] -= 1;
int n = partitions(deck, j + 1);
deck[j] += 1;
p += n;
}
printf("Dealer showing %i partitions = %i\n", i, p);
d += p;
deck[i] += 1;
}
printf("Total partitions = %i\n", d);
return 0;
}
Here's the C function, compiled using 'gcc -O3 -fPIC -shared -o libpartitions.so partitions.c'.
int partitions(int cards[10], int subtotal)
{
int m = 0;
int total;
// Hit
for (int i = 0; i < 10; i++)
{
if (cards[i] > 0)
{
total = subtotal + i + 1;
if (total < 21)
{
cards[i] -= 1;
// Stand
m += 1;
// Hit again
m += partitions(cards, total);
cards[i] += 1;
}
else if (total == 21)
{
// Stand; hit again is an automatic bust
m += 1;
}
}
}
return m;
}
And here's the Python wrapper for the C function:
#!/usr/bin/env python
from ctypes import *
import os
test_lib = cdll.LoadLibrary(os.path.abspath("libpartitions.so"))
test_lib.partitions.argtypes = [POINTER(c_int), c_int]
test_lib.partitions.restype = c_int
deck = ([4]*9)
deck.append(16)
d = 0
for i in xrange(10):
# Dealer showing
deck[i] -= 1
p = 0
for j in xrange(10):
deck[j] -= 1
nums_arr = (c_int*len(deck))(*deck)
n = test_lib.partitions(nums_arr, c_int(j+1))
deck[j] += 1
p += n
print('Dealer showing ', i,' partitions =',p)
d += p
deck[i] += 1
print('Total partitions =',d)
I think the reason here is how GCC compiles function partitions in 2 cases. You can compare asm code in outcomes binary executable and libpartitions.so by using objdump to see the differences.
objdump -d -M intel <file name>
When building to shared library, GCC has no idea how partitions is called. While in C program, GCC know exactly when partitions is called (in this case, however, lead to worse performance). This difference in context makes GCC does optimization differently.
You can try different compilers to compare the result. I have checked with GCC 5.4 and Clang 6.0. With GCC 5.4, the Python script runs faster while with Clang, the C program runs faster.

Efficient set union and intersection in C++

Given two sets set1 and set2, I need to compute the ratio of their intersection by their union. So far, I have the following code:
double ratio(const set<string>& set1, const set<string>& set2)
{
if( set1.size() == 0 || set2.size() == 0 )
return 0;
set<string>::const_iterator iter;
set<string>::const_iterator iter2;
set<string> unionset;
// compute intersection and union
int len = 0;
for (iter = set1.begin(); iter != set1.end(); iter++)
{
unionset.insert(*iter);
if( set2.count(*iter) )
len++;
}
for (iter = set2.begin(); iter != set2.end(); iter++)
unionset.insert(*iter);
return (double)len / (double)unionset.size();
}
It seems to be very slow (I'm calling the function about 3M times, always with different sets). The python counterpart, on the other hand, is way much faster
def ratio(set1, set2):
if not set1 or not set2:
return 0
return len(set1.intersection(set2)) / len(set1.union(set2))
Any idea about how to improve the C++ version (possibly, not using Boost)?
It can be done in linear time, without new memory:
double ratio(const std::set<string>& set1, const std::set<string>& set2)
{
if (set1.empty() || set2.empty()) {
return 0.;
}
std::set<string>::const_iterator iter1 = set1.begin();
std::set<string>::const_iterator iter2 = set2.begin();
int union_len = 0;
int intersection_len = 0;
while (iter1 != set1.end() && iter2 != set2.end())
{
++union_len;
if (*iter1 < *iter2) {
++iter1;
} else if (*iter2 < *iter1) {
++iter2;
} else { // *iter1 == *iter2
++intersection_len;
++iter1;
++iter2;
}
}
union_len += std::distance(iter1, set1.end());
union_len += std::distance(iter2, set2.end());
return static_cast<double>(intersection_len) / union_len;
}
You don't actually need to construct the union set. In Python terms, len(s1.union(s2)) == len(s1) + len(s2) - len(s1.intersection(s2)); the size of the union is the sum of the sizes of s1 and s2, minus the number of elements counted twice, which is the number of elements in the intersection. Thus, you can do
for (const string &s : set1) {
len += set2.count(s);
}
return ((double) len) / (set1.size() + set2.size() - len)

Categories