This is probably going to be a long question, I apologize in advance.
I'm working on a project with the goal of researching different solutions for the closest string problem.
Let s_1, ... s_n be strings of length m. Find a string s of length m such that it minimizes max{d(s, s_i) | i = 1, ..., n}, where d is the hamming distance.
One solution that has been tried is one using ant colony optimization, as decribed here.
The paper itself does not go into implementation details, so I've done my best on efficiency. However, efficiency is not the only unusual behaviour.
I'm not sure whether it's common pratice to do so, but I will present my code through pastebin since I believe it would overwhelm the thread if I should put it directly here. If that turns out to be a problem, I won't mind editing the thread to put it here directly. As all the previous algorithms I've experimented with, I've written this one in python initially. Here's the code:
def solve_(self, problem: CSProblem) -> CSSolution:
m, n, alphabet, strings = problem.m, problem.n, problem.alphabet, problem.strings
A = len(alphabet)
rho = self.config['RHO']
colony_size = self.config['COLONY_SIZE']
global_best_ant = None
global_best_metric = m
ants = np.full((colony_size, m), '')
world_trails = np.full((m, A), 1 / A)
for iteration in range(self.config['MAX_ITERS']):
local_best_ant = None
local_best_metric = m
for ant_idx in range(colony_size):
for next_character_index in range(m):
ants[ant_idx][next_character_index] = random.choices(alphabet, weights=world_trails[next_character_index], k=1)[0]
ant_metric = utils.problem_metric(ants[ant_idx], strings)
if ant_metric < local_best_metric:
local_best_metric = ant_metric
local_best_ant = ants[ant_idx]
# First we perform pheromone evaporation
for i in range(m):
for j in range(A):
world_trails[i][j] = world_trails[i][j] * (1 - rho)
# Now, using the elitist strategy, only the best ant is allowed to update his pheromone trails
best_ant_ys = (alphabet.index(a) for a in local_best_ant)
best_ant_xs = range(m)
for x, y in zip(best_ant_xs, best_ant_ys):
world_trails[x][y] = world_trails[x][y] + (1 - local_best_metric / m)
if local_best_metric < global_best_metric:
global_best_metric = local_best_metric
global_best_ant = local_best_ant
return CSSolution(''.join(global_best_ant), global_best_metric)
The utils.problem_metric function looks like this:
def hamming_distance(s1, s2):
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
def problem_metric(string, references):
return max(hamming_distance(string, r) for r in references)
I've seen that there are a lot more tweaks and other parameters you can add to ACO, but I've kept it simple for now. The configuration I'm using is is 250 iterations, colony size od 10 ants and rho=0.1. The problem that I'm testing it on is from here: http://tcs.informatik.uos.de/research/csp_cssp , the one called 2-10-250-1-0.csp (the first one). The alphabet consists only of '0' and '1', the strings are of length 250, and there are 10 strings in total.
For the ACO configuration that I've mentioned, this problem, using the python solver, gets solved on average in 5 seconds, and the average target function value is 108.55 (simulated 20 times). The correct target function value is 96. Ironically, the 5-second average is good compared to what it used to be in my first attempt of implementing this solution. However, it's still surprisingly slow.
After doing all kinds of optimizations, I've decided to try and implement the exact same solution in C++ so see whether there will be a significant difference between the running times. Here's the C++ solution:
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <random>
#include <chrono>
#include <map>
class CSPProblem{
public:
int m;
int n;
std::vector<char> alphabet;
std::vector<std::string> strings;
CSPProblem(int m, int n, std::vector<char> alphabet, std::vector<std::string> strings)
: m(m), n(n), alphabet(alphabet), strings(strings)
{
}
static CSPProblem from_csp(std::string filepath){
std::ifstream file(filepath);
std::string line;
std::vector<std::string> input_lines;
while (std::getline(file, line)){
input_lines.push_back(line);
}
int alphabet_size = std::stoi(input_lines[0]);
int n = std::stoi(input_lines[1]);
int m = std::stoi(input_lines[2]);
std::vector<char> alphabet;
for (int i = 3; i < 3 + alphabet_size; i++){
alphabet.push_back(input_lines[i][0]);
}
std::vector<std::string> strings;
for (int i = 3 + alphabet_size; i < input_lines.size(); i++){
strings.push_back(input_lines[i]);
}
return CSPProblem(m, n, alphabet, strings);
}
int hamm(const std::string& s1, const std::string& s2) const{
int h = 0;
for (int i = 0; i < s1.size(); i++){
if (s1[i] != s2[i])
h++;
}
return h;
}
int measure(const std::string& sol) const{
int mm = 0;
for (const auto& s: strings){
int h = hamm(sol, s);
if (h > mm){
mm = h;
}
}
return mm;
}
friend std::ostream& operator<<(std::ostream& out, CSPProblem problem){
out << "m: " << problem.m << std::endl;
out << "n: " << problem.n << std::endl;
out << "alphabet_size: " << problem.alphabet.size() << std::endl;
out << "alphabet: ";
for (const auto& a: problem.alphabet){
out << a << " ";
}
out << std::endl;
out << "strings:" << std::endl;
for (const auto& s: problem.strings){
out << "\t" << s << std::endl;
}
return out;
}
};
std::random_device rd;
std::mt19937 gen(rd());
int get_from_distrib(const std::vector<float>& weights){
std::discrete_distribution<> d(std::begin(weights), std::end(weights));
return d(gen);
}
int max_iter = 250;
float rho = 0.1f;
int colony_size = 10;
int ant_colony_solver(const CSPProblem& problem){
srand(time(NULL));
int m = problem.m;
int n = problem.n;
auto alphabet = problem.alphabet;
auto strings = problem.strings;
int A = alphabet.size();
float init_pher = 1.0 / A;
std::string global_best_ant;
int global_best_matric = m;
std::vector<std::vector<float>> world_trails(m, std::vector<float>(A, 0.0f));
for (int i = 0; i < m; i++){
for (int j = 0; j < A; j++){
world_trails[i][j] = init_pher;
}
}
std::vector<std::string> ants(colony_size, std::string(m, ' '));
for (int iteration = 0; iteration < max_iter; iteration++){
std::string local_best_ant;
int local_best_metric = m;
for (int ant_idx = 0; ant_idx < colony_size; ant_idx++){
for (int next_character_idx = 0; next_character_idx < m; next_character_idx++){
char next_char = alphabet[get_from_distrib(world_trails[next_character_idx])];
ants[ant_idx][next_character_idx] = next_char;
}
int ant_metric = problem.measure(ants[ant_idx]);
if (ant_metric < local_best_metric){
local_best_metric = ant_metric;
local_best_ant = ants[ant_idx];
}
}
// Evaporation
for (int i = 0; i < m; i++){
for (int j = 0; j < A; j++){
world_trails[i][j] = world_trails[i][j] + (1.0 - rho);
}
}
std::vector<int> best_ant_xs;
for (int i = 0; i < m; i++){
best_ant_xs.push_back(i);
}
std::vector<int> best_ant_ys;
for (const auto& c: local_best_ant){
auto loc = std::find(std::begin(alphabet), std::end(alphabet), c);
int idx = loc- std::begin(alphabet);
best_ant_ys.push_back(idx);
}
for (int i = 0; i < m; i++){
int x = best_ant_xs[i];
int y = best_ant_ys[i];
world_trails[x][y] = world_trails[x][y] + (1.0 - static_cast<float>(local_best_metric) / m);
}
if (local_best_metric < global_best_matric){
global_best_matric = local_best_metric;
global_best_ant = local_best_ant;
}
}
return global_best_matric;
}
int main(){
auto problem = CSPProblem::from_csp("in.csp");
int TRIES = 20;
std::vector<int> times;
std::vector<int> measures;
for (int i = 0; i < TRIES; i++){
auto start = std::chrono::high_resolution_clock::now();
int m = ant_colony_solver(problem);
auto stop = std::chrono::high_resolution_clock::now();
int duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count();
times.push_back(duration);
measures.push_back(m);
}
float average_time = static_cast<float>(std::accumulate(std::begin(times), std::end(times), 0)) / TRIES;
float average_measure = static_cast<float>(std::accumulate(std::begin(measures), std::end(measures), 0)) / TRIES;
std::cout << "Average running time: " << average_time << std::endl;
std::cout << "Average solution: " << average_measure << std::endl;
std::cout << "all solutions: ";
for (const auto& m: measures) std::cout << m << " ";
std::cout << std::endl;
return 0;
}
The average running time now is only 530.4 miliseconds. However, the average target function value is 122.75, which is significantly higher than that of the python solution.
If the average function values were the same, and the times were as they are, I would simply write this off as 'C++ is faster than python' (even though the difference in speed is also very suspiscious). But, since C++ yields worse solutions, it leads me to believe that I've done something wrong in C++. What I'm suspiscious of is the way I'm generating an alphabet index using weights. In python I've done it using random.choices as follows:
ants[ant_idx][next_character_index] = random.choices(alphabet, weights=world_trails[next_character_index], k=1)[0]
As for C++, I haven't done it in a while so I'm a bit rusty on reading cppreference (which is a skill of its own), and the std::discrete_distribution solution is something I've plain copied from the reference:
std::random_device rd;
std::mt19937 gen(rd());
int get_from_distrib(const std::vector<float>& weights){
std::discrete_distribution<> d(std::begin(weights), std::end(weights));
return d(gen);
}
The suspiscious thing here is the fact that I'm declaring the std::random_device and std::mt19937 objects globally and using the same ones every time. I have not been able to find an answer to whether this is the way they're meant to be used. However, if I put them in the function:
int get_from_distrib(const std::vector<float>& weights){
std::random_device rd;
std::mt19937 gen(rd());
std::discrete_distribution<> d(std::begin(weights), std::end(weights));
return d(gen);
}
the average running time gets significantly worse, clocking in at 8.84 seconds. However, even more surprisingly, the average function value gets worse as well, at 130.
Again, if only one of the two things changed (say if only the time went up) I would have been able to draw some conclusions. This way it only gets more confusing.
So, does anybody have an idea of why this is happening?
Thanks in advance.
MAJOR EDIT: I feel embarrased having asked such a huge question when in fact the problem lies in a simple typo. Namely in the evaporation step in the C++ version I put a + instead of a *.
Now the algorithms behave identically in terms of average solution quality.
However, I could still use some tips on how to optimize the python version.
Apart form the dumb mistake I've mentioned in the question edit, it seems I've finally found a way to optimize the python solution decently. First of all, keeping world_trails and ants as numpy arrays instead of lists of lists actually slowed things down. Furthermore, I actually stopped keeping a list of ants altogether since I only ever need the best one per iteration.
Lastly, running cProfile indicated that a lot of the time was spent on random.choices, therefore I've decided to implement my own version of it suited specifically for this case. I've done this by pre-computing total weight sum per character for each next iteration (in the trail_row_wise_sums array), and using the following function:
def fast_pick(arr, weights, ws):
r = random.random()*ws
for i in range(len(arr)):
if r < weights[i]:
return arr[i]
r -= weights[i]
return 0
The new version now looks like this:
def solve_(self, problem: CSProblem) -> CSSolution:
m, n, alphabet, strings = problem.m, problem.n, problem.alphabet, problem.strings
A = len(alphabet)
rho = self.config['RHO']
colony_size = self.config['COLONY_SIZE']
miters = self.config['MAX_ITERS']
global_best_ant = None
global_best_metric = m
init_pher = 1.0 / A
world_trails = [[init_pher for _ in range(A)] for _ in range(m)]
trail_row_wise_sums = [1.0 for _ in range(m)]
for iteration in tqdm(range(miters)):
local_best_ant = None
local_best_metric = m
for _ in range(colony_size):
ant = ''.join(fast_pick(alphabet, world_trails[next_character_index], trail_row_wise_sums[next_character_index]) for next_character_index in range(m))
ant_metric = utils.problem_metric(ant, strings)
if ant_metric <= local_best_metric:
local_best_metric = ant_metric
local_best_ant = ant
# First we perform pheromone evaporation
for i in range(m):
for j in range(A):
world_trails[i][j] = world_trails[i][j] * (1 - rho)
# Now, using the elitist strategy, only the best ant is allowed to update his pheromone trails
best_ant_ys = (alphabet.index(a) for a in local_best_ant)
best_ant_xs = range(m)
for x, y in zip(best_ant_xs, best_ant_ys):
world_trails[x][y] = world_trails[x][y] + (1 - 1.0*local_best_metric / m)
if local_best_metric < global_best_metric:
global_best_metric = local_best_metric
global_best_ant = local_best_ant
trail_row_wise_sums = [sum(world_trails[i]) for i in range(m)]
return CSSolution(global_best_ant, global_best_metric)
The average running time is now down to 800 miliseconds (compared to 5 seconds that it was before). Granted, applying the same fast_pick optimization to the C++ solution did also speed up the C++ version (around 150 ms) but I guess now I can write it off as C++ being faster than python.
Profiler also showed that a lot of the time was spent on calculating Hamming distances, but that's to be expected, and apart from that I see no other way of computing the Hamming distance between arbitrary strings more efficiently.
I am hoping to mimic a Python for loop with the range() function in C. I'd like to accomplish a task an increasing number of times each loop until I reach the value of a given variable, in this case 5 (for the variable h). Here it is in Python:
x = 5
y = 0
while x > y:
for i in range(y+1):
print("#",end='')
print('')
y+=1
Output:
#
##
###
####
#####
I was able to accomplish the opposite (executing something a decreasing number of times) in C, as below:
{
int h = 5;
while (h > 0)
{
for (int i = 0; i < h; i++)
{
printf("#");
}
printf("\n");
h--;
}
}
Output:
#####
####
###
##
#
When I've attempted the top version in C, with the increasing number of executions, I run into the problem of not knowing how to control the various incrementing and decrementing variables.
I suggest you should think simply:
Increment up the number of # to print
Use loop to print that number of #
#include <stdio.h>
int main(void)
{
int h = 5;
for (int c = 1; c <= h; c++) // the number of # to print
{
for (int i = 0; i < c; i++)
{
printf("#");
}
printf("\n");
}
return 0;
}
Another way is simply writing in just the same way as the Python version:
#include <stdio.h>
int main(void)
{
int x = 5;
int y = 0;
while (x > y)
{
for (int i = 0; i < y+1; i++)
{
printf("#");
}
printf("\n");
y += 1;
}
return 0;
}
The solution in C:
#include <stdio.h>
int main ()
{
int x = 5;
int y = 0;
while (x > y)
{
for (int i=0;i<y+1;i++)
{
printf("#");
}
printf("\n");
}
return 0;
}
In Python, in the for loop, the variable is initialized as zero and increments by 1 by default. But in C, you need to do it explicitly.
I wrote a program to make a list of primes from 2 to a user given number in both python and C. I ran both the programs looking for primes up to the same number and looked at their respective processes in activity monitor. I found that the python implementation used exactly 9 times as much memory as the C implementation. Why does python require so much more memory and why that specific multiple to store the same array of integers? Here are both implementations of the program:
Python version:
import math
import sys
top = int(input('Please enter the highest number you would like to have checked: '))
num = 3
prime_list = [2]
while num <= top:
n = 0
prime = True
while int(prime_list[n]) <= math.sqrt(num):
if num % prime_list[n] == 0:
prime = False
n = 0
break
n = n + 1
if prime == True:
prime_list.append(num)
prime = False
num = num + 1
print("I found ", len(prime_list), " primes")
print("The largest prime I found was ", prime_list[-1])
C version:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/types.h>
#include <unistd.h>
int main(){
int N;
int arraySize = 1;
int *primes = malloc(100*sizeof(int));
int isPrime = 1;
primes[0] = 2;
int timesRealloc = 0;
int availableSlots = 100;
printf("Please enter the largest number you want checked: \n");
scanf("%d", &N);
int j = 0;
int i;
for (i = 3; i <= N; i+=2){
j = 0;
isPrime = 1;
while (primes[j] <= sqrt(i)) {
if (i%primes[j] == 0) {
isPrime = 0;
break;
}
j++;
}
if (isPrime == 1){
primes[arraySize] = i;
arraySize++;
}
if (availableSlots == arraySize){
timesRealloc++;
availableSlots += 100;
primes = realloc(primes, availableSlots*sizeof(int));
}
}
printf("I found %d primes\n", arraySize);
printf("Memory was reallocated %d times\n", timesRealloc);
printf("The largest prime I found was %d\n", primes[(arraySize-1)]);
return 0;
}
>>> import sys
>>> sys.getsizeof(123456)
28
That's 7 times the size of C int. In Python 3 integers are instances of struct _longobject a.k.a PyLong:
struct _longobject {
PyVarObject ob_base;
digit ob_digit[1];
};
where PyVarObject is
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size;
} PyVarObject;
and PyObject is
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
From that we get the following memory usage for that object 123456 in 64-bit Python build:
8 bytes for reference counter (Py_ssize_t)
8 bytes for pointer to the type object &PyLong_Type (of type PyTypeObject *
8 bytes for counting the number of bytes in the variable-length part of the object; (of type Py_ssize_t)
4 bytes for each 30 bits of digits in the integer.
Since 123456 fits in the first 30 bits, this sums up to 28, or 7 * sizeof (int)
That in addition to the fact that each element in a Python list is a PyObject * which points the actual object; each of these pointers are 64 bits on 64-bit Python builds; which means that each list element reference alone consumes twice as much memory as a C int.
Add together 7 and 2 and you get 9.
For more storage-efficient code you can use arrays; with type code 'i' the memory consumption should be quite close to the C version. arrays have the append method thanks to which growing an array should be even easier than in C / with realloc.
I wrote a word-list generator originally in C some while ago, which took about 2 days. I used a dedicated int array to store the indices and incremented them using the idea of number bases. Here is my source code:
/*
* Generates word lists from a character set
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int power(int x, int y);
void ntoarray(int n, int base, int seq[], int len);
void setchar(char charset[], char word[], int indices[]);
int main(int argc, char *argv[])
{
// checks arguments
if (argc != 4)
{
printf("Usage: %s start_length end_length charset", argv[0]);
return 1;
}
else
{
// loops from start length to end length
for (int lenword = atoi(argv[1]), lim = atoi(argv[2]); lenword <= lim; lenword++)
{
// pointer to character set
char *charset = argv[3];
// length of the character set
int lencharset = strlen(charset);
// array for the word
char word[lenword + 1];
word[lenword] = '\0';
// array for storing the indices to generate the word from the character set
int indices[lenword];
// number to convert to required base
int n = 0;
// repetition is allowed so the number of times to loop is number^choice
for (int i = 0; i < power(lencharset, lenword); i++)
{
// converts number to an integer array with length of the charset as the base
ntoarray(n++, lencharset, indices, lenword);
// sets the word according to the indices array
setchar(charset, word, indices);
// prints the word
printf("%s\n", word);
}
}
return 0;
}
}
// simple power algorithm which raises x to the power y
int power(int x, int y)
{
int n = 1;
for (int i = 0; i < y; i++)
n *= x;
return n;
}
// converts n to the required base and stores it in an integer array
void ntoarray(int n, int base, int seq[], int len)
{
int i = len - 1;
memset(seq, 0, sizeof(int) * len);
while (i >= 0 && n >= base)
{
int r = n % base;
n /= base;
seq[i--] = r;
}
seq[i] = n;
}
// sets the word by combining the data from the indices array and the charset array
void setchar(char charset[], char word[], int indices[])
{
int len = strlen(word);
for (int i = 0; i < len; i++)
{
word[i] = charset[indices[i]];
}
}
Now I have rewritten it in python3 using a similar idea, but it took only about an hour.
"""
Generates word lists from a character set.
"""
import argparse
def main():
# checks arguments
parser = argparse.ArgumentParser()
parser.add_argument("start_length", help = "starting length for your words", type = int)
parser.add_argument("end_length", help = "ending length for your words", type = int)
parser.add_argument("charset", help = "character set to be used", type = str)
args = parser.parse_args()
charset = args.charset
len_charset = len(charset)
# loops from start_length to end_length
for length in range(args.start_length, args.end_length + 1):
# initializes indices list
indices = [0] * length
# prints the word
print(genword(charset, indices))
# increments the indices list and prints the word until the limit
# repetition is allowed so the number of loops is base ^ length - 1 (-1 for the printed word)
for i in range(len_charset ** length - 1):
inc_seq(indices, len_charset)
print(genword(charset, indices))
def inc_seq(seq, base, index=-1):
"""
Increments a number sequence with a specified base by one.
"""
if seq[index] < base - 1:
seq[index] += 1
else:
inc_seq(seq, base, index - 1)
seq[index] = 0
def genword(charset, indices):
"""
Generates a word by combining a character set and a list of indices.
"""
return "".join([charset[i] for i in indices])
if __name__ == "__main__":
main()
There is a notable difference: In C, I incremented an intermediary number n and used it to modify the int array; in python, I harnessed the power of negative indices to directly increment the int list.
I learned to code mostly by self-study (i.e. reading books and using online resources), but I do not yet know how to analyze algorithms properly. So my question is: Which version is more efficient, in terms of time and space?
Given a positive integer N, print all integers between 1 and 2^N such that there is no consecutive 1’s in its Binary representation.
I have below code but it is printing duplicate sometimes. Is it possible to print without duplicates?
#include <stdio.h>
int a[100];
void foo(int i, int size)
{
if (i >= size) {
int i;
for (i=0;i<size;i++)
printf("%d\n", a[i]);
printf("----\n");
return;
}
if (a[i-1] == 1 || a[i-1] == 0)
a[i] = 0;
foo(i+1, size);
if (a[i-1] == 0)
a[i] = 1;
foo(i+1, size);
}
int main(void) {
int i = 0;
int size = 5;
a[i] = 1;
foo(1, size);
return 0;
}
I have this http://ideone.com/cT4Hco python program which uses hash maps to print the elements but I think we can do this without hashmaps also.
Couple of notes:
you shouldn't start the backtracking from index 1. Instead, start from 0 since your numbers would be in the range [0, n-1] in array a
you shouldn't initialize a[0] to 1 since a[0] = 0 is also a valid case.
if (a[i-1] == 1 || a[i-1] == 0) is redundant
Code:
#include <stdio.h>
int a[100];
void foo(int i, int size)
{
if (i >= size) {
int i;
for (i=0;i<size;i++)
printf("%d ", a[i]);
printf("\n----\n");
return;
}
a[i] = 0;
foo(i+1, size);
if ( i == 0 || a[i-1] == 0) {
a[i] = 1;
foo(i+1, size);
}
}
int main(void) {
int i = 0;
int size = 5;
foo(0, size);
return 0;
}
You might also want to filter the solution 0 0 0 ... 0 during the printing since you need only the numbers from 1 to 2^n. If 2^n is included you should also print it. The backtracking considers the numbers 0, ...., 2^n-1