Pygmo2: migration between islands in an archipelago during evolution - python

I'm trying to use the Python library Pygmo2 (https://esa.github.io/pagmo2/index.html) to parallelize an optimization problem.
To my understanding, parallelization can be achieved with an archipelago of islands (in this case, mp_island).
As a minimal working example, one of the tutorials from the official site can serve: https://esa.github.io/pagmo2/docs/python/tutorials/using_archipelago.html
I extracted the code:
class toy_problem:
def __init__(self, dim):
self.dim = dim
def fitness(self, x):
return [sum(x), 1 - sum(x*x), - sum(x)]
def gradient(self, x):
return pg.estimate_gradient(lambda x: self.fitness(x), x)
def get_nec(self):
return 1
def get_nic(self):
return 1
def get_bounds(self):
return ([-1] * self.dim, [1] * self.dim)
def get_name(self):
return "A toy problem"
def get_extra_info(self):
return "\tDimensions: " + str(self.dim)
import pygmo as pg
a_cstrs_sa = pg.algorithm(pg.cstrs_self_adaptive(iters=1000))
p_toy = pg.problem(toy_problem(50))
p_toy.c_tol = [1e-4, 1e-4]
archi = pg.archipelago(n=32,algo=a_cstrs_sa, prob=p_toy, pop_size=70)
print(archi)
archi.evolve()
print(archi)
Looking at the documentation of the old version of the library (http://esa.github.io/pygmo/documentation/migration.html), migration between islands seems to be an essential feature of the island parallelization model.
Also, to my understanding, optimization algorithms like evolutionary algorithms could not work without it.
However, in the documentation of Pygmo2, I can nowhere find how to perform migration.
Is it happening automatically in an archipelago?
Does it depend on the selected algorithm?
Is it not yet implemented in Pygmo2?
Is the documentation on this yet missing or did I just not find it?
Can somebody enlighten me?

pagmo2 is now implementing migration since v2.11, the PR has benn completed and merged into master. Almost all capabilities present in pagmo1.x are restored. We will still add more topologies in the future, but they can already be implemented manually. Refer to docs here: https://esa.github.io/pagmo2/docs/cpp/cpp_docs.html
Tutorial and example are missing and will be added in the near future (help is welcome)

The migration framework has not been fully ported from pagmo1 to pagmo2 yet. There is a long-standing PR open here:
https://github.com/esa/pagmo2/pull/102
We will complete the implementation of the migration framework in the next few months, hopefully by the beginning of the summer.

IMHO, the PyGMO2/pagmo documentation is confirming the migration feature to be present.
The archipelago class is the main parallelization engine of pygmo. It essentially is a container of island able to initiate evolution (optimization tasks) in each island asynchronously while keeping track of the results and of the information exchange (migration) between the tasks ...
With an exception of thread_island-s ( where some automated inference may take place and enforce 'em for thread-safe UDI-s ), all other island types - { mp_island | ipyparallel_island }-s do create a GIL-independent form of a parallelism, yet the computing is performed via an async-operated .evolve() method
In original PyGMO, the archipelago class was auto .__init__()-ed with attribute topology = unconnected(), unless specified explicitly, as documented in PyGMO, having a tuple of call-interfaces for archipelago.__init__() method ( showing just the matching one ):
__init__( <PyGMO.algorithm> algo,
<PyGMO.problem> prob,
<int> n_isl,
<int> n_ind [, topology = unconnected(),
distribution_type = point_to_point,
migration_direction = destination
]
)
But, adding that, one may redefine the default, so as to meet one's PyGMO evolutionary process preferences:
topo = topology.erdos_renyi( nodes = 100,
p = 0.03
) # Erdos-Renyi ( random ) topology
or
set a Clustered Barabási-Albert, with ageing vertices graph topology:
topo = topology.clustered_ba( m0 = 3,
m = 3,
p = 0.5,
a = 1000,
nodes = 0
) # clustered Barabasi-Albert,
# # with Ageing vertices topology
or:
topo = topology.watts_strogatz( nodes = 100,
p = 0.1
) # Watts-Strogatz ( circle
# + links ) topology
and finally, set it by assignment into the class-instance attribute:
archi = pg.archipelago( n = 32,
algo = a_cstrs_sa,
prob = p_toy,
pop_size = 70
) # constructs an archipelago
archi.topology = topo # sets the topology to the
# # above selected, pre-defined <topo>

Related

Minimal cost flow network when using multiple demand/supply sets (networkx)

This is my first stackoverflow submission, so please let me know whether I am asking this question according to community guidelines.
Question in short: Is there a way to have networkx find the min cost flow network given multiple demand/supply sets?
I currently working on a network optimization problem in which I am using the nx.min_cost_flow() function to calculate the minimum cost network layout given a supply and demand set for the nodes, and maximum capacity for certain edges. The code works, but now I am trying to find a way to find the optimal network layout when given multiple supply and demand sets over time. I have tried a method that works for optimal trees (iterating through all the demand/supply sets and using the maximum capacity for each edge in each time step), but sadly this does not provide the optimal solution when using nx.min_cost_flow() as I want the model to be able to create loops as well.
I have added the relevant code below, but it might be a bit confusing as there are many elements referring to other parts of the model unrelated to the problem I am currently facing. The complete model is too large to share.
I have tried looking for a solution online without success. Hopefully you know a solution. Thanks in advance!
def full_existing(T,G,folder,rpc,routing):
fullG = G.copy()
for (i,j) in fullG.edges():
fullG[i][j]['capacity'] = 0
for (i,j) in T.edges():
if 'current' in T[i][j]:
fullG[i][j]['weight'] = T[i][j]['weight']*rpc
f_demand = open(folder+'/demand.txt','r')
for X in f_demand:
print('Demand set is called')
Y = X.split(',')
demand = {i: float(Y[i+1]) for i in range(len(Y)-1)}
G1 = fullG.copy()
for i in fullG.nodes():
if i in demand.keys():
G1.nodes[i]['demand'] = demand[i]
else:
G1.nodes[i]['demand'] = 0
for (i,j) in G1.edges():
G1[i][j]['capacity'] = inf
for (i,j) in T.edges():
if 'current' in T[i][j]:
G1[i][j]['capacity'] = T[i][j]['current']
G1[i][j]['weight'] = T[i][j]['weight']*rpc
else:
G1[i][j]['capacity'] = inf
G1 = simplex(G1)
for i,j in G1.edges():
fullG[i][j]['capacity'] = max(fullG[i][j]['capacity'],G1[i][j]['capacity'])
for (i,j) in T.edges():
if 'current' in T[i][j]:
fullG[i][j]['weight'] = T[i][j]['weight']
nx.set_edge_attributes(fullG,nx.get_edge_attributes(T,'current'),'current')
return fullG
The simplex procedure is called because the input graph is undirected.
def simplex(G):
GX = G.copy()
G1 = GX.to_directed()
ns = nx.min_cost_flow(G1)
for i,j in G1.edges():
if ns[i][j] > 0 or ns[j][i] > 0:
GX[i][j]['capacity'] = max(ns[i][j],ns[j][i])
else:
GX[i][j]['capacity'] = 0
return GX

Implementing the TD-Gammon algorithm

I am attempting to implement the algorithm from the TD-Gammon article by Gerald Tesauro. The core of the learning algorithm is described in the following paragraph:
I have decided to have a single hidden layer (if that was enough to play world-class backgammon in the early 1990's, then it's enough for me). I am pretty certain that everything except the train() function is correct (they are easier to test), but I have no idea whether I have implemented this final algorithm correctly.
import numpy as np
class TD_network:
"""
Neural network with a single hidden layer and a Temporal Displacement training algorithm
taken from G. Tesauro's 1995 TD-Gammon article.
"""
def __init__(self, num_input, num_hidden, num_output, hnorm, dhnorm, onorm, donorm):
self.w21 = 2*np.random.rand(num_hidden, num_input) - 1
self.w32 = 2*np.random.rand(num_output, num_hidden) - 1
self.b2 = 2*np.random.rand(num_hidden) - 1
self.b3 = 2*np.random.rand(num_output) - 1
self.hnorm = hnorm
self.dhnorm = dhnorm
self.onorm = onorm
self.donorm = donorm
def value(self, input):
"""Evaluates the NN output"""
assert(input.shape == self.w21[1,:].shape)
h = self.w21.dot(input) + self.b2
hn = self.hnorm(h)
o = self.w32.dot(hn) + self.b3
return(self.onorm(o))
def gradient(self, input):
"""
Calculates the gradient of the NN at the given input. Outputs a list of dictionaries
where each dict corresponds to the gradient of an output node, and each element in
a given dict gives the gradient for a subset of the weights.
"""
assert(input.shape == self.w21[1,:].shape)
J = []
h = self.w21.dot(input) + self.b2
hn = self.hnorm(h)
o = self.w32.dot(hn) + self.b3
for i in range(len(self.b3)):
db3 = np.zeros(self.b3.shape)
db3[i] = self.donorm(o[i])
dw32 = np.zeros(self.w32.shape)
dw32[i, :] = self.donorm(o[i])*hn
db2 = np.multiply(self.dhnorm(h), self.w32[i,:])*self.donorm(o[i])
dw21 = np.transpose(np.outer(input, db2))
J.append(dict(db3 = db3, dw32 = dw32, db2 = db2, dw21 = dw21))
return(J)
def train(self, input_states, end_result, a = 0.1, l = 0.7):
"""
Trains the network using a single series of input states representing a game from beginning
to end, and a final (supervised / desired) output for the end state
"""
outputs = [self(input_state) for input_state in input_states]
outputs.append(end_result)
for t in range(len(input_states)):
delta = dict(
db3 = np.zeros(self.b3.shape),
dw32 = np.zeros(self.w32.shape),
db2 = np.zeros(self.b2.shape),
dw21 = np.zeros(self.w21.shape))
grad = self.gradient(input_states[t])
for i in range(len(self.b3)):
for key in delta.keys():
td_sum = sum([l**(t-k)*grad[i][key] for k in range(t + 1)])
delta[key] += a*(outputs[t + 1][i] - outputs[t][i])*td_sum
self.w21 += delta["dw21"]
self.w32 += delta["dw32"]
self.b2 += delta["db2"]
self.b3 += delta["db3"]
The way I use this is I play through a whole game (or rather, the neural net plays against itself), and then I send the states of that game, from start to finish, into train(), along with the final result. It then takes this game log, and applies the above formula to alter weights using the first game state, then the first and second game states, and so on until the final time, when it uses the entire list of game states. Then I repeat that many times and hope that the network learns.
To be clear, I am not after feedback on my code writing. This was never meant to be more than a quick and dirty implementation to see that I have all the nuts and bolts in the right spots.
However, I have no idea whether it is correct, as I have thus far been unable to make it capable of playing tic-tac-toe at any reasonable level. There could be many reasons for that. Maybe I'm not giving it enough hidden nodes (I have used 10 to 12). Maybe it needs more games to train (I have used 200 000). Maybe it would do better with different normalisation functions (I've tried sigmoid and ReLU, leaky and non-leaky, in different variations). Maybe the learning parameters are not tuned right. Maybe tic-tac-toe and its deterministic gameplay means it "locks in" on certain paths in the game tree. Or maybe the training implementation is just wrong. Which is why I'm here.
Have I misunderstood Tesauro's algorithm?
I can't say that I entirely understand your implementation, but this line jumps out to me:
td_sum = sum([l**(t-k)*grad[i][key] for k in range(t + 1)])
Comparing with the formula you reference:
I see at least two differences:
Your implementation sums over t+1 elements compared to t elements in the formula
The gradient should be indexed with the same k as used in l**(t-k), but in your implementation it is indexed with i and key, without any reference to k
Perhaps if you fix these discrepancies your solution will behave more as expected.

Good ways to log/store results/metrics of reinforcement learning experiments in python?

I'm currently experimenting with different RL algorithms in environments like the ones in the OpenAI gym. Currently I'm just using environments and code I implemented myself because it helps me to understand how things work.
Now I'm looking for a good way to log and store all data created during a course of many episodes.
A few examples:
states visited
loss of my neural network
number of steps/episode
reward per episide
I thought about using the python logging module although it's probably intended for a different use. Also I thought about using the observer pattern to push events (agent takes action, newstate, end of episode etc.) to different loggers I'm attaching as observers.
Are there better ways to realize this functionality?
Or maybe there is some good example code I can learn from?
Is using the logging module a good idea? I thought it could be beneficial because I could control what's logged or turning logging on or off. But if I'm using the observer pattern I don't really need this.
Sincerely
David
Most people implement this from scratch for their experiments according to their needs. You may want to reference the way BURLAP, a popular Java RL library, structures its plotting (it doesn't do logging, but the same information is required in either case). The example experimenter setup is here.
Typically, I whip up a class that allows me to quickly take means/confidences of some sequence of observations, whether that's episode reward or evaluation steps, etc.
from typing import List
import numpy as np
import scipy as scipy
import scipy.stats
class ExperimentLog():
def __init__(self, series: List[float], signfigance_level: float):
self.means = []
self.variances = []
self.confidences = []
self.n = 1
self.current_observation_num = 0
self.series = series
self.signfigance_level = signfigance_level
def observe(self, value: float):
mean = None
variance = None
if self.current_observation_num > len(self.means) - 1:
self.means.append(0.0)
self.variances.append(0.0)
mean = 0.0
variance = 0.0
else:
mean = self.means[self.current_observation_num]
variance = self.variances[self.current_observation_num]
delta = value - mean
mean += delta / self.n
variance += delta * (value - mean)
self.means[self.current_observation_num] = mean
self.variances[self.current_observation_num] = variance
self.current_observation_num += 1
def finalize_confidences(self):
assert self.n > 1
self.variances = [variance / (self.n - 1) for variance in
self.variances]
for (mean, variance) in zip(self.means, self.variances):
crit = scipy.stats.t.ppf(1.0 - self.signfigance_level, self.n - 1)
width = crit * np.math.sqrt(variance) / np.math.sqrt(self.n)
self.confidences.append(width)
def observe_trial_end(self):
self.n += 1
self.current_observation_num = 0
I populate it directly in the learning or evaluation loop. Then it's simple to save this to a file:
def save(name, log: ExperimentLog, out_dir: str, unique_num: int = 0):
out_prefix = out_dir
if not os.path.exists(out_prefix):
os.makedirs(out_prefix)
filename = str(experiment_num) + "_" + str(num_trials) + "_" + name + str(unique_num) + ".csv"
full_out_path = os.path.join(out_prefix, filename)
if log.n > 1:
log.finalize_confidences()
data = np.c_[(log.series, log.means, log.variances, log.confidences)]
else:
data = np.c_[(log.series, log.means)]
np.savetxt(full_out_path, data,
fmt=["%d", "%f", "%f", "%f"],
delimiter=",")

Back-propagation in Tensorflow

I am constructing a time delayed recurrent model and I need to know about how and when TensorFlow computes its backwards step.
Consider the following model and pseudo-code:
unit_1 = LSTM(unit_size)
unit_2 = LSTM(unit_size)
unit_3 = LSTM(unit_size)
unit_4 = LSTM(unit_size)
ip_W = Variable([4 * unit_size, output_size])
ip_b = Variable([output_size])
prev_1 = tf.zeros([unit_size])
prev_2 = tf.zeros([unit_size])
prev_3 = tf.zeros([unit_size])
prev_4 = tf.zeros([unit_size])
for t, input in enumerate(input_data):
if t%1==0:
prev_1 = unit_1([input])
if t%2==0:
prev_2 = unit_2([input])
if t%3==0:
prev_3 = unit_3([input])
if t%4==0:
prev_4 = unit_4([input])
concat = tf.concat(0,[prev_1, prev_2, prev_3, prev_4])
output[t] = tf.matmul(concat, ip_W) + ip_B
Here is a gist to a usable version of this code, based on tensorflow/python/ops/rnn.py
My Question:
For time steps where cells are not called (i.e. at T=1, unit_0 is called, while all the rest are not) are their weights updated? I'm torn as to whether it would be a good idea to have them update at every state or not. The cells themselves haven't been exposed to any new data in the steps, so I afraid that back propagating may result in over-correction. Would appreciate other's insights into this.
Let me know if any clarification is necessary.

Create a model that switches between two different states using Temporal Logic?

Im trying to design a model that can manage different requests for different water sources.
Platform : MAC OSX, using latest Python with TuLip module installed.
For example,
Definitions :
Two water sources : w1 and w2
3 different requests : r1,r2,and r3
-
Specifications :
Water 1 (w1) is preferred, but w2 will be used if w1 unavailable.
Water 2 is only used if w1 is depleted.
r1 has the maximum priority.
If all entities request simultaneously, r1's supply must not fall below 50%.
-
The water sources are not discrete but rather continuous, this will increase the difficulty of creating the model. I can do a crude discretization for the water levels but I prefer finding a model for the continuous state first.
So how do I start doing that ?
Some of my thoughts :
Create a matrix W where w1,w2 ∈ W
Create a matrix R where r1,r2,r3 ∈ R
or leave all variables singular without putting them in a matrix
I'm not an expert in coding so that's why I need help. Not sure what is the best way to start tackling this problem.
I am only interested in the model, or a code sample of how can this be put together.
edit
Now imagine I do a crude discretization of the water sources to have w1=[0...4] and w2=[0...4] for 0, 25, 50, 75,100 percent respectively.
==> means implies
Usage of water sources :
if w1[0]==>w2[4] -- meaning if water source 1 has 0%, then use 100% of water source 2 etc
if w1[1]==>w2[3]
if w1[2]==>w2[2]
if w1[3]==>w2[1]
if w1[4]==>w2[0]
r1=r2=r3=[0,1] -- 0 means request OFF and 1 means request ON
Now what model can be designed that will give each request 100% water depending on the values of w1 and w2 (w1 and w2 values are uncontrollable so cannot define specific value, but 0...4 is used for simplicity )
This is called the flow problem: http://en.wikipedia.org/wiki/Maximum_flow_problem
Wiki has some code for the solution: http://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm
I'm not sure temporal logic is of much help here. For example load balancing is a major research topic, and I believe most of it doesn't use this formalism.
I have coded something, which only represents a simple priority list, which is kind of trivial. I would use classes and functions to represent states, not matrices. The dependencies in terms of priority are simple enough. Otherwise you can add those to the class watersource aswell. (class WaterSourcePriorityQueue or something like that). To get a simulation it is good to use threads, which I haven't here. You can use stepwise iteration (rounds), which is more in line with a procedural program.
import time
from random import random
from math import floor
import operator
class Watersource:
def __init__(self,initlevel,prio,name):
self.level = initlevel
self.priority = prio
self.name = name
def requestWater(self,amount):
if amount < self.level:
self.level -= amount
return True
else:
return False
#watersources
w1 = Watersource(40,1,"A")
w2 = Watersource(30,2,"B")
w3 = Watersource(20,3,"C")
probA = 0.8 # probability A will be requested
probB = 0.7
probC = 0.9
probs = {w1:probA,w2:probB,w3:probC}
amounts = {w1:10,w2:10,w3:20} # amounts requested
ws = [w1,w2,w3]
numrounds = 100
for round in range(1,numrounds):
print 'round ',round
done = False
i = 0
priorRequest = False
prioramount = 0
while not done or priorRequest:
if i==len(ws):
done=True
break
w = ws[i]
probtresh = probs[w]
prob = random()
if prob > probtresh: # request water
if prioramount != 0:
amount = prioramount
else:
amount = floor(random()*amounts[w])
prioramount = amount
print 'requesting ',amount
success = w.requestWater(amount)
if not success:
print 'not enough'
priorRequest=True
else:
print 'got water'
done = True
priorRequest=False
i+=1
time.sleep(1)

Categories