Custom Network and Policy in Stable-Baselines3

Custom Network and Policy in Stable-Baselines3 - python

I am attempting to create a small working example of how to use MultiDiscrete actions spaces together with a Box observation space. One of the problems that I have run into is that the dimension returned by utilizing a normal policy does not fit with the Box dimensions. The base policy returns something of size 25, whereas I need something that is (5,5).
I have tried to alleviate this problem by generating a custom "policy" (actually a network) where I, as the last step, reshape the output to (5,5) rather than 25. This has resulted in an array of problems. I have attempted to read the documentation for how to create custom policies; however, I cannot for the life of me find the issue.
I have attempted to use policy_kwargs; however, I don't know how to write that the NN should be reshaped.
I have attempted to use a BaseFeaturesExtractor, with no luck as well.
Various combinations of 1 and 2.
I have included some of the error messages that I get for the various different attempts that I have made. Does anyone know what I am missing? Is it something completely fundamental that I have misunderstood?
import numpy as np
import gym
import torch.nn as nn
import torch as th
from stable_baselines3 import PPO
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor # don't know if this is necessary
# -------- Attempt using BaseFeaturesExtractor
# class CustomPolicy(BaseFeaturesExtractor): # Don't know if BaseFeaturesExtractor is correct
# def __init__(self, observation_space, action_space, features_dim: int = 25): # Features should perhaps be (5,5)
# super().__init__(observation_space, features_dim)
# --------
# Define a custom neural network architecture
class CustomPolicy():
def __init__(self, observation_space, action_space):
super().__init__()
# Define the layers of the neural network
self.fc1 = nn.Linear(observation_space.shape[0], 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, action_space.shape[0])
# Reshape the output to match the Box observation space shape
def forward(self, x):
x = nn.functional.relu(self.fc1(x))
x = nn.functional.relu(self.fc2(x))
x = self.fc3(x)
x = th.reshape(x, (5, 5))
return x
# Define the grid world environment
class GridWorldEnv(gym.Env):
def __init__(self):
self.observation_space = gym.spaces.Box(low=0, high=1, shape=(5, 5), dtype=np.float32)
self.action_space = gym.spaces.MultiDiscrete([5, 3]) # 5 movement directions, 3 movement distances
self.state = np.zeros((5, 5))
self.state[0, 0] = 1 # Start location
self.goal = (4, 4) # Goal location
self.steps = 0
self.state.flatten()
def reset(self):
self.state = np.zeros((5, 5))
self.state[0, 0] = 1 # Start location
self.goal = (4, 4) # Goal location
self.steps = 0
return self.state.flatten()
def step(self, action):
direction, distance = action
reward = -1
done = False
# Calculate the movement offset based on the selected direction and distance
if direction == 0:
offset = (distance, 0)
elif direction == 1:
offset = (-distance, 0)
elif direction == 2:
offset = (0, distance)
elif direction == 3:
offset = (0, -distance)
else:
offset = (0, 0)
# Calculate the new position based on the current position and movement offset
current_pos = np.argwhere(self.state == 1)[0]
new_pos = tuple(np.clip(current_pos + np.array(offset), 0, 4))
# Update the state with the new position
self.state[current_pos] = 0
self.state[new_pos] = 1
# Check if the agent has reached the goal
if np.argmax(self.state) == np.ravel_multi_index(self.goal, self.state.shape):
reward = 10
done = True
# Increment step count and check if episode should end
self.steps += 1
if self.steps >= 50:
done = True
return self.state, reward, done, {}
# Press the green button in the gutter to run the script.
if __name__ == '__main__':
# Create an environment with the CustomEnv environment
env = GridWorldEnv()
# Create policy
policy = CustomPolicy(env.observation_space, env.action_space)
# Create a PPO agent with the CustomPolicy
model = PPO(policy=policy, env=env, verbose=1)
# --------- TypeError: 'CustomPolicy' object is not callable
# --------- Attempt at using policy_kwargs
# policy_kwargs = dict(activation_fn=th.nn.ReLU,
# net_arch=dict(pi=[32, 32], vf=[32, 32]))
# model = PPO("MlpPolicy", env=env, verbose=1, policy_kwargs=policy_kwargs)
# --------- ValueError: could not broadcast input array from shape (25,) into shape (5,5)
# --------- Attempt at using policy_kwargs with custom policy
# policy_kwargs = dict(
# features_extractor_class=CustomPolicy,
# features_extractor_kwargs=dict(features_dim=25), # should perhaps be (5,5)
# )
# model = PPO(policy=policy, env=env, verbose=1, policy_kwargs=policy_kwargs)
# --------- TypeError: CustomPolicy.forward() got an unexpected keyword argument 'use_sde'
# Train the agent for 1000 steps
model.learn(total_timesteps=1000)
Thank you in advance.

Related

Deep Galerkin Method for optimization

I am trying to use Deep Galerkin Method (DGM) to solve high dimensional PDEs and I face a problem. For illustrative purposes, I am posting a simple optimization problem below. The feed-forward network successfully recovers the optimal funciton, but DGM network fails to do so. Any help is highly appreciated.
import logging, os
os.system('clear')
logging.disable(logging.WARNING)
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
# CLASS DEFINITIONS FOR NEURAL NETWORKS USED IN DEEP GALERKIN METHOD
#%% import needed packages
import tensorflow as tf
#%% LSTM-like layer used in DGM (see Figure 5.3 and set of equations on p. 45) - modification of Keras layer class
class LSTMLayer(tf.keras.layers.Layer):
# constructor/initializer function (automatically called when new instance of class is created)
def __init__(self, output_dim, input_dim, trans1 = "tanh", trans2 = "tanh"):
'''
Args:
input_dim (int): dimensionality of input data
output_dim (int): number of outputs for LSTM layers
trans1, trans2 (str): activation functions used inside the layer;
one of: "tanh" (default), "relu" or "sigmoid"
Returns: customized Keras layer object used as intermediate layers in DGM
'''
# create an instance of a Layer object (call initialize function of superclass of LSTMLayer)
super(LSTMLayer, self).__init__()
# add properties for layer including activation functions used inside the layer
self.output_dim = output_dim
self.input_dim = input_dim
if trans1 == "tanh":
self.trans1 = tf.nn.tanh
elif trans1 == "relu":
self.trans1 = tf.nn.relu
elif trans1 == "sigmoid":
self.trans1 = tf.nn.sigmoid
if trans2 == "tanh":
self.trans2 = tf.nn.tanh
elif trans2 == "relu":
self.trans2 = tf.nn.relu
elif trans2 == "sigmoid":
self.trans2 = tf.nn.relu
### define LSTM layer parameters (use Xavier initialization)
# u vectors (weighting vectors for inputs original inputs x)
self.Uz = self.add_variable("Uz", shape=[self.input_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
self.Ug = self.add_variable("Ug", shape=[self.input_dim ,self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
self.Ur = self.add_variable("Ur", shape=[self.input_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
self.Uh = self.add_variable("Uh", shape=[self.input_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
# w vectors (weighting vectors for output of previous layer)
self.Wz = self.add_variable("Wz", shape=[self.output_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
self.Wg = self.add_variable("Wg", shape=[self.output_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
self.Wr = self.add_variable("Wr", shape=[self.output_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
self.Wh = self.add_variable("Wh", shape=[self.output_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
# bias vectors
self.bz = self.add_variable("bz", shape=[1, self.output_dim])
self.bg = self.add_variable("bg", shape=[1, self.output_dim])
self.br = self.add_variable("br", shape=[1, self.output_dim])
self.bh = self.add_variable("bh", shape=[1, self.output_dim])
# main function to be called
def call(self, S, X):
'''Compute output of a LSTMLayer for a given inputs S,X .
Args:
S: output of previous layer
X: data input
Returns: customized Keras layer object used as intermediate layers in DGM
'''
# compute components of LSTM layer output (note H uses a separate activation function)
Z = self.trans1(tf.add(tf.add(tf.matmul(X,self.Uz), tf.matmul(S,self.Wz)), self.bz))
G = self.trans1(tf.add(tf.add(tf.matmul(X,self.Ug), tf.matmul(S, self.Wg)), self.bg))
R = self.trans1(tf.add(tf.add(tf.matmul(X,self.Ur), tf.matmul(S, self.Wr)), self.br))
H = self.trans2(tf.add(tf.add(tf.matmul(X,self.Uh), tf.matmul(tf.multiply(S, R), self.Wh)), self.bh))
# compute LSTM layer output
S_new = tf.add(tf.multiply(tf.subtract(tf.ones_like(G), G), H), tf.multiply(Z,S))
return S_new
#%% Fully connected (dense) layer - modification of Keras layer class
class DenseLayer(tf.keras.layers.Layer):
# constructor/initializer function (automatically called when new instance of class is created)
def __init__(self, output_dim, input_dim, transformation=None):
'''
Args:
input_dim: dimensionality of input data
output_dim: number of outputs for dense layer
transformation: activation function used inside the layer; using
None is equivalent to the identity map
Returns: customized Keras (fully connected) layer object
'''
# create an instance of a Layer object (call initialize function of superclass of DenseLayer)
super(DenseLayer,self).__init__()
self.output_dim = output_dim
self.input_dim = input_dim
### define dense layer parameters (use Xavier initialization)
# w vectors (weighting vectors for output of previous layer)
self.W = self.add_variable("W", shape=[self.input_dim, self.output_dim],
initializer = tf.contrib.layers.xavier_initializer())
# bias vectors
self.b = self.add_variable("b", shape=[1, self.output_dim])
if transformation:
if transformation == "tanh":
self.transformation = tf.tanh
elif transformation == "relu":
self.transformation = tf.nn.relu
else:
self.transformation = transformation
# main function to be called
def call(self,X):
'''Compute output of a dense layer for a given input X
Args:
X: input to layer
'''
# compute dense layer output
S = tf.add(tf.matmul(X, self.W), self.b)
if self.transformation:
S = self.transformation(S)
return S
#%% Neural network architecture used in DGM - modification of Keras Model class
class DGMNet(tf.keras.Model):
# constructor/initializer function (automatically called when new instance of class is created)
def __init__(self, layer_width, n_layers, input_dim, final_trans=None):
'''
Args:
layer_width:
n_layers: number of intermediate LSTM layers
input_dim: spaital dimension of input data (EXCLUDES time dimension)
final_trans: transformation used in final layer
Returns: customized Keras model object representing DGM neural network
'''
# create an instance of a Model object (call initialize function of superclass of DGMNet)
super(DGMNet,self).__init__()
# define initial layer as fully connected
# NOTE: to account for time inputs we use input_dim+1 as the input dimensionality
self.initial_layer = DenseLayer(layer_width, input_dim, transformation = "tanh")
# define intermediate LSTM layers
self.n_layers = n_layers
self.LSTMLayerList = []
for _ in range(self.n_layers):
self.LSTMLayerList.append(LSTMLayer(layer_width, input_dim))
# define final layer as fully connected with a single output (function value)
self.final_layer = DenseLayer(1, layer_width, transformation = final_trans)
# main function to be called
def call(self,x):
'''
Args:
t: sampled time inputs
x: sampled space inputs
Run the DGM model and obtain fitted function value at the inputs (t,x)
'''
# define input vector as time-space pairs
X = tf.concat([x],1)
# call initial layer
S = self.initial_layer.call(X)
# call intermediate LSTM layers
for i in range(self.n_layers):
S = self.LSTMLayerList[i].call(S,X)
# call final LSTM layers
result = self.final_layer.call(S)
return result
#%% main class
class check():
def __init__(self,v,x,layers,learning_rate,adam_iter,params):
self.params=params
self.v=v
self.x=x
self.learning_rate = learning_rate
self.adam_iter = adam_iter
self.lb = np.array([self.x[0][0]])
self.ub = np.array([self.x[-1][0]])
self.sess = tf.Session(config = tf.ConfigProto(allow_soft_placement = True, log_device_placement = True))
self.x_tf = tf.placeholder(tf.float32, shape=[None,self.x.shape[1]])
self.v_tf = tf.placeholder(tf.float32, shape=[None,self.v.shape[1]])
self.x_u_tf = tf.placeholder(tf.float32, shape=[None,self.x.shape[1]])
self.v_u_tf = tf.placeholder(tf.float32, shape=[None,self.v.shape[1]])
self.weights_v,self.biases_v = self.initialize_nn(layers)
self.weights_i,self.biases_i = self.initialize_nn(layers)
with tf.variable_scope("control",reuse=True):
self.i_pred = self.net_i(self.x_tf)
with tf.variable_scope("value",reuse=True):
self.v_pred = self.net_v(self.x_tf)
self.error_i = self.policy_error(self.x_tf)
self.loss_v = tf.math.reduce_max(tf.abs(self.v_pred-self.v_tf))
self.loss = tf.math.reduce_max(tf.abs(self.v_pred-self.v_tf)) + tf.reduce_mean(tf.square(self.error_i))
self.optimizer_Adam = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
self.train_op_Adam = self.optimizer_Adam.minimize(self.loss)
self.optimizer_Adam_v = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
self.train_op_Adam_v = self.optimizer_Adam.minimize(self.loss_v)
init = tf.global_variables_initializer()
self.sess.run(init)
def policy_error(self,x):
i=self.net_i(x)
v_ = self.net_v(x+i)
l = v_ - i*x**2
error_i = tf.gradients(l,i)[0]
return error_i
def initialize_nn(self,layers):
weights = []
biases = []
num_layers = len(layers)
for l in range(num_layers-1):
W = self.xavier_init(size = [layers[l],layers[l+1]])
b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype = tf.float32)
weights.append(W)
biases.append(b)
return weights,biases
def xavier_init(self,size):
in_dim = size[0]
out_dim = size[1]
xavier_stddev = np.sqrt(2/(in_dim + out_dim))
try:
val = tf.Variable(tf.random.truncated_normal([in_dim,out_dim], stddev = xavier_stddev), dtype = tf.float32)
except:
val = tf.Variable(tf.truncated_normal([in_dim,out_dim], stddev = xavier_stddev), dtype = tf.float32)
return val
def neural_net(self,X,weights,biases):
num_layers = len(weights) +1
H = 2.0*(X - self.lb)/(self.ub - self.lb) -1
#H=X
for l in range(num_layers-2):
W = weights[l]
b = biases[l]
H = tf.tanh(tf.add(tf.matmul(H,W),b))
W = weights[-1]
b = biases[-1]
Y = tf.add(tf.matmul(H,W),b)
return Y
def net_v(self,eta):
if self.params['DGM']==True:
model_v = DGMNet(self.params['neurons_per_layer'],self.params['num_layers'],1)
v_u = model_v(eta)
else:
X = tf.concat([eta],1)
v_u = self.neural_net(X,self.weights_v,self.biases_v)
return v_u
def net_i(self,eta):
if self.params['DGM']==True:
model_i = DGMNet(self.params['neurons_per_layer'],self.params['num_layers'],1)
i_u = model_i(eta)
else:
X = tf.concat([eta],1)
i_u = self.neural_net(X,self.weights_i,self.biases_i)
return i_u
def callback(self,loss):
print('Loss: ',loss)
def train(self):
#K.clear_session()
start_time = time.time()
if True: #set this to true if you want adam to run
tf_dict = {self.v_tf:self.v, self.x_tf:self.x}
for it in range(self.adam_iter):
self.sess.run(self.train_op_Adam_v, tf_dict)
# Print
if it % 1000 == 0:
elapsed = time.time() - start_time
loss_value = self.sess.run(self.loss_v, tf_dict)
print('It: %d, Loss: %.3e, Time: %.2f' %
(it, loss_value, elapsed))
start_time = time.time()
start_time = time.time()
if True: #set this to true if you want adam to run
tf_dict = {self.v_tf:self.v, self.x_tf:self.x}
for it in range(self.adam_iter):
self.sess.run(self.train_op_Adam, tf_dict)
# Print
if it % 1000 == 0:
elapsed = time.time() - start_time
loss_value = self.sess.run(self.loss, tf_dict)
print('It: %d, Loss: %.3e, Time: %.2f' %
(it, loss_value, elapsed))
start_time = time.time()
start_time = time.time()
def predict(self,X_star):
i_star = self.sess.run(self.i_pred,{self.x_tf: X_star[:,0:1]})
v_star = self.sess.run(self.v_pred,{self.x_tf: X_star[:,0:1]})
error = self.sess.run(self.error_i,{self.x_tf: X_star[:,0:1]})
tf.reset_default_graph()
return i_star,v_star,error
#%%
if __name__=="__main__":
params={'DGM':True,'neurons_per_layer':50,'num_layers':4}
x=np.linspace(-1,1,100).reshape(-1,1).astype(np.float32)
v=(10 - x**2).reshape(-1,1).astype(np.float32)
#architecture for feed-forward network
layers = [1, 10,1]
learning_rate = 0.001
adam_iter = 5000
run = check(v,x,layers,learning_rate,adam_iter,params)
run.train()
i_star,v_star,error=run.predict(x)
The problem is to find the optimal function i that maximizes the function v=10-(x+i^2)-ix^2, where x is the state variable. That is, the optimal function i will depend on x. If I set 'DGM' as False in the parameter dictionary and run the code, I get the right solution (in this case the functions are coded as feed-forward neural network), where the correct analytical solution is i_star = 0.5*(-2x-x^2). If I set 'DGM' as False, the solution is incorrect. I tried with different number of layers and number of neurons per each layer, but DGM always gives incorrect solution.
Am I doing something wrong? Many thanks.

set variable network layers based on parameters in pytorch

I want to make the following network definition to a parametric one. The number of continuous and discrete columns varies for different data. I first pass the whole input data, which in this case is 110 dimensional , from a linear with a relu activation. The output of each categorical field of my data varies based on a previous one-hot encoding data transformation. I need to define a nn.Linear(110, number of encodings) for each of them.
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__(110)
self.lin1 = nn.Linear(110,110)
self.lin_numerical = nn.Linear(110, 6)
self.lin_cat_job = nn.Linear(110, 9)
self.lin_cat_sex = nn.Linear(110, 2)
self.lin_cat_incomeclass = nn.Linear(110, 7)
def forward(self, x):
x = torch.relu(self.lin1(x))
x_numerical = f.leaky_relu(self.lin_numerical(x))
x_cat1 = f.gumbel_softmax(self.lin_cat_job(x), tau=0.2)
x_cat2 = f.gumbel_softmax(self.lin_cat_sex(x), tau=0.2)
x_cat3 = f.gumbel_softmax(self.lin_cat_incomeclass(x), tau=0.2)
x_final = torch.cat((x_numerical, x_cat1, x_cat2, x_cat3),1)
return x_final
I have managed to change the init part, using discrete_columns input which is an ordereddict that has the name and number of one-hot-encoding of each categorical field of my data as key and values, and continuous_columns which is only a list with the names of the continuous columns. But I have no idea how to edit the forward part:
class Generator(nn.Module):
def __init__(self, input_dim, continuous_columns, discrete_columns):
super(Generator, self).__init__()
self._input_dim = input_dim
self._discrete_columns = discrete_columns
self._num_continuous_columns = len(continuous_columns)
self.lin1 = nn.Linear(self._input_dim, self._input_dim)
self.lin_numerical = nn.Linear(self._input_dim, self._num_continuous_columns)
for key, value in self._discrete_columns.items():
setattr(self, "lin_cat_{}".format(key), nn.Linear(self._input_dim, value))
def forward(self, x):
x = torch.relu(self.lin1(x))
x_numerical = f.leaky_relu(self.lin_numerical(x))
####
This is the problematic part
#####
return x

You don't need to use setattr and honestly should not since you'd need getattr, it brings more trouble than it solves if there's any other ways to do the job.
Now this is what I'd do for this task
self.lin_cat = nn.ModuleDict()
for key, value in self._discrete_columns.items():
self.lin_cat[key] = nn.Linear(self._input_dim, value)
# setattr(self, "lin_cat_{}".format(key), nn.Linear(self._input_dim, value))
def forward(self, x):
x = torch.relu(self.lin1(x))
x_numerical = f.leaky_relu(self.lin_numerical(x))
x_cat = []
for key in self.lin_cat:
x_cat.append(f.gumbel_softmax(self.lin_cat[key](x), tau=0.2))
x_final = torch.cat((x_numerical, *x_cat), 1)
return x

ModuleNotFoundError: No module named 'grad'

I try to run this Neural Network script (for a regression model)
There are two classes defined above. One is Standardizer class and other is Neural Net class. The Standardizer class normalizes all the values and the NeuralNet class builds the neural network that learns the data through feed forward and back propagation.
This function takes the the number of inputs, hidden units, and outputs as the three parameters.
The set_hunit function is used to either update or initiate the weights.It takes the weight as the parameter.
The Pack function packs the multiple weights of each layer into one vector. The unpack function does vice versa.
Forward pass in neural network propagates as shown below:
𝑍𝑌=ℎ(𝑋𝑙⋅𝑉)=𝑍𝑙⋅𝑊
Activation function is used to make the network non linear. We may use tanh or RBG or etc.
In the backward pass the function takes the the z values, Target values and the error as input. Based on the delta value, the weights and the bias are updated accoringly. This method returns the weight vector packed together of that particualr layer. Below are the functions that are excecuted during backward pass.
𝑉𝑊←𝑉+𝛼ℎ1𝑁1𝐾𝑋𝑙⊤((𝑇−𝑌)𝑊⊤⊙(1−𝑍2))←𝑊+𝛼𝑜1𝑁1𝐾𝑍𝑙⊤(𝑇−𝑌)
The train function takes the feautures and the target as the input. The gradientf unpacks the weights,proceeds with the forward pass by calling forward function. Now error is calculated using results of forward pass. Now back propagation is proceeded by calling backward function with parameters as error, Z, T(Target), _lambda.
The optimtarget function tries to reduce the error by using the object function and updates the weights accordingly.
The use method is applied to the test data after training the model. Testing data is passed as parameter and it stadardizes the data. Then forward is applied on the data which returns the predictions
This shows module not found error, but I have installed grad module with pip installation
#Importing required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import grad
import matplotlib.pyplot as plt
# Reading data using pandas library
vehicle_data=pd.read_csv('processed_Data.csv')
# Overall idea about distribution of data
vehicle_data.hist(bins=40, figsize=(20,15))
plt.show()
# Count plot of Ellectric Range
sns.countplot(x='Electric Range',data=vehicle_data)
# Joint plot between Latitude on x axis and Longitude on y axis
sns.jointplot(x=vehicle_data.BaseMSRP.values,y=vehicle_data.LegislativeDistrict.values,height=10)
plt.xlabel("Base MSRP",fontsize=10)
plt.ylabel("Lengislative District",fontsize=10)
# function to drop the rows that has null or missing values
vehicle_data=vehicle_data.dropna()
# Data is already clean and has no missing values
vehicle_data.shape
#Dropping unwanted columns
vehicle_data=vehicle_data.drop(['VIN (1-10)','County', 'City', 'State', 'ZIP Code', 'DOL Vehicle ID'],axis=1)
vehicle_data.shape
# Seperating target variable
t=pd.DataFrame(vehicle_data.iloc[:,8])
vehicle_data=vehicle_data.drop(['Electric Range'],axis=1)
t
vehicle_data.head()
#NeuralNet class for regression
# standardization class
class Standardizer:
""" class version of standardization """
def __init__(self, X, explore=False):
self._mu = np.mean(X,8)
self._sigma = np.std(X,8)
if explore:
print ("mean: ", self._mu)
print ("sigma: ", self._sigma)
print ("min: ", np.min(X,8))
print ("max: ", np.max(X,8))
def set_sigma(self, s):
self._sigma[:] = s
def standardize(self,X):
return (X - self._mu) / self._sigma
def unstandardize(self,X):
return (X * self._sigma) + self._mu
def add_ones(w):
return np.hstack((np.ones((w.shape[8], 1)), w))
from grad import scg, steepest
from copy import copy
class NeuralNet:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[8], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = np.dot(self.add_ones(t), self._W[i])
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
#t = self.RBF(np.dot(np.hstack((np.ones((t.shape[0],1)),t)),self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return 0.5 * np.mean(np.square(T - Y)) + wpenalty
def train(self, X, T, **params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
#self._weights[:] = weights[:] # unpack
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y

Try to open command prompt and type pip install grad or if you using jupyter notebook, make a new code shell and type !pip install grad before you importing it
Hope that solves your problem

Define action values in keras-rl

I have a custom environment in keras-rl with the following configurations in the constructor
def __init__(self, data):
#Declare the episode as the first episode
self.episode=1
#Initialize data
self.data=data
#Declare low and high as vectors with -inf values
self.low = numpy.array([-numpy.inf])
self.high = numpy.array([+numpy.inf])
self.observation_space = spaces.Box(self.low, self.high, dtype=numpy.float32)
#Define the space of actions as 3 (I want them to be 0, 1 and 2)
self.action_space = spaces.Discrete(3)
self.currentObservation = 0
self.limit = len(data)
#Initiates the values to be returned by the environment
self.reward = None
As you can see, my agent will perform 3 actions, depending on the action, a different reward will be calculated in the function step() below:
def step(self, action):
assert self.action_space.contains(action)
#Initiates the reward
self.reward=0
#get the reward
self.possibleGain = self.data.iloc[self.currentObservation]['delta_next_day']
#If action is 1, calculate the reward
if(action == 1):
self.reward = self.possibleGain-self.operationCost
#If action is 2, calculate the reward as negative
elif(action==2):
self.reward = (-self.possibleGain)-self.operationCost
#If action is 0, no reward
elif(action==0):
self.reward = 0
#Finish episode
self.done=True
self.episode+=1
self.currentObservation+=1
if(self.currentObservation>=self.limit):
self.currentObservation=0
#Return the state, reward and if its done or not
return self.getObservation(), self.reward, self.done, {}
The problem is the fact that, if I print the actions at every episode, they are 0, 2, and 4. I want them to be 0, 1 and 2. How can I force the agent to recognize only these 3 actions with keras-rl?

I am not sure why self.action_space = spaces.Discrete(3) is giving you actions as 0,2,4 since I cannot reproduce your error with the code snippet you posted, so I would suggest the following for defining your action
self.action_space = gym.spaces.Box(low=np.array([1]),high= np.array([3]), dtype=np.int)
And this what I get when I sample from the action space.
actions= gym.spaces.Box(low=np.array([1]),high= np.array([3]), dtype=np.int)
for i in range(10):
print(actions.sample())
[1]
[3]
[2]
[2]
[3]
[3]
[1]
[1]
[2]
[3]

Always getting the same path with A* implementation

I'm trying to implementing A* from the pseudo code from wikipedia however I'm getting some weird results.
The implementation finds what at first looks like a good path, but with a further look it always produces the same path!
Can anyone spot anything wrong? The code is written in python 3.1 and uses pygame.
import pygame
import sys, traceback
import random
import math
TILE_WIDTH = 30
TILE_HEIGHT = 30
NUM_TILES_X = 30
NUM_TILES_Y = 30
NUM_TILES = NUM_TILES_X * NUM_TILES_Y
GRID_WIDTH = TILE_WIDTH * NUM_TILES_X
GRID_HEIGHT = TILE_HEIGHT * NUM_TILES_Y
# h(x,y)
def heuristic_dist(source,dest):
return int(( (source.x - dest.x)**2 + (source.y - dest.y)**2 ) **0.5)
def a_star(nodes,start,goal):
# Set up data structures
closedset = []
openset = [start]
came_from={}
g_score = {}
g_score[start.index] = 0
h_score = {}
h_score[start.index] = heuristic_dist(start,goal)
f_score = {}
f_score[start.index] = h_score[start.index]
while len(openset) > 0:
# Find node with least f_score in openset
x = min(openset,key=lambda el:f_score[el.index])
# We have reached our goal!
if x.index == goal.index:
path = reconstruct_path(came_from,goal.index)
# Mark the path with green color
for node in path:
nodes[node].color=(0,255,0)
print( "Yihaaa!" )
return True
# Filter out x from openset and add it to closedset
openset = list(filter(lambda y:y.index!=x.index,openset))
closedset.append(x)
# Go through all neighbours
for y in x.get_neighbours():
# If this neighbour has been closed, skip it
if y in closedset: continue
# Not sure that this is correct.
tentative_g_score = g_score[x.index] + heuristic_dist(x,y)
if y not in openset:
openset.append(y)
tentative_is_better = True
elif tentative_g_score < g_score[y.index]:
tentative_is_better = True
else:
tentative_is_better = False
if tentative_is_better:
if y.index in came_from:
if f_score[x.index] < f_score[came_from[y].index]:
came_from[y.index] = x
else:
came_from[y.index] = x
g_score[y.index] = tentative_g_score
h_score[y.index] = heuristic_dist(y, goal)
f_score[y.index] = g_score[y.index] + h_score[y.index]
print("Couldn't find a path!")
return False
# Traverse the path backwards
def reconstruct_path(came_from,current_node,depth=0):
if current_node in came_from:
p = reconstruct_path(came_from,came_from[current_node].index)
return p + [current_node]
else:
return [current_node]
def draw_string(surface,string,x,y):
s = font.render(string,True,(0,0,0))
surface.blit(s,(x,y))
# Tile or Node that has a cuple of attributes: color, cost and x,y
class Tile:
def __init__(self,x,y,cost,index):
self.x=x
self.y=y
self.cost=cost
self.index=index
self.color = (255,255,255)
def draw(self,surface):
surface.fill(self.color,pygame.Rect(self.x*TILE_WIDTH,self.y*TILE_HEIGHT,TILE_WIDTH,TILE_HEIGHT))
pygame.draw.rect(surface,(255, 180, 180),pygame.Rect(self.x*TILE_WIDTH,self.y*TILE_HEIGHT,TILE_WIDTH,TILE_HEIGHT),2)
draw_string(surface,str(self.cost),self.x*TILE_WIDTH+TILE_WIDTH//3,self.y*TILE_HEIGHT+TILE_HEIGHT//3)
def get_neighbours(self):
nbs = []
# Where are our neighbours?
offsets = [(0,-1),(-1,0),(1,0),(0,1)]
for offset in offsets:
x = self.x + offset[0]
y = self.y + offset[1]
try: # coord_to_tile throws exception if no such neighbour exists (out of bounds for example)
nbs.append(coord_to_tile(x,y))
except Exception as e:
pass
return nbs
def __eq__(self,other):
return self.x == other.x and self.y==other.y
# Small helper function to convert x,y coords to a tile instance
nodes_lookup={}
def coord_to_tile(x,y):
return nodes_lookup[(x,y)]
def main():
global nodes_lookup
screen = pygame.display.set_mode((GRID_WIDTH, GRID_HEIGHT))
tiles = []
for x in range(NUM_TILES_X):
for y in range(NUM_TILES_Y):
# Create a random distribution where max grows
cost = random.randint(1,min(x*y,98)+1)
# Let the bottom line cost 1 as well
if y == NUM_TILES_Y-1: cost = 1
t = Tile(x,y,cost,len(tiles))
nodes_lookup[(x,y)] = t
tiles.append(t)
# Do a*
a_star(tiles,tiles[0],tiles[len(tiles)-1])
while True:
event = pygame.event.wait()
if event.type == pygame.QUIT:
break
for tile in tiles:
tile.draw(screen)
pygame.display.flip()
pygame.init()
font = pygame.font.SysFont("Times New Roman",18)
try:
main()
except Exception as e:
tb = sys.exc_info()[2]
traceback.print_exception(e.__class__, e, tb)
pygame.quit()
I really have no clue, since I think I have pretty much implemented the pseudo code statement by statement.
Here's a screenshot as well:
http://andhen.mine.nu/uploads/astar.dib
Thanks!

You access came_from on time with y, and one time with y.index in
if tentative_is_better:
if y.index in came_from:
if f_score[x.index] < f_score[came_from[y].index]: // index by y
came_from[y.index] = x // index by y.index
else:
You probably meant
if f_score[x.index] < f_score[came_from[y.index].index]:
in the first line.
Besides that, the code looks ok.
Anyway, what do you mean by always produces the same path? The algorithm is supposed to return the optimal path which should always be the same... (or did you mean, it always produces the same path independently of start and goal?)`
EDIT:
You don't use your random cost anywhere in the algorithm. The 'costs' the algorithm is using are always the distance between two adjacent nodes: They are defined in heuristic_distance and used in the line
tentative_g_score = g_score[x.index] + heuristic_dist(x,y)
If you want to define random costs, you must first realize that this algorithm assigns costs to edges, not to vertices. You'll have to define some function real_costs(x,y) which calculates the costs for going from node x to node y and use this cost function instead of heuristic_dist in the above line.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Custom Network and Policy in Stable-Baselines3 - python

Related

Deep Galerkin Method for optimization

set variable network layers based on parameters in pytorch

ModuleNotFoundError: No module named 'grad'

Define action values in keras-rl

Always getting the same path with A* implementation

Categories

Resources