For big ML models with many parameters, it is helpful if one can interrupt and resume the hyperparameter optimization search.
Optuna allows doing that with the RDB backend, which stores the study in a SQlite database (https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/001_rdb.html#sphx-glr-tutorial-20-recipes-001-rdb-py).
However, when interrupting and resuming a study, the results are not the same as that of an uninterrupted study.
Expect: For a fixed seed, the results from an optimization run with n_trials = x are identical to a study with n_trials = x/5, that is resumed 5 times and a study, that is interrupted with KeyboardInterrupt 5 times and resumed 5 times until n_trials = x.
Actual: The results are equal up to the point of the first interruption. From then on, they differ.
The figures show the optimization history of all trials in a study. The left-most figure (A) shows the uninterrupted run, the center one shows a run interrupted by keyboard (B), the right-most figure shows the run interrupted by n_iter (C). In B and C, the red dotted line shows the point where the first study was first interrupted. Left of the line, the results are equal to the uninterrupted study, to the right they differ.
Is it possible to interrupt and resume a study, so that another study with the same seed that has not been interrupted generates exactly the same result?
(Obviously assuming that the objective function behaves in a non-deterministic way.)
Minimal working example to reproduce:
import optuna
import logging
import sys
import numpy as np
def objective(trial):
x = trial.suggest_float("x", -10, 10)
return (x - 4) ** 2
def set_study(db_name,
study_name,
seed,
direction="minimize"):
'''
Creates a new study in a sqlite database located in results/ .
The study can be resumed after keyboard interrupt by simple creating it
using the same command used for the initial creation.
'''
# Add stream handler of stdout to show the messages
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
sampler = optuna.samplers.TPESampler(seed = seed, n_startup_trials = 0)
storage_name = f"sqlite:///{db_name}.db"
storage = optuna.storages.RDBStorage(storage_name, heartbeat_interval=1)
study = optuna.create_study(storage=storage,
study_name=study_name,
sampler=sampler,
direction=direction,
load_if_exists=True)
return study
study = set_study('optuna_test', 'optuna_test_study', 1)
try:
# Press CTRL+C to stop the optimization.
study.optimize(objective, n_trials=100)
except KeyboardInterrupt:
pass
df = study.trials_dataframe(attrs=("number", "value", "params", "state"))
print(df)
print("Best params: ", study.best_params)
print("Best value: ", study.best_value)
print("Best Trial: ", study.best_trial)
# print("Trials: ", study.trials)
fig = optuna.visualization.plot_optimization_history(study)
fig.show()
Found whats causing the problem: The random number generator in the sampler is initialized using the seed, but of course it returns a different number if the study is interrupted and resumed (it is then reinitialised)
This is especially bad using random search with fixed seed, as the search then basically starts from new.
If one really needs reproducible runs, one can simply extract the rng into a binary file after a run or keyboard interrupt, and resume by overwriting the newly generated rng of the sampler with the saved one.
Related
I am reproducing some simple 10-arm bandit experiments from Sutton and Barto's book Reinforcement Learning: An Introduction.
Some of these require significant computation time so I tried to get the advantage of my multicore CPU.
Here is the function which i need to run 2000 times. It has 1000 sequential steps which incrementally improve the reward:
import numpy as np
def foo(eps): # need an (unused) argument to use pool.map()
# initialising
# the true values of the actions
q = np.random.normal(0, 1, size=10)
# the estimated values
q_est = np.zeros(10)
# the counter of how many times each of the 10 actions was chosen
n = np.zeros(10)
rewards = []
for i in range(1000):
# choose an action based on its estimated value
a = np.argmax(q_est)
# get the normally distributed reward
rewards.append(np.random.normal(q[a], 1))
# increment the chosen action counter
n[a] += 1
# update the estimated value of the action
q_est[a] += (rewards[-1] - q_est[a]) / n[a]
return rewards
I execute this function 2000 times to get (2000, 1000) array:
reward = np.array([foo(0) for _ in range(2000)])
Then I plot the mean reward across 2000 experiments:
import matplotlib.pyplot as plt
plt.plot(np.arange(1000), reward.mean(axis=0))
sequential plot
which fully corresponds the expected result (looks the same as in the book).
But when I try to execute it in parallel, I get much greater standard deviation of the average reward:
import multiprocessing as mp
with mp.Pool(mp.cpu_count()) as pool:
reward_p = np.array(pool.map(foo, [0]*2000))
plt.plot(np.arange(1000), reward_p.mean(axis=0))
parallel plot
I suppose this is due to the parallelization of a loop inside of the foo. As i reduce the number of cores allocated to the task, the reward plot approaches the expected shape.
Is there a way to get the advantage of the multiprocessing here while getting the correct results?
UPD:
I tried running the same code on Windows 10 and sequential vs parallel and the results turned out to be the same! What may be the reason?
Ubuntu 20.04, Python 3.8.5, jupyter
Windows 10, Python 3.7.3, jupyter
As we found out it is different on windows and ubuntu. It is probably because of this:
spawn The parent process starts a fresh python interpreter process.
The child process will only inherit those resources necessary to run
the process objects run() method. In particular, unnecessary file
descriptors and handles from the parent process will not be inherited.
Starting a process using this method is rather slow compared to using
fork or forkserver.
Available on Unix and Windows. The default on Windows and macOS.
fork The parent process uses os.fork() to fork the Python interpreter.
The child process, when it begins, is effectively identical to the
parent process. All resources of the parent are inherited by the child
process. Note that safely forking a multithreaded process is
problematic.
Available on Unix only. The default on Unix.
Try adding this line to your code:
mp.set_start_method('spawn')
I am running a program in pycharm on a linux server which uses multiprocessing.Pool().map for increased performance.
The code looks something like this:
import multiprocessing
from functools import partial
for episode in episodes:
with multiprocessing.Pool() as mpool:
func_part = partial(worker_function)
mpool.map(func_part, range(step))
The weird thing is that it runs perfectly fine on my Windows 10 Laptop but as soon as I try to run it on a linux server the program gets stuck at the exact last Process measurement count 241/242, so right before proceeding to the next iteration of the loop e.g. the next episode.
No error message given. I am running pycharm on both machines. The Step layer is where I placed the multiprocessing.Pool().map function.
Edit:
I've added mpool.close() and mpool.join() but it does seem to have no effect:
import multiprocessing
from functools import partial
for episode in episodes:
with multiprocessing.Pool() as mpool:
func_part = partial(worker_function)
mpool.map(func_part, range(step))
mpool.close()
mpool.join()
It still gets stuck at the last process.
Edit2:
This is the worker function:
def worker_func(steplength, episode, episodes, env, agent, state, log_data_qvalues, log_data, steps):
env.time_ = step
action = agent.act(state, env) # given the state, the agent acts (eps-greedy) either by choosing randomly or relying on its own prediction (weights are considered here to sum up the q-values of all objectives)
next_state, reward = env.steplength(action, state) # given the action, the environment gives back the next_state and the reward for the transaction for all objectives seperately
agent.remember(state, action, reward, next_state, env.future_reward) # agent puts the experience in his memory
q_values = agent.model.predict(np.reshape(state, [1, env.state_size])) # This part is not necessary for the framework, but lets the agent predict every time_ to
start = 0 # to store the development of the prediction and to investigate the development of the Q-values
machine_start = 0
for counter, machine in enumerate(env.list_of_machines):
liste = [episode, steplength, state[counter]]
q_values_objectives = []
for objective in range(1, env.number_of_objectives + 1):
liste.append(objective)
liste.append(q_values[0][start:machine.actions + start])
start = int(agent.action_size / env.number_of_objectives) + start
log_data_qvalues.append(liste)
machine_start += machine.actions
start = machine_start
state = next_state
steps.append(state)
env.current_step += 1
if len(agent.memory) > agent.batch_size: # If the agent has collected more than batch_size-experience, the networks of the agents are starting
agent.replay(env) # to be trained, with the replay function, batch-size- samples from the memory of the agents are selected
agent.update_target_model() # the Q-target is updated after one batch-training
if steplength == env.steplength-2: # for plotting the process during training
#agent.update_target_model()
print(f'Episode: {episode + 1}/{episodes} Score: {steplength} e: {agent.epsilon:.5}')
log_data.append([episode, agent.epsilon])
As you can see it uses several classes to pass attributes. I don't know how I would reproduce it. I am still experimenting on where the process gets stuck exactly. The worker function communicates with the env and the agent class and passes information that is required to train a neural network. The agent class controls the learning process while the env class simulates the environment the network has control over.
step is an integer variable:
step = 12
Are you calling
mpool.close()
mpool.join()
at the end?
EDIT
The problem is not w/ multiprocessing but with the measurement count part. According to the screenshot, the pool map successfully ends w/ step 11 (range(12) starts at 0). measurement count is nowhere to be seen in the provided snippets to try debugging that part.
I am attempting to develop a GUI program in python (using pyqt5) to interact with a data acquisition device (DAQ) that will be connected via LAN or USB to a windows PC. On the click of a button (in the GUI), the DAQ will perform a test.
Each "Test" will consist of collecting a reading (collecting a reading takes about 1.5 seconds) at user-defined intervals from the start of the test (e.g., 0.1 min, 0.2 min, 0.5 min, 1 min, 2 min, 5 min...1000 min etc.). A reading is collected by execution of a function, so code for a single test might look like this:
import time
t=[0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000] #times from start of test to collect readings at (min)
intervals=[t[i]-t[i-1] for i in range(1,len(t))] #time delta between readings (min)
def GetReading():
#some code to connect to the DAQ (using pyvisa) and collect reading
reading=['2020-01-02 17:33:33',1.23456] #the data returned from the DAQ
return reading
def RunTest(r):
results=[GetReading()] #get the initial (t=0) reading
ReadTime=1.5 #time in seconds to collect 1 reading (I may use an implementation of
#time.run_process() or similar to actually calculate this this instead)
for j in r:
time.sleep(j*60-ReadTime)
results.append(GetReading())
return results
RunTest(intervals)
The DAQ can only perform one reading at a time. I would like to be able to run multiple tests simultaneously, and have my program automatically wait and start a new test when it is feasible (i.e., delay starting a test on click if another test is already running).
The first, say 5 readings, are imperative that they happen on time, but the subsequent readings of a given test can be delayed by a bit without affecting the quality of the test. For example, if a test is running at the 0.2 min reading interval, and the user initiates a new test, the program would wait until the current test completed say, the 5 min reading, before starting the additional test sequence.
Subsequent readings beyond the 5 min reading could be delayed to collect the first 5 readings of a new test sequence, or collect a reading from another test.
I'm struggling with how to program this, conceptually. I think i need to use multiprocesses or similar to allow multiple tests to be run in parallel (though no actual parallel readings can occur). Or, perhaps I can use scheduler? I'm just not sure how to implement either of these; I've never used them before, and I'm having trouble understanding examples I find in the context of my problem.
Furthermore, I need to be able to access results (output from RunTest) between calls to GetReading() (e.g., to view data as the test progresses), and using the time.sleep wouldn't allow that.
UPDATE
The measurement the DAQ is collecting is deformation, via a LVDT.
The time zero in var t is not actually the button click supplied by the user. On button click, the DAQ will open the specified channel and the program will monitor for a change in deformation above a certain threshold. The user will then physically start the test (which involves adding a weight on some material, to measure stress-strain properties), and time zero will occur at i-1 where i is the first instance of change above the threshold is detected (i.e., t=0 corresponds to the zero-deformation reading the instant before the weight is added). I need the whole process from button click, to adding the weight, to collecting up to the 5 minute reading to be uninterrupted for a single test (Deformation occurs most rapidly, and potentially erratically, in the first 5 minutes or so).
Below code works, but doesn't ensure, that first measurements of a new test are prioritized.
If this is is essential, then the solution will be a little bit more difficult.
In order to be sure, that only one function / thread is reading data at a given time, you can use a mutex (threading.Lock)
from threading import Lock
read_lock = Lock()
def get_data():
with read_lock:
#some code to connect to the DAQ (using pyvisa) and collect reading
reading=['2020-01-02 17:33:33',1.23456] #the data returned from the DAQ
return reading
I'd propose to write a function, that fetches the result and appends it to a results list.
Any object being modified by one thread and read by another should be protected with a Lock, therefore there is a second lock to avoid simultaneous reading / writing of results.
results_lock = Lock()
def get_and_store_data(results):
result = get_data()
with results_lock:
results.append(result)
You can schedule a get_and_store_data action with threading.Timer
Below a the full code example:
from threading import Lock
from threading import Timer
t=[0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000] #times from start of test to collect readings at (min)
read_lock = Lock()
results_lock = Lock()
def get_data():
import time
import datetime
import random
with read_lock:
time.sleep(1.5)
#some code to connect to the DAQ (using pyvisa) and collect reading
reading = [
datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
random.randint(0,10000) / 1000,
]
return reading
def get_and_store_data(results):
result = get_data()
with results_lock:
results.append(result)
# schedule measures for one test
def schedule_measures(measure_times, results):
timers = []
for t in measure_times:
timer = Timer(t, get_and_store_data, args=[results])
timer.start()
timers.append(timer)
def main():
results = []
meas_times = [tim * 1 for tim in t]
schedule_measures(meas_times, results)
while True:
msg = "Please press enter to display results or q and enter to quit"
choice = input(msg).strip()
if choice == "q":
break
print("Results:")
with results_lock:
for result in results:
print(result)
main()
If you want to reduce the 'drift' between measurements, then you could do something like:
import time
def schedule_measures(measure_times, results):
timers = []
t_0 = time.time()
for t in measure_times:
now = time.time()
timer = Timer(t - (now - t_0), get_and_store_data, args=[results])
timer.start()
timers.append(timer)
Though the drift will probably be low enough anyways, but it's a neat trick if you have more CPU intensive actions in your schedule function or if you do not want to schedule all events at startup.
For prioritizing measurements it might perhaps be easier to create a sorted list of lists with the calculated times when the measurement should be performed and start the next timer only when the previous timer fired. there had to be some logic to decide which measurement should be the next one to be scheduled. I don't have time now, but will probably come back within the next 12 hours with a suggested algorithm
I am trying to sample a signal at 10Khz in Python. There is no problem when try to run this code(at 1KHz):
import sched, time
i = 0
def f(): # sampling function
s.enter(0.001, 1, f, ())
global i
i += 1
if i == 1000:
i = 0
print "one second"
s = sched.scheduler(time.time, time.sleep)
s.enter(0.001, 1, f, ())
s.run()
When I try to make the time less, it starts to exceed one second(in my computer, 1.66s at 10e-6).
It it possible to run a sampling function at a specific frequency in Python?
You didn't account for the code's overhead. Each iteration, this error adds up and skews the "clock".
I'd suggest to use a loop with time.sleep() instead (see comments to https://stackoverflow.com/a/14813874/648265) and count the time to sleep from the next reference moment so the inevitable error doesn't add up:
period=0.001
t=time.time()
while True:
t+=period
<...>
time.sleep(max(0,t-time.time())) #max is needed in Windows due to
#sleep's behaviour with negative argument
Note that the OS scheduling will not allow you to reach precisions beyond a certain level since other processes have to preempt yours from time to time. In this case, you'll need to use some OS-specific facilities for multimedia applications or work out a solution that doesn't need this level of accuracy (e.g. sample the signal with a specialized app and work with its saved output).
I am doing gradient descent (100 iterations to be precise). Each data point can be analyzed in parallel, there are 50 data points. Since I have 4 cores, I create a pool of 4 workers using multiprocessing.Pool. The core of the program looks like following:
# Read the sgf files (total 50)
(intermediateBoards, finalizedBoards) = read_sgf_files()
# Create a pool of processes to analyze game boards in parallel with as
# many processes as number of cores
pool = Pool(processes=cpu_count())
# Initialize the parameter object
param = Param()
# maxItr = 100 iterations of gradient descent
for itr in range(maxItr):
args = []
# Prepare argument vector for each file
for i in range(len(intermediateBoards)):
args.append((intermediateBoards[i], finalizedBoards[i], param))
# 4 processes analyze 50 data points in parallel in each iteration of
# gradient descent
result = pool.map_async(train_go_crf_mcmc, args)
Now, I haven't included definition for the function train_go_crf, but the very first line in the function is a print statement. So, when I execute this function the print statement should get executed 100*50 times. But that does not happen. What's more, I get different number of console outputs different number of times.
What's wrong?
Your problem is that you are using map_async instead of map. This means that once all of the work is farmed out to the pool, it will continue on with the loop, even if all of the work has not been finished. It is not clear to me what will happen to the work still running when the next loop starts, but if these are supposed to be iterations, I can't imagine it is a) good b) well defined.
If you use map, it will block the loop until all of the worker functions have finished before moving on to the next step. I guess you could do this with sleep, but that would just make things more complicated for no gain. map will wait for exactly the minimum amount of time it needs to to let everything finish.