I am trying to manage the results of machine learning with mlflow and hydra.
So I tried to run it using the multi-run feature of hydra.
I used the following code as a test.
import mlflow
import hydra
from hydra import utils
from pathlib import Path
import time
#hydra.main('config.yaml')
def main(cfg):
print(cfg)
mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
mlflow.set_experiment(cfg.experiment_name)
mlflow.log_param('param1',5)
# mlflow.log_param('param1',5)
# mlflow.log_param('param1',5)
with mlflow.start_run() :
mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
if __name__ == '__main__':
main()
This code will not work.
I got the following error
Exception: Run with UUID [RUNID] is already active. To start a new run, first end the current run with mlflow.end_run(). To start a nested run, call start_run with nested=True
So I modified the code as follows
import mlflow
import hydra
from hydra import utils
from pathlib import Path
import time
#hydra.main('config.yaml')
def main(cfg):
print(cfg)
mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
mlflow.set_experiment(cfg.experiment_name)
mlflow.log_param('param1',5)
# mlflow.log_param('param1',5)
# mlflow.log_param('param1',5)
with mlflow.start_run(nested=True) :
mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
if __name__ == '__main__':
main()
This code works, but the artifact is not saved.
The following corrections were made to save the artifacts.
import mlflow
import hydra
from hydra import utils
from pathlib import Path
import time
#hydra.main('config.yaml')
def main(cfg):
print(cfg)
mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
mlflow.set_experiment(cfg.experiment_name)
mlflow.log_param('param1',5)
# mlflow.log_param('param1',5)
# mlflow.log_param('param1',5)
mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
if __name__ == '__main__':
main()
As a result, the artifacts are now saved.
However, when I run the following command
python test.py model=A,B hidden=12,212,31 -m
Only the artifact of the last execution condition was saved.
How can I modify mlflow to manage the parameters of the experiment by taking advantage of the multirun feature of hydra?
MLFlow is not officially supported by Hydra. At some point there will be a plugin that will make this smoother.
Looking at the errors you are reporting (and without running your code):
One thing that you can try to to use the Joblib launcher plugin to get job isolation through processes (this requires Hydra 1.0.0rc1 or newer).
What you are observing is due to the interaction between MLFlow and Hydra. As far as MLflow can tell, all of your Hydra multiruns are the same MLflow run!
Since both frameworks use the term "run", I will need to be verbose in the following text. Please bear with me.
If you didn't explicitly start a MLflow run, MLflow will do it for you when you do mlflow.log_params or mlflow.log_artifacts. Within a Hydra multirun context, it appears that instead of creating a new MLflow run for each Hydra run, the previous MLflow run is inherited after the first Hydra run. This is why you would get this error where MLflow thinks you are trying to update parameter values in logging: mlflow.exceptions.MlflowException: Changing param values is not allowed.
You can fix this by wrapping your MLFlow logging code within a with mlflow.start_run() context manager:
import mlflow
import hydra
from hydra import utils
from pathlib import Path
#hydra.main(config_path="", config_name='config.yaml')
def main(cfg):
print(cfg)
mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
mlflow.set_experiment(cfg.experiment_name)
with mlflow.start_run() as run:
mlflow.log_params(cfg)
mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
print(run.info.run_id) # just to show each run is different
if __name__ == '__main__':
main()
The context manager will start and end MLflow runs properly, preventing the issue from occuring.
Alternatively, you can also start and end an MLFlow run manually:
activerun = mlflow.start_run()
mlflow.log_params(cfg)
mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
print(activerun.info.run_id) # just to show each run is different
mlflow.end_run()
This is related to the way you defined your MLFlow run. You use log_params and then start_run, so you have two concurrent runs of mlflow which explains the error. You could try getting rid of the following line in your first code sample and see what happens
mlflow.log_param('param1',5)
Related
Hi I am trying to write a python script that does exactly what the following command does:
gcloud logging read "logName=projects/[project_name]/logs/[id]"
so when i run that command from the cli it does not give me any error, it outputs the logs as expected.
however when i run my python script:
import argparse
import datetime
import os
import sys
from pprint import pprint
from google.cloud import bigquery
from google.cloud import logging
assert "GOOGLE_APPLICATION_CREDENTIALS" in os.environ
def main():
client = logging.Client()
log_name = 'log_id'
logger = client.logger(log_name)
for entry in logger.list_entries():
print(entry.payload)
if __name__ == "__main__":
main()
I get the error:
google.api_core.exceptions.PermissionDenied: 403 The caller does not have permission
Im not sure what to do here, since the command line runs, i clearly have permission.
any thoughts would be greatly appreciated
I see that you are trying to read and show your logs from Cloud Logging using Python.
From the error code you got:
error: google.api_core.exceptions.PermissionDenied: 403
I think this comes from an authentication problem. I would like to share these documents with you: the Python quickstart to write, read, delete, and export log entries [1]; and authentication on GCE instances [2].
[1] https://cloud.google.com/logging/docs/quickstart-python#linux
[2] https://googleapis.dev/python/google-api-core/latest/auth.html#using-google-compute-engine
I found an example online of a config file to build on the base logging and colorlog.
File Structure
main.py
directory.py
client.py
/utils/log.py
main, directory and client all have from utils.log import init_logger which is the function inside my log.py.
For processing,
main.py imports a function from directory.py which imports a function from client.py.
VSCODE complains that the modules can't be imported, but the script runs fine with no errors except vscode complaining.
I think I have a circular dependency issue going on, but can't figure out how I should share the logging config between all my files without importing it.
main.py
"""
Setup Trusts Between Directories
"""
import json
from utils.log import init_logger
from directory import stage
###############
def main():
"""
MAIN TRUST FUNCTION
"""
## Initialize Logger
if __name__ == "__main__":
logger = init_logger(__name__, testing_mode=False)
logger.info("Running Creator")
## Get Outputs
new_dirs = get_output("newDirectoryIds")
config_dirs = get_output("configDirectories")
## Get Directory Stage
stage(new_dirs, config_dirs)
###############
## Run Main Function
main()
directory.py
"""
directory.py
- Loops Through List of Directories
- Returns back when stage is Active or Failed
"""
import time
from utils.log import init_logger
from get_client import get_new_client
###########
def stage(new_dirs, config_dirs):
"""
Returns Directory Stage Value
- Loops until Directory is Failed or Active
"""
## Initialize Logger
logger = init_logger(__name__, testing_mode=True)
for config_dir in config_dirs:
## Get Describe Directory Services Client
ds_client = get_new_client(config_dir["account"], "ds", "us-west-
2")
## Get All Directories in Account
temp_dirs = ds_client.describe_directories()
This is what worked for me per Pylint "unresolved import" error in visual studio code
"The best solution for now is to create a .env file in your project root folder. Then add a PYTHONPATH to it like this:"
PYTHONPATH=YOUR/MODULES/PATH
and in your settings.json add
"python.envFile": ".env"
I have a script call run_test.py, here's the content:-
if __name__ == '__main__':
nose.main(argv=sys.argv)
Running all my tests is as simple as doing this:
run_test.py unittests/test_*.py
I'm trying to now incorperate the output reporting for this into teamcity.
I'm referring to this https://github.com/JetBrains/teamcity-messages
I tried changing all my unittests/test_*.py program following the documentation. It works if running the test individually like this:-
unittest/test_one.py
But it does not work when running it thru nose, like this:
run_test.py unittest/test_one.py
According to the documentation link, it says that nose reporting is enabled automatically under TeamCity build. I don't quite get what that means.
Is there anything that i'm missing out here?
Any help is greatly appreciated. Thanks.
have a look at the xunit plugin of nose.
it will generate an xml file with the results => which jenkins and teamcity can use.
some documentation for teamcity
this post tells you how to enable the plugin in your test script
if __name__ == '__main__':
argv = sys.argv[:]
argv.insert(1, "--with-xunit")
nose.main(argv=argv)
I finally found out the way to achieve that.
Here's what i modified in my run_test.py
#!/usr/bin/env python
import os
import sys
import unittest
from teamcity import is_running_under_teamcity
from teamcity.unittestpy import TeamcityTestRunner
loader = unittest.TestLoader()
start_dir = sys.argv[1]
suite = loader.discover(start_dir, pattern='test_*.py')
#runner = unittest.TextTestRunner()
runner = TeamcityTestRunner(verbosity=2)
runner.run(suite)
Pardon my relative inexperience in Python, I am trying to run this code (taken from GitHub) but interpreter is unable to resolve the reference for ini_file_io and model (I have seen a similar post, but I am seeing the same issue both on PyCharm and MS Visual Studio Code). Directory Structure is as follows:
Here is the main.py: (both ini_file_io.py and model.py are available in same directory)
import os
import tensorflow as tf
from ini_file_io import load_train_ini #Unresolved Reference
from model import cgan_unet_xy #Unresolved Reference
def main(_):
# load training parameter #
ini_file = '../outcome/model/ini/tr_param.ini'
param_sets = load_train_ini(ini_file)
param_set = param_sets[0]
print('====== Phase >>> %s <<< ======' % param_set['phase'])
if not os.path.exists(param_set['chkpoint_dir']):
os.makedirs(param_set['chkpoint_dir'])
if not os.path.exists(param_set['labeling_dir']):
os.makedirs(param_set['labeling_dir'])
with tf.Session() as sess:
model = cgan_unet_xy(sess, param_set)
if param_set['phase'] == 'train':
model.train()
elif param_set['phase'] == 'test':
model.test()
elif param_set['phase'] == 'crsv':
model.test4crsv()
if __name__ == '__main__':
tf.app.run()
Any help will be really appreciated.
Try adding a empty file named __init__.py that is the same directory level as the 3 files in question: main, ini.., and model
Python uses these files as markers for directory level imports
Note that this should not be a concern for same dir level files. I believe you may be running the code from the wrong place. Try to cd into the directory that these files exist it and run the main.py there instead of some other directory.
If you can't do that then you'd have to add that dir to your python path.
You can also try a relative import — try from .ini_file_io import load_train_ini instead.
I am running the following script inside AWS Lambda:
#!/usr/bin/python
from __future__ import print_function
import json
import os
import ansible.inventory
import ansible.playbook
import ansible.runner
import ansible.constants
from ansible import utils
from ansible import callbacks
print('Loading function')
def run_playbook(**kwargs):
stats = callbacks.AggregateStats()
playbook_cb = callbacks.PlaybookCallbacks(verbose=utils.VERBOSITY)
runner_cb = callbacks.PlaybookRunnerCallbacks(
stats, verbose=utils.VERBOSITY)
# use /tmp instead of $HOME
ansible.constants.DEFAULT_REMOTE_TMP = '/tmp/ansible'
out = ansible.playbook.PlayBook(
callbacks=playbook_cb,
runner_callbacks=runner_cb,
stats=stats,
**kwargs
).run()
return out
def lambda_handler(event, context):
return main()
def main():
out = run_playbook(
playbook='little.yml',
inventory=ansible.inventory.Inventory(['localhost'])
)
return(out)
if __name__ == '__main__':
main()
However, I get the following error: failed=True msg='boto required for this module'
However, according to this comment(https://github.com/ansible/ansible/issues/5734#issuecomment-33135727), it works.
But, I'm not understanding how do I mention that in my script? Or, can I have a separate hosts file, and include it in the script, like how I call my playbook?
If so, then how?
[EDIT - 1]
I have added the line inventory=ansible.inventory.Inventory('hosts')
with hosts file as:
[localhost]
127.0.0.1 ansible_python_interpreter=/usr/local/bin/python
But, I get this error: /bin/sh: /usr/local/bin/python: No such file or directory
So, where is python located inside AWS Lambda?
I installed boto just like I installed other packages in the Lambda's deployment package: pip install boto -t <folder-name>
The bash command which python will usually give the location of the Python binary. There's an example of how to call a bash script from AWS Lambda here.