How to overwrite defaults of subconfigs in hydra? - python

I have a hydra configuration in which I have to use dataclasses. As values for some members I want again to use configs which inherit from some common baseclass. Lets have a look at the following minimal example:
from dataclasses import dataclass, field
from typing import List, Any
import hydra.utils
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING
# data.py:
# for a machine learning project, I have two different dataset classes.
class Dataset1:
def __init__(self, member1):
pass
class Dataset2:
def __init__(self, member2):
pass
# They have each their own config dataclass with different members.
# For later use they are also based on a common base class.
#dataclass
class DataConfig:
"""This is just a common base class."""
pass
#dataclass
class Dataset1Config(DataConfig):
_target_: str = "Dataset1"
member1: int = 1
#dataclass
class Dataset2Config(DataConfig):
_target_: str = "Dataset2"
member2: int = 2
# I register them at some place in my folder structure.
cs = ConfigStore.instance()
cs.store(group="some/folder", name=Dataset1Config.__name__, node=Dataset1Config)
cs.store(group="some/folder", name=Dataset2Config.__name__, node=Dataset2Config)
# main.py:
# for the training routine of my machine learning project, I also have a config that needs a dataset
# usually I would use dataset1, so I have this as a default.
# In any case, I want to be able to use any config inheriting from `DataConfig` as config for my dataset.
#dataclass
class TrainConfig:
defaults: List[Any] = field(default_factory=lambda: ["some/folder/Dataset1Config#dataset"])
dataset: DataConfig = MISSING
cs.store(name="TrainConfig", node=TrainConfig)
#hydra.main(config_name="my_config", version_base="1.2", config_path=".")
def main(cfg):
instance_dict = hydra.utils.instantiate(cfg)
if __name__ == "__main__":
main()
Now I want to use Dataset2Config instead of the default Dataset1Config. To this end, I pass the following my_config.yaml to the script.
# my_config.yaml
defaults:
- TrainConfig # I want to have this, because in reality there are other defaults set in it I want to use
- /some/folder/Dataset2Config#dataset # trying to overwrite the default value in the TrainConfig
I would now like to have all stuff from Dataset1Config replaced by Dataset2Config. However the cfg I obtain when running the main.py is {'dataset': {'_target_': 'Dataset2', 'member1': 1, 'member2': 2}}. (In other slightly more complex examples, the config wasn't built at all, but I can't yet reproduce that).
What do I have to do, to end up with a cfg like {'dataset': {'_target_': 'Dataset2', 'member2': 2}}?

Related

define a value dynamically using hydra for yaml files

Let's say I have an app.py like this
class myClassA :
def __init__(self):
self.id = 100
class myClassB :
def __init__(self, objA, id):
pass
Is there a way to use hydra to have a config file like below work like it intuitively should ?
myClassA:
_target_: myapp.myClassA
myclassB:
_target_: myapp.myClassB
param1: ${myClassA}
param2: ${myclassB.param1.id}
My issue is that in order to instanciate my class B, I need an attribute from the class A object but this attribute is set in the init function of classA and cannot be set in the config file.
I've tried putting id: ??? but it didn't work
Thank a lot !
The following does the trick:
# app.py
import hydra
from hydra.utils import instantiate
from omegaconf import OmegaConf
class myClassA:
def __init__(self):
self.id = 100
class myClassB:
def __init__(self, objA, objA_id):
assert isinstance(objA, myClassA)
assert objA_id == 100
print("myClassB __init__ ran")
#hydra.main(config_name="conf.yaml", config_path=".", version_base="1.2")
def app(cfg):
instantiate(cfg)
if __name__ == "__main__":
app()
# conf.yaml
myClassA:
_target_: __main__.myClassA
myClassB:
_target_: __main__.myClassB
objA: ${myClassA}
objA_id:
_target_: builtins.getattr
_args_:
- ${myClassA}
- "id"
$ python app.py
myClassB __init__ ran
How does this work? Using builtins.getattr as a target allows for looking up the "id" attribute on an instance of myClassA.
NOTE: Several instances of myClassA will be created here. There is an open feature request in Hydra regarding support for a singleton pattern in recursive instantiation, which would enable re-using the same instance of myClassA in several places.

How to build a good registration mechanism in python?

I want to build a well-modularized python project, where all alternative modules should be registed and acessed via a function named xxx_builder.
Taking data class as an example:
register.py:
def register(key, module, module_dict):
"""Register and maintain the data classes
"""
if key in module_dict:
logger.warning(
'Key {} is already pre-defined, overwritten.'.format(key))
module_dict[key] = module
data_dict = {}
def register_data(key, module):
register(key, module, data_dict)
data.py:
from register import register_data
import ABCDEF
class MyData:
"""An alternative data class
"""
pass
def call_my_data(data_type):
if data_type == 'mydata'
return MyData
register_data('mydata', call_my_data)
builder.py:
import register
def get_data(type):
"""Obtain the corresponding data class
"""
for func in register.data_dict.values():
data = func(type)
if data is not None:
return data
main.py:
from data import MyData
from builder import get_data
if __name__ == '__main__':
data_type = 'mydata'
data = get_data(type=data_type)
My problem
In main.py, to register MyData class into register.data_dict before calling the function get_data, I need to import data.py in advance to execute register_data('mydata', call_my_data).
It's okay when the project is small, and all the data-related classes are placed according to some rules (e.g. all data-related class should be placed under the directory data) so that I can import them in advance.
However, this registeration mechanism means that all data-related classes will be imported, and I need to install all packages even if I won't use it actually. For example, when the indicator data_type in main.py is not mydata I still need to install ABCDEF package for the class MyData.
So is there any good idea to avoid importing all the packages?
Python's packaging tools come with a solution for this: entry points. There's even a tutorial about how to use entry points for plugins (which seems like what you're doing) (in conjunction with this Setuptools tutorial).
IOW, something like this (nb. untested), if you have a plugin package that has defined
[options.entry_points]
myapp.data_class =
someplugindata = my_plugin.data_classes:SomePluginData
in setup.cfg (or pypackage.toml or setup.py, with their respective syntaxes), you could register all of these plugin classes (here shown with an example with a locally registered class too).
from importlib.metadata import entry_points
data_class_registry = {}
def register(key):
def decorator(func):
data_class_registry[key] = func
return func
return decorator
#register("mydata")
class MyData:
...
def register_from_entrypoints():
for entrypoint in entry_points(group="myapp.data_class"):
register(entrypoint.name)(entrypoint.load())
def get_constructor(type):
return data_class_registry[type]
def main():
register_from_entrypoints()
get_constructor("mydata")(...)
get_constructor("someplugindata")(...)

Registering classes to factory with classes in different files

I have a factory as shown in the following code:
class ClassFactory:
registry = {}
#classmethod
def register(cls, name):
def inner_wrapper(wrapped_class):
if name in cls.registry:
print(f'Class {name} already exists. Will replace it')
cls.registry[name] = wrapped_class
return wrapped_class
return inner_wrapper
#classmethod
def create_type(cls, name):
exec_class = cls.registry[name]
type = exec_class()
return type
#ClassFactory.register('Class 1')
class M1():
def __init__(self):
print ("Starting Class 1")
#ClassFactory.register('Class 2')
class M2():
def __init__(self):
print("Starting Class 2")
This works fine and when I do
if __name__ == '__main__':
print(ClassFactory.registry.keys())
foo = ClassFactory.create_type("Class 2")
I get the expected result of dict_keys(['Class 1', 'Class 2']) Starting Class 2
Now the problem is that I want to isolate classes M1 and M2 to their own files m1.py and m2.py, and in the future add other classes using their own files in a plugin manner.
However, simply placing it in their own file
m2.py
from test_ import ClassFactory
#MethodFactory.register('Class 2')
class M2():
def __init__(self):
print("Starting Class 2")
gives the result dict_keys(['Class 1']) since it never gets to register the class.
So my question is: How can I ensure that the class is registered when placed in a file different from the factory, without making changes to the factory file whenever I want to add a new class? How to self register in this way? Also, is this decorator way a good way to do this kind of thing, or are there better practices?
Thanks
How can I ensure that the class is registered when placed in a file different from the factory, without making changes to the factory file whenever I want to add a new class?
I'm playing around with a similar problem, and I've found a possible solution. It seems too much of a 'hack' though, so set your critical thinking levels to 'high' when reading my suggestion below :)
As you've mentioned in one of your comments above, the trick is to force the loading of the individual *.py files that contain individual class definitions.
Applying this to your example, this would involve:
Keeping all class implementations in a specific folders, e.g., structuring the files as follows:
.
└- factory.py # file with the ClassFactory class
└─ classes/
└- __init__.py
└- m1.py # file with M1 class
└- m2.py # file with M2 class
Adding the following statement to the end of your factory.py file, which will take care of loading and registering each individual class:
from classes import *
Add a piece of code like the snippet below to your __init__.py within the classes/ foder, so that to dynamically load all classes [1]:
from inspect import isclass
from pkgutil import iter_modules
from pathlib import Path
from importlib import import_module
# iterate through the modules in the current package
package_dir = Path(__file__).resolve().parent
for (_, module_name, _) in iter_modules([package_dir]):
# import the module and iterate through its attributes
module = import_module(f"{__name__}.{module_name}")
for attribute_name in dir(module):
attribute = getattr(module, attribute_name)
if isclass(attribute):
# Add the class to this package's variables
globals()[attribute_name] = attribute
If I then run your test code, I get the desired result:
# test.py
from factory import ClassFactory
if __name__ == "__main__":
print(ClassFactory.registry.keys())
foo = ClassFactory.create_type("Class 2")
$ python test.py
dict_keys(['Class 1', 'Class 2'])
Starting Class 2
Also, is this decorator way a good way to do this kind of thing, or are there better practices?
Unfortunately, I'm not experienced enough to answer this question. However, when searching for answers to this problem, I've came across the following sources that may be helpful to you:
[2] : this presents a method for registering class existence based on Python Metaclasses. As far as I understand, it relies on the registering of subclasses, so I don't know how well it applies to your case. I did not follow this approach, as I've noticed that the new edition of the book suggests the use of another technique (see bullet below).
[3], item 49 : this is the 'current' suggestion for subclass registering, which relies on the definition of the __init_subclass__() function in a base class.
If I had to apply the __init_subclass__() approach to your case, I'd do the following:
Add a Registrable base class to your factory.py (and slightly re-factor ClassFactory), like this:
class Registrable:
def __init_subclass__(cls, name:str):
ClassFactory.register(name, cls)
class ClassFactory:
registry = {}
#classmethod
def register(cls, name:str, sub_class:Registrable):
if name in cls.registry:
print(f'Class {name} already exists. Will replace it')
cls.registry[name] = sub_class
#classmethod
def create_type(cls, name):
exec_class = cls.registry[name]
type = exec_class()
return type
from classes import *
Slightly modify your concrete classes to inherit from the Registrable base class, e.g.:
from factory import Registrable
class M2(Registrable, name='Class 2'):
def __init__(self):
print ("Starting Class 2")

how to test a Python class that requires command line argument?

I have a python class that requires a command line argument:
class SomeClass:
request = sys.argv[1] + ".json"
def __init__(self_:
self.req = request
i'd run someClass.py on the commandline i.e. python someClass 1234, which would set the json to 1234.json.
I want a second class, testClass.py, to be able to test methods inside of the main class. But first, i just want to make sure its connected by printing variables:
from someClass import SomeClass
i = SomeClass()
print(i.req)
if i run python testClass.py (without any input), i get a missing input error,
error: the following arguments are required: input
so if i run python testClass.py 1234, i get
none
i just want to know how to pull the class in and make sure its provided with an argument so i can test individual components inside of it.
Just overwrite request in every test that needs it:
import unittest
from x import SomeClass
class TestClass(unittest.TestCase):
def setUp(self):
SomeClass.request = ''
In general, don't make classes which set themselves up. Make the class take parameters which are not defaulted.
You can always make higher level code which supplies default values.
Don't make your class depend on sys.argv in the first place.
class SomeClass:
def __init__(self, base):
self.req = base + ".json"
Then
from someClass import SomeClass
i = SomeClass(sys.argv[1]) # or any other value
print(i.req)

Access instance in other modules

I have a class instance I want to access in other modules. This class loads config values using configParser to update an class instance __dict__ attribute as per this post:
I want to access this instance in other module. The instance is only created in the main.py file where it has access to the required parameters, which come via command line arguments.
I have three files: main.py, config.py and file.py. I don't know the best way to access the instance in the file.py. I only have access to it in main.py and not other modules.
I've looked at the following answers, here and here but they don't fully answer my scenario.
#config.py
class Configuration():
def __init__(self, *import_sections):
#use configParser, get config for relevant sections, update self.__dict__
#main.py
from config import Configuration
conf = Configuration('general', 'dev')
# other lines of code use conf instance ... e.g. config.log_path in log setup
#file.py
#I want to use config instance like this:
class File():
def __init__(self, conf.feed_path):
# other code here...
Options considered:
Initialise Configuration in config.py module
In config.py after class definition I could add:
conf = Configuration('general', 'dev')
and in file.py and main.py:
from config import conf
but the general and dev variables are only found in main.py so doesn't look like it will work.
Make Configuration class a function
I could make it a function and create a module-level dictionary and import data into other modules:
#config.py
conf = {}
def set_config(*import_section):
# use configParser, update conf dictionary
conf.update(...)
This would mean referring to it as config.conf['log_path'] for example. I'd prefer conf.log_path as it's used multiple times.
Pass via other instances
I could pass the conf instance as parameters via other class instances from main.py, even if the intermediate instances don't use it. Seems very messy.
Other options?
Can I use Configuration as an instance somehow?
By changing your Configuration class into a Borg, you are guaranteed to get a common state from wherever you want. You can either provide initialization through a specific __init__:
#config.py
class Configuration:
__shared_state = {}
def __init__(self, *import_sections):
self.__dict__ = self.__shared_state
if not import_sections: # we are not initializing this time
return
#your old code verbatim
initialization is donne as usual with a c = config.Configuration('general','dev') and any call to conf = config.Configuration() will get the state that c created.
or you can provide an initialization method to avoid tampering with the shared state in the __init__:
#config.py
class Configuration:
__shared_state = {}
def __init__(self):
self.__dict__ = self.__shared_state
def import(self, *import_sections):
#your old __init__
that way there is only one meaning to the __init__ method, which is cleaner.
In both cases, you can get the shared state, once initialized, from anywhere in your code by using config.Configuration().

Categories