AWS - Step functions, use execution input within a TuningStep - python

I've written a simple AWS step functions workflow with a single step:
from stepfunctions.inputs import ExecutionInput
from stepfunctions.steps import Chain, TuningStep
from stepfunctions.workflow import Workflow
import train_utils
def main():
workflow_execution_role = 'arn:aws:iam::MY ARN'
execution_input = ExecutionInput(schema={
'app_id': str
})
estimator = train_utils.get_estimator()
tuner = train_utils.get_tuner(estimator)
tuning_step = TuningStep(state_id="HP Tuning", tuner=tuner, data={
'train': f's3://my-bucket/{execution_input["app_id"]}/data/'},
wait_for_completion=True,
job_name='HP-Tuning')
workflow_definition = Chain([
tuning_step
])
workflow = Workflow(
name='HP-Tuning',
definition=workflow_definition,
role=workflow_execution_role,
execution_input=execution_input
)
workflow.create()
if __name__ == '__main__':
main()
My goal is to have the train input pulled from the execution JSON provided at runtime. When I execute the workflow (from the step functions console), providing the JSON {"app_id": "My App ID"} the tuning step does not get the right data, instead it gets a to_string representation of the stepfunctions.inputs.placeholders.ExecutionInput. Furthermore when looking at the generated ASL I can see that the execution input was rendered as a string:
...
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://my-bucket/<stepfunctions.inputs.placeholders.ExecutionInput object at 0x12261f7d0>/data/",
"S3DataDistributionType": "FullyReplicated"
}
},
...
What am I doing wrong?
Update:
As mentioned by #yoodan the SDK is probably behind, so I'll have to edit the definition before calling create. I can see there is a way to review the definition before calling create, but can I modify the graph definition? how?

The python SDK for step functions generates corresponding code, we need a string concatenation / format built into the Amazon States Language to accomplish what you desire.
Recently in August 2020, Amazon States Language introduced built-in functions such as string format into it's language spec. https://states-language.net/#appendix-b
Unfortunately, the python SDK is not up to date and does not support the new changes.
As a work around, maybe manually modify the definition before calling workflow create?

your definition of step function is good and it looks like it should work.
it is unclear from your code example where the execution actually happens (you stated directly from console) which can lead to several options that may cause the problem, and i believe thats the source pf the issue.
please provide more information regarding that.
are you somehow exporting your created Workflow object so it can be executed?
it seems that the something is missing... as a sanity check, append the following to your main function:
workflow.execute(inputs={"app_id": "My App ID"})
and check logs again

It seems like what you're looking for is not supported by the SDK.
https://github.com/aws/aws-step-functions-data-science-sdk-python/issues/79
You can however, change the definition before creating the state machine. Here is a function that will iterate the definition dict and replace a place holder surrounded with {{PH}} with the right intrinsic functions syntax, for example :
s3://my-bucket/{{app_id}}/data/
def get_updated_definition(data):
if isinstance(data, dict):
for k, v in data.copy().items():
if isinstance(v, dict): # For DICT
data[k] = get_updated_definition(v)
elif isinstance(v, list): # For LIST
data[k] = [get_updated_definition(i) for i in v]
elif isinstance(v, str) and re.search(r'{{([a-z_]+)}}', v): # Update Key-Value
# data.pop(k)
# OR
del data[k]
keys = re.findall(r'{{([a-z_]+)}}', v)
data[f"{k}.$"] = f"States.Format('{re.sub(r'{{[a-z_]+}}', '{}', v)}',{','.join(['$.'+k for k in keys])})"
return data
usage:
workflow_definition = get_updated_definition(workflow.definition.to_dict())
create_state_machine(json.dumps(workflow_definition)) #<-- implement using boto3 (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/stepfunctions.html#SFN.Client.create_state_machine)

Related

Forward function signature in VSCode

Say I have a function in src/f1.py with the following signature:
def my_func(a1: int, a2: bool) -> float:
...
In a separate file src/f2.py I create a dictionary:
from src.f1 import my_func
my_dict = {
"func": my_func
}
In my final file src/test1.py I call the function:
from src.f2 import my_dict
print(my_dict["func"](1, True))
I'm getting IDE auto-suggestions for my_func in src/f2.py, but not in src/test1.py. I have tried using typing.Callable, but it doesn't create the same signature and it loses the function documentation. Is there any way I can get these in src/test1.py without changing the structure of my code?
I don't want to change the files in which my functions, dictionaries, or tests are declared.
I use VSCode version 1.73.1 and Python version 3.8.13. I cannot change my Python version.
I tried creating different types of Callable objects, but had problems getting the desired results.
They seem to have no docstring support
Some types are optional. I couldn't get that to work.
They do not work with variable names, only data types. I want the variable names (argument names to the function) to be there in the IDE suggestion.
What am I trying to do really?
I am trying to implement a mechanism where a user can set the configurations for a library in a single file. That single file is where all the dictionaries are stored (and it imports all essential functions).
This "configuration dictionary" is called in the main python file (or wherever needed). I have functions in a set of files for accomplishing a specific set of tasks.
Say functions fa1 and fa2 for task A; fb1, fb2, and fb3 for task B; and fc1 for task C. I want configurations (choices) for task A, followed by B, then C. So I do
work_ABC = {"A": fa1, "B": fb2, "C": fc1};
Now, I have a function like
wd = work_ABC
def do_work_abc(i, do_B=True):
res_a = wd["A"](i)
res_b = res_a
if do_B:
res_b = wd["B"](res_a)
res_c = wd["C"](res_b)
return res_c
If you have a more efficient way how I can implement the same thing, I'd love to hear it.
I want IntelliSense to give me the function signature of the function set for the dictionary.
There is no type annotation construct in Python that covers docstrings or parameter names of functions. There isn't even one for positional-only or keyword-only parameters (which would be actually meaningful in a type sense).
As I already mentioned in my comment, docstrings and names are not type-related. Ultimately, this is an IDE issue. PyCharm for example has no problem inferring those details with the setup you provided. I get the auto-suggestion for my_dict["func"] with the parameter names because PyCharm is smart/heavy enough to track it to the source. But it has its limits. If I change the code in f2 to this:
from src.f1 import my_func
_other_dict = {
"func": my_func
}
my_dict = {}
my_dict.update(_other_dict)
Then the suggestion engine is lost.
The reason is simply the discrepancy between runtime and static analysis. At some point it becomes silly/unreasonable to expect a static analysis tool to essentially run the code for you. This is why I always say:
Static type checkers don't execute your code, they just read it.
Even the fact that PyCharm "knows" the signature of my_func with your setup entails it running some non-trivial code in the background to back-track from the dictionary key to the dictionary to the actual function definition.
So in short: It appears you are out of luck with VSCode. And parameter names and docstrings are not part of the type system.

Easily entering arguments from dbutils.notebook.run when using a notebook directly

I'm calling a notebook like this:
dbutils.notebook.run(path, timeout, arguments)
where arguments is a dictionary containing many fields for the notebook's widgets.
I want to debug called notebook interactively: copy/pasting the widget parameters takes time and can cause hard-to-spot errors not done perfectly.
It would be nice to just take the arguments dictionary and use it directly. Perhaps copying it, then populating the widgets from the dictionary.
How can I do this, or something like it?
If we get some variables like this:
dbutils.widgets.text('myvar', '-1', '')
myvar = dbutils.widgets.get('myvar')
I can override them like this:
config = {'myvar': '42'}
import sys
module = sys.modules[__name__]
for k, v in config.items():
setattr(module, k, v)
Which means all the overriding happens in a cell I can later delete, leaving no edits in the real code.
Pass the values as a json like below in the widget let it be "Filters"
{"Type":"Fruit","Item":"Apple"]
To read the json you need to make use of json library
import json
filters = dbutils.widgets.get("Filters")
jsonfilter = json.loads(filters)
Now you can access individual items by
jsonfilter["Item"]

Is it possible to add attributes to built in python objects dynamically in Python?

I need to add an attribute (holding a tuple or object) to python objects dynamically. This works for Python classes written by me, but not for built in classes.
Consider the following program:
import numpy as np
class My_Class():
pass
my_obj = My_Class()
my_obj2 = My_Class()
my_obj.__my_hidden_field = (1,1)
my_obj2.__my_hidden_field = (2,1)
print(my_obj.__my_hidden_field, my_obj2.__my_hidden_field)
This correctly prints (1, 1) (2, 1). However the following program doesnt work.
X = np.random.random(size=(2,3))
X.__my_hidden_field = (3,1)
setattr(X, '__my_hidden_field', (3,1))
Both of the above line throws the following error # AttributeError: 'numpy.ndarray' object has no attribute '__my_hidden_field'
Now, the reason found from these questions (i.e., Attribute assignment to built-in object, Can't set attributes of object class, python: dynamically adding attributes to a built-in class) is Python does not allow dynamically adding attributes to built_in objects.
Excerpt from the answer: https://stackoverflow.com/a/22103924/8413477
This is prohibited intentionally to prevent accidental fatal changes to built-in types (fatal to parts of the code that you never though of). Also, it is done to prevent the changes to affect different interpreters residing in the address space, since built-in types (unlike user-defined classes) are shared between all such interpreters.
However, all the answers are quite old, and I am badly in need of doing this for my research project.
There is a module that allows to add methods to built in Class though:
https://pypi.org/project/forbiddenfruit/
However,it doesnt allow adding objects/attributes to each object.
Any help ?
You probably want weakref.WeakKeyDictionary. From the doc,
This can be used to associate additional data with an object owned by other parts of an application without adding attributes to those objects.
Like an attribute, and unlike a plain dict, this allows the objects to get garbage collected when there are no other references to it.
You'd look up the field with
my_hidden_field[X]
instead of
X._my_hidden_field
Two caveats: First, since a weak key may be deleted at any time without warning, you shouldn't iterate over a WeakKeyDictionary. Looking up an object you have a reference to is fine though. And second, you can't make a weakref to an object type written in C that doesn't have a slot for it (true for many builtins), or a type written in Python that doesn't allow a __weakref__ attribute (usually due to __slots__).
If this is a problem, you can just use a normal dict for those types, but you'll have to clean it up yourself.
Quick answer
Is it possible to add attributes to built in python objects dynamically in Python?
No, the reasons your read about in the links you posted, are the same now days. But I came out with a recipe I think might be the starting point of your tracer.
Instrumenting using subclassing combined with AST
After reading a lot about this, I came out with a recipe that might not be the complete solution, but it sure looks like you can start from here.
The good thing about this recipe is that it doesn't use third-party libraries, all is achieved with the standard (Python 3.5, 3.6, 3.7) libraries.
The target code.
This recipe will make code like this be instrumented (simple instrumentation is performed here, this is just a poof of concept) and executed.
# target/target.py
d = {1: 2}
d.update({3: 4})
print(d) # Should print "{1: 2, 3: 4}"
print(d.hidden_field) # Should print "(0, 0)"
Subclassing
Fist we have to add the hidden_field to anything we want to (this recipe have been tested only with dictionaries).
The following code receives a value, finds out its type/class and subclass it in order to add the mentioned hidden_field.
def instrument_node(value):
VarType = type(value)
class AnalyserHelper(VarType):
def __init__(self, *args, **kwargs):
self.hidden_field = (0, 0)
super(AnalyserHelper, self).__init__(*args, **kwargs)
return AnalyserHelper(value)
with that in place you are able to:
d = {1: 2}
d = instrument_node(d)
d.update({3: 4})
print(d) # Do print "{1: 2, 3: 4}"
print(d.hidden_field) # Do print "(0, 0)"
At this point, we know already a way to "add instrumentation to a built-in dictionary" but there is no transparency here.
Modify the AST.
The next step is to "hide" the instrument_node call and we will do that using the ast Python module.
The following is an AST node transformer that will take any dictionary it finds and wrap it in an instrument_node call:
class AnalyserNodeTransformer(ast.NodeTransformer):
"""Wraps all dicts in a call to instrument_node()"""
def visit_Dict(self, node):
return ast.Call(func=ast.Name(id='instrument_node', ctx=ast.Load()),
args=[node], keywords=[])
return node
Putting all together.
With thats tools you can the write a script that:
Read the target code.
Parse the program.
Apply AST changes.
Compile it.
And execute it.
import ast
import os
from ast_transformer import AnalyserNodeTransformer
# instrument_node need to be in the namespace here.
from ast_transformer import instrument_node
if __name__ == "__main__":
target_path = os.path.join(os.path.dirname(__file__), 'target/target.py')
with open(target_path, 'r') as program:
# Read and parse the target script.
tree = ast.parse(program.read())
# Make transformations.
tree = AnalyserNodeTransformer().visit(tree)
# Fix locations.
ast.fix_missing_locations(tree)
# Compile and execute.
compiled = compile(tree, filename='target.py', mode='exec')
exec(compiled)
This will take our target code, and wraps every dictionary with an instrument_node() and execute the result of such change.
The output of running this against our target code,
# target/target.py
d = {1: 2}
d.update({3: 4})
print(d) # Will print "{1: 2, 3: 4}"
print(d.hidden_field) # Will print "(0, 0)"
is:
>>> {1: 2, 3: 4}
>>> (0, 0)
Working example
You can clone a working example here.
Yes, it is possible, it is one of the coolest things of python, in Python, all the classes are created by the typeclass
You can read in detail here, but what you need to do is this
In [58]: My_Class = type("My_Class", (My_Class,), {"__my_hidden_field__": X})
In [59]: My_Class.__my_hidden_field__
Out[59]:
array([[0.73998002, 0.68213825, 0.41621582],
[0.05936479, 0.14348496, 0.61119082]])
*Edited because inheritance was missing, you need to pass the original class as a second argument (in tuple) so that it updates, otherwise it simply re-writes the class)

Is there a Python test framework suitable to run simulations

I am looking into characterising some software by running many simulations with different parameters.
Each simulation can be assimilated to a test with different input parameters.
The test specification lists the different parameters:
param_a = 1
param_b = range(1,10)
param_c = {'package_1':1.1, 'params':[1,2,34]}
function = algo_1
and that would generate a list of tests:
['test-0':{'param_a':1, 'param_b':1, param_c:},
'test-1':{'param_a':1, 'param_b':2, param_c:},
...]
and call the function with these parameters. The return value of the function is the test results that should be reported in a 'friendly way'.
test-0: performance = X%, accuracy = Y%, runtime = Zsec ...
For example, Erlang's Common Test and Quickcheck are very suitable for this task, and provide HTML reporting of the tests.
Is there anything similar in Python?
you could give Robot Framework a chance. It will be easy/native to call your Python code from Robot test cases. We will get nice HTML reports as well. If you get blocked you will get some help on SO (tag robotframework) or on the Robot User Mailing List.
Considering the lack of available packages, here is the implementation of a couple of different wanted features:
test definition:
a python file that contain a config variable which is a dictionary of static requirements and a variables variable which is a dictionary of varying requirements (stored as lists).
config = {'db' : 'database_1'}
variables = {'threshold' : [1,2,3,4]}
The test specification is import using imp, through parsing the arguments of the script into args:
testspec = imp.load_source("testspec", args.test)
test generation:
The list of tests are generated using a modified version of product from numpy:
def my_product(dicts):
return (dict(izip(dicts, x)) for x in product(*dicts.itervalues()))
def generate_tests(testspec):
return [dict(testspec.config.items() + x.items()) for x in my_product(testspec.variables)]
which returns:
[{'db': 'database_1', 'threshold': 1},
{'db': 'database_1', 'threshold': 2},
{'db': 'database_1', 'threshold': 3},
{'db': 'database_1', 'threshold': 4}]
dynamic module loading:
To load the correct module database_1 under the generic name db, I used again imp in combination with the testspec in the class that uses module:
dbModule = testspec['db']
global db
db = imp.load_source('db', 'config/'+dbModule+'.py')
pretty printing:
not much here, just logging to terminal.

Most Pythonic way to provide function metadata at compile time?

I am building a very basic platform in the form of a Python 2.7 module. This module has a read-eval-print loop where entered user commands are mapped to function calls. Since I am trying to make it easy to build plugin modules for my platform, the function calls will be from my Main module to an arbitrary plugin module. I'd like a plugin builder to be able to specify the command that he wants to trigger his function, so I've been looking for a Pythonic way to remotely enter a mapping in the command->function dict in the Main module from the plugin module.
I've looked at several things:
Method name parsing: the Main module would import the plugin module
and scan it for method names that match a certain format. For
example, it might add the download_file_command(file) method to its
dict as "download file" -> download_file_command. However, getting a
concise, easy-to-type command name (say, "dl") requires that the
function's name also be short, which isn't good for code
readability. It also requires the plugin developer to conform to a
precise naming format.
Cross-module decorators: decorators would let
the plugin developer name his function whatever he wants and simply
add something like #Main.register("dl"), but they would necessarily
require that I both modify another module's namespace and keep
global state in the Main module. I understand this is very bad.
Same-module decorators: using the same logic as above, I could add a
decorator that adds the function's name to some command name->function mapping local to the
plugin module and retrieve the mapping to the Main module with an
API call. This requires that certain methods always be present or
inherited though, and - if my understanding of decorators is correct - the function will only register itself the first time it is run and will unnecessarily re-register itself every subsequent time
thereafter.
Thus, what I really need is a Pythonic way to annotate a function with the command name that should trigger it, and that way can't be the function's name. I need to be able to extract the command name->function mapping when I import the module, and any less work on the plugin developer's side is a big plus.
Thanks for the help, and my apologies if there are any flaws in my Python understanding; I'm relatively new to the language.
Building or Standing on the first part of #ericstalbot's answer, you might find it convenient to use a decorator like the following.
################################################################################
import functools
def register(command_name):
def wrapped(fn):
#functools.wraps(fn)
def wrapped_f(*args, **kwargs):
return fn(*args, **kwargs)
wrapped_f.__doc__ += "(command=%s)" % command_name
wrapped_f.command_name = command_name
return wrapped_f
return wrapped
################################################################################
#register('cp')
def copy_all_the_files(*args, **kwargs):
"""Copy many files."""
print "copy_all_the_files:", args, kwargs
################################################################################
print "Command Name: ", copy_all_the_files.command_name
print "Docstring : ", copy_all_the_files.__doc__
copy_all_the_files("a", "b", keep=True)
Output when run:
Command Name: cp
Docstring : Copy many files.(command=cp)
copy_all_the_files: ('a', 'b') {'keep': True}
User-defined functions can have arbitrary attributes. So you could specify that plug-in functions have an attribute with a certain name. For example:
def a():
return 1
a.command_name = 'get_one'
Then, in your module you could build a mapping like this:
import inspect #from standard library
import plugin
mapping = {}
for v in plugin.__dict__.itervalues():
if inspect.isfunction(v) and v.hasattr('command_name'):
mapping[v.command_name] = v
To read about arbitrary attributes for user-defined functions see the docs
There are two parts in a plugin system:
Discover plugins
Trigger some code execution in a plugin
The proposed solutions in your question address only the second part.
There many ways to implement both depending on your requirements e.g., to enable plugins, they could be specified in a configuration file for your application:
plugins = some_package.plugin_for_your_app
another_plugin_module
# ...
To implement loading of the plugin modules:
plugins = [importlib.import_module(name) for name in config.get("plugins")]
To get a dictionary: command name -> function:
commands = {name: func
for plugin in plugins
for name, func in plugin.get_commands().items()}
Plugin author can use any method to implement get_commands() e.g., using prefixes or decorators — your main application shouldn't care as long as get_commands() returns the command dictionary for each plugin.
For example, some_plugin.py (full source):
def f(a, b):
return a + b
def get_commands():
return {"add": f, "multiply": lambda x,y: x*y}
It defines two commands add, multiply.

Categories