How can I implement an optional parameter to an AWS Glue Job?
I have created a job that currently have a string parameter (an ISO 8601 date string) as an input that is used in the ETL job. I would like to make this parameter optional, so that the job use a default value if it is not provided (e.g. using datetime.now and datetime.isoformatin my case). I have tried using getResolvedOptions:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
However, when I am not passing an --ISO_8601_STRING job parameter I see the following error:
awsglue.utils.GlueArgumentError: argument --ISO_8601_STRING is required
matsev and Yuriy solutions is fine if you have only one field which is optional.
I wrote a wrapper function for python that is more generic and handle different corner cases (mandatory fields and/or optional fields with values).
import sys
from awsglue.utils import getResolvedOptions
def get_glue_args(mandatory_fields, default_optional_args):
"""
This is a wrapper of the glue function getResolvedOptions to take care of the following case :
* Handling optional arguments and/or mandatory arguments
* Optional arguments with default value
NOTE:
* DO NOT USE '-' while defining args as the getResolvedOptions with replace them with '_'
* All fields would be return as a string type with getResolvedOptions
Arguments:
mandatory_fields {list} -- list of mandatory fields for the job
default_optional_args {dict} -- dict for optional fields with their default value
Returns:
dict -- given args with default value of optional args not filled
"""
# The glue args are available in sys.argv with an extra '--'
given_optional_fields_key = list(set([i[2:] for i in sys.argv]).intersection([i for i in default_optional_args]))
args = getResolvedOptions(sys.argv,
mandatory_fields+given_optional_fields_key)
# Overwrite default value if optional args are provided
default_optional_args.update(args)
return default_optional_args
Usage :
# Defining mandatory/optional args
mandatory_fields = ['my_mandatory_field_1','my_mandatory_field_2']
default_optional_args = {'optional_field_1':'myvalue1', 'optional_field_2':'myvalue2'}
# Retrieve args
args = get_glue_args(mandatory_fields, default_optional_args)
# Access element as dict with args[‘key’]
Porting Yuriy's answer to Python solved my problem:
if ('--{}'.format('ISO_8601_STRING') in sys.argv):
args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
else:
args = {'ISO_8601_STRING': datetime.datetime.now().isoformat()}
There is a workaround to have optional parameters. The idea is to examine arguments before resolving them (Scala):
val argName = 'ISO_8601_STRING'
var argValue = null
if (sysArgs.contains(s"--$argName"))
argValue = GlueArgParser.getResolvedOptions(sysArgs, Array(argName))(argName)
I don't see a way to have optional parameters, but you can specify default parameters on the job itself, and then if you don't pass that parameter when you run the job, your job will receive the default value (note that the default value can't be blank).
Wrapping matsev's answer in a function:
def get_glue_env_var(key, default="none"):
if f'--{key}' in sys.argv:
return getResolvedOptions(sys.argv, [key])[key]
else:
return default
It's possible to create a Step Function that starts the same Glue job with different parameters. The state machine starts with a Choice state and uses different number of inputs depending on which is present.
stepFunctions:
stateMachines:
taskMachine:
role:
Fn::GetAtt: [ TaskExecutor, Arn ]
name: ${self:service}-${opt:stage}
definition:
StartAt: DefaultOrNot
States:
DefaultOrNot:
Type: Choice
Choices:
- Variable: "$.optional_input"
IsPresent: false
Next: DefaultTask
- Variable: "$. optional_input"
IsPresent: true
Next: OptionalTask
OptionalTask:
Type: Task
Resource: "arn:aws:states:::glue:startJobRun.task0"
Parameters:
JobName: ${self:service}-${opt:stage}
Arguments:
'--log_group.$': "$.specs.log_group"
'--log_stream.$': "$.specs.log_stream"
'--optional_input.$': "$. optional_input"
Catch:
- ErrorEquals: [ 'States.TaskFailed' ]
ResultPath: "$.errorInfo"
Next: TaskFailed
Next: ExitExecution
DefaultTask:
Type: Task
Resource: "arn:aws:states:::glue:startJobRun.sync"
Parameters:
JobName: ${self:service}-${opt:stage}
Arguments:
'--log_group.$': "$.specs.log_group"
'--log_stream.$': "$.specs.log_stream"
Catch:
- ErrorEquals: [ 'States.TaskFailed' ]
ResultPath: "$.errorInfo"
Next: TaskFailed
Next: ExitExecution
TaskFailed:
Type: Fail
Error: "Failure"
ExitExecution:
Type: Pass
End: True
If you're using the interface, you must provide your parameter names starting with "--" like "--TABLE_NAME", rather than "TABLE_NAME", then you can use them like the following (python) code:
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'TABLE_NAME'])
table_name = args['TABLE_NAME']
Related
My problem is i can't find a default value for FlagConvert if the previous parameter of FlagConvert has a another default value.
For example:
class ClearFlag(FlagConverter, prefix='--', delimiter=':'):
has: str = None
match: str = None
by: Tuple[Union[User, Member], ...] = None
check_all: bool = True
#command('clear', aliases=['cls', 'clean'])
async def clear_messages(ctx: Context, limit: int = 10, flags: ClearFlag = ???):
...
In docs examples on commands.html,
always FlagConvert is required and I can't find any answer.
I use discord.py v2.0, so you need these:
Docs
Installation (install the development version).
discord.ext.commands.FlageConvert on API Refrence and Commands
I can use ClearFlags object. because has all of attributes I need. For example:
#command('clear', aliases=['cls', 'clean'])
async def clear_messages(ctx: Context, limit: int = 10, flags: ClearFlag = ClearFlag()):
...
But should never be a type
I simplifying my problem here. I need to do this:
python test.py --arg1 '--no-route53'
I added the parser like this:
parser.add_argument("--arg1", nargs="?", type=str, help="option")
args = parser.parse_args()
I wanted to get the '--no-route53' as a whole string and use it later in my script. But I keep getting this error:
test.py: error: unrecognized arguments: --no-route53
How can I work around it?
UPDATE1: if i give extra space after '--no-route53', like this and it worked:
python test.py --arg1 '--no-route53 '
I had the exact same issue a while ago. I ended up pre-parsing the sys.argvlist:
def bug_workaround(argv):
# arg needs to be prepended by space in order to not be interpreted as
# an option
add_space = False
args = []
for arg in argv[1:]: # Skip argv[0] as that should not be passed to parse_args
if add_space:
arg = " " + arg
add_space = True if arg == "--args" else False
args.append(arg)
return args
parser = argparse.ArgumentParser(description='My progrm.')
.
.
.
parser.add_argument('--args', type=str,
help='Extra args (in "quotes")')
args = parser.parse_args(bug_workaround(sys.argv))
# Prune added whitespace (for bug workaround)
if args.args:
args.args = args.args[1:]
It is straight forward creating a string parameter such as --test_email_address below.
class Command(BaseCommand):
option_list = BaseCommand.option_list + (
make_option('--test_email_address',
action='store',
type="string",
dest='test_email_address',
help="Specifies test email address."),
make_option('--vpds',
action='store',
type='list', /// ??? Throws exception
dest='vpds',
help="vpds list [,]"),
)
But how can I define a list to be passed in? such as [1, 3, 5]
You should add a default value and change the action to 'append':
make_option('--vpds',
action='append',
default=[],
dest='vpds',
help="vpds list [,]"),
The usage is as follows:
python manage.py my_command --vpds arg1 --vpds arg2
I am attempting to use argparse to convert an argument into a timedelta object. My program reads in strings supplied by the user and converts them to various datetime objects for later usage. I cannot get the filter_length argument to process correctly though. My code:
import datetime
import time
import argparse
def mkdate(datestring):
return datetime.datetime.strptime(datestring, '%Y-%m-%d').date()
def mktime(timestring):
return datetime.datetime.strptime(timestring, '%I:%M%p').time()
def mkdelta(deltatuple):
return datetime.timedelta(deltatuple)
parser = argparse.ArgumentParser()
parser.add_argument('start_date', type=mkdate, nargs=1)
parser.add_argument('start_time', type=mktime, nargs=1, )
parser.add_argument('filter_length', type=mkdelta, nargs=1, default=datetime.timedelta(1))#default filter length is 1 day.
I run the program, passing 1 as the timedelta value (I only want it to be one day):
> python program.py 2012-09-16 11:00am 1
But I get the following error:
>>> program.py: error: argument filter_length: invalid mkdelta value: '1'
I don't understand why the value is invalid. If I call the mkdelta function on its own, like this:
mkdelta(1)
print mkdelta(1)
It returns:
datetime.timedelta(1)
1 day, 0:00:00
This is exactly the value that I'm looking for. Can someone help me figure out how to do this conversion properly using argparse?
Notice the quotes around '1' in your error message? You pass a string to mkdelta, whereas in your test code, you pass an integer.
Your function doesn't handle a string argument, which is what argparse is handing it; call int() on it:
def mkdelta(deltatuple):
return datetime.timedelta(int(deltatuple))
If you need to support more than days, you'll have to find a way to parse the argument passed in into timedelta arguments.
You could, for example, support d, h, m or s postfixes to denote days, hours, minutes or seconds:
_units = dict(d=60*60*24, h=60*60, m=60, s=1)
def mkdelta(deltavalue):
seconds = 0
defaultunit = unit = _units['d'] # default to days
value = ''
for ch in list(str(deltavalue).strip()):
if ch.isdigit():
value += ch
continue
if ch in _units:
unit = _units[ch]
if value:
seconds += unit * int(value)
value = ''
unit = defaultunit
continue
if ch in ' \t':
# skip whitespace
continue
raise ValueError('Invalid time delta: %s' % deltavalue)
if value:
seconds = unit * int(value)
return datetime.timedelta(seconds=seconds)
Now your mkdelta method accepts more complete deltas, and even integers still:
>>> mkdelta('1d')
datetime.timedelta(1)
>>> mkdelta('10s')
datetime.timedelta(0, 10)
>>> mkdelta('5d 10h 3m 10s')
datetime.timedelta(5, 36190)
>>> mkdelta(5)
datetime.timedelta(5)
>>> mkdelta('1')
datetime.timedelta(1)
The default unit is days.
You could use a custom action to collect all the remaining args and parse them into a timedelta.
This will allow you to write CLI commands such as
% test.py 2012-09-16 11:00am 2 3 4 5
datetime.timedelta(2, 3, 5004) # args.filter_length
You could also provide optional arguments for --days, --seconds, etc, so you can write CLI commands such as
% test.py 2012-09-16 11:00am --weeks 6 --days 0
datetime.timedelta(42) # args.filter_length
% test.py 2012-09-16 11:00am --weeks 6.5 --days 0
datetime.timedelta(45, 43200)
import datetime as dt
import argparse
def mkdate(datestring):
return dt.datetime.strptime(datestring, '%Y-%m-%d').date()
def mktime(timestring):
return dt.datetime.strptime(timestring, '%I:%M%p').time()
class TimeDeltaAction(argparse.Action):
def __call__(self, parser, args, values, option_string = None):
# print '{n} {v} {o}'.format(n = args, v = values, o = option_string)
setattr(args, self.dest, dt.timedelta(*map(float, values)))
parser = argparse.ArgumentParser()
parser.add_argument('start_date', type = mkdate)
parser.add_argument('start_time', type = mktime)
parser.add_argument('--days', type = float, default = 1)
parser.add_argument('--seconds', type = float, default = 0)
parser.add_argument('--microseconds', type = float, default = 0)
parser.add_argument('--milliseconds', type = float, default = 0)
parser.add_argument('--minutes', type = float, default = 0)
parser.add_argument('--hours', type = float, default = 0)
parser.add_argument('--weeks', type = float, default = 0)
parser.add_argument('filter_length', nargs = '*', action = TimeDeltaAction)
args = parser.parse_args()
if not args.filter_length:
args.filter_length = dt.timedelta(
args.days, args.seconds, args.microseconds, args.milliseconds,
args.minutes, args.hours, args.weeks)
print(repr(args.filter_length))
This gist seems to solve your problem: https://gist.github.com/jnothman/4057689
Just in case somebody lands here looking to add a pandas.Timedelta to an argument parser (as I just did), it turns out the following works just fine:
parser = argparse.ArgumentParser()
parser.add_argument('--timeout', type=pd.Timedelta)
Then:
>>> parser.parse_args(['--timeout', '24 hours'])
Namespace(timeout=Timedelta('1 days 00:00:00'))
I'm writing a server querying tool, and I have a little bit of code to parse arguments at the very top:
# Parse arguments
p = argparse.ArgumentParser()
g = p.add_mutually_exclusive_group(required=True)
g.add_argument('--odam', dest='query_type', action='store_const',
const='odam', help="Odamex Master query.")
g.add_argument('--odas', dest='query_type', action='store_const',
const='odas', help="Odamex Server query.")
p.add_argument('address', nargs='*')
args = p.parse_args()
# Default master server arguments.
if args.query_type == 'odam' and not args.address:
args.address = [
'master1.odamex.net:15000',
'master2.odamex.net:15000',
]
# If we don't have any addresses by now, we can't go on.
if not args.address:
print "If you are making a server query, you must pass an address."
sys.exit(1)
Is there a nicer way to do this, preferably all within the parser? That last error looks a little out of place, and it would be nice if I could make nargs for address depend on if --odam or ---odas is passed. I could create a subparser, but that would make help look a little odd since it would leave off the addresses part of the command.
You can do this with an custom argparse.Action:
import argparse
import sys
class AddressAction(argparse.Action):
def __call__(self, parser, args, values, option = None):
args.address=values
if args.query_type=='odam' and not args.address:
args.address=[
'master1.odamex.net:15000',
'master2.odamex.net:15000',
]
if not args.address:
parser.error("If you are making a server query, you must pass an address.")
p = argparse.ArgumentParser()
g = p.add_mutually_exclusive_group(required=True)
g.add_argument('--odam', dest='query_type', action='store_const',
const='odam', help="Odamex Master query.")
g.add_argument('--odas', dest='query_type', action='store_const',
const='odas', help="Odamex Server query.")
p.add_argument('address', nargs='*', action=AddressAction)
args = p.parse_args()
yields
% test.py --odas
If you are making a server query, you must pass an address.
% test.py --odam
Namespace(address=['master1.odamex.net:15000', 'master2.odamex.net:15000'], query_type='odam')
% test.py --odam 1 2 3
Namespace(address=['1', '2', '3'], query_type='odam')