Convert python script to airflow dag - python

I have identified the below script as being really useful for anyone running Amazon Redshift:
#!/usr/bin/env python
from __future__ import print_function
'''
analyze-vacuum-schema.py
* Copyright 2015, Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Amazon Software License (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/asl/
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
The Redshift Analyze Vacuum Utility gives you the ability to automate VACUUM and ANALYZE operations.
When run, it will analyze or vacuum an entire schema or individual tables. This Utility Analyzes
and Vacuums table(s) in a Redshift Database schema, based on certain parameters like unsorted,
stats off and size of the table and system alerts from stl_explain & stl_alert_event_log.
By turning on/off '--analyze-flag' and '--vacuum-flag' parameters, you can run it as 'vacuum-only'
or 'analyze-only' utility. This script can be scheduled to run VACUUM and ANALYZE as part of
regular maintenance/housekeeping activities, when there are less database activities (quiet period).
This script will:
1) Analyze a single table or tables in a schema based on,
a) Alerts from stl_explain & stl_alert_event_log.
b) 'stats off' metrics from SVV_TABLE_INFO.
2) Vacuum a single table or tables in a schema based on,
a) The alerts from stl_alert_event_log.
b) The 'unsorted' and 'size' metrics from SVV_TABLE_INFO.
c) Vacuum reindex to analyze the interleaved sort keys
Srinikri Amazon Web Services (2015)
11/21/2015 : Added support for vacuum reindex to analyze the interleaved sort keys.
09/01/2017 : Fixed issues with interleaved sort key tables per https://github.com/awslabs/amazon-redshift-utils/issues/184
11/09/2017 : Refactored to support running in AWS Lambda
14/12/2017 : Refactored to support a more sensible interface style with kwargs
'''
import os
import sys
import argparse
# add the lib directory to the sys path
try:
sys.path.append(os.path.join(os.path.dirname(__file__), "lib"))
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
except:
pass
import getopt
import analyze_vacuum
import config_constants
__version__ = ".9.2.1"
OK = 0
ERROR = 1
INVALID_ARGS = 2
NO_WORK = 3
TERMINATED_BY_USER = 4
NO_CONNECTION = 5
# setup cli args
parser = argparse.ArgumentParser()
parser.add_argument("--analyze-flag", dest="analyze_flag", default=True, type=bool,
help="Flag to turn ON/OFF ANALYZE functionality (True or False : Default = True ")
parser.add_argument("--max-unsorted-pct", dest="max_unsorted_pct",
help="Maximum unsorted percentage(% to consider a table for vacuum : Default = 50%")
parser.add_argument("--min-interleaved-cnt", dest="min_interleaved_cnt", type=int,
help="Minimum stv_interleaved_counts records to consider a table for vacuum reindex: Default = 0")
parser.add_argument("--min-interleaved-skew", dest="min_interleaved_skew",
help="Minimum index skew to consider a table for vacuum reindex: Default = 1.4")
parser.add_argument("--min-unsorted-pct", dest="min_unsorted_pct",
help="Minimum unsorted percentage(% to consider a table for vacuum : Default = 5%")
parser.add_argument("--stats-off-pct ", dest="stats_off_pct",
help="Minimum stats off percentage(% to consider a table for analyze : Default = 10%")
parser.add_argument("--table-name", dest="table_name",
help="A specific table to be Analyzed or Vacuumed if analyze-schema is not desired")
parser.add_argument("--vacuum-flag", dest="vacuum_flag", default=True, type=bool,
help="Flag to turn ON/OFF VACUUM functionality (True or False : Default = True")
parser.add_argument("--vacuum-parameter", dest="vacuum_parameter",
help="Vacuum parameters [ FULL | SORT ONLY | DELETE ONLY | REINDEX ] Default = FULL")
parser.add_argument("--blacklisted-tables", dest="blacklisted_tables", help="The tables we do not want to Vacuum")
parser.add_argument("--db-conn-opts", dest="db_conn_opts",
help="Additional connection options. name1=opt1[ name2=opt2]..")
parser.add_argument("--db-host", dest="db_host", required=True, help="The Cluster endpoint")
parser.add_argument("--db-port", dest="db_port", type=int, required=True,
help="The Cluster endpoint port : Default = 5439")
parser.add_argument("--db-pwd", dest="db_pwd", help="The Password for the Database User to connect to")
parser.add_argument("--db-user", dest="db_user", required=True, help="The Database User to connect to")
parser.add_argument("--debug ", dest="debug", default=False,
help="Generate Debug Output including SQL Statements being run")
parser.add_argument("--ignore-errors", dest="ignore_errors", default=True,
help="Ignore errors raised when running and continue processing")
parser.add_argument("--max-table-size-mb", dest="max_table_size_mb", type=int,
help="Maximum table size in MB : Default = 700*1024 MB")
parser.add_argument("--output-file", dest="output_file", help="The full path to the output file to be generated")
parser.add_argument("--predicate-cols", dest="predicate_cols", help="Analyze predicate columns only")
parser.add_argument("--query-group", dest="query_group", help="Set the query_group for all queries")
parser.add_argument("--require-ssl", dest="require_ssl", default=False,
help="Does the connection require SSL? (True | False")
parser.add_argument("--schema-name", dest="schema_name",
help="The Schema to be Analyzed or Vacuumed (REGEX: Default = public")
parser.add_argument("--slot-count", dest="slot_count", help="Modify the wlm_query_slot_count : Default = 1")
parser.add_argument("--suppress-cloudwatch", dest="suppress_cw",
help="Don't emit CloudWatch metrics for analyze or vacuum when set to True")
parser.add_argument("--db", dest="db", help="The Database to Use")
full_args = parser.parse_args()
parse_args = {}
# remove args that end up as None
for k, v in vars(full_args).items():
if v is not None:
parse_args[k] = v
def main(argv):
# get environmental args
args = {config_constants.DB_NAME: os.environ.get('PGDATABASE', None),
config_constants.DB_USER: os.environ.get('PGUSER', None),
config_constants.DB_PASSWORD: os.environ.get('PGPASSWORD', None),
config_constants.DB_HOST: os.environ.get('PGHOST', None),
config_constants.DB_PORT: os.environ.get('PGPORT', 5439)}
# add argparse args
args.update(parse_args)
if args.get(config_constants.OUTPUT_FILE) is not None:
sys.stdout = open(args.get(config_constants.OUTPUT_FILE), 'w')
# invoke the main method of the utility
result = analyze_vacuum.run_analyze_vacuum(**args)
if result is not None:
sys.exit(result)
else:
sys.exit(0)
if __name__ == "__main__":
main(sys.argv)
However, you can only "easily" schedule it from cron in an EC2 or similar scheduler. So, I have been trying to find a way to run it as an airflow dag.
I have found two similar questions on stackoverflow and I guess it is missing some import commands perhaps? like the below:
from airflow import DAG
from airflow.models import Variable
I'm hoping to find someone familiar enough with airflow to be able to help me find the required entries or point me in the right direction?
If so, I can create a fork from the original script:
https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AnalyzeVacuumUtility/analyze-vacuum-schema.py
Which will assist others in the future with the same goal.

How about creating a new custom operator? It should accept all the cli arguments and then you can pass them to code from existing script. Here is some rough draft of what I would do:
from airflow.models import BaseOperator
class AnalyzeVacuumOperator(BaseOperator):
def __init__(
self,
*,
analyze_flag: bool = True,
max_unsorted_pct: float = None,
# ... here goes all other arguments from argparse
**kwargs
):
super().__init__(**kwargs)
self.args = {
config_constants.DB_NAME: os.environ.get('PGDATABASE', None),
# Here goes all the rest of args defined in original script
# ...
# Then we add new arguments configured by users
# which are now passed using cli (argparse).
# TODO: check what are current key names so we match with them
analyze_flag: analyze_flag,
max_unsorted_pct: max_unsorted_pct,
# ...
}
def execute(context):
args = self.args
# Here we copy paste from the existing script
if args.get(config_constants.OUTPUT_FILE) is not None:
sys.stdout = open(args.get(config_constants.OUTPUT_FILE), 'w')
# invoke the main method of the utility
result = analyze_vacuum.run_analyze_vacuum(**args)
if result is not None:
sys.exit(result)
else:
sys.exit(0)
Then you can use such operator in your DAGs:
with DAG(...) as dag:
AnalyzeVacuumOperator(
task_id="vacuum_task",
analyze_flag=True,
slot_count=3,
)
Mind that you may need to adjust imports and path changes as in original script.

Related

How to support dig's #<ip> syntax in python argparse

The linux dig utility allows you to specify the target DNS server with '#ip' syntax, like dig #8.8.8.8 A www.example.com. How can I can support that same syntax support in Python's argparse module? (Installing dnsutils in my container was adding 50MB to my image size, so I'm trying to put together a small and simple resolver with dnspython that mimics my most commonly used dig features.)
#!/usr/local/bin/python
import dns.resolver
import argparse
parser = argparse.ArgumentParser(description='Small and incomplete python replacement for dig.')
#parser = argparse.ArgumentParser(prefix_chars='-#', description='Small and incomplete python replacement for dig.')
parser.add_argument('query_type', nargs='?', default='A', help='The type of record to resolve')
parser.add_argument('url', type=str,
help='The URL to resolve')
parser.add_argument('#', dest=dns,
help='The DNS server to query.')
if __name__ == "__main__":
resolver = dns.resolver.Resolver()
args = parser.parse_args()
if args.dns:
print(f'DNS is {args.dns}')
resolver.nameservers=[args.dns]
if args.query_type:
print(f'query_type is {args.query_type}')
if args.url:
print(f'url is {args.url}')
records = resolver.resolve('_imaps._tcp.gmail.com', args.query_type)
for record in records:
print(record)

How do I pass variables defined from argparse in config as a module to other scripts?

The main problem I have is accessing variables defined in a configuration file as a module in other scripts when those variables are defined by argparse inputs. If I encapsulate this code within a function, I will not have access to it in other scripts. Similarly, if I don't use a function, the configuration file will expect inputs whenever it is called as a module. I think global variables might be the answer, but I don't know how to set them up in this use case?
import shutil
import inspect
import configparser
import sys
import argparse
def cmd_inputs():
parser = argparse.ArgumentParser()
parser.add_argument('-dir','--geofloodhomedir',help="File path to directory that will hold the cfg file \
and the inputs and outputs folders. Default is the GeoNet3 directory.",
type=str)
parser.add_argument('-p','--project',
help="Folder within GIS, Hydraulics, and NWM directories \
of Inputs and Outputs. Default is 'my_project'.",
type=str)
parser.add_argument('-n','--DEM_name',
help="Name of Input DEM (without extension) and the prefix used for all \
project outputs. Default is 'dem'",
type=str)
parser.add_argument('--no_chunk',help="Dont batch process DEM's > 1.5 GB for Network Extraction. Default is to chunk DEMs larger than 1.5 GB.",
action='store_true')
global home_dir
args = parser.parse_args()
if args.geofloodhomedir:
home_dir = args.geofloodhomedir
if os.path.abspath(home_dir) == os.path.dirname(os.path.abspath(__file__)):
home_dir = os.getcwd()
print("hello")
print('GeoNet and GeoFlood Home Directory: ')
print(home_dir)
else:
ab_path = os.path.abspath(__file__)
home_dir = os.path.dirname(ab_path)
print('Using default GeoNet and GeoFlood home directory:')
print(home_dir)
if args.project:
project_name=args.project
print(f"Project Name: {project_name}")
else:
project_name='my_project'
print(f"Default Project Name: {project_name}")
if args.DEM_name:
dem_name = args.DEM_name
print(f'DEM Name: {dem_name}')
else:
dem_name='dem'
print(f'Default DEM: {dem_name}')
if not args.no_chunk:
print("Chunking DEMs > 1.5 GBs in Network Extraction script")
chunk = 1
else:
print("Not chunking input DEM or its products in Network Extraction script")
chunk = 0
config = configparser.ConfigParser()
config['Section']={'geofloodhomedir':home_dir,
'projectname':project_name,
'dem_name':dem_name,
'Chunk_DEM':chunk}
with open(os.path.join(home_dir,'GeoFlood.cfg'),'w') as configfile:
config.write(configfile)
if __name__=='__main__':
cmd_inputs()
print ("Configuration Complete")```

how to create vms with pyvmomi

I am trying to create a python program that will create a provided number of identical virtual machines. I have used the community sample scripts to get as much as i can running but i am completely stuck now.
#!/usr/bin/env python
"""
vSphere SDK for Python program for creating tiny VMs (1vCPU/128MB)
"""
import atexit
import hashlib
import json
import random
import time
import requests
from pyVim import connect
from pyVmomi import vim
from tools import cli
from tools import tasks
from add_nic_to_vm import add_nic, get_obj
def get_args():
"""
Use the tools.cli methods and then add a few more arguments.
"""
parser = cli.build_arg_parser()
parser.add_argument('-c', '--count',
type=int,
required=True,
action='store',
help='Number of VMs to create')
parser.add_argument('-d', '--datastore',
required=True,
action='store',
help='Name of Datastore to create VM in')
parser.add_argument('--datacenter',
required=True,
help='Name of the datacenter to create VM in.')
parser.add_argument('--folder',
required=True,
help='Name of the vm folder to create VM in.')
parser.add_argument('--resource-pool',
required=True,
help='Name of resource pool to create VM in.')
parser.add_argument('--opaque-network',
help='Name of the opaque network to add to the new VM')
# NOTE (hartsock): as a matter of good security practice, never ever
# save a credential of any kind in the source code of a file. As a
# matter of policy we want to show people good programming practice in
# these samples so that we don't encourage security audit problems for
# people in the future.
args = parser.parse_args()
return cli.prompt_for_password(args)
def create_dummy_vm(vm_name, service_instance, vm_folder, resource_pool,
datastore):
"""Creates a dummy VirtualMachine with 1 vCpu, 128MB of RAM.
:param name: String Name for the VirtualMachine
:param service_instance: ServiceInstance connection
:param vm_folder: Folder to place the VirtualMachine in
:param resource_pool: ResourcePool to place the VirtualMachine in
:param datastore: DataStrore to place the VirtualMachine on
"""
datastore_path = '[' + datastore + '] ' + vm_name
# bare minimum VM shell, no disks. Feel free to edit
vmx_file = vim.vm.FileInfo(logDirectory=None,
snapshotDirectory=None,
suspendDirectory=None,
vmPathName=datastore_path)
config = vim.vm.ConfigSpec(name=vm_name, memoryMB=128, numCPUs=1,
files=vmx_file, guestId='dosGuest',
version='vmx-07')
print("Creating VM {}...".format(vm_name))
task = vm_folder.CreateVM_Task(config=config, pool=resource_pool)
tasks.wait_for_tasks(service_instance, [task])
A=1
def main():
"""
Simple command-line program for creating Dummy VM based on Marvel character
names
"""
name = "computer" + str(A)
args = get_args()
service_instance = connect.SmartConnectNoSSL(host=args.host,
user=args.user,
pwd=args.password,
port=int(args.port))
if not service_instance:
print("Could not connect to the specified host using specified "
"username and password")
return -1
atexit.register(connect.Disconnect, service_instance)
content = service_instance.RetrieveContent()
datacenter = get_obj(content, [vim.Datacenter], args.datacenter)
vmfolder = get_obj(content, [vim.Folder], args.folder)
resource_pool = get_obj(content, [vim.ResourcePool], args.resource_pool)
vm_name = name
create_dummy_vm(vm_name, service_instance, vmfolder, resource_pool,
args.datastore)
A + 1
if args.opaque_network:
vm = get_obj(content, [vim.VirtualMachine], vm_name)
add_nic(service_instance, vm, args.opaque_network)
return 0
# Start program
if __name__ == "__main__":
main()
The error i get when running it is
Creating VM computer1...
Traceback (most recent call last):
File "create_vm.py", line 142, in <module>
main()
File "create_vm.py", line 133, in main
args.datastore)
File "create_vm.py", line 98, in create_dummy_vm
task = vm_folder.CreateVM_Task(config=config, pool=resource_pool)
AttributeError: 'NoneType' object has no attribute 'CreateVM_Task'
I know that my CreateVM_task is returning a a parameter of none but i cant seem to figure out why.
Problem was with the config parameters. With the current code, the datacenter and vmfolder objects come back as none when printed. To fix this I edited it to the following block.
content = service_instance.RetrieveContent()
datacenter = content.rootFolder.childEntity[0]
vmfolder = datacenter.vmFolder
hosts = datacenter.hostFolder.childEntity
resource_pool = hosts[0].resourcePool

argparse solution requested for commands and subcommands with subcommands

Hopefully this will translate into an elegant solution, but i am unable to figure it out myself. I have been reading huge amounts of examples and explanations but I cant seem to get it to work.
I am writing a program that needs the following options:
myprogram:
-h help
--DEBUG debugging on
Commands are: dashboard / promote / deploy / auto-tag / dbquery
Some commands have sub-commands.
Dashboard prosesing:
dashboard <ALL | DEBUG | AAA [sub1-aaa | ...] >
Promoting:
promote <environment> [to environment] <system> [version]
Deployment
deploy <system> [version] <in environment>
Auto-tagging
auto-tag <repo> <project> [--days-back]
Database queries (--tab is optional)
(system or env or both are required)
dbquery lock set <system | environment> [--tab]
dbquery lock clear <system | environment> [--tab]
dbquery lock show <system | environment | before-date | after-date> [--tab]
dbquery promote list <system| version| environment | before-date | after-date> [--tab]
dbquery deploy list <system| version| environment | before-date | after-date> [--tab]
If a command is used, some sub-commands or options are required.
I have a hard time getting this done using the argparse library. I tried to use add_argument_group, subparsers etc. But I think I am missing something basic here. All examples i found getting close are about svn, but they seem to only go 1 level after svn. And i need more, or a different approach.
If possible I would like to make all parameters after dbquery deploy list optional, with at least 1 option required. But to distinguish between the name of a system and an environemnt might become tricky so it might be better to change this:
dbquery lock set <system | environment>
into
dbquery lock set <system=system | env=environment>
p.s. Options enclosed between [] are optional, options between <> are required.
Thanks in advance.
In response to the comment of providing my code, lets focus on the dbquery, because the rest might be a repetition:
import argparse
parser = argparse.ArgumentParser(description="Main cli tool for processing in the CD pipeline (%s)" % VERSION)
subparsers = parser.add_subparsers(help='additional help',title='subcommands')
dbq_parser=subparsers.add_parser("dbqueue", aliases=['dbq'])
dbq_group_lock = dbq_parser.add_argument_group('lock', 'lock desc')
dbq_group_promote =dbq_parser.add_argument_group('promote')
dbq_group_deploy = dbq_parser.add_argument_group('deploy','Deployment description')
dbq_group_lock.add_argument('set', help="Sets lock")
dbq_group_lock.add_argument('clear', help='Clears lock')
dbq_group_lock.add_argument('show', help='Show lock status')
dbq_group_deploy.add_argument('name system etc')
Executing results in:
# python3 main.py -h
usage: main.py [-h] [--debug] {dbqueue,dbq} ...
Main cli tool for processing in the CD pipeline (cdv3, Jun 2019, F.IJskes)
optional arguments:
-h, --help show this help message and exit
--debug Generate debug output and keep temp directories
subcommands:
{dbqueue,dbq} additional help
This looks okay, but:
#python3 main.py dbq -h
usage: main.py dbqueue [-h] set clear show name system etc
optional arguments:
-h, --help show this help message and exit
lock:
lock desc
set Sets lock
clear Clears lock
show Show lock status
shows that the expected parameters are not lock, promote or deploy.
Okay, the feedback helped my understanding. I now understand that parsers can get subparsers and those can get parsers. So there might be no limit in the depth one can go.
This new insight got me to this: (a partial copy from my working example)
import argparse
main_parser = argparse.ArgumentParser(description="Main cli tool for processing in the CD pipeline (%s)" % VERSION)
main_subparsers = main_parser.add_subparsers(help='',title='Possible commnds')
dashbrd_subparser = main_subparsers.add_parser('dashboard', help="Proces specified dashboards", allow_abbrev=True)
dashbrd_subparser.add_argument('who?',help='Proces dashboard for ALL, Supplier or DEBUG what should happen.')
dashbrd_subparser.add_argument('-subsystem', help='Select subsystem to proces for given supplier')
dbq_main=main_subparsers.add_parser("dbquery", help="Query database for locks,deployments or promotes")
dbq_main_sub=dbq_main.add_subparsers(help="additions help", title='sub commands for dbquery')
dbq_lock=dbq_main_sub.add_parser('lock', help='query db for lock(s)')
dbq_lock_sub=dbq_lock.add_subparsers(help='', title='subcommands for lock')
dbq_lock_sub_set=dbq_lock_sub.add_parser('set', help='sets a lock')
dbq_lock_sub_set.add_argument('-env', required=True)
dbq_lock_sub_set.add_argument('--tab',required=False, action="store_true" )
dbq_lock_sub_clear=dbq_lock_sub.add_parser('clear', help='clears a lock')
# dbq_lock_sub_set.add_argument('-env', required=True)
# dbq_lock_sub_set.add_argument('--tab', required=False)
dbq_lock_sub_show=dbq_lock_sub.add_parser('show', help='shows a lock/locks')
# dbq_lock_sub_set.add_argument('-env', required=True)
# dbq_lock_sub_set.add_argument('--tab', required=False)
print( vars(main_parser.parse_args()))
exit(1)
I now only seem to have issues using parameters as '-env' and '-subsystem' across the different sub-commands. Because there is a conflict when i add them to another parser.
I also have no data to use about what options where chosen. This is something that is also needed.
Almost there, and what I now have is usable to a great extent, so I'll post this so other users might benefit from my work, which has benefited by the comments of hpaulj which led me to other documents that could clarify other parts.
# see: https://pymotw.com/3/argparse/
# BLOCK FOR SHARED options used at more places,
# see also: https://stackoverflow.com/questions/7498595/python-argparse-add-argument-to-multiple-subparsers
env_shared = argparse.ArgumentParser(add_help=False)
env_shared.add_argument('-env', action="store", help='name of environment <tst|acc|acc2|prod|...>', metavar='<environment>', required=True)
ds_shared = argparse.ArgumentParser(add_help=False)
ds_shared.add_argument('-ds', action="store", help='name of subsystem to use <ivs-gui|ivs-vpo|...>', metavar='<system name>', required=True)
version_shared = argparse.ArgumentParser(add_help=False)
version_shared.add_argument('-version', action="store", help='version to use. If used, specify at least sprint as in 1.61', metavar='<version>')
tab_shared=argparse.ArgumentParser(add_help=False)
tab_shared.add_argument('--tab', required=False, action="store_true", help='Use nice tabulation for output')
# END OF SHARED options
# MAIN
main_p = argparse.ArgumentParser(description="Main cli tool for use in the CD v3 pipeline (%s)" % VERSION, epilog='Defaults are taken from the configuration file, which is pulled from git when needed.', )
# Change formatting of help output for main-parser
main_p.formatter_class = lambda prog: argparse.HelpFormatter(prog, max_help_position=30, width=80)
main_p.add_argument("--debug", action="store_true", help="Generate debug output and keep temp directories")
# MAIN-SUB
main_s_p = main_p.add_subparsers(title='Commands', dest='main_cmd')
# MAIN-SUB-DASHBOARD
dashbrd_p = main_s_p.add_parser('dashboard', aliases=['db'], help="Proces specified dashboards", parents=[ds_shared])
dashbrd_p.formatter_class = lambda prog: argparse.HelpFormatter(prog, max_help_position=40, width=80)
dashbrd_p.add_argument('which', help='Proces dashboard for <ALL|Supplier|DEBUG what would happen>. If supplier is specified an optional subsystem to be processed can be passed using -ds')
# MAIN-SUB-QUERY
query_p = main_s_p.add_parser("query", aliases=['qry'], help="Query database for locks,deployments or promotes")
query_p.formatter_class = lambda prog: argparse.HelpFormatter(prog, max_help_position=40, width=80)
# MAIN-SUB-QUERY-SUB
query_s_p = query_p.add_subparsers(help="additions help", title='sub commands for query', dest='query_cmd')
# MAIN-SUB-QUERY-LOCK
q_lock_p = query_s_p.add_parser('lock', help='query db for lock(s)', )
# MAIN-SUB-QUERY-LOCK-SUB
q_lock_sub = q_lock_p.add_subparsers(help='', title='subcommands for dbquery lock', dest='lock_cmd')
# MAIN-SUB-QUERY-LOCK-SUB-SET
q_lock_set_p = q_lock_sub.add_parser('set', help='sets a lock', parents=[env_shared, ds_shared])
# MAIN-SUB-QUERY-LOCK-SUB-CLEAR
q_lock_clear_p = q_lock_sub.add_parser('clear', help='clears a lock', parents=[env_shared, ds_shared], )
# MAIN-SUB-QUERY-LOCK-SUB-SHOW
q_lock_show_p = q_lock_sub.add_parser('show', help='shows a lock/locks', parents=[env_shared, tab_shared])
# MAIN-SUB-QUERY-PROMOTE
q_promote_p = query_s_p.add_parser('promote', help='query db for promotions', parents=[env_shared,ds_shared] )
# MAIN-SUB-QUERY-DEPLOY
q_deploy_p = query_s_p.add_parser('deploy', help='query db for deployments')
# MAIN-SUB-PROMOTE
promote_p = main_s_p.add_parser('promote', help='Do a promotion.', parents=[env_shared, ds_shared, version_shared])
promote_p.add_argument('-dest', required=False, action="store", help='If used specifies the destination, otherwise defaults are used. (-dest is optional, -env is not)', metavar='<dest_env>')
# MAIN-SUB-DEPLOY
deploy_p = main_s_p.add_parser('deploy', help='Deploy software', parents=[ds_shared, env_shared])
# MAIN-SUB-AUTOTAG
autotag_p = main_s_p.add_parser('autotag', help="Autotags specified repository", parents=[ds_shared])
print("------------arguments-----------")
print(vars(main_p.parse_args()))
This creates a lot of the functionality i intended.

How to pass arguments to main function within Python module?

I have recently started learning more about Python Packages and Modules. I'm currently busy updating my existing modules so that that can be run as script or imported as a module into my other code. I'm not sure how to construct my input arguments within my module and pass them to the main() function within my module.
I've have written my my main() function and called it under if __name__ == '__main__' passing the input arguments. The inputs are currently hard coded to show what I'm trying to achieve. Any help in how to correctly construct my input arguments that the user will pass, which will then be passed onto the main function will be appreciated.
As mentioned I'm trying to be able to use the following as a script when used directly or imported as a module into my other code and run from there. If I import it as a module would I call the main() function when importing it? Is the following structure correct in how I have written the following? Any advice is appreciated.
'''
Created on March 12, 2017
Create a new ArcHydro Schema
File Geodatabase and Rasters
Folder
#author: PeterW
'''
# import site-packages and modules
import re
from pathlib import Path
import arcpy
# set environment settings
arcpy.env.overwriteOutput = True
def archydro_rasters_folder(workspace):
"""Create rasters folder directory
if it doens't already exist"""
model_name = Path(workspace).name
layers_name = re.sub(r"\D+", "Layers", model_name)
layers_folder = Path(workspace, layers_name)
if layers_folder.exists():
arcpy.AddMessage("Rasters folder: {0} exists".format(layers_name))
else:
layers_folder.mkdir(parents=True)
arcpy.AddMessage("Rasters folder {0} created".format(layers_name))
def archydro_fgdb_schema(workspace, schema, dem):
"""Create file geodatabase using XML
schema and set coordinate system based
on input DEM if it doesn't already exist"""
model_name = Path(workspace).name
fgdb = "{0}.gdb".format(model_name)
if arcpy.Exists(str(Path(workspace, fgdb))):
arcpy.AddMessage("{0} file geodatabase exists".format(fgdb))
else:
new_fgdb = arcpy.CreateFileGDB_management(str(workspace), fgdb)
import_type = "SCHEMA_ONLY"
config_keyword = "DEFAULTS"
arcpy.AddMessage("New {0} file geodatabase created".format(fgdb))
arcpy.ImportXMLWorkspaceDocument_management(new_fgdb, schema,
import_type,
config_keyword)
arcpy.AddMessage("ArcHydro schema imported")
projection = arcpy.Describe(dem).spatialReference
projection_name = projection.PCSName
feature_dataset = Path(workspace, fgdb, "Layers")
arcpy.DefineProjection_management(str(feature_dataset),
projection)
arcpy.AddMessage("Changed projection to {0}".format(projection_name))
def main(workspace, dem, schema):
"""main function to create rasters folder
and file geodatabase"""
archydro_rasters_folder(workspace)
archydro_fgdb_schema(schema, dem, workspace)
if __name__ == '__main__':
main(workspace = r"E:\Projects\2016\01_Bertrand_Small_Projects\G113268\ArcHydro\Model04",
dem = r"E:\Projects\2016\01_Bertrand_Small_Projects\G113268\ArcHydro\DEM2\raw",
schema = r"E:\Python\Masters\Schema\ESRI_UC12\ModelBuilder\Schema\Model01.xml")
Updated: 17/03/13
The following is my updated Python module based on Jonathan's suggestions:
'''
Created on March 12, 2017
Create a new ArcHydro Schema
File Geodatabase and Rasters
Folder
#author: PeterW
'''
# import site-packages and modules
import re
from pathlib import Path
import arcpy
import argparse
# set environment settings
arcpy.env.overwriteOutput = True
def rasters_directory(workspace):
"""Create rasters folder directory
if it doens't already exist"""
model_name = Path(workspace).name
layers_name = re.sub(r"\D+", "Layers", model_name)
layers_folder = Path(workspace, layers_name)
if layers_folder.exists():
arcpy.AddMessage("Rasters folder: {0} exists".format(layers_name))
else:
layers_folder.mkdir(parents=True)
arcpy.AddMessage("Rasters folder {0} created".format(layers_name))
def fgdb_schema(workspace, schema, dem):
"""Create file geodatabase using XML
schema and set coordinate system based
on input DEM if it doesn't already exist"""
model_name = Path(workspace).name
fgdb = "{0}.gdb".format(model_name)
if arcpy.Exists(str(Path(workspace, fgdb))):
arcpy.AddMessage("{0} file geodatabase exists".format(fgdb))
else:
new_fgdb = arcpy.CreateFileGDB_management(str(workspace), fgdb)
import_type = "SCHEMA_ONLY"
config_keyword = "DEFAULTS"
arcpy.AddMessage("New {0} file geodatabase created".format(fgdb))
arcpy.ImportXMLWorkspaceDocument_management(new_fgdb, schema,
import_type,
config_keyword)
arcpy.AddMessage("ArcHydro schema imported")
projection = arcpy.Describe(dem).spatialReference
projection_name = projection.PCSName
feature_dataset = Path(workspace, fgdb, "Layers")
arcpy.DefineProjection_management(str(feature_dataset),
projection)
arcpy.AddMessage("Changed projection to {0}".format(projection_name))
def model_schema(workspace, schema, dem):
"""Create model schema: rasters folder
and file geodatabase"""
rasters_directory(workspace)
fgdb_schema(schema, dem, workspace)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Create a ArcHydro schema')
parser.add_argument('--workspace', metavar='path', required=True,
help='the path to workspace')
parser.add_argument('--schema', metavar='path', required=True,
help='path to schema')
parser.add_argument('--dem', metavar='path', required=True,
help='path to dem')
args = parser.parse_args()
model_schema(workspace=args.workspace, schema=args.schema, dem=args.dem)
This looks correct to me, and yes if you're looking to use this as a module you would import main. Though, it would probably be better to name it in a more descriptive way.
To clarify how __main__ and the function main() works. When you execute a module it will have a name which is stored in __name__. If you execute the module stand alone as a script it will have the name __main__. If you execute it as part of a module ie import it into another module it will have the name of the module.
The function main() can be named anything you would like, and that wouldn't affect your program. It's commonly named main in small scripts but it's not a particularly good name if it's part of a larger body of code.
In terms letting a user to input arguments when running as a script I would look into either using argparse or click
An example of how argparse would work.
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='Create a ArcHydro schema')
parser.add_argument('--workspace', metavar='path', required=True,
help='the path to workspace')
parser.add_argument('--schema', metavar='path', required=True,
help='path to schema')
parser.add_argument('--dem', metavar='path', required=True,
help='path to dem')
args = parser.parse_args()
main(workspace=args.workspace, schema=args.schema, dem=args.dem)

Categories