How to get pod volume list using python? - python

My pod has a volume as:
"volumes": [
{
"name": "configs",
"secret": {
"defaultMode": 420,
"secretName": "some_secret"
}
},
....]
I want to be able to read it using Python as V1Volume.
Tried to do:
from kubernetes import config
config.load_incluster_config()
spec = client.V1PodSpec()
But I'm stuck as it gives me
raise ValueError("Invalid value for `containers`, must not be `None`")
and I'm not sure how to continue. How can I get the volumes from the V1PodSpec?

It gives you the error because you initialise V1PodSpec without any arguments. V1PodSpec used to create pods, not to read them.
To read pod spec from Kubernetes:
from kubernetes import client,config
config.load_kube_config()
# or
# config.load_incluster_config()
core_api = client.CoreV1Api()
response = core_api.read_namespaced_pod(name="debug-pod", namespace='dev')
# access volumes in the returned response
type(response.spec.volumes[0])
# returns:
# <class 'kubernetes.client.models.v1_volume.V1Volume'>

Related

Run a YAML SSM Run Document from AWS Lambda with Parameters

I am trying to run a YAML SSM document from a Python AWS Lambda, using boto3 ssm.send_command with parameters, but even if I'm just trying to run the sample "Hello World", I get:
"errorMessage": "An error occurred (InvalidParameters) when calling the SendCommand operation: document TestMessage does not support parameters.
JSON Run Documents work without an issue, so it seems like the parameters are being passed in JSON format, but the document I intend this for contains a relatively long Powershell script, JSON needing to run it all on a single line would be awkward, and I am hoping to avoid needing to run it from an S3 bucket. Can anyone suggest a way to run a YAML Run Document with parameters from the Lambda?
As far as I know AWS lambda always gets it's events as JSON. My suggestion would be that in the lambda_handler.py file declare a new variable like this:
import json
import yaml
def handler_name(event, context):
yaml_event = yaml.dump(json.load(event))
#rest of the code...
This way the event will be in YAML format and you can use that variable instead of the event, which is in JSON format.
Here is an example of running a YAML Run Command document using boto3 ssm.send_command in a Lambda running Python 3.8. Variables are passed to the Lambda using either environment variables or SSM Parameter Store. The script is retrieved from S3 and accepts a single parameter formatted as a JSON string which is passed to the bash script running on Linux (sorry I don't have one for PowerShell).
The SSM Document is deployed using CloudFormation but you could also create it through the console or CLI. Based on the error message you cited, perhaps verify the Document Type is set as "Command".
SSM Document (wrapped in CloudFormation template, refer to the Content property)
Neo4jLoadQueryDocument:
Type: AWS::SSM::Document
Properties:
DocumentType: "Command"
DocumentFormat: "YAML"
TargetType: "/AWS::EC2::Instance"
Content:
schemaVersion: "2.2"
description: !Sub "Load Neo4j for ${AppName}"
parameters:
sourceType:
type: "String"
description: "S3"
default: "S3"
sourceInfo:
type: "StringMap"
description: !Sub "Downloads all files under the ${AppName} scripts prefix"
default:
path: !Sub 'https://{{resolve:ssm:/${AppName}/${Stage}/${AWS::Region}/DataBucketName}}.s3.amazonaws.com/config/scripts/'
commandLine:
type: "String"
description: "These commands are invoked by a Lambda script which sets the correct parameters (Refer to documentation)."
default: 'bash start_task.sh'
workingDirectory:
type: "String"
description: "Working directory"
default: "/home/ubuntu"
executionTimeout:
type: "String"
description: "(Optional) The time in seconds for a command to complete before it is considered to have failed. Default is 3600 (1 hour). Maximum is 28800 (8 hours)."
default: "86400"
mainSteps:
- action: "aws:downloadContent"
name: "downloadContent"
inputs:
sourceType: "{{ sourceType }}"
sourceInfo: "{{ sourceInfo }}"
destinationPath: "{{ workingDirectory }}"
- action: "aws:runShellScript"
name: "runShellScript"
inputs:
runCommand:
- ""
- "directory=$(pwd)"
- "export PATH=$PATH:$directory"
- " {{ commandLine }} "
- ""
workingDirectory: "{{ workingDirectory }}"
timeoutSeconds: "{{ executionTimeout }}"
Lambda function
import os
import boto3
neo4j_load_query_document_name = os.environ["NEO4J_LOAD_QUERY_DOCUMENT_NAME"]
# neo4j_database_instance_id = os.environ["NEO4J_DATABASE_INSTANCE_ID"]
neo4j_database_instance_id_param = os.environ["NEO4J_DATABASE_INSTANCE_ID_SSM_PARAM"]
load_neo4j_activity = os.environ["LOAD_NEO4J_ACTIVITY"]
app_name = os.environ["APP_NAME"]
# Get SSM Document Neo4jLoadQuery
ssm = boto3.client('ssm')
response = ssm.get_document(Name=neo4j_load_query_document_name)
neo4j_load_query_document_content = json.loads(response["Content"])
# Get Instance ID
neo4j_database_instance_id = ssm.get_parameter(Name=neo4j_database_instance_id_param)["Parameter"]["Value"]
# Extract document parameters
neo4j_load_query_document_parameters = neo4j_load_query_document_content["parameters"]
command_line_default = neo4j_load_query_document_parameters["commandLine"]["default"]
source_info_default = neo4j_load_query_document_parameters["sourceInfo"]["default"]
def lambda_handler(event, context):
params = {
"params": {
"app_name": app_name,
"activity_arn": load_neo4j_activity,
}
}
# Include params JSON as command line argument
cmd = f"{command_line_default} \'{json.dumps(params)}\'"
try:
response = ssm.send_command(
InstanceIds=[
neo4j_database_instance_id,
],
DocumentName=neo4j_load_query_document_name,
Parameters={
"commandLine":[cmd],
"sourceInfo":[json.dumps(source_info_default)]
},
MaxConcurrency='1')
if response['ResponseMetadata']['HTTPStatusCode'] != 200:
logger.error(json.dumps(response, cls=DatetimeEncoder))
raise Exception("Failed to send command")
else:
logger.info(f"Command `{cmd}` invoked on instance {neo4j_database_instance_id}")
except Exception as err:
logger.error(err)
raise err
return
Parameters in a JSON document are not necessarily in JSON themselves, they can easily be string or numeric values (more likely IMO). If you want to pass a parameter in JSON format (not the same as a JSON document), pay attention to quotes and escaping.

Azure CLI returning second array when only expecting one

I'm working with azure CLI to script out a storage upgrade as well as add a policy, all in a python script. However, when I run the script I'm getting some expected and some very NOT expected output.
What I'm using so far:
from azure.cli.core import get_default_cli
def az_cli (args_str):
args = args_str.split()
cli = get_default_cli()
cli.invoke(args)
if cli.result.result:
return cli.result.result
elif cli.result.error:
raise cli.result.error
return True
sas = az_cli("storage account list --query [].{Name:name,ResourceGroup:resourceGroup,Kind:kind}")
print(sas)
By using this SO article as reference I'm pretty easily making Azure CLI calls, however my output is the following:
[
{
"Kind": "StorageV2",
"Name": "TestStorageName",
"ResourceGroup": "my_test_RG"
},
{
"Kind": "Storage",
"Name": "TestStorageName2",
"ResourceGroup": "my_test_RG_2"
}
]
[OrderedDict([('Name', 'TestStorageName'), ('ResourceGroup', 'my_test_RG'), ('Kind', 'StorageV2')]), OrderedDict([('Name', 'TestStorageName2'), ('ResourceGroup', 'my_test_RG_2'), ('Kind', 'Storage')])]
I appear to be getting 2 arrays back, and I'm unsure of what the cause is. I'm assuming it has to do with my using the --query to narrow down the output I get back, but I'm at a loss as to why it then repeats itself. Expected result would just be the first part that's in json format. I have also tried with tsv output as well with the same results. I appreciate any insight!

How to create a Dataproc cluster with time to live using Python SDK

I try to create a Dataproc cluster which has a time to live of 1 day using python SDK. For this purpose, v1beta2 of the Dataproc API introduces the LifecycleConfig object which is child of the ClusterConfig object.
I use this object in the JSON file which I pass to the create_cluster method. To set the particular TTL, I use the field auto_delete_ttl which shall have the value 86,400 seconds (one day).
The documentation of Google Protocol Buffers is rather specific about how to represent a duration in the JSON file: Durations shall be represented as string with suffix s for seconds and there shall be 0,3,6 or 9 fractional seconds:
However, if I pass the duration using this format, I get the error:
Parameter to MergeFrom() must be instance of same class: expected google.protobuf.Duration got str
This is how I create the cluster:
from google.cloud import dataproc_v1beta2
project = "your_project_id"
region = "europe-west4"
cluster = "" #see below for cluster JSON file
client = dataproc_v1beta2.ClusterControllerClient(client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
})
# Create the cluster
operation = client.create_cluster(project, region, cluster)
The variable cluster holds the JSON object describing the desired cluster:
{
"cluster_name":"my_cluster",
"config":{
"config_bucket":"my_conf_bucket",
"gce_cluster_config":{
"zone_uri":"europe-west4-a",
"metadata":{
"PIP_PACKAGES":"google-cloud-storage google-cloud-bigquery"
},
"subnetwork_uri":"my subnet",
"service_account_scopes":[
"https://www.googleapis.com/auth/cloud-platform"
],
"tags":[
"some tags"
]
},
"master_config":{
"num_instances":1,
"machine_type_uri":"n1-highmem-4",
"disk_config":{
"boot_disk_type":"pd-standard",
"boot_disk_size_gb":200,
"num_local_ssds":0
},
"accelerators":[
]
},
"software_config":{
"image_version":"1.4-debian9",
"properties":{
"dataproc:dataproc.allow.zero.workers":"true",
"yarn:yarn.log-aggregation-enable":"true",
"dataproc:dataproc.logging.stackdriver.job.driver.enable":"true",
"dataproc:dataproc.logging.stackdriver.enable":"true",
"dataproc:jobs.file-backed-output.enable":"true"
},
"optional_components":[
]
},
"lifecycle_config":{
"auto_delete_ttl":"86400s"
},
"initialization_actions":[
{
"executable_file":"gs://some-init-script"
}
]
},
"project_id":"project_id"
}
Package versions I am using:
google-cloud-dataproc: 0.6.1
protobuf: 3.11.3
googleapis-common-protos: 1.6.0
Am I doing something wrong here, is it an issue with wrong package versions or is it even a bug?
You should use 100s format for a duration type when you construct protobuf in a text format (i.e. json, etc), but you are using a Python object to construct API request body, that's why you need to create a Duration object instead of a string:
duration_message.FromSeconds(86400)

Read laz files are stored on IBM COS

I have a problem with reading laz files that are stored at IBM cloud object storage. I have built pywren-ibm library with all requirements that pdal one of them with docker and I then deployed it to IBM cloud function as an action, where the error that appear is "Unable to open stream for 'Colorea.laz" with error 'No such file or directory.' How can I read the files with pdal in IBM cloud function?
Here is some of the code:
import pywren_ibm_cloud as pywren
import pdal
import json
def manip_data(bucket, key, data_stream):
data = data_stream.read()
cr_json ={
"pipeline": [
{
"type": "readers.las",
"filename": f"{key}"
},
{
"type":"filters.range",
"limits":"Classification[9:9]"
}
]
}
pipeline = pdal.Pipeline(json.dumps(cr_json, indent=4))
pipeline.validate()
pipeline.loglevel = 8
n_points = pipeline.execute()
bucketname = 'The bucket name'
pw = pywren.ibm_cf_executor(runtime='ammarokran/pywren-pdal:1.0')
pw.map(manip_data, bucketname, chunk_size=None)
print(pw.get_result())
The code is running from local pc with jupyter notebook.
You'll need to specify some credentials and the correct endpoint for the bucket holding the files you're trying to access. Not totally sure how that works with a custom runtime, but typically you can just pass a config object in the executor.
import pywren_ibm_cloud as pywren
config = {'pywren' : {'storage_bucket' : 'BUCKET_NAME'},
'ibm_cf': {'endpoint': 'HOST',
'namespace': 'NAMESPACE',
'api_key': 'API_KEY'},
'ibm_cos': {'endpoint': 'REGION_ENDPOINT',
'api_key': 'API_KEY'}}
pw = pywren.ibm_cf_executor(config=config)

How do I create a partitioned collection in Cosmos DB with pydocumentdb?

The pydocumentdb.document_client.DocumentClient object has a CreateCollection() method, defined here.
When creating a collection with this method, one needs to specify the database link (already known), the collection (I don't know how to reference it if it hasn't been made) and options.
Parameters that I would like to control when creating the collection are:
name of collection
type of collection (fixed size vs. partitioned)
partition keys
RU value
Indexing policy (or at least be able to create a default template somewhere and automatically copy it to the newly created one)
Enums for some of these parameters seem to be defined here, but I don't see any potentially useful HTTP headers in http_constants.py, and I don't see where RUs come in to play or where a cohesive "Collection" object would be passed as a parameter.
You could refer to the source sample code from here and the rest api from here.
import pydocumentdb;
import pydocumentdb.errors as errors
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'https://***.documents.azure.com:443/',
'MASTERKEY': '***'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
databaseLink = "dbs/db"
coll = {
"id": "testCreate",
"indexingPolicy": {
"indexingMode": "lazy",
"automatic": False
},
"partitionKey": {
"paths": [
"/AccountNumber"
],
"kind": "Hash"
}
}
collection_options = { 'offerThroughput': 400 }
client.CreateCollection(databaseLink , coll, collection_options)
Hope it helps you.

Categories