Run Dataflow Pipeline from Google Cloud Function

Run Dataflow Pipeline from Google Cloud Function - python

I'm executing a Dataflow Pipeline from Google Cloud function and the workflow creation fails with the error shown in the screenshot.
I've created the LauchTemplateParameters according to the official documentation, but some parameters are causing errors:
I would like to set the europe-west1-b zone and n1-standard-4 machine type. What I'm missing?
def call_dataflow(dataflow_name):
service = build('dataflow', 'v1b3')
gcp_project = 'PROJECT_ID'
template_path = 'TEMPLATE_PATH'
template_body = {
'parameters': {
},
'environment': {
'workerZone': 'europe-west1-b',
'subnetwork': 'regions/europe-west1/subnetworks/europe-west1-subnet',
'network': 'dataflow-network',
"machineType": 'n1-standard-4',
'numWorkers': 5,
'tempLocation': 'TEMP_BUCKET_PATH',
'ipConfiguration': 'WORKER_IP_PRIVATE'
},
'jobName': dataflow_name
}
print('series_fulfillment - call_dataflow ' + dataflow_name + ' - lanzando')
request = service.projects().templates().launch(projectId=gcp_project, gcsPath=template_path, body=template_body)
response = request.execute()
return response

Related

AWS Lambda function for AWS Metric using Autoscaling groups

I am in the midst of coding a lambda function which will create an alarm based upon some disk metrics. The code so far looks like this:
import collections
from datetime import datetime
import calendar
def lambda_handler(event, context):
client = boto3.client('cloudwatch')
alarm = client.put_metric_alarm(
AlarmName='Disk Monitor',
MetricName='disk_used_percent',
Namespace='CWAgent',
Statistic='Maximum',
ComparisonOperator='GreaterThanOrEqualToThreshold',
Threshold=60.0,
Period=10,
EvaluationPeriods=3,
Dimensions=[
{
'Name': 'InstanceId',
'Value': '{instance_id}'
},
{
'Name': 'AutoScalingGroupName',
'Value': '{instance_id}'
},
{
'Name': 'fstype',
'Value': 'xfs'
},
{
'Name': 'path',
'Value': '/'
}
],
Unit='Percent',
ActionsEnabled=True)
As seen, {instance_id} is a variable because the idea is that this will be used for every instance. However, I am wondering how I would code the same for AutoScalingGroupName because I require this to be a variable also. I know that that the below pulls out the AutoScalingGroupName for me, but how would I add that to the above block in terms of syntax, is my problem:
aws autoscaling describe-auto-scaling-instances --output text --query "AutoScalingInstances[?InstanceId == '<instance_dets>'].{AutoScalingGroupName:AutoScalingGroupName}"
For example, would I add a block beginning as below:
def lambda_handler(event, context):
client = boto3.client('autoscaling')
And if so, how would I then code what is needed in terms of syntax to get the 'Value': '{AutoScalingGroupName}' by which I mean a variable to hold the ASG?

describe_auto_scaling_instances takes InstanceIds as a parameter. So if you know your instance_id you can find its asg as follows:
client = boto3.client('autoscaling')
response = client.describe_auto_scaling_instances(
InstanceIds=[instance_id])
asg_name = ''
if response['AutoScalingInstances']:
asg_name = response['AutoScalingInstances'][0]['AutoScalingGroupName']
print(asg_name)

Azure python sdk, how to deploy a vm and it's a Azure Spot instance

Azure python sdk,
how to deploy a vm and it's a Azure Spot instance

If you want to create Azure Spot VM, please refer to the following code. For more deatils, please refer to the doucment and the docuemnt
from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.compute.v2019_07_01 import ComputeManagementClient
from azure.mgmt.compute.v2019_07_01.models import VirtualMachinePriorityTypes, VirtualMachineEvictionPolicyTypes, BillingProfile
SUBSCRIPTION_ID = 'subscription-id'
GROUP_NAME = 'myResourceGroup'
LOCATION = 'westus'
VM_NAME = 'myVM'
credentials = ServicePrincipalCredentials(
client_id = 'application-id',
secret = 'authentication-key',
tenant = 'tenant-id'
)
compute_client = ComputeManagementClient(
credentials,
SUBSCRIPTION_ID
)
vm_parameters = {
'location': LOCATION,
'os_profile': {
'computer_name': VM_NAME,
'admin_username': 'azureuser',
'admin_password': 'Azure12345678'
},
'hardware_profile': {
'vm_size': 'Standard_DS1'
},
'storage_profile': {
'image_reference': {
'publisher': 'MicrosoftWindowsServer',
'offer': 'WindowsServer',
'sku': '2012-R2-Datacenter',
'version': 'latest'
}
},
'network_profile': {
'network_interfaces': [{
'id': nic.id
}]
},
'priority':VirtualMachinePriorityTypes.spot, # use Azure spot intance
'eviction_policy':VirtualMachineEvictionPolicyTypes.deallocate , #For Azure Spot virtual machines, the only supported value is 'Deallocate'
'billing_profile': BillingProfile(max_price=float(2))
creation_result = compute_client.virtual_machines.create_or_update(
GROUP_NAME,
VM_NAME,
vm_parameters
)
print(creation_result.result())
}

SNS email notification based on lambda cloudwatch log with custom metrics

I have written a python script to get instance information over email with cron setup and populate metrics as well. With the following code i can see all the logs in cloudwatch logs console. However "dimension" never gets created under cloudwatch events section and not triggering any mail as well.
import boto3
import json
import logging
from datetime import datetime
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def post_metric(example_namespace, example_dimension_name, example_metric_name, example_dimension_value, example_metric_value):
cw_client = boto3.client("cloudwatch")
response = cw_client.put_metric_data(
Namespace=example_namespace,
MetricData=[
{
'MetricName': example_metric_name,
'Dimensions': [
{
'Name': example_dimension_name,
'Value': example_dimension_value
},
],
'Timestamp': datetime.datetime.now(),
'Value': int(example_metric_value)
},
]
)
def lambda_handler(event, context):
logger.info(event)
ec2_client = boto3.client("ec2")
sns_client = boto3.client("sns")
response = ec2_client.describe_instances(
Filters=[
{
'Name': 'tag:Name',
'Values': [
'jenkins-slave-*'
]
}
]
)['Reservations']
for reservation in response:
ec2_instances = reservation["Instances"]
for instance in ec2_instances:
myInstanceId = (instance['InstanceId'])
myInstanceState = (instance['State']['Name'])
myInstance = \
(
{
'InstanceId': (myInstanceId),
'InstanceState': (myInstanceState),
}
)
logger.info(json.dumps(myInstance)
post_metric("Jenkins", "ciname", "orphaned-slaves", myInstanceId, 1)
# Send message to SNS (Testing purpose)
SNS_TOPIC_ARN = 'arn:aws:sns:us-east-1:1234567890:example-instance-alarms'
sns_client.publish(
TopicArn = SNS_TOPIC_ARN,
Subject = 'Instance Info: ' + myInstanceId,
Message = 'Instance id: ' + myInstanceId
)
Can anyone please help if i am missing anything here. Thanks in advance.

You forgot to add required fields such as EvaluationPeriods, AlarmName and etc. to your put_metric_data according to documentation.
You can use this for an example.

Cannot create cluster with properties using the dataproc API

I'm trying to create a cluster programmatically in python:
import googleapiclient.discovery
dataproc = googleapiclient.discovery.build('dataproc', 'v1')
zone_uri ='https://www.googleapis.com/compute/v1/projects/{project_id}/zone/{zone}'.format(
project_id=my_project_id,
zone=my_zone,
)
cluster_data = {
'projectId': my_project_id,
'clusterName': my_cluster_name,
'config': {
'gceClusterConfig': {
'zoneUri': zone_uri
},
'softwareConfig' : {
'properties' : {'string' : {'spark:spark.executor.memory' : '10gb'}},
},
},
}
result = dataproc \
.projects() \
.regions() \
.clusters() \
.create(
projectId=my_project_id,
region=my_region,
body=cluster_data,
) \
.execute()
And I keep getting this error : Invalid JSON payload received. Unknown name "spark:spark.executor.memory" at 'cluster.config.software_config.properties[0].value': Cannot find field.">
The doc of the API is here : https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#SoftwareConfig
Property keys are specified in prefix:property format, such as core:fs.defaultFS.
And even when I change the properties to {'string' : {'core:fs.defaultFS' : 'hdfs://'}}, I get that same error.

Properties is a key/value mapping:
'properties': {
'spark:spark.executor.memory': 'foo'
}
The documentation could have had a better example. In general, the best way to find out what the API looks like is to click "Equivalent REST" in the Cloud Console, or --log-http when using gcloud. For example:
$ gcloud dataproc clusters create clustername --properties spark:spark.executor.memory=foo --log-http
=======================
==== request start ====
uri: https://dataproc.googleapis.com/v1/projects/projectid/regions/global/clusters?alt=json
method: POST
== body start ==
{"clusterName": "clustername", "config": {"gceClusterConfig": {"internalIpOnly": false, "zoneUri": "us-east1-d"}, "masterConfig": {"diskConfig": {}}, "softwareConfig": {"properties": {"spark:spark.executor.memory": "foo"}}, "workerConfig": {"diskConfig": {}}}, "projectId": "projectid"}
== body end ==
==== request end ====

adaccount/reportstats is deprecated for versions v2.4 and higher

I'm trying to follow some examples from Python Facebook Marketing Api but, when I run:
i_async_job = account.get_insights(params={'level': 'adgroup'}, async=True)
r_async_job = account.get_report_stats(
params={
'data_columns': ['adgroup_id'],
'date_preset': 'last_30_days'
},
async=True
)
I'm getting
Status: 400
Response:
{
"error": {
"message": "(#12) adaccount/reportstats is deprecated for versions v2.4 and higher",
"code": 12,
"type": "OAuthException"
}
}
Even from Facebook
I found this page, but there are only curl examples.
Is there a working example on how to get data from Insights edge with the Python Ads API?

Here is a full example of how to export some insights asynchronously from the new Insights endpoints:
from facebookads import test_config as config
from facebookads.objects import *
import time
account_id = <YOUR_ACCOUNT_ID>
account_id = 'act_' + str(account_id)
fields = [
Insights.Field.impressions,
Insights.Field.clicks,
Insights.Field.actions,
Insights.Field.spend,
Insights.Field.campaign_group_name,
]
params = {
'date_preset': Insights.Preset.last_7_days,
'level': Insights.Level.adgroup,
'sort_by': 'date_start',
'sort_dir': 'desc',
}
ad_account = AdAccount(account_id)
job = ad_account.get_insights(fields=fields, params=params, async=True)
insights = None
while insights is None:
time.sleep(1)
job.remote_read()
completition = job[AsyncJob.Field.async_percent_completion]
print("Percent done: " + str(completition))
if int(completition) is 100:
insights = job.get_result(params={'limit': 100})
for ad_insight in insights:
print(ad_insight)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Run Dataflow Pipeline from Google Cloud Function - python

Related

AWS Lambda function for AWS Metric using Autoscaling groups

Azure python sdk, how to deploy a vm and it's a Azure Spot instance

SNS email notification based on lambda cloudwatch log with custom metrics

Cannot create cluster with properties using the dataproc API

adaccount/reportstats is deprecated for versions v2.4 and higher

Categories

Resources