I am writing a script to pull back metrics for ELBv2 (Network LB) using Boto3 but it just keeps returning empty datapoints. I have read through the AWS and Boto docs, and scoured here for answers but nothing seems to be correct. I am aware CW likes everything the be exact and so I have played with different dimensions, different time windows, datapoint periods, different metrics, with and without specifying units etc to no avail.
My script is here:-
#!/usr/bin/env python
import boto3
from pprint import pprint
from datetime import datetime
from datetime import timedelta
def initialize_client():
client = boto3.client(
'cloudwatch',
region_name='eu-west-1'
)
return client
def request_metric(client):
response = client.get_metric_statistics(
Namespace='AWS/NetworkELB',
Period=300,
StartTime=datetime.utcnow() - timedelta(days=5),
EndTime=datetime.utcnow() - timedelta(days=1),
MetricName='NewFlowCount',
Statistics=['Sum'],
Dimensions=[
{
'Name': 'LoadBalancer',
'Value': 'net/nlb-name/1111111111'
},
{
'Name': 'AvailabilityZone',
'Value': 'eu-west-1a'
}
],
)
return response
def main():
client = initialize_client()
response = request_metric(client)
pprint(response['Datapoints'])
return 0
main()
Related
import subprocess
import datetime
StartTime=datetime.datetime.utcnow() - datetime.timedelta(hours=1)
EndTime=datetime.datetime.utcnow()
instances = ['i-xxx1', 'i-xxx2']
list_files = subprocess.run(["aws", "cloudwatch", "get-metric-statistics", "--metric-name", "CPUUtilization", "--start-time", StartTime, "--end-time", EndTime, "--period", "300", "--namespace", "AWS/EC2", "--statistics", "Maximum", "--dimensions", "Name=InstanceId,#call the instances#"])
print("The exit code was: %d" % list_files.returncode)
Quick and dirty code. How do I loop from subprocess.run from instances list and print the results also in loop? Having issue also in calling the datetime from StartTime and Endtime format.
Thank you
It is recommended to use the boto3 library to call AWS from python. It is fairly easy to translate cli commands to boto3 commands.
list_files = subprocess.run(["aws", "cloudwatch", "get-metric-statistics", "--metric-name", "CPUUtilization", "--start-time", StartTime, "--end-time", EndTime, "--period", "300", "--namespace", "AWS/EC2", "--statistics", "Maximum", "--dimensions", "Name=InstanceId,#call the instances#"])
Instead of the above, you can run the following:
import boto3
client = boto3.client('cloudwatch')
list_files = client.get_metric_statistics(
MetricName='CPUUtilization',
StartTime=StartTime, # These should be datetime objects
EndTime=EndTime, # These should be datetime objects
Period=300,
Namespace='AWS/EC2',
Statistics=['Maximum'],
Dimensions=[
{
'Name': 'InstanceId',
'Value': '#call the instances#'
}
]
You can run help(client.get_metric_statistics) to get detailed information about the function. The boto3 library is pretty well-documented. The response structure and syntax is also documented on there.
I want to delete the snapshot which is 10 days older in GCP using python. I tried using the below program using filter expression, but unfortunately i faced below errors
from datetime import datetime
from googleapiclient import discovery
import google.oauth2.credentials
from oauth2client.service_account import ServiceAccountCredentials
import sys
def get_disks(project,zone):
credentials = ServiceAccountCredentials.from_json_keyfile_name(r"D:\Users\ganeshb\Desktop\Json\auth.json",
scopes='https://www.googleapis.com/auth/compute')
service = discovery.build('compute', 'v1',credentials=credentials)
request = service.snapshots().list(project='xxxx',FILTER="creationTimestamp<'2021-05-31'")
response = request.execute()
print (response)
output = get_disks("xxxxxxxx", "europe-west1-b")
Your problem is a known Google Cloud bug.
Please read these issue trackers: 132365111 and 132676194
Solution:
Remove the filter statement and process the returned results:
from datetime import datetime
from dateutil import parser
request = service.snapshots().list(project=project)
response = request.execute()
# Watch for timezone issues here!
filter_date = '2021-05-31'
d1 = parser.parse(filter_date)
for item in response['items']:
d2 = datetime.fromisoformat(item['creationTimestamp'])
if d2.timestamp() < d1.timestamp():
# Process the result here. This is a print statement stub.
print("{} {}".format(item['name'], item['creationTimestamp']))
I have created a function in AWS lambda which looks like this:
import boto3
import numpy as np
import pandas as pd
import s3fs
from io import StringIO
def test(event=None, context=None):
# creating a pandas dataframe from an api
# placing 2 csv files in S3 bucket
This function queries an external API and places 2 csv files in S3 bucket. I want to trigger this function in Airflow, I have found this code:
import boto3, json, typing
def invokeLambdaFunction(*, functionName:str=None, payload:typing.Mapping[str, str]=None):
if functionName == None:
raise Exception('ERROR: functionName parameter cannot be NULL')
payloadStr = json.dumps(payload)
payloadBytesArr = bytes(payloadStr, encoding='utf8')
client = boto3.client('lambda')
response = client.invoke(
FunctionName=test,
InvocationType="RequestResponse",
Payload=payloadBytesArr
)
return response
if __name__ == '__main__':
payloadObj = {"something" : "1111111-222222-333333-bba8-1111111"}
response = invokeLambdaFunction(functionName='test', payload=payloadObj)
print(f'response:{response}')
But as I understand this code snippet does not connect to the S3. Is this the right approach to trigger AWS Lambda function from Airflow or there is a better way?
I would advice to use the AwsLambdaHook:
https://airflow.apache.org/docs/stable/_api/airflow/contrib/hooks/aws_lambda_hook/index.html#module-airflow.contrib.hooks.aws_lambda_hook
And you can check a test showing its usage to trigger a lambda function:
https://github.com/apache/airflow/blob/master/tests/providers/amazon/aws/hooks/test_lambda_function.py
I have a very small Flask app that looks very much like this:
Point = namedtuple('Point', ['lat', 'lng', 'alt'])
p1 = Point(38.897741, -77.036450, 20)
def create_app():
app = flask.Flask(__name__)
#app.route('/position')
def position():
return flask.jsonify({
'vehicle': p1,
})
return app
It exists only to feed position data to a web ui. I was expecting that the Point namedtuple would be rendered as a JSON array, but to my surprise I was getting:
{
"vehicle": {
"alt": 20,
"lat": 38.897741,
"lng": -77.03645
}
}
...which, you know, that's fine. I can work with that. But then I was writing some unit tests, which look something like this:
from unittest import TestCase
import json
import tupletest
class TupleTestTest(TestCase):
def setUp(self):
_app = tupletest.create_app()
_app.config['TESTING'] = True
self.app = _app.test_client()
def test_position(self):
rv = self.app.get('/position')
assert rv.status_code == 200
assert rv.mimetype == 'application/json'
data = json.loads(rv.get_data())
assert data['vehicle']['lat'] = 38.897741
...and they failed, because suddenly I wasn't get dictionaries:
> assert data['vehicle']['lat'] == 38.897741
E TypeError: list indices must be integers, not str
And indeed, if in the test I wrote the return value out to a file I had:
{
"vehicle": [
38.897741,
-77.03645,
20
]
}
What.
What is going on here? I can't even reproduce this for the purposes of this question; the unit test above renders dictionaries. As does my actual webapp, when it is running, but not when it's being tested. But on another system I appear to be getting arrays from the actual app.
Looking at the sourcecode, this is in the flask's jsonify.py:
# Use the same json implementation as itsdangerous on which we
# depend anyways.
from itsdangerous import json as _json
and in the itstangerous.py there is:
try:
import simplejson as json
except ImportError:
import json
The simplejson library has an option namedtuple_as_object which is enabled by default.
So when the 3rd party simplejson is installed, the app uses it and serializes a namedtuple to a JSON object (a dict in Python).
On systems where that library is not installed, the app falls back to standard json and serializes a namedtuple to an array (list).
But if the simplejson is installed and imported by flask, the test program imports directly the standard json overwriting it and thus changing the behaviour between running and testing.
I'm trying to upload a local CSV to google big query using python
def uploadCsvToGbq(self,table_name):
load_config = {
'destinationTable': {
'projectId': self.project_id,
'datasetId': self.dataset_id,
'tableId': table_name
}
}
load_config['schema'] = {
'fields': [
{'name':'full_name', 'type':'STRING'},
{'name':'age', 'type':'INTEGER'},
]
}
load_config['sourceFormat'] = 'CSV'
upload = MediaFileUpload('sample.csv',
mimetype='application/octet-stream',
# This enables resumable uploads.
resumable=True)
start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = bigquery.jobs.insert(
projectId=self.project_id,
body={
'jobReference': {
'jobId': job_id
},
'configuration': {
'load': load_config
}
},
media_body=upload).execute()
return result
when I run this it throws error like
"NameError: global name 'MediaFileUpload' is not defined"
whether any module is needed please help.
One of easiest method to upload to csv file in GBQ is through pandas.Just import csv file to pandas (pd.read_csv()). Then from pandas to GBQ (df.to_gbq(full_table_id, project_id=project_id)).
import pandas as pd
import csv
df=pd.read_csv('/..localpath/filename.csv')
df.to_gbq(full_table_id, project_id=project_id)
Or you can use client api
from google.cloud import bigquery
import pandas as pd
df=pd.read_csv('/..localpath/filename.csv')
client = bigquery.Client()
dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table('new_table')
client.load_table_from_dataframe(df, table_ref).result()
pip install --upgrade google-api-python-client
Then on top of your python file write:
from googleapiclient.http import MediaFileUpload
But care you miss some parenthesis. Better write:
result = bigquery.jobs().insert(projectId=PROJECT_ID, body={'jobReference': {'jobId': job_id},'configuration': {'load': load_config}}, media_body=upload).execute(num_retries=5)
And by the way, you are going to upload all your CSV rows, including the top one that defines columns.
The class MediaFileUpload is in http.py. See https://google-api-python-client.googlecode.com/hg/docs/epy/apiclient.http.MediaFileUpload-class.html