azure batch network_configuration -> "failed to authenticate"

azure batch network_configuration -> "failed to authenticate" - python

I am using the Azure batch service for calculations on ubuntu nodes, and it works fine. Recently I wanted to change the nodes to be on the same subnet, so I will have the possibility to use mpi in the future as well as use NFS for file acces to a common files server also on azure.
But after adding:
network_configuration = batchmodels.NetworkConfiguration(subnet_id=subnet.id)
to my batchmodels.PoolAddParameter I suddenly receive:
{'value': 'Server failed to authenticate the request. Make sure the
value of Authorization header is formed
correctly.\nRequestId:a815194a-8a66-4cb4-847e-60db4ca3ff10\nTime:2017-10-23T15:04:00.3938448Z',
'lang': 'en-US'}
Any ideas to why? Without the network_configuration my pool starts fine...

finally got it to work...
I needed to have the same credentials (and then again not fully) for the two clients in use here. Also I needed to activate batch in the app I have to set up to get credentials... I ended with something like this:
def get_credentials(res):
if res=='mgmt':
r='https://management.core.windows.net/'
elif res=='batch':
r="https://batch.core.windows.net/"
credentials = ServicePrincipalCredentials(
client_id = id,
secret = secret,
tenant = tenant,
resource = r
)
return credentials
network_client = NetworkManagementClient(get_credentials('mgmt'), sub_id)
batch_client = batch.BatchServiceClient( get_credentials('batch'), base_url=batchserviceurl)

You will need to use Azure Active Directory for authenticating with the Batch Service to enable NetworkConfiguration on a pool with Batch Service pool allocation mode accounts (which are the default).

Related

Flask web app on Cloud Run - google.auth.exceptions.DefaultCredentialsError:

I'm hosting a Flask web app on Cloud Run. I'm also using Secret Manager to store Service Account keys. (I previously downloaded a JSON file with the keys)
In my code, I'm accessing the payload then using os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = payload to authenticate. When I deploy the app and try to visit the page, I get an Internal Service Error. Reviewing the logs, I see:
File "/usr/local/lib/python3.10/site-packages/google/auth/_default.py", line 121, in load_credentials_from_file
raise exceptions.DefaultCredentialsError(
google.auth.exceptions.DefaultCredentialsError: File {"
I can access the secret through gcloud just fine with: gcloud secrets versions access 1 --secret="<secret_id>" while acting as the Service Account.
Here is my Python code:
# Grabbing keys from Secret Manager
def access_secret_version():
# Create the Secret Manager client.
client = secretmanager.SecretManagerServiceClient()
# Build the resource name of the secret version.
name = "projects/{project_id}/secrets/{secret_id}/versions/1"
# Access the secret version.
response = client.access_secret_version(request={"name": name})
payload = response.payload.data.decode("UTF-8")
return payload
#app.route('/page/page_two')
def some_random_func():
# New way
payload = access_secret_version() # <---- calling the payload
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = payload
# Old way
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "service-account-keys.json"
I'm not technically accessing a JSON file like I was before. The payload variable is storing entire key. Is this why it's not working?

Your approach is incorrect.
When you run on a Google compute service like Cloud Run, the code runs under the identity of the compute service.
In this case, by default, Cloud Run uses the Compute Engine default service account but, it's good practice to create a Service Account for your service and specify it when you deploy it to Cloud Run (see Service accounts).
This mechanism is one of the "legs" of Application Default Credentials when your code is running on Google Cloud, you don't specify the environment variable (you also don't need to create a key) and Cloud Run service acquires the credentials from the Metadata service:
import google.auth
credentials, project_id = google.auth.default()
See google.auth package
It is bad practice to define|set an environment variable within code. By their nature, environment variables should be provided by the environment. Doing this with APPLICATION_DEFAULT_CREDENTIALS means that your code always sets this value when it should only do this when the code is running off Google Cloud.
For completeness, if you need to create Credentials from a JSON string rather than from a file contain a JSON string, you can use from_service_account_info (see google.oauth2.service_account)

Correct way to connect AWS Secret Manager

I am trying to connect to AWS Secret manage, on local it works fine on the python server it gives "botocore.exceptions.NoCredentialsError: Unable to locate credentials"
error
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name
)
So I had 2 ways to correct this :
First Method:
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name,aws_access_key_id=Xxxxxx,
aws_secret_access_key=xxxxxx
)
Second Method: To have this in a config file (Which will again expose keys)
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name,aws_access_key_id=confg.access,
aws_secret_access_key=confg.key
)
Arent, we exposing our key and access keys in github if we are specifying it here.?
What is the correct way to access Secret Manager without specifying it here

You are correct you shouldn't pass your access key and secret key to any running server or service in AWS to avoid exposing it. On your local machine, it worked because your environment is getting your user's permissions via AWS CLI.
What you need to do for a server is to add to the service role a policy allowing it to access the Secrets Manager, then you won't face permissions issues anymore
On Permissions policy examples - AWS Secrets Manager can find examples of how those policies need to be.
And on Assign an IAM Role to an EC2 Instance you can see how to attach a role with a specific policy to an EC2 instance.

Trigger an Azure Function upon VM creation

I'm looking for a way to trigger an Azure Function which will run some Python code, each time a new virtual machine is created. I have already done the same thing in AWS using CloudWatch + Lambda, but I can't find where/how achieve the same thing in Azure.
I have tried to use Logic App with Event Grid but there is no trigger to monitor VM state.
Anyone could provide me with some guidance here ?
Many thanks in advance.

Azure service don't have built-in method to achieve your requirement, but I think you can achieve this by your own python code. The main logic is to polling the VM names from your subscription and then store the VM names in somewhere, if they changes, post a request to something like 'HttpTrigger' endpoint(Or just put the logic in the polling algorithm.).
And the for the polling algorithm, you can design by yourself or just use the 'TimeTrigger' to achieve.
I notice you add the 'Python' tag, so just use code like below and put them inside a polling algorithm:
import requests
from azure.identity import ClientSecretCredential
import json
client_id = 'xxx'
tenant_id = 'xxx'
client_secret = 'xxx'
subscription_id = 'xxx'
credential = ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)
accesstoken = str(credential.get_token('https://management.azure.com/.default'))[19:1287]
bearertoken = "Bearer "+accesstoken
r = requests.get("https://management.azure.com/subscriptions/"+subscription_id+"/resources?$filter=resourceType eq 'Microsoft.Compute/virtualMachines'&api-version=2020-06-01",headers={'Authorization': bearertoken})
items = json.loads(r.text)
print(r.text)
for item in items['value']:
print(item['name'])#This line is print, you need to store this in some place such as database, azure blob storage, azure table storage etc.
#check the VM names here. If some VM been added, post a request to the HttpTrigger function.
If you use azure function 'Time Trigger' instead of self-designed algorithm, then you can store the client id, tenent id, client_secret and subscription id to the keyvault and then let your function app configuration settings refer to the keyvault, this will make it safe.
Above code is based on AAD bearer token, you need to create a AAD App and let it have the 'Owner' RBAC role of the subscription. You need to something like this:
This just like a 'custom trigger' that trigger by the VM created in your 'subscription'. And I think your VM will not be many, so it will not consume much computing resources.

Trying to connect to Google cloud storage (GCS) using python

I've build the following script:
import boto
import sys
import gcs_oauth2_boto_plugin
def check_size_lzo(ds):
# URI scheme for Cloud Storage.
CLIENT_ID = 'myclientid'
CLIENT_SECRET = 'mysecret'
GOOGLE_STORAGE = 'gs'
dir_file= 'date_id={ds}/apollo_export_{ds}.lzo'.format(ds=ds)
gcs_oauth2_boto_plugin.SetFallbackClientIdAndSecret(CLIENT_ID, CLIENT_SECRET)
uri = boto.storage_uri('my_bucket/data/apollo/prod/'+ dir_file, GOOGLE_STORAGE)
key = uri.get_key()
if key.size < 45379959:
raise ValueError('umg lzo file is too small, investigate')
else:
print('umg lzo file is %sMB' % round((key.size/1e6),2))
if __name__ == "__main__":
check_size_lzo(sys.argv[1])
It works fine locally but when I try and run on kubernetes cluster I get the following error:
boto.exception.GSResponseError: GSResponseError: 403 Access denied to 'gs://my_bucket/data/apollo/prod/date_id=20180628/apollo_export_20180628.lzo'
I have updated the .boto file on my cluster and added my oauth client id and secret but still having the same issue.
Would really appreciate help resolving this issue.
Many thanks!

If it works in one environment and fails in another, I assume that you're getting your auth from a .boto file (or possibly from the OAUTH2_CLIENT_ID environment variable), but your kubernetes instance is lacking such a file. That you got a 403 instead of a 401 says that your remote server is correctly authenticating as somebody, but that somebody is not authorized to access the object, so presumably you're making the call as a different user.
Unless you've changed something, I'm guessing that you're getting the default Kubernetes Engine auth, with means a service account associated with your project. That service account probably hasn't been granted read permission for your object, which is why you're getting a 403. Grant it read/write permission for your GCS resources, and that should solve the problem.
Also note that by default the default credentials aren't scoped to include GCS, so you'll need to add that as well and then restart the instance.

Azure python sdk - getting the machine state

Using the python api for azure, I want to get the state of one of my machines.
I can't find anywhere to access this information.
Does someone know?
After looking around, I found this:
get_with_instance_view(resource_group_name, vm_name)
https://azure-sdk-for-python.readthedocs.org/en/latest/ref/azure.mgmt.compute.computemanagement.html#azure.mgmt.compute.computemanagement.VirtualMachineOperations.get_with_instance_view

if you are using the legacy api (this will work for classic virtual machines), use
from azure.servicemanagement import ServiceManagementService
sms = ServiceManagementService('your subscription id', 'your-azure-certificate.pem')
your_deployment = sms.get_deployment_by_name('service name', 'deployment name')
for role_instance in your_deployment.role_instance_list:
print role_instance.instance_name, role_instance.instance_status
if you are using the current api (will not work for classic vm's), use
from azure.common.credentials import UserPassCredentials
from azure.mgmt.compute import ComputeManagementClient
import retry
credentials = UserPassCredentials('username', 'password')
compute_client = ComputeManagementClient(credentials, 'your subscription id')
#retry.retry(RuntimeError, tries=3)
def get_vm(resource_group_name, vm_name):
'''
you need to retry this just in case the credentials token expires,
that's where the decorator comes in
this will return all the data about the virtual machine
'''
return compute_client.virtual_machines.get(
resource_group_name, vm_name, expand='instanceView')
#retry.retry((RuntimeError, IndexError,), tries=-1)
def get_vm_status(resource_group_name, vm_name):
'''
this will just return the status of the virtual machine
sometime the status may be unknown as shown by the azure portal;
in that case statuses[1] doesn't exist, hence retrying on IndexError
also, it may take on the order of minutes for the status to become
available so the decorator will bang on it forever
'''
return compute_client.virtual_machines.get(resource_group_name, vm_name, expand='instanceView').instance_view.statuses[1].display_status

If you are using Azure Cloud Services, you should use the Role Environment API, which provides state information regarding the current instance of your current service instance.
https://msdn.microsoft.com/en-us/library/azure/microsoft.windowsazure.serviceruntime.roleenvironment.aspx

In the new API resource manager
There's a function:
get_with_instance_view(resource_group_name, vm_name)
It's the same function as get machine, but it also returns an instance view that contains the machine state.
https://azure-sdk-for-python.readthedocs.org/en/latest/ref/azure.mgmt.compute.computemanagement.html#azure.mgmt.compute.computemanagement.VirtualMachineOperations.get_with_instance_view

Use this method get_deployment_by_name to get the instances status:
subscription_id = '****-***-***-**'
certificate_path = 'CURRENT_USER\\my\\***'
sms = ServiceManagementService(subscription_id, certificate_path)
result=sms.get_deployment_by_name("your service name","your deployment name")
You can get instance status via "instance_status" property.
Please see this post https://stackoverflow.com/a/31404545/4836342

As mentioned in other answers the Azure Resource Manager API has an instance view query to show the state of running VMs.
The documentation listing for this is here: VirtualMachineOperations.get_with_instance_view()
Typical code to get the status of a VM is something like this:
resource_group = "myResourceGroup"
vm_name = "myVMName"
creds = azure.mgmt.common.SubscriptionCloudCredentials(…)
compute_client = azure.mgmt.compute.ComputeManagementClient(creds)
vm = compute_client.virtual_machines.get_with_instance_view(resource_group, vm_name).virtual_machine
# Index 0 is the ProvisioningState, index 1 is the Instance PowerState, display_status will typically be "VM running, VM stopped, etc.
vm_status = vm.instance_view.statuses[1].display_status

There is no direct way to get the state of a virtual machine while listing them.
But, we can list out the vms by looping into them to get the instance_view of a machine and grab its power state.
In the code block below, I am doing the same and dumping the values into a .csv file to make a report.
import csv
from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.compute import ComputeManagementClient
def get_credentials():
subscription_id = "*******************************"
credential = ServicePrincipalCredentials(
client_id="*******************************",
secret="*******************************",
tenant="*******************************"
)
return credential, subscription_id
credentials, subscription_id = get_credentials()
# Initializing compute client with the credentials
compute_client = ComputeManagementClient(credentials, subscription_id)
resource_group_name = "**************"
json_list = []
json_object = {"Vm_name": "", "Vm_state": "", "Resource_group": resource_group_name}
# listing out the virtual machine names
vm_list = compute_client.virtual_machines.list(resource_group_name=resource_group_name)
# looping inside the list of virtual machines, to grab the state of each machine
for i in vm_list:
vm_state = compute_client.virtual_machines.instance_view(resource_group_name=resource_group_name, vm_name=i.name)
json_object["Vm_name"] = i.name
json_object["Vm_state"] = vm_state.statuses[1].code
json_list.append(json_object)
csv_columns = ["Vm_name", "Vm_state", "Resource_group"]
f = open("vm_state.csv", 'w+')
csv_file = csv.DictWriter(f, fieldnames=csv_columns)
csv_file.writeheader()
for i in json_list:
csv_file.writerow(i)
To grab the state of a single virtual machine, where you know its resource_group_name and vm_name, just use the block below.
vm_state = compute_client.virtual_machines.instance_view(resource_group_name="foo_rg_name", vm_name="foo_vm_name")
power_state = vm_state.statuses[1].code
print(power_state)

As per the new API reference, this worked for me
vm_status = compute_client.virtual_machines.instance_view(GROUP_NAME, VM_NAME).statuses[1].code
it will return any one of these states, based on the current state
"PowerState/stopped", "PowerState/running","PowerState/stopping", "PowerState/starting"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

azure batch network_configuration -> "failed to authenticate" - python

You will need to use Azure Active Directory for authenticating with the Batch Service to enable NetworkConfiguration on a pool with Batch Service pool allocation mode accounts (which are the default).

Related

Flask web app on Cloud Run - google.auth.exceptions.DefaultCredentialsError:

Correct way to connect AWS Secret Manager

Trigger an Azure Function upon VM creation

Trying to connect to Google cloud storage (GCS) using python

Azure python sdk - getting the machine state

Categories

Resources