AWS NoCredentials in training

AWS NoCredentials in training - python

I am attempting to run the example code for Amazon Sagemaker on a local GPU. I have copied the code from the Jupyter notebook to the following Python script:
import boto3
import subprocess
import sagemaker
from sagemaker.mxnet import MXNet
from mxnet import gluon
from sagemaker import get_execution_role
import os
sagemaker_session = sagemaker.Session()
instance_type = 'local'
if subprocess.call('nvidia-smi') == 0:
# Set type to GPU if one is present
instance_type = 'local_gpu'
# role = get_execution_role()
gluon.data.vision.MNIST('./data/train', train=True)
gluon.data.vision.MNIST('./data/test', train=False)
# successfully connects and uploads data
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/mnist')
hyperparameters = {
'batch_size': 100,
'epochs': 20,
'learning_rate': 0.1,
'momentum': 0.9,
'log_interval': 100
}
m = MXNet("mnist.py",
role=role,
train_instance_count=1,
train_instance_type=instance_type,
framework_version="1.1.0",
hyperparameters=hyperparameters)
# fails in Docker container
m.fit(inputs)
predictor = m.deploy(initial_instance_count=1, instance_type=instance_type)
m.delete_endpoint()
where the referenced mnist.py file is exactly as specified on Github. The script fails on m.fit in Docker container with the following error:
algo-1-1DUU4_1 | Downloading s3://<S3-BUCKET>/sagemaker-mxnet-2018-10-07-00-47-10-435/source/sourcedir.tar.gz to /tmp/script.tar.gz
algo-1-1DUU4_1 | 2018-10-07 00:47:29,219 ERROR - container_support.training - uncaught exception during training: Unable to locate credentials
algo-1-1DUU4_1 | Traceback (most recent call last):
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
algo-1-1DUU4_1 | fw.train()
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/mxnet_container/train.py", line 169, in train
algo-1-1DUU4_1 | mxnet_env.download_user_module()
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/container_support/environment.py", line 89, in download_user_module
algo-1-1DUU4_1 | cs.download_s3_resource(self.user_script_archive, tmp)
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/container_support/utils.py", line 37, in download_s3_resource
algo-1-1DUU4_1 | script_bucket.download_file(script_key_name, target)
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 246, in bucket_download_file
algo-1-1DUU4_1 | ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 172, in download_file
algo-1-1DUU4_1 | extra_args=ExtraArgs, callback=Callback)
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 307, in download_file
algo-1-1DUU4_1 | future.result()
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/s3transfer/futures.py", line 73, in result
algo-1-1DUU4_1 | return self._coordinator.result()
algo-1-1DUU4_1 | File "/usr/local/lib/python2.7/dist-packages/s3transfer/futures.py", line 233, in result
algo-1-1DUU4_1 | raise self._exception
algo-1-1DUU4_1 | NoCredentialsError: Unable to locate credentials
I am confused that I can authenticate to S3 outside of the container (to pload the training/test data) but I cannot within the Docker container. So I am guessing the issues has to do with passing the AWS credentials to the Docker container. Here is the generated Docker-compose file:
networks:
sagemaker-local:
name: sagemaker-local
services:
algo-1-1DUU4:
command: train
environment:
- AWS_REGION=us-west-2
- TRAINING_JOB_NAME=sagemaker-mxnet-2018-10-07-00-47-10-435
image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet:1.1.0-gpu-py2
networks:
sagemaker-local:
aliases:
- algo-1-1DUU4
stdin_open: true
tty: true
volumes:
- /tmp/tmpSkaR3x/algo-1-1DUU4/input:/opt/ml/input
- /tmp/tmpSkaR3x/algo-1-1DUU4/output:/opt/ml/output
- /tmp/tmpSkaR3x/algo-1-1DUU4/output/data:/opt/ml/output/data
- /tmp/tmpSkaR3x/model:/opt/ml/model
version: '2.1'
Should the AWS credentials be passed in as enviromental variables?
I upgraded my sagemaker install to after reading Using boto3 in install local mode?, but that had no effect. I checked the credentials that are being fetched in the Sagemaker session (outside the container) and they appear to be blank, even though I have an ~/.aws/config and ~/.aws/credentials file:
{'_token': None, '_time_fetcher': <function _local_now at 0x7f4dbbe75230>, '_access_key': None, '_frozen_credentials': None, '_refresh_using': <bound method AssumeRoleCredentialFetcher.fetch_credentials of <botocore.credentials.AssumeRoleCredentialFetcher object at 0x7f4d2de48bd0>>, '_secret_key': None, '_expiry_time': None, 'method': 'assume-role', '_refresh_lock': <thread.lock object at 0x7f4d9f2aafd0>}
I am new to AWS so I do not know how to diagnose the issue regarding AWS credentials. My .aws/config file has the following information (with placeholder values):
[default]
output = json
region = us-west-2
role_arn = arn:aws:iam::123456789012:role/SageMakers
source_profile = sagemaker-test
[profile sagemaker-test]
output = json
region = us-west-2
Where the sagemaker-test profile has AmazonSageMakerFullAccess in the IAM Management Console.
The .aws/credentials file has the following information (represented by placeholder values):
[default]
aws_access_key_id = 1234567890
aws_secret_access_key = zyxwvutsrqponmlkjihgfedcba
[sagemaker-test]
aws_access_key_id = 0987654321
aws_secret_access_key = abcdefghijklmopqrstuvwxyz
Lastly, these are versions of the applicable libraries from a pip freeze:
awscli==1.16.19
boto==2.48.0
boto3==1.9.18
botocore==1.12.18
docker==3.5.0
docker-compose==1.22.0
mxnet-cu91==1.1.0.post0
sagemaker==1.11.1
Please let me know if I left out any relevant information and thanks for any help/feedback that you can provide.
UPDATE: Thanks for your help, everyone! While attempting some of your suggested fixes, I noticed that boto3 was out of date, and update it (to boto3-1.9.26 and botocore-1.12.26) which appeared to resolve the issue. I was not able to find any documentation on that being an issue with boto3==1.9.18. If someone could help me understand what the issue was with boto3, I would happy to make mark their answer as correct.

SageMaker local mode is designed to pick up whatever credentials are available in your boto3 session, and pass them into the docker container as environment variables.
However, the version of the sagemaker sdk that you are using (1.11.1 and earlier) will ignore the credentials if they include a token, because that usually indicates short-lived credentials that won't remain valid long enough for a training job to complete or endpoint to be useful.
If you are using temporary credentials, try replacing them with permanent ones, or running from an ec2 instance (or SageMaker notebook!) that has an appropriate instance role assigned.
Also, the sagemaker sdk's handling of credentials changed in v1.11.2 and later -- temporary credentials will be passed to local mode containers, but with a warning message. So you could just upgrade to a newer version and try again (pip install -U sagemaker).
Also, try upgrading boto3 can change, so try using the latest version.

I just confirmed that his example works on my machine locally. Please make sure the role you are using has permission to use the buckets with name starts with sagemaker. Sagemaker by default creates buckets prefixed with sagemaker.

It looks like you have the credentials configured on your host at ~/.aws/credentials but are trying to access them on a docker container running on the host.
The simplest solution seems to be, mounting your aws credentials on the container at the expected location. You appear to be using the sagemaker-mxnet:1.1.0-gpu-py2 image, which appears to use the root user. Based on this, if you update the volumes in your docker-compose file for the algo-1-1DUU4 to include:
volumes:
...
~/.aws/:/root/.aws/
this will mount your credentials on to the root user in your container, so that your python script should be able to access them.

I'll assume that the library you're using has boto3 at its core. boto3 advises that there are several methods of authentication available to you.
Passing credentials as parameters in the boto.client() method
Passing credentials as parameters when creating a Session object
Environment variables
Shared credential file (~/.aws/credentials)
AWS config file (~/.aws/config)
Assume Role provider
Boto2 config file (/etc/boto.cfg and ~/.boto)
Instance metadata service on an Amazon EC2 instance that has an IAM role configured.
But it sounds like a docker sandbox does not have access to your ~/.aws/credentials.conf file, so I'd consider other options that may be available to you. As I'm unfamiliar with docker, I can't give you a guaranteed solution for your scenario.

Related

Read Azure KeyVault Secret from Function App

This Python script is deployed to run from Azure Function App on Linux Consumption plan, This script is expected to read secrets from Azure Key Vault.
Apart from code deployment, following configurations are made
1.)System Assigned Managed Access Enabled for Azure Function App
2.)Azure Key Vault's Role Assignments Reference this Function App with >Reader role.
Here is the script from > > >init.py
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
# Get url and filename from postman by using POST method
#identity = ManagedIdentityCredential()
credentials = DefaultAzureCredential()
secretClient = SecretClient(vault_url="https://kvkkpbedpdev.vault.azure.net/", credential=credentials)
secret = secretClient.get_secret(name = 'st-cs-kkpb-edp-dev')
This function app requires following libraries and defined in requirements.txt file
azure-functions
azure-keyvault-secrets
azure-identity
This function runs and ends up following exception.
warn: Function.Tide_GetFiles.User[0]
python | SharedTokenCacheCredential.get_token failed: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
python | Traceback (most recent call last):
python | File "/usr/local/lib/python3.8/site-packages/azure/identity/_internal/decorators.py", line 27, in wrapper
python | token = fn(*args, **kwargs)
python | File "/usr/local/lib/python3.8/site-packages/azure/identity/_credentials/shared_cache.py", line 88, in get_token
python | account = self._get_account(self._username, self._tenant_id)
python | File "/usr/local/lib/python3.8/site-packages/azure/identity/_internal/decorators.py", line 45, in wrapper
python | return fn(*args, **kwargs)
python | File "/usr/local/lib/python3.8/site-packages/azure/identity/_internal/shared_token_cache.py", line 166, in _get_account
python | raise CredentialUnavailableError(message=NO_ACCOUNTS)
python | azure.identity._exceptions.CredentialUnavailableError: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
python | info: Function.Tide_GetFiles.User[0]
python | DefaultAzureCredential - SharedTokenCacheCredential is unavailab
and error
fail: Function.Tide_GetFiles[3]
python | Executed 'Functions.Tide_GetFiles' (Failed, Id=9d514a1f-aeae-4625-9379-b2f0bc89f38f, Duration=1673ms)
python | Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Tide_GetFiles
python | ---> Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException: Result: Failure
python | Exception: ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token from the included credentials.
python | Attempted credentials:
python | EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
python | ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
python | SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
how can I figure this

From the error, it seems managed identity is not applied to your Function app correctly. You should be able to see that going to the identity blade of Function app.
Additionally, you should add the required access policy (separate from role assignment in access control) (secret get here) to allow the identity (same name as the app) to access keyvault if you are not using the new preview access control. Refer How to set and get secrets from Azure Key Vault with Azure Managed Identities and Python.
Using the Azure Portal, go to the Key Vault's access policies, and grant required access to the Key Vault.
Search for your Key Vault in “Search Resources dialog box” in Azure Portal.
Select "Overview", and click on Access policies
Click on "Add Access Policy", select required permissions.
Click on "Select Principal", add your account
Save the Access Policies
You can also create an Azure service principal either through
Azure CLI, PowerShell or the portal and grant it the same access.

Polyglot Processor with Local Dataflow Server

I have been trying to work with polyglot and build a simple python processor. I followed the polyglot recipe and I could not get the stream to deploy. I originally deployed the same processor that is used in the example and got the following errors:
Unknown command line arg requested: spring.cloud.stream.bindings.input.destination
Unknown environment variable requested: SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS
Traceback (most recent call last):
File "/processor/python_processor.py", line 10, in
consumer = KafkaConsumer(get_input_channel(), bootstrap_servers=[get_kafka_binder_brokers()])
File "/usr/local/lib/python2.7/dist-packages/kafka/consumer/group.py", line 353, in init
self._client = KafkaClient(metrics=self._metrics, **self.config)
File "/usr/local/lib/python2.7/dist-packages/kafka/client_async.py", line 203, in init
self.cluster = ClusterMetadata(**self.config)
File "/usr/local/lib/python2.7/dist-packages/kafka/cluster.py", line 67, in init
self._bootstrap_brokers = self._generate_bootstrap_brokers()
File "/usr/local/lib/python2.7/dist-packages/kafka/cluster.py", line 71, in _generate_bootstrap_brokers
bootstrap_hosts = collect_hosts(self.config['bootstrap_servers'])
File "/usr/local/lib/python2.7/dist-packages/kafka/conn.py", line 1336, in collect_hosts
host, port, afi = get_ip_port_afi(host_port)
File "/usr/local/lib/python2.7/dist-packages/kafka/conn.py", line 1289, in get_ip_port_afi
host_and_port_str = host_and_port_str.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
Exception AttributeError: "'KafkaClient' object has no attribute '_closed'" in <bound method KafkaClient.del of <kafka.client_async.KafkaClient object at 0x7f8b7024cf10>> ignored
I then attempted to pass the environment and binding arguments through the deployment stream but that did not work. When I manually inserted the SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS and spring.cloud.stream.bindings.input.destination parameter into Kafka's consumer I was able to deploy the stream as a workaround. I am not entirely sure what is causing the issue, would deploying this on Kubernetes be any different or is this an issue with Polyglot and Dataflow? Any help with this would be appreciated.
Steps to reproduce:
Attempt to deploy polyglot-processor stream from polyglot recipe on local dataflow server. I am also using the same stream definition as in the example: http --server.port=32123 | python-processor --reversestring=true | log.
Additional context:
I am attempting to deploy the stream on a local installation of SPDF and Kafka since I had some issues deploying custom python applications with Docker.

The recipe you have posted above expects the SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS environment variable present as part of the server configuration (since the streams are managed via Skipper server, you would need to set this environment variable in your Skipper server configuration).
You can check this documentation on how you can set SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS as environment property in Skipper server deployment.
You can also pass this property as a deployer property when deploying the python-processor stream app. You can refer this documentation on how you can pass deployment property to set the Spring Cloud Stream properties (here the binder configuration property) at the time of stream deployment.

Logging Artifacts from MlFlow on GCS Bucket

I have a running MlFlow server on GCS VM instance. I have created a bucket to log the artifacts.
This is the command I'm running to start the server and for specifying bucket path-
mlflow server --default-artifact-root gs://gcs_bucket/artifacts --host x.x.x.x
But facing this error:
TypeError: stat: path should be string, bytes, os.PathLike or integer, not ElasticNet
Note- The mlflow server is running fine with the specified host alone. The problem is in the way when I'm specifying the storage bucket path.
I have given permission of storage api by using these commands:
gcloud auth application-default login
gcloud auth login
Also, on printing the artifact URI, this is what I'm getting:
mlflow.get_artifact_uri()
Output:
gs://gcs_bucket/artifacts/0/122481bf990xxxxxxxxxxxxxxxxxxxxx/artifacts
So in the above path from where this is coming 0/122481bf990xxxxxxxxxxxxxxxxxxxxx/artifacts and why it's not getting auto-created at gs://gcs_bucket/artifacts
After debugging more, why it's not able to get the local path from VM:
And this error I'm getting on VM:
ARNING:root:Malformed experiment 'mlruns'. Detailed error Yaml file './mlruns/mlruns/meta.yaml' does not exist.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/mlflow/store/tracking/file_store.py", line 197, in list_experiments
experiment = self._get_experiment(exp_id, view_type)
File "/usr/local/lib/python3.6/dist-packages/mlflow/store/tracking/file_store.py", line 256, in _get_experiment
meta = read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
File "/usr/local/lib/python3.6/dist-packages/mlflow/utils/file_utils.py", line 160, in read_yaml
raise MissingConfigException("Yaml file '%s' does not exist." % file_path)
mlflow.exceptions.MissingConfigException: Yaml file './mlruns/mlruns/meta.yaml' does not exist.
Can I get a solution to this and what I'm missing?

I think the main error is from the structure that you want to deploy. For your use case, the structure is suitable that in here. So you miss the URI path which used to store backend metadata. So please install DB SQL(PostgreSQL,...) first, then add the path to --backend-storage-uri.
In case you want to use MlFlow as a model registry and store images on gcs. You can use this structure in here with adding tag --artifacts-only --serve-artifacts
Hope this can help you.

GCP cloud function - Could not find kubectl on the path

i'm writing this Google Cloud Function (Python)
def create_kubeconfig(request):
subprocess.check_output("curl https://sdk.cloud.google.com | bash | echo "" ",stdin=subprocess.PIPE, shell=True )
os.system("./google-cloud-sdk/install.sh")
os.system("gcloud init")
os.system("curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.17.0/bin/linux/amd64/kubectl")
os.system("gcloud container clusters get-credentials **cluster name** --zone us-west2-a --project **project name**")
os.system("gcloud container clusters get-credentials **cluster name** --zone us-west2-a --project **project name**")
conf = KubeConfig()
conf.use_context('**cluster name**')
when i run the code it gives me the error
'Invalid kube-config file. ' kubernetes.config.config_exception.ConfigException: Invalid kube-config file. No configuration found.
help me to solve it please

You have to reach programmatically the K8S API. You have the description of the API in the documentation
But it's not easy and simple to perform. However, here some inputs for achieving what you want.
First, get the GKE master IP
Then you can access to the cluster easily. Here for reading the deployment
import google.auth
from google.auth.transport import requests
credentials, project_id = google.auth.default()
session = requests.AuthorizedSession(credentials)
response = session.get('https://34.76.28.194/apis/apps/v1/namespaces/default/deployments', verify=False)
response.raise_for_status()
print(response.json())
For creating one, you can do this
import google.auth
from google.auth.transport import requests
credentials, project_id = google.auth.default()
session = requests.AuthorizedSession(credentials)
with open("deployment.yaml", "r") as f:
data = f.read()
response = session.post('https://34.76.28.194/apis/apps/v1/namespaces/default/deployments', data=data,
headers={'content-type': 'application/yaml'}, verify=False)
response.raise_for_status()
print(response.json())
According with the object that you want to build, you have to use the correct file definition and the correct API endpoint. I don't know a way to apply a whole yaml with several definition in only one API call.
Last things, be sure to provide the correct GKE roles to the Cloud Function service Account
UPDATE
Another solution is to use Cloud Run. Indeed, with Cloud Run and thanks to the Container capability, you have the ability to install and to call system process (it's totally open because your container runs into a GVisor sandbox, but most of common usages are allowed)
The idea is the following: use a gcloud SDK base image and deploy your application on it. Then, code your app to perform system calls.
Here a working example in Go
Docker file
FROM golang:1.13 as builder
# Copy local code to the container image.
WORKDIR /app/
COPY go.mod .
ENV GO111MODULE=on
RUN go mod download
COPY . .
# Perform test for building a clean package
RUN go test -v ./...
RUN CGO_ENABLED=0 GOOS=linux go build -v -o server
# Gcloud capable image
FROM google/cloud-sdk
COPY --from=builder /app/server /server
CMD ["/server"]
Note: The image cloud-sdk image is heavy: 700Mb
The content example (only the happy path. I remove error management, and the stderr/stdout feedback for simplifying the code)
.......
// Example here: recover the yaml file into a bucket
client,_ := storage.NewClient(ctx)
reader,_ := client.Bucket("my_bucket").Object("deployment.yaml").NewReader(ctx)
content,_:= ioutil.ReadAll(reader)
// You can store locally the file into /tmp directory. It's an in-memory file system. Don't forget to purge it to avoid any out of memory crash
ioutil.WriteFile("/tmp/file.yaml",content, 0644)
// Execute external command
// 1st Recover the kube authentication
exec.Command("gcloud","container","clusters","get-credentials","cluster-1","--zone=us-central1-c").Run()
// Then interact with the cluster with kubectl tools and simply apply your description file
exec.Command("kubectl","apply", "-f","/tmp/file.yaml").Run()
.......

Instead of using gcloud inside the Cloud Function (and attempting to install it on every request, which will significantly increase the runtime of your function), you should use the google-cloud-container client library to make the same API calls directly from Python, for example:
from google.cloud import container_v1
client = container_v1.ClusterManagerClient()
project_id = 'YOUR_PROJECT_ID'
zone = 'YOUR_PROJECT_ZONE'
response = client.list_clusters(project_id, zone)

GCS with GKE, 403 Insufficient permission for writing into GCS bucket [duplicate]

This question already has an answer here:
Is it necessary to recreate a Google Container Engine cluster to modify API permissions?
(1 answer)
Closed 5 years ago.
Currently I'm trying to write files into Google Cloud Storage bucket. For this, I have used django-storages package.
I have deployed my code and I get into the running container through kubernetes kubectl utility to check the working of GCS bucket.
$ kubectl exec -it foo-pod -c foo-container --namespace=testing python manage.py shell
I can able to read the bucket but if I try to write into the bucket, it shows the below traceback.
>>> from django.core.files.storage import default_storage
>>> f = default_storage.open('storage_test', 'w')
>>> f.write('hi')
2
>>> f.close()
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 946, in upload_from_file
client, file_obj, content_type, size, num_retries)
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 867, in _do_upload
client, stream, content_type, size, num_retries)
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload
transport, data, object_metadata, content_type)
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 98, in transmit
self._process_response(result)
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/_upload.py", line 110, in _process_response
response, (http_client.OK,), self._get_status_code)
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/_helpers.py", line 93, in require_status_code
status_code, u'Expected one of', *status_codes)
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/storages/backends/gcloud.py", line 75, in close
self.blob.upload_from_file(self.file, content_type=self.mime_type)
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 949, in upload_from_file
_raise_from_invalid_response(exc)
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1735, in _raise_from_invalid_response
raise exceptions.from_http_response(error.response)
google.api_core.exceptions.Forbidden: 403 POST https://www.googleapis.com/upload/storage/v1/b/foo.com/o?uploadType=multipart: Insufficient Permission
>>> default_storage.url('new docker')
'https://storage.googleapis.com/foo.appspot.com/new%20docker'
>>>
Seems like it was completely related to the bucket permissions. So I have assigned Storage admin , Storage object creator roles to google cloud build service account (through bucket -> manage permissions) but still it shows the same error.

A possible explanation for this would be if you haven't assigned your cluster with the correct scope. If this is the case, the nodes in the cluster would not have the required authorisation/permission to write to Google Cloud Storage which could explain the 403 error you're seeing.
If no scope is set when the cluster is created, the default scope is assigned and this only provides read permission for Cloud Storage.
In order to check the clusters current scopes using Cloud SDK you could try running a 'describe' command from the Cloud Shell, for example:
gcloud container clusters describe CLUSTER-NAME --zone ZONE
The oauthScopes section of the output contains the current scopes assigned to the cluster/nodes.
The default read only Cloud Storage scope would display:
https://www.googleapis.com/auth/devstorage.read_only
If the Cloud Storage read/write scope is set the output will display:
https://www.googleapis.com/auth/devstorage.read_write
The scope can be set during cluster creation using the --scope switch followed by the desired scope identifier. In your case, this would be “storage-rw”. For example, you could run something like:
gcloud container clusters create CLUSTER-NAME --zone ZONE --scopes storage-rw
The storage-rw scope, combined with your service account should then allow the nodes in your cluster to write to Cloud Storage.
Alternatively you if you don't want to recreate the cluster you can create a new node pool with the new desired scopes, then delete your old node pool. See the accepted answer for Is it necessary to recreate a Google Container Engine cluster to modify API permissions? for information on how to achieve this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.