How to upload files to GCS from a python script in GCP? - python

I'm trying to upload a file into GCS, but I'm running into a permission issue which I'm not sure how to resolve. Reading a file from a bucket in GCS doesn't seem to be an issue. However, I'm getting issues for upload.
client = storage.Client()
bucket = client.get_bucket('fda-drug-label-data')
blob = bucket.get_blob(f'fda-label-doc-links.csv')
bt = blob.download_as_string()
s = str(bt, 'utf-8')
s = StringIO(s)
df = pd.read_csv(s)
df_doc_links = list(df['Link'])
a = pd.DataFrame([len(df_doc_links)])
a.to_csv('test.csv', index=False)
client = storage.Client()
bucket = client.get_bucket('fda-drug-label-data')
blob = bucket.blob('test.csv')
blob.upload_from_filename('test.csv')
This is the message I'm getting:
Traceback (most recent call last): File "/home/.../.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1567, in upload_from_file
if_metageneration_not_match, File "/home/.../.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1420, in _do_upload
if_metageneration_not_match, File "/home/.../.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1098, in _do_multipart_upload
response = upload.transmit(transport, data, object_metadata, content_type) File "/home/.../.local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 108, in transmit
self._process_response(response) File "/home/.../.local/lib/python3.7/site-packages/google/resumable_media/_upload.py", line 109, in _process_response
_helpers.require_status_code(response, (http_client.OK,), self._get_status_code) File "/home/.../.local/lib/python3.7/site-packages/google/resumable_media/_helpers.py", line 96, in require_status_code
*status_codes google.resumable_media.common.InvalidResponse: ('Request failed with status code', 403, 'Expected one of', <HTTPSta tus.OK: 200>) During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "scrape.py", line 134, in <module>
blob.upload_from_filename('test.csv') File "/home/.../.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1655, in upload_from_filename
if_metageneration_not_match=if_metageneration_not_match, File "/home/.../.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1571, in upload_from_file
_raise_from_invalid_response(exc) File "/home/.../.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2620, in _raise_from_invalid_response
raise exceptions.from_http_status(response.status_code, message, response=response) google.api_core.exceptions.Forbidden: 403 POST https://storage.googleapis.com/upload/storage/v1/b/fda-drug-label-da ta/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)

You don't have permission to upload to the data in your service account.Go to IAM and Admin section and under service accounts assign permission role to your account.After that generate the KEY again.

Related

pymongo: Resolver configuration could not be read or specified no nameservers

It's my first time using MongoDB but I can't seem to fix this one issue, my friend who uses MongoDB doesn't know how to use python so he can't really help me.
Here's my code:
import pymongo
# Replace the uri string with your MongoDB deployment's connection string.
conn_str = "mongodb+srv://sqdnoises:{mypass}#sqd.d4kjb.mongodb.net/myFirstDatabase?retryWrites=true&w=majority"
# set a 5-second connection timeout
client = pymongo.MongoClient(conn_str, serverSelectionTimeoutMS=5000)
try:
print(client.server_info())
print('\n\n\n aka connected')
except Exception:
print("Unable to connect to the server.")
Where {mypass} is my MongoDB password.
I keep getting this error:
Traceback (most recent call last):
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/dns/resolver.py", line 782, in read_resolv_conf
f = stack.enter_context(open(f))
FileNotFoundError: [Errno 2] No such file or directory: '/etc/resolv.conf'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/srv_resolver.py", line 88, in _resolve_uri
results = _resolve('_' + self.__srv + '._tcp.' + self.__fqdn,
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/srv_resolver.py", line 41, in _resolve
return resolver.resolve(*args, **kwargs)
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/dns/resolver.py", line 1305, in resolve
return get_default_resolver().resolve(qname, rdtype, rdclass, tcp, source,
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/dns/resolver.py", line 1278, in get_default_resolver
reset_default_resolver()
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/dns/resolver.py", line 1290, in reset_default_resolver
default_resolver = Resolver()
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/dns/resolver.py", line 734, in __init__
self.read_resolv_conf(filename)
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/dns/resolver.py", line 785, in read_resolv_conf
raise NoResolverConfiguration
dns.resolver.NoResolverConfiguration: Resolver configuration could not be read or specified no nameservers.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/storage/emulated/0/! workspace/mongolearn/main.py", line 7, in <module>
client = pymongo.MongoClient(conn_str, serverSelectionTimeoutMS=5000)
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/mongo_client.py", line 677, in __init__
res = uri_parser.parse_uri(
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/uri_parser.py", line 532, in parse_uri
nodes = dns_resolver.get_hosts()
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/srv_resolver.py", line 119, in get_hosts
_, nodes = self._get_srv_response_and_hosts(True)
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/srv_resolver.py", line 99, in _get_srv_response_and_hosts
results = self._resolve_uri(encapsulate_errors)
File "/data/data/com.termux/files/usr/lib/python3.9/site-packages/pymongo/srv_resolver.py", line 95, in _resolve_uri
raise ConfigurationError(str(exc))
pymongo.errors.ConfigurationError: Resolver configuration could not be read or specified no nameservers.
How do I fix this?
I am following https://docs.mongodb.com/drivers/pymongo/
indeed, the problem is that dnspython tries to open /etc/resolv.conf
import dns.resolver
dns.resolver.default_resolver=dns.resolver.Resolver(configure=False)
dns.resolver.default_resolver.nameservers=['8.8.8.8']
Just adde this code to the top of your main code, and that should be sufficient to get you past this hurdle..

PermissionDenied: 403 error when I'm using Google Cloud Billing Budget API

I'm using Google cloud functions and the Cloud Billing Budget API to get a list with all of my budgets, but I'm having the following error:
Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable return callable_(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.PERMISSION_DENIED details = "The caller does not have permission"
debug_error_string = "{"created":"#9627456.9324530376","description":"Error received from peer ipv4:54.128.19.5:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"The caller does not have permission","grpc_status":7}" > The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker_v2.py", line 449, in run_background_function _function_handler.invoke_user_function(event_object) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker_v2.py", line 268, in invoke_user_function return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker_v2.py", line 265, in call_user_function event_context.Context(**request_or_event.context)) File "/user_code/main.py", line 22, in getting_data all_budgets = client.list_budgets(request = {'parent': BILLING_ACCOUNT}) File "/env/local/lib/python3.7/site-packages/google/cloud/billing/budgets_v1/services/budget_service/client.py", line 693, in list_budgets response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,) File "/env/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__ return wrapped_func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py", line 290, in retry_wrapped_func on_error=on_error, File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py", line 188, in retry_target return target() File "/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "<string>", line 3, in raise_from google.api_core.exceptions.PermissionDenied: 403 The caller does not have permission
What I've done is give the appropriate permissions (billing.budget.list ... billing.budget.get ... etc ) at the organization level to the service account of the function, but it does not work.
My code is this:
#main.py
import os
import get_budgets
from google.cloud.billing import budgets
def getting_data(data, context):
BILLING_ACCOUNT = 'billingAccounts/XXXXXX-XXXXXX-XXXXXX'
client = budgets.BudgetServiceClient()
all_budgets = client.list_budgets(request = {'parent': BILLING_ACCOUNT})
budget_actions.budget_list(all_budgets)
#get_budgets.py
from google.cloud.billing import budgets
from googleapiclient import discovery
#BUDGET LIST
def budget_list(all_budgets):
print('Budget summary')
for budget in all_budgets:
print(f'Name: {budget.display_name}')
b_amount = budget.amount
if 'specified_amount' in b_amount:
print(f'Specified Amount: {b_amount.specified_amount.units} {b_amount.specified_amount.currency_code}')
if 'last_period_amount' in b_amount:
print('Dynamic spend (based on last period)')
print('')
Is there something that I've forgotten?
The solution to my problem was easier than I expected.
What I've done is add the service account as a member of the billing account.
Here is a quick video from Google of how to do it
https://www.youtube.com/watch?v=Vti0OGQfLHQ
And in the comments of my question, you can find additional details.

Python : upload my own files into my drive using Pydrive library

I want to upload my file into my drive. However in Pydrive Documentation I found only upload() function that uploads a file created by drive.CreateFile() function and update it, and not the file in my hard drive (my own file).
file1 = drive.CreateFile({'title': 'Hello.txt'}) # Create GoogleDriveFile
instance with title 'Hello.txt'.
file1.SetContentString('Hello World!') # Set content of the file from given
string.
file1.Upload()
I've tried the ansewers of my question here in stackoverflow, but an error accured . here is my code :
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
#1st authentification
gauth = GoogleAuth()
gauth.LocalWebserverAuth() # Creates local webserver and auto handles
#authentication.
drive = GoogleDrive(gauth)
file1 = drive.CreateFile(metadata={"title": "big.txt"})
file1.SetContentFile('big.txt')
file1.Upload()
The file "big.txt" is in the same folder of my code file.
When I run it, I got this traceback:
Traceback (most recent call last):
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pydrive\files.py", line 369, in _FilesInsert
http=self.http)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\oauth2client\_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\googleapiclient\http.py", line 813, in execute
_, body = self.next_chunk(http=http, num_retries=num_retries)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\oauth2client\_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\googleapiclient\http.py", line 981, in next_chunk
return self._process_response(resp, content)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\googleapiclient\http.py", line 1012, in _process_response
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting
https://www.googleapis.com/upload/drive/v2/files?
alt=json&uploadType=resumable returned "Bad Request">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/**/AppData/Local/Programs/Python/Python36-
32/quickstart.py", line 13, in <module>
file1.Upload()
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pydrive\files.py", line 285, in Upload
self._FilesInsert(param=param)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pydrive\auth.py", line 75, in _decorated
return decoratee(self, *args, **kwargs)
File "C:\Users\**\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pydrive\files.py", line 371, in _FilesInsert
raise ApiRequestError(error)
pydrive.files.ApiRequestError: <HttpError 400 when requesting
https://www.googleapis.com/upload/drive/v2/files?
alt=json&uploadType=resumable returned "Bad Request">
You have to set the content with SetContentFile() instead of SetContentString():
file1 = drive.CreateFile({'title': 'Hello.txt'})
file1.SetContentFile(path_to_your_file)
file1.Upload()
As the documentation states, if you haven't set the title and mimeType they will be set automatically from the name and type of the file your give. Therefore if you want to upload the file with the same name it already has on your computer you can do:
file1 = drive.CreateFile()
file1.SetContentFile(path_to_your_file)
file1.Upload()
Regarding your second point, as far as I'm aware GDrive can not convert a file to a different format.
Based on the documentation of PyDrive, I would say, you need to do the following:
file_path = "path/to/your/file.txt"
file1 = drive.CreateFile()
file1.SetContentFile(file_path)
file1.Upload()
Title and content type metadata are set automatically based on the provided file path. If you want to provide a different filename, pass it to CreateFile() like this:
file1 = drive.CreateFile(metadata={"title": "CustomFileName.txt"})

Python Azure SDK - Unable to download zip file

I am using the latest version of Azure Storgae SDK on Python 3.5.2.
I want to download a zip file from a blob on Azure storage cloud.
My Code:
self.azure_service= BlockBlobService(account_name = ACCOUNT_NAME,
account_key = KEY)
with open(local_path, "wb+") as f:
self.azure_service.get_blob_to_stream(blob_container,
file_cloud_path,
f)
The Error:
AzureException: ('Received response with content-encoding: gzip, but failed to decode it.,, error('Error -3 while decompressing data: incorrect header check',))
The error is probably coming from the requests package and i don't seem to have access for changing the headers or something like that.
What exactly is the problem and how can i fix it?
Just as summary,I tried to verify the above exception with Microsoft Azure Storage Explorer Tool.
When user upload a zip type file , if set the EncodingType property for gzip.
at the time of download the client will check whether the file type can be to depressed to EncodingType , if dismatch will occur the exception as below:
Traceback (most recent call last):
File "D:\Python35\lib\site-packages\urllib3\response.py", line 266, in _decode
data = self._decoder.decompress(data)
File "D:\Python35\lib\site-packages\urllib3\response.py", line 66, in decompress
return self._obj.decompress(data)
zlib.error: Error -3 while decompressing data: incorrect header check
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python35\lib\site-packages\requests\models.py", line 745, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "D:\Python35\lib\site-packages\urllib3\response.py", line 436, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "D:\Python35\lib\site-packages\urllib3\response.py", line 408, in read
data = self._decode(data, decode_content, flush_decoder)
File "D:\Python35\lib\site-packages\urllib3\response.py", line 271, in _decode
"failed to decode it." % content_encoding, e)
urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python35\lib\site-packages\azure\storage\storageclient.py", line 222, in _perform_request
response = self._httpclient.perform_request(request)
File "D:\Python35\lib\site-packages\azure\storage\_http\httpclient.py", line 114, in perform_request
proxies=self.proxies)
File "D:\Python35\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "D:\Python35\lib\site-packages\requests\sessions.py", line 658, in send
r.content
File "D:\Python35\lib\site-packages\requests\models.py", line 823, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "D:\Python35\lib\site-packages\requests\models.py", line 750, in generate
raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonWorkSpace/AzureStorage/BlobStorage/CreateContainer.py", line 20, in <module>
f)
File "D:\Python35\lib\site-packages\azure\storage\blob\baseblobservice.py", line 1932, in get_blob_to_stream
_context=operation_context)
File "D:\Python35\lib\site-packages\azure\storage\blob\baseblobservice.py", line 1659, in _get_blob
operation_context=_context)
File "D:\Python35\lib\site-packages\azure\storage\storageclient.py", line 280, in _perform_request
raise ex
File "D:\Python35\lib\site-packages\azure\storage\storageclient.py", line 252, in _perform_request
raise AzureException(ex.args[0])
azure.common.AzureException: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
Process finished with exit code 1
Solution:
As #Gaurav Mantri sail, you could set the EncodingType property to None or ensure that the EncodingType setting matches the type of the file itself.
Also,you could refer to the SO thread python making POST request with JSON data.

python and boto : 403 accessDenied

import boto
from boto.s3.connection import S3Connection
from boto.s3.connection import OrdinaryCallingFormat
conn = S3Connection(access_key, secret_key, calling_format=OrdinaryCallingFormat())
bucket = conn.get_bucket(file_name)
print(bucket.name)
And the console display:
raise err
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
I have seen many post about the same problem but I can't figure out how to solve it...
note that I am not the owner of the bucket, but I succeed to connect and download the file with a gui tool. I need to process it by script for automation.
EDIT:
Succeed to connect, but still struggling...
I begin to hesitate to process it automatically rather than manually ...
conn = S3Connection(access_key, secret_key, calling_format=OrdinaryCallingFormat())
bucket = conn.get_bucket(bucket_name, validate=False)
print('bucket:', bucket)
print('bucket.name:', bucket.name)
key = bucket.get_key(file_name)
print("key: {name}\t{size}\t{modified}".format(name = key.name,size = key.size,modified = key.last_modified))
print('bucket.list():',bucket.list(prefix='GA-Exports/Events_3112/DEV'))
for key in bucket.list(prefix='DEV/',delimiter='/'):
print('bucket list -> key:',key)
console :
bucket: <Bucket: GA-Exports/Events_3112/>
bucket.name: GA-Exports/Events_3112/
key: DEV/EVENTS_3113_120002892.csv.gz 3826 Sat, 16 May 2015 10:05:44 GMT
bucket.list(): <boto.s3.bucketlistresultset.BucketListResultSet object at 0x0000000004E9F7F0>
Traceback (most recent call last):
File "D:\Python\lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: no element found: line 1, column 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Francois\OneDrive\IDE\Workspace\eclipse\Python_test\etltest.py", line 31, in <module>
for key in bucket.list(prefix='DEV/',delimiter='/'):
File "D:\Python\lib\site-packages\boto\s3\bucketlistresultset.py", line 34, in bucket_lister
encoding_type=encoding_type)
File "D:\Python\lib\site-packages\boto\s3\bucket.py", line 472, in get_all_keys
'', headers, **params)
File "D:\Python\lib\site-packages\boto\s3\bucket.py", line 406, in _get_all
xml.sax.parseString(body, h)
File "D:\Python\lib\xml\sax\__init__.py", line 46, in parseString
parser.parse(inpsrc)
File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "D:\Python\lib\xml\sax\xmlreader.py", line 125, in parse
self.close()
File "D:\Python\lib\xml\sax\expatreader.py", line 217, in close
self.feed("", isFinal = 1)
File "D:\Python\lib\xml\sax\expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "D:\Python\lib\xml\sax\handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:0: no element found
By default, boto will attempt to validate the bucket when you call get_bucket by performing a HEAD request on the bucket. You may not have permission to do this even though you may have permission to retrieve objects from the bucket. Try this to disable the validation step:
bucket = conn.get_bucket(bucket_name, validate=False)
Also, make sure you are passing in the name of the bucket. Your example code is passing in file_name which doesn't sound right.

Categories