Unable to append data into Azure storage using Python - python

I am using the below code to append data to Azure blob using python.
from azure.storage.blob import AppendBlobService
append_blob_service = AppendBlobService(account_name='myaccount', account_key='mykey')
# The same containers can hold all types of blobs
append_blob_service.create_container('mycontainer')
# Append blobs must be created before they are appended to
append_blob_service.create_blob('mycontainer', 'myappendblob')
append_blob_service.append_blob_from_text('mycontainer', 'myappendblob', u'Hello, world!')
append_blob = append_blob_service.get_blob_to_text('mycontainer', 'myappendblob')
The above code works fine, but when I tried to insert new data, the old data gets overwritten.
Is there any way I can append data to 'myappendblob'

Considering you are calling the same code to append the data, the issue is with the following line of code:
append_blob_service.create_blob('mycontainer', 'myappendblob')
If you read the documentation for create_blob method, you will notice the following:
Creates a blob or overrides an existing blob. Use if_none_match=* to
prevent overriding an existing blob.
So essentially you are overriding the blob every time you call your code.
You should call this method with if_none_match="*" as the documentation suggests. If the blob exists, your code will throw an exception which you will need to handle.

Try this code which is taken from the Document and it is given by #Harsh Jain ,
from azure.storage.blob import AppendBlobService
def append_data_to_blob(data):
service = AppendBlobService(account_name="<Storage acc name>",
account_key="<Storage acc key>")
try:
service.append_blob_from_text(container_name="<name of Conatiner >", blob_name="<The name of file>", text = data)
except:
service.create_blob(container_name="<name of Conatiner >", blob_name="<the name of file>")
service.append_blob_from_text(container_name="<name of Conatiner>", blob_name="<the name of file>", text = data)
print('Data got Appended ')
append_data_to_blob('Hi blob')
Taken References from:
https://www.educative.io/answers/how-to-append-data-in-blob-storage-in-azure-using-python

Related

Cannot ingest data using snowpipe more than once

I am using the sample program from the Snowflake document on using Python to ingest the data to the destination table.
So basically, I have to execute put command to load data to the internal stage and then run the Python program to notify the snowpipe to ingest the data to the table.
This is how I create the internal stage and pipe:
create or replace stage exampledb.dbschema.example_stage;
create or replace pipe exampledb.dbschema.example_pipe
as copy into exampledb.dbschema.example_table
from
(
select
t.*
from
#exampledb.dbschema.example_stage t
)
file_format = (TYPE = CSV) ON_ERROR = SKIP_FILE;
put command:
put file://E:\\example\\data\\a.csv #exampledb.dbschema.example_stage OVERWRITE = TRUE;
This is the sample program I use:
from logging import getLogger
from snowflake.ingest import SimpleIngestManager
from snowflake.ingest import StagedFile
from snowflake.ingest.utils.uris import DEFAULT_SCHEME
from datetime import timedelta
from requests import HTTPError
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.serialization import load_pem_private_key
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.serialization import Encoding
from cryptography.hazmat.primitives.serialization import PrivateFormat
from cryptography.hazmat.primitives.serialization import NoEncryption
import time
import datetime
import os
import logging
logging.basicConfig(
filename='/tmp/ingest.log',
level=logging.DEBUG)
logger = getLogger(__name__)
# If you generated an encrypted private key, implement this method to return
# the passphrase for decrypting your private key.
def get_private_key_passphrase():
return '<private_key_passphrase>'
with open("E:\\ssh\\rsa_key.p8", 'rb') as pem_in:
pemlines = pem_in.read()
private_key_obj = load_pem_private_key(pemlines,
get_private_key_passphrase().encode(),
default_backend())
private_key_text = private_key_obj.private_bytes(
Encoding.PEM, PrivateFormat.PKCS8, NoEncryption()).decode('utf-8')
# Assume the public key has been registered in Snowflake:
# private key in PEM format
# List of files in the stage specified in the pipe definition
file_list=['a.csv.gz']
ingest_manager = SimpleIngestManager(account='<account_identifier>',
host='<account_identifier>.snowflakecomputing.com',
user='<user_login_name>',
pipe='exampledb.dbschema.example_pipe',
private_key=private_key_text)
# List of files, but wrapped into a class
staged_file_list = []
for file_name in file_list:
staged_file_list.append(StagedFile(file_name, None))
try:
resp = ingest_manager.ingest_files(staged_file_list)
except HTTPError as e:
# HTTP error, may need to retry
logger.error(e)
exit(1)
# This means Snowflake has received file and will start loading
assert(resp['responseCode'] == 'SUCCESS')
# Needs to wait for a while to get result in history
while True:
history_resp = ingest_manager.get_history()
if len(history_resp['files']) > 0:
print('Ingest Report:\n')
print(history_resp)
break
else:
# wait for 20 seconds
time.sleep(20)
hour = timedelta(hours=1)
date = datetime.datetime.utcnow() - hour
history_range_resp = ingest_manager.get_history_range(date.isoformat() + 'Z')
print('\nHistory scan report: \n')
print(history_range_resp)
After running the program, I just need to remove the file in the internal stage:
REMOVE #exampledb.dbschema.example_stage;
The code works as expected for the first time but when I truncate the data on that table and run the code again, the table on snowflake doesn't have any data in it.
Do I miss something here? How can I make this code can run multiple times?
Update:
I found that if I use a file with a different name each time I run, the data can load to the snowflake table.
So how can I run this code without changing the data filename?
Snowflake uses file loading metadata to prevent reloading the same files (and duplicating data) in a table. Snowpipe prevents loading files with the same name even if they were later modified (i.e. have a different eTag).
The file loading metadata is associated with the pipe object rather than the table. As a result:
Staged files with the same name as files that were already loaded are ignored, even if they have been modified, e.g. if new rows were added or errors in the file were corrected.
Truncating the table using the TRUNCATE TABLE command does not delete the Snowpipe file loading metadata.
However, note that pipes only maintain the load history metadata for 14 days. Therefore:
Files modified and staged again within 14 days:
Snowpipe ignores modified files that are staged again. To reload modified data files, it is currently necessary to recreate the pipe object using the CREATE OR REPLACE PIPE syntax.
Files modified and staged again after 14 days:
Snowpipe loads the data again, potentially resulting in duplicate records in the target table.
For more information have a look here

Get Properties of storage blobs returning empty dict

I've just uploaded a 5GB of data and would like to verify that the MD5 sums match. I've calculated this for my local copy of the files, but am having problems fetching ContentMD5 from Azure. So far, I get an empty dict, but I can see the blob names. I've limited it to the first 10 items at the moment, just for debugging. I'm aware that MD5 is different on Azure from a typical md5sum call and have allowed for that locally. But, currently, I cannot see any blob properties. The properties are there when I browse via the Azure console (as is the ContentMD5 property).
Where am I going wrong?
Here's my code at the moment:
import os
from os import sys
from azure.storage.blob import BlobServiceClient
def remote_check(connection_str):
blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_name = "global"
container = blob_service_client.get_container_client(container=container_name)
blob_list = container.list_blobs()
count = 0
for blob in blob_list:
if count < 10:
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob)
a = blob_client.get_blob_properties()
print(a.metadata)
print("Blob name: " + str(blob_client.blob_name))
count = count + 1
else:
break
def main():
try:
CONNECTION_STRING = os.environ['AZURE_STORAGE_CONNECTION_STRING']
remote_check(CONNECTION_STRING)
except KeyError:
print("AZURE_STORAGE_CONNECTION_STRING must be set.")
sys.exit(1)
if __name__ == '__main__':
main()
Please make sure you're using the latest version of package azure-storage-blob 12.6.0.
Some properties are in the content_settings, for example, to get content_md5, you should use the following code:
a=blob_client.get_blob_properties()
print(a.content_settings.content_md5)
Here is the my test result:
Maybe you can check the blob properties with a rest (e.g. with an rest client like postman) call described here:
https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob-properties
The "Content-MD5" is returned as HTTP-Response Header.

Get elasticloadbalancers names with boto3

When I try to print the load balancers from aws I get a huge dictionary with a lot of keys, but when I'm trying to print only the 'LoadBalancerName' value I get: None, I want to print all the load balancers names in our environment how I can do it? thanks!
What I tried:
import boto3
client = boto3.client('elbv2')
elb = client.describe_load_balancers()
Name = elb.get('LoadBalancerName')
print(Name)
The way in which you were handling the response object was incorrect, and you'll need to put it in a loop if you want all the Names and not just one. What you'll you'll need is this :
import boto3
client = boto3.client('elbv2')
elb = client.describe_load_balancers()
for i in elb['LoadBalancers']:
print(i['LoadBalancerArn'])
print(i['LoadBalancerName'])
However if your still getting none as a value it would be worth double checking what region the load balancers are in as well as if you need to pass in the use of a profile too.

How to convert suds object to xml string

This is a duplicate to this question:
How to convert suds object to xml
But the question has not been answered: "totxt" is not an attribute on the Client class.
Unfortunately I lack of reputation to add comments. So I ask again:
Is there a way to convert a suds object to its xml?
I ask this because I already have a system that consumes wsdl files and sends data to a webservice. But now the customers want to alternatively store the XML as files (to import them later manually). So all I need are 2 methods for writing data: One writes to a webservice (implemented and tested), the other (not implemented yet) writes to files.
If only I could make something like this:
xml_as_string = My_suds_object.to_xml()
The following code is just an example and does not run. And it's not elegant. Doesn't matter. I hope you get the idea what I want to achieve:
I have the function "write_customer_obj_webservice" that works. Now I want to write the function "write_customer_obj_xml_file".
import suds
def get_customer_obj():
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
customer = c.factory.create("ns0:CustomerType")
return customer
def write_customer_obj_webservice(customer):
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
response = c.service.save(someparameters, None, None, customer)
return response
def write_customer_obj_xml_file(customer):
output_filename = r'C\temp\testxml'
# The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
xml = customer.to_xml()
fo = open(output_filename, 'a')
try:
fo.write(xml)
except:
raise
else:
response = 'All ok'
finally:
fo.close()
return response
# Get the customer object always from the wsdl.
customer = get_customer_obj()
# Since customer is an object, setting it's attributes is very easy. There are very complex objects in this system.
customer.name = "Doe J."
customer.age = 42
# Write the new customer to a webservice or store it in a file for later proccessing
if later_processing:
response = write_customer_obj_xml_file(customer)
else:
response = write_customer_obj_webservice(customer)
I found a way that works for me. The trick is to create the Client with the option "nosend=True".
In the documentation it says:
nosend - Create the soap envelope but don't send. When specified, method invocation returns a RequestContext instead of sending it.
The RequestContext object has the attribute envelope. This is the XML as string.
Some pseudo code to illustrate:
c = suds.client.Client(url, nosend=True)
customer = c.factory.create("ns0:CustomerType")
customer.name = "Doe J."
customer.age = 42
response = c.service.save(someparameters, None, None, customer)
print response.envelope # This prints the XML string that would have been sent.
You have some issues in write_customer_obj_xml_file function:
Fix bad path:
output_filename = r'C:\temp\test.xml'
The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
What's the type of customer? type(customer)?
xml = customer.to_xml() # to be continued...
Why mode='a'? ('a' => append, 'w' => create + write)
Use a with statement (file context manager).
with open(output_filename, 'w') as fo:
fo.write(xml)
Don't need to return a response string: use an exception manager. The exception to catch can be EnvironmentError.
Analyse
The following call:
customer = c.factory.create("ns0:CustomerType")
Construct a CustomerType on the fly, and return a CustomerType instance customer.
I think you can introspect your customer object, try the following:
vars(customer) # display the object attributes
help(customer) # display an extensive help about your instance
Another way is to try the WSDL URLs by hands, and see the XML results.
You may obtain the full description of your CustomerType object.
And then?
Then, with the attributes list, you can create your own XML. Use an XML template and fill it with the object attributes.
You may also found the magic function (to_xml) which do the job for you. But, not sure the XML format matches your need.
client = Client(url)
client.factory.create('somename')
# The last XML request by client
client.last_sent()
# The last XML response from Web Service
client.last_received()

Softlayer Object Storage Python API Time To Live

How do I set time to live for a file on object storage?
Looking at the code in https://github.com/softlayer/softlayer-object-storage-python/blob/master/object_storage/storage_object.py it takes in (self, data, check_md5) with no TTL option.
sl_storage = object_storage.get_client(
username = environment['slos_username'],
password = environment['api_key'],
auth_url = environment['auth_url']
)
# get container
sl_container = sl_storage.get_container(environment['object_container'])
# create "pointer" to cointainer file fabfile.zip
sl_file = sl_container[filename]
myzip = open(foldername + filename, 'rb')
sl_file.create()
sl_file.send(myzip, TIME_TO_LIVE_PARAM=100)
I also tried according to https://github.com/softlayer/softlayer-object-storage-python/blob/master/object_storage/container.py
sl_file['ttl'] = timetolive
But it doesn't work.
Thanks!
You need to make sure that the "ttl" is available in the headers, The "TTL" header is available when your container has enabled CDN.
so to verify that the ttl header exist you can use this code line:
sl_storage['myContainserName']['MyFileName'].headers
then you can update the tll using this line code:
sl_storage['myContainserName']['MyFileName'].update({'x-cdn-ttl':'3600'})
in case the ttl values does not exist and you have the cdn enabled try to create the header using this line code:
sl_storage['myContainserName']['MyFileName'].create({'x-cdn-ttl':'3600'})
Regards
You need to set up the header "X-Delete-At: 1417341600" where 1417341600 is a Unix timestamp see more information here http://docs.openstack.org/developer/swift/overview_expiring_objects.html
using the Python client you can use the update method:
https://github.com/softlayer/softlayer-object-storage-python/blob/master/object_storage/storage_object.py#L210-L216
sl_storage['myContainserName']['MyFileName'].update({'X-Delete-At':1417341600})
Regards

Categories