Update DynamoDB Atomic Counter with Python / Boto

Update DynamoDB Atomic Counter with Python / Boto - python

I am trying to update an atomic count counter with Python Boto 2.3.0, but can find no documentation for the operation.
It seems there is no direct interface, so I tried to go to "raw" updates using the layer1 interface, but I was unable to complete even a simple update.
I tried the following variations but all with no luck
dynoConn.update_item(INFLUENCER_DATA_TABLE,
{'HashKeyElement': "9f08b4f5-d25a-4950-a948-0381c34aed1c"},
{'new': {'Value': {'N':"1"}, 'Action': "ADD"}})
dynoConn.update_item('influencer_data',
{'HashKeyElement': "9f08b4f5-d25a-4950-a948-0381c34aed1c"},
{'new': {'S' :'hello'}})
dynoConn.update_item("influencer_data",
{"HashKeyElement": "9f08b4f5-d25a-4950-a948-0381c34aed1c"},
{"AttributesToPut" : {"new": {"S" :"hello"}}})
They all produce the same error:
File "/usr/local/lib/python2.6/dist-packages/boto-2.3.0-py2.6.egg/boto/dynamodb/layer1.py", line 164, in _retry_handler
data)
boto.exception.DynamoDBResponseError: DynamoDBResponseError: 400 Bad Request
{u'Message': u'Expected null', u'__type': u'com.amazon.coral.service#SerializationException'}
I also investigated the API docs here but they were pretty spartan.
I have done a lot of searching and fiddling, and the only thing I have left is to use the PHP API and dive into the code to find where it "formats" the JSON body, but that is a bit of a pain. Please save me from that pain!

Sorry, I misunderstood what you were looking for. You can accomplish this via layer2 although there is a small bug that needs to be addressed. Here's some Layer2 code:
>>> import boto
>>> c = boto.connect_dynamodb()
>>> t = c.get_table('counter')
>>> item = t.get_item('counter')
>>> item
{u'id': 'counter', u'n': 1}
>>> item.add_attribute('n', 20)
>>> item.save()
{u'ConsumedCapacityUnits': 1.0}
>>> item # Here's the bug, local Item is not updated
{u'id': 'counter', u'n': 1}
>>> item = t.get_item('counter') # Refetch item just to verify change occurred
>>> item
{u'id': 'counter', u'n': 21}
This results in the same over-the-wire request as you are performing in your Layer1 code, as shown by the following debug output.
2012-04-27 04:17:59,170 foo [DEBUG]:StringToSign:
POST
/
host:dynamodb.us-east-1.amazonaws.com
x-amz-date:Fri, 27 Apr 2012 11:17:59 GMT
x-amz-security- token:<removed> ==
x-amz-target:DynamoDB_20111205.UpdateItem
{"AttributeUpdates": {"n": {"Action": "ADD", "Value": {"N": "20"}}}, "TableName": "counter", "Key": {"HashKeyElement": {"S": "counter"}}}
If you want to avoid the initial GetItem call, you could do this instead:
>>> import boto
>>> c = boto.connect_dynamodb()
>>> t = c.get_table('counter')
>>> item = t.new_item('counter')
>>> item.add_attribute('n', 20)
>>> item.save()
{u'ConsumedCapacityUnits': 1.0}
Which will update the item if it already exists or create it if it doesn't yet exist.

For those looking for the answer I have found it.
First IMPORTANT NOTE, I am currently unaware of what is going on BUT for the moment, to get a layer1 instance I have had to do the following:
import boto
AWS_ACCESS_KEY=XXXXX
AWS_SECRET_KEY=YYYYY
dynoConn = boto.connect_dynamodb(AWS_ACCESS_KEY, AWS_SECRET_KEY)
dynoConnLayer1 = boto.dynamodb.layer1.Layer1(AWS_ACCESS_KEY, AWS_SECRET_KEY)
Essentially instantiating a layer2 FIRST and THEN a layer 1.
Maybe Im doing something stupid but at this point Im just happy to have it working....
I'll sort the details later. THEN...to actually do the atomic update call:
dynoConnLayer1.update_item("influencer_data",
{"HashKeyElement":{"S":"9f08b4f5-d25a-4950-a948-0381c34aed1c"}},
{"direct_influence":
{"Action":"ADD","Value":{"N":"20"}}
}
);
Note in the example above Dynamo will ADD 20 to what ever the current value is and this operation will be atomic meaning other operations happening at the "same time" will be correctly "scheduled" to happen after the new value has been established as +20 OR before this operation is executed. Either way the desired effect will be accomplished.
Be certain to do this on the instance of the layer1 connection as the layer2 will throw errors given it expects a different set of parameter types.
Thats all there is to it!!!! Just so folks know, I figured this out using the PHP SDK. Takes a very short time to install and set up AND THEN when you do a call, the debug data will actually show you the format of the HTTP request body so you will be able to copy/model your layer1 parameters after the example. Here is the code I used to do the atomic update in PHP:
<?php
// Instantiate the class
$dynamodb = new AmazonDynamoDB();
$update_response = $dynamodb->update_item(array(
'TableName' => 'influencer_data',
'Key' => array(
'HashKeyElement' => array(
AmazonDynamoDB::TYPE_STRING=> '9f08b4f5-d25a-4950-a948-0381c34aed1c'
)
),
'AttributeUpdates' => array(
'direct_influence' => array(
'Action' => AmazonDynamoDB::ACTION_ADD,
'Value' => array(
AmazonDynamoDB::TYPE_NUMBER => '20'
)
)
)
));
// status code 200 indicates success
print_r($update_response);
?>
Hopefully this will help other up until the Boto layer2 interface catches up...or someone simply figures out how to do it in level2 :-)

I'm not sure this is truly an atomic counter, since when you increment the value of 1, another call call could increment the number by 1, so that when you "get" the value, it is not the value that you would expect.
For instance, putting the code by garnaat, which is marked as the accepted answer, I see that when you put it in a thread, it does not work:
class ThreadClass(threading.Thread):
def run(self):
conn = boto.dynamodb.connect_to_region(aws_access_key_id=os.environ['AWS_ACCESS_KEY'], aws_secret_access_key=os.environ['AWS_SECRET_KEY'], region_name='us-east-1')
t = conn.get_table('zoo_keeper_ids')
item = t.new_item('counter')
item.add_attribute('n', 1)
r = item.save() #- Item has been atomically updated!
# Uh-Oh! The value may have changed by the time "get_item" is called!
item = t.get_item('counter')
self.counter = item['n']
logging.critical('Thread has counter: ' + str(self.counter))
tcount = 3
threads = []
for i in range(tcount):
threads.append(ThreadClass())
# Start running the threads:
for t in threads:
t.start()
# Wait for all threads to complete:
for t in threads:
t.join()
#- Now verify all threads have unique numbers:
results = set()
for t in threads:
results.add(t.counter)
print len(results)
print tcount
if len(results) != tcount:
print '***Error: All threads do not have unique values!'
else:
print 'Success! All threads have unique values!'
Note: If you want this to truly work, change the code to this:
def run(self):
conn = boto.dynamodb.connect_to_region(aws_access_key_id=os.environ['AWS_ACCESS_KEY'], aws_secret_access_key=os.environ['AWS_SECRET_KEY'], region_name='us-east-1')
t = conn.get_table('zoo_keeper_ids')
item = t.new_item('counter')
item.add_attribute('n', 1)
r = item.save(return_values='ALL_NEW') #- Item has been atomically updated, and you have the correct value without having to do a "get"!
self.counter = str(r['Attributes']['n'])
logging.critical('Thread has counter: ' + str(self.counter))
Hope this helps!

There is no high-level function in DynamoDB for atomic counters. However, you can implement an atomic counter using the conditional write feature. For example, let's say you a table with an string hash key called like this.
>>> import boto
>>> c = boto.connect_dynamodb()
>>> schema = s.create_schema('id', 's')
>>> counter_table = c.create_table('counter', schema, 5, 5)
You now write an item to that table that includes an attribute called 'n' whose value is zero.
>>> n = 0
>>> item = counter_table.new_item('counter', {'n': n})
>>> item.put()
Now, if I want to update the value of my counter, I would perform a conditional write operation that will bump the value of 'n' to 1 iff it's current value agrees with my idea of it's current value.
>>> n += 1
>>> item['n'] = n
>>> item.put(expected_value={'n': n-1})
This will set the value of 'n' in the item to 1 but only if the current value in the DynamoDB is zero. If the value was already incremented by someone else, the write would fail and I would then need to increment by local counter and try again.
This is kind of complicated but all of this could be wrapped up in some code to make it much simpler to use. I did a similar thing for SimpleDB that you can find here:
http://www.elastician.com/2010/02/stupid-boto-tricks-2-reliable-counters.html
I should probably try to update that example to use DynamoDB

You want to increment a value in dynamodb then you can achieve this by using:
import boto3
import json
import decimal
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, decimal.Decimal):
if o % 1 > 0:
return float(o)
else:
return int(o)
return super(DecimalEncoder, self).default(o)
ddb = boto3.resource('dynamodb')
def get_counter():
table = ddb.Table(TableName)
try:
response = table.update_item(
Key={
'haskey' : 'counterName'
},
UpdateExpression="set currentValue = currentValue + :val",
ExpressionAttributeValues={
':val': decimal.Decimal(1)
},
ReturnValues="UPDATED_NEW"
)
print("UpdateItem succeeded:")
except Exception as e:
raise e
print(response["Attributes"]["currentValue" ])
This implementetion needs a extra counter table that will just keep the last used value for you.

Related

Elasticsearch for python - Calls not blocking correctly

I'm trying to write unittests for my own Elasticsearch client. It uses the client from elasticsearch-py.
Most of my tests are fine, but when running a test on my own search() function (which uses the search() function from Elasticsearch client) I get very random behaviour. This is the way my test is implemented:
def setUp(self) -> None:
self.es = ESClient(host="localhost")
self.es_acc = ESClient()
self.connection_res = (False, {})
self.t = self.es_acc.get_connection_status(self._callback)
self.t.join()
# Create test index and index some documents
self.es.create_index(self.TEST_INDEX)
names = ["Gregor", "Alice", "Per Svensson", "Mats Hermelin", "Mamma Mia"
, "Eva Dahlgren", "Per Morberg", "Maja Larsson", "Ola Salo", "Magrecievic Holagrostokovic"]
self.num_docs = len(names)
self.payload = []
random.seed(123)
for i, name in enumerate(names):
n = name.split(" ")
fname = n[0]
lname = n[1] if len(n) > 1 else n[0]
self.payload.append({"name": {"first": fname, "last": lname}, "age": random.randint(-100, 100),
"timestamp": datetime.utcnow() - timedelta(days=1 * i)})
self.es.upload(self.TEST_INDEX, self.payload, ids=list(range(len(names))))
def test_search(self):
# Test getting docs based on ids
ids = ["1", "4", "9"]
status, hits = self.es.search(self.TEST_INDEX, ids=ids) # Breakpoint
docs = hits["hits"]["hits"]
self.assertTrue(status, "Status not correct for search!")
returned_ids = [d["_id"] for d in docs]
names = [d["_source"]["name"] for d in docs]
self.assertListEqual(sorted(returned_ids), ids, "Returned ids from search not correct!")
self.assertListEqual(names, [self.payload[i]["name"] for i in [1, 4, 9]], "Returned source from search not correct!")
In setUp() I'm just uploading a few documents to test on, so there should always be 10 documents to test on. Below is an excerpt from my search() function.
if ids:
try:
q = Query().ids(ids).compile_and_get()
res = self.es.search(index=index, body=q)
print(res)
return True, res
except exceptions.ElasticsearchException as e:
self._handle_elastic_exceptions("search", e, index=index)
return False, {}
I've implemented Query. Anyway, when I just run the test, I ALMOST always get 0 hits. But if I debug the application, with a breakpoint in test_search() on the row where I make the call to search() and step, everything works fine. If I put it just one line below, I get 0 hits again. What is going on? Why is it not blocking correctly?

It seems like I found my solution!
I did not understand that setUp was called on every test method. This was actually not the problem however.
The problem is that for some tests, uploading documents simply took to much time (which was done in setUp) and so when the test started, the documents did not exist yet! Solution: add sleep(1) to the end of setUp.

Printing out a Python List with unknown contents

def enumerate_all_config_objects(baseDN):
url = 'https://www.AwebsiteThatIWontProvide.com'
payload={"ObjectDN":baseDN,"Pattern":"*aws*","Pattern":"*jkb*"}
r = requests.post(url, verify='/PathToKeyForVerification/', headers=headers,data=json.dumps(payload))
response = r.json()
status = r.status_code
print " "
print "Returned status code:", status
print " "
return response ['Objects'][0]['GUID']
Output:
Returned status code: 200
{11871545-8c5b-4c3c-9609-7372fae1add5}
Process finished with exit code 0
I am trying to return ONLY the "GUID" information from a json request. This works (the 1187154...), as I enter values into the index between ['objects'] and ['guid'], each value is successfully produced from the list. My problem is, even though I am printing out the actual response to verify the output is correct, the final script should not require anything being dumped to a CSV file. I have to perform everything in memory. The next function that I need to create will use the returned GUID values and query the server with those values.
How do I get the items in the list to display from the output of enumerate_all_config_objects? I would like to print them to troubleshoot initially. Then I will comment out this feature and have the second function pull each value from that list and use it.
Two problems:
Print out the list which will always have an unknown number of entries.
Create another function to reference / use the values from that list.
The list is populated correctly, I've verified this. I just don't know how to access it or print it.

If I understood correctly, you are looking to do something like:
def enumerate_all_config_objects(baseDN):
url = 'https://www.AwebsiteThatIWontProvide.com'
payload = {"ObjectDN": baseDN, "Pattern": "*aws*", "Pattern": "*jkb*"}
r = requests.post(url, verify='/PathToKeyForVerification/', headers=headers,data=json.dumps(payload))
response = r.json()
status = r.status_code
return map(lambda x: x["GUID"] , response['Objects'])
def use_guids(guid_list):
#do_stuff, for example, to show the guids:
for guid in guid_list:
print(guid)
use_guids(enumerate_all_config_objects(baseDN=<replaceWithYourParameter>))
Edit : To clear out the questions from your comment, I decided to mock the call to the API which you said already works
def enumerate_all_config_objects():
foo = {"GUID" : 1}
bar = {"GUID" : 2}
baz = {"GUID" : 3}
response = {"Objects": [foo, bar, baz] }
mapped = list(map(lambda x: x["GUID"] , response['Objects']))
return map(lambda x: x["GUID"] , response['Objects'])
def use_guids(guid_list):
#do_stuff, for example, to show the guids:
for guid in guid_list:
print(guid)
use_guids(enumerate_all_config_objects())
prints out
1
2
3
When you want to use the value computed from a function, you need to use the return keyword.
For example return map(lambda x: x["GUID"] , response['Objects']), in this new example would return a map object containing [1, 2, 3]. This return value can then be used such as in my first example by passing it to another function.
In the example just above, the list is passed to the use_guids function, which prints the contents of the list.
Edit 2 : If you really insist on calling a function that handles one GUID, you can do that in this way:
def enumerate_all_config_objects(baseDN):
url = 'https://www.AwebsiteThatIWontProvide.com'
payload = {"ObjectDN": baseDN, "Pattern": "*aws*", "Pattern": "*jkb*"}
r = requests.post(url, verify='/PathToKeyForVerification/', headers=headers,data=json.dumps(payload))
response = r.json()
status = r.status_code
for obj in response['Objects']:
use_guid(obj["GUID"])
def use_guid(guid):
print(guid)
# Do some other stuff.
# ...
enumerate_all_config_objects(baseDN=<replaceWithYourParameter>)

List out auto scaling group names with a specific application tag using boto3

I was trying to fetch auto scaling groups with Application tag value as 'CCC'.
The list is as below,
gweb
prd-dcc-eap-w2
gweb
prd-dcc-emc
gweb
prd-dcc-ems
CCC
dev-ccc-wer
CCC
dev-ccc-gbg
CCC
dev-ccc-wer
The script I coded below gives output which includes one ASG without CCC tag.
#!/usr/bin/python
import boto3
client = boto3.client('autoscaling',region_name='us-west-2')
response = client.describe_auto_scaling_groups()
ccc_asg = []
all_asg = response['AutoScalingGroups']
for i in range(len(all_asg)):
all_tags = all_asg[i]['Tags']
for j in range(len(all_tags)):
if all_tags[j]['Key'] == 'Name':
asg_name = all_tags[j]['Value']
# print asg_name
if all_tags[j]['Key'] == 'Application':
app = all_tags[j]['Value']
# print app
if all_tags[j]['Value'] == 'CCC':
ccc_asg.append(asg_name)
print ccc_asg
The output which I am getting is as below,
['prd-dcc-ein-w2', 'dev-ccc-hap', 'dev-ccc-wfd', 'dev-ccc-sdf']
Where as 'prd-dcc-ein-w2' is an asg with a different tag 'gweb'. And the last one (dev-ccc-msp-agt-asg) in the CCC ASG list is missing. I need output as below,
dev-ccc-hap-sdf
dev-ccc-hap-gfh
dev-ccc-hap-tyu
dev-ccc-mso-hjk
Am I missing something ?.

In boto3 you can use Paginators with JMESPath filtering to do this very effectively and in more concise way.
From boto3 docs:
JMESPath is a query language for JSON that can be used directly on
paginated results. You can filter results client-side using JMESPath
expressions that are applied to each page of results through the
search method of a PageIterator.
When filtering with JMESPath expressions, each page of results that is
yielded by the paginator is mapped through the JMESPath expression. If
a JMESPath expression returns a single value that is not an array,
that value is yielded directly. If the result of applying the JMESPath
expression to a page of results is a list, then each value of the list
is yielded individually (essentially implementing a flat map).
Here is how it looks like in Python code with mentioned CCP value for Application tag of Auto Scaling Group:
import boto3
client = boto3.client('autoscaling')
paginator = client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filtered_asgs = page_iterator.search(
'AutoScalingGroups[] | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format(
'Application', 'CCP')
)
for asg in filtered_asgs:
print asg['AutoScalingGroupName']

Elaborating on Michal Gasek's answer, here's an option that filters ASGs based on a dict of tag:value pairs.
def get_asg_name_from_tags(tags):
asg_name = None
client = boto3.client('autoscaling')
while True:
paginator = client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filter = 'AutoScalingGroups[]'
for tag in tags:
filter = ('{} | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format(filter, tag, tags[tag]))
filtered_asgs = page_iterator.search(filter)
asg = filtered_asgs.next()
asg_name = asg['AutoScalingGroupName']
try:
asgX = filtered_asgs.next()
asgX_name = asg['AutoScalingGroupName']
raise AssertionError('multiple ASG\'s found for {} = {},{}'
.format(tags, asg_name, asgX_name))
except StopIteration:
break
return asg_name
eg:
asg_name = get_asg_name_from_tags({'Env':env, 'Application':'app'})
It expects there to be only one result and checks this by trying to use next() to get another. The StopIteration is the "good" case, which then breaks out of the paginator loop.

I got it working with below script.
#!/usr/bin/python
import boto3
client = boto3.client('autoscaling',region_name='us-west-2')
response = client.describe_auto_scaling_groups()
ccp_asg = []
all_asg = response['AutoScalingGroups']
for i in range(len(all_asg)):
all_tags = all_asg[i]['Tags']
app = False
asg_name = ''
for j in range(len(all_tags)):
if 'Application' in all_tags[j]['Key'] and all_tags[j]['Value'] in ('CCP'):
app = True
if app:
if 'Name' in all_tags[j]['Key']:
asg_name = all_tags[j]['Value']
ccp_asg.append(asg_name)
print ccp_asg
Feel free to ask if you have any doubts.

The right way to do this isn't via describe_auto_scaling_groups at all but via describe_tags, which will allow you to make the filtering happen on the server side.
You can construct a filter that asks for tag application instances with any of a number of values:
Filters=[
{
'Name': 'key',
'Values': [
'Application',
]
},
{
'Name': 'value',
'Values': [
'CCC',
]
},
],
And then your results (in Tags in the response) are all the times when a matching tag is applied to an autoscaling group. You will have to make the call multiple times, passing back NextToken every time there is one, to go through all the pages of results.
Each result includes an ASG ID that the matching tag is applied to. Once you have all the ASG IDs you are interested in, then you can call describe_auto_scaling_groups to get their names.

yet another solution, in my opinion simple enough to extend:
client = boto3.client('autoscaling')
search_tags = {"environment": "stage"}
filtered_asgs = []
response = client.describe_auto_scaling_groups()
for group in response['AutoScalingGroups']:
flattened_tags = {
tag_info['Key']: tag_info['Value']
for tag_info in group['Tags']
}
if search_tags.items() <= flattened_tags.items():
filtered_asgs.append(group)
print(filtered_asgs)

not able to undertand cursors in appengine

I'm trying to fetch results in a python2.7 appengine app using cursors, but each time I use with_cursor() it fetches the same result set.
query = Model.all().filter("profile =", p_key).order('-created')
if r.get('cursor'):
query = query.with_cursor(start_cursor = r.get('cursor'))
cursor = query.cursor()
objs = query.fetch(limit=10)
count = len(objs)
for obj in objs:
...
Each time through I'm getting same 10 results. I'm thinkng it has to do with using end_cursor, but how do I get that value if query.cursor() is returning the start_cursor. I've looked through the docs but this is poorly documented.

Your formatting is a bit screwy by the way. Looking at your code (which is incomplete and therefore potentially leaving something out.) I have to assume you have forgotten to store the cursor after fetching results (or return to the user - I am assuming r is a request ?).
So after you have fetched some data you need to call cursor() on the query. e.g This function counts all entities using a cursor.
def count_entities(kind):
c = None
count = 0
q = kind.all(keys_only=True)
while True:
if c:
q.with_cursor(c)
i = q.fetch(1000)
count = count + len(i)
if not i:
break
c = q.cursor()
return count
See how after fetch() has been called the c=q.cursor() call and it's is used as the cursor next time through the loop.

Here's what finally worked:
query = Model.all().filter("profile =", p_key).order('-created')
if request.get('cursor'):
query = query.with_cursor(request.get('cursor'))
objs = query.fetch(limit=10)
cursor = query.cursor()
for obj in objs:
...

Problem with shelve module?

Using the shelve module has given me some surprising behavior. keys(), iter(), and iteritems() don't return all the entries in the shelf! Here's the code:
cache = shelve.open('my.cache')
# ...
cache[url] = (datetime.datetime.today(), value)
later:
cache = shelve.open('my.cache')
urls = ['accounts_with_transactions.xml', 'targets.xml', 'profile.xml']
try:
print list(cache.keys()) # doesn't return all the keys!
print [url for url in urls if cache.has_key(url)]
print list(cache.keys())
finally:
cache.close()
and here's the output:
['targets.xml']
['accounts_with_transactions.xml', 'targets.xml']
['targets.xml', 'accounts_with_transactions.xml']
Has anyone run into this before, and is there a workaround without knowing all possible cache keys a priori?

According to the python library reference:
...The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small...
This correctly reproduces the 'bug':
import shelve
a = 'trxns.xml'
b = 'foobar.xml'
c = 'profile.xml'
urls = [a, b, c]
cache = shelve.open('my.cache', 'c')
try:
cache[a] = a*1000
cache[b] = b*10000
finally:
cache.close()
cache = shelve.open('my.cache', 'c')
try:
print cache.keys()
print [url for url in urls if cache.has_key(url)]
print cache.keys()
finally:
cache.close()
with the output:
[]
['trxns.xml', 'foobar.xml']
['foobar.xml', 'trxns.xml']
The answer, therefore, is don't store anything big—like raw xml—but rather results of calculations in a shelf.

Seeing your examples, my first thought is that cache.has_key() has side effects, i.e. this call will add keys to the cache. What do you get for
print cache.has_key('xxx')
print list(cache.keys())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Update DynamoDB Atomic Counter with Python / Boto - python

Related

Elasticsearch for python - Calls not blocking correctly

Printing out a Python List with unknown contents

List out auto scaling group names with a specific application tag using boto3

not able to undertand cursors in appengine

Problem with shelve module?

Categories

Resources