I am trying to convert boto3 dynamoDB conditional expressions (using types from boto3.dynamodb.conditions) to its string representation. Of course this could be hand coded but naturally I would prefer to be able to find something developed by AWS itself.
Key("name").eq("new_name") & Attr("description").begins_with("new")
would become
"name = 'new_name' and begins_with(description, 'new')"
I have been checking in the boto3 and boto core code but so far no success, but I assume it must exist somewhere in the codebase...
In the boto3.dynamodb.conditions module there is a class called ConditionExpressionBuilder. You can convert a condition expression to string by doing the following:
condition = Key("name").eq("new_name") & Attr("description").begins_with("new")
builder = ConditionExpressionBuilder()
expression = builder.build_expression(condition, is_key_condition=True)
expression_string = expression.condition_expression
expression_attribute_names = expression.attribute_name_placeholders
expression_attribute_values = expression.attribute_value_placeholders
I'm not sure why this isn't documented anywhere. I just randomly found it looking through the source code at the bottom of this page https://boto3.amazonaws.com/v1/documentation/api/latest/_modules/boto3/dynamodb/conditions.html.
Unfortunately, this doesn't work for the paginator format string notation, but it should work for the Table.query() format.
From #Brian's answer with
ConditionExpressionBuilder I had to add the Dynamodb's {'S': 'value'} type notation before the query execution.
I changed it with expression_attribute_values[':v0'] = {'S': pk_value}, where :v0 is the first Key/Attr in the condition. Not sure but should work for next values (:v0, :v1, :v2...).
Here is the full code, using pagination to retrieve only part of data
from boto3.dynamodb.conditions import Attr, Key, ConditionExpressionBuilder
from typing import Optional, List
import boto3
client_dynamodb = boto3.client("dynamodb", region_name="us-east-1")
def get_items(self, pk_value: str, pagination_config: dict = None) -> Optional[List]:
if pagination_config is None:
pagination_config = {
# Return only first page of results when no pagination config is not provided
'PageSize': 300,
'StartingToken': None,
'MaxItems': None,
}
condition = Key("pk").eq(pk_value)
builder = ConditionExpressionBuilder()
expression = builder.build_expression(condition, is_key_condition=True)
expression_string = expression.condition_expression
expression_attribute_names = expression.attribute_name_placeholders
expression_attribute_values = expression.attribute_value_placeholders
# Changed here to make it compatible with dynamodb typing
python expression_attribute_values[':v0'] = {'S': pk_value}
paginator = client_dynamodb.get_paginator('query')
page_iterator = paginator.paginate(
TableName="TABLE_NAME",
IndexName="pk_value_INDEX",
KeyConditionExpression=expression_string,
ExpressionAttributeNames=expression_attribute_names,
ExpressionAttributeValues=expression_attribute_values,
PaginationConfig=pagination_config
)
for page in page_iterator:
resp=page
break
if ("Items" not in resp) or (len(resp["Items"]) == 0):
return None
return resp["Items"]
EDIT:
I used this question to get string representation for Dynamodb Resource's query, which is not compatible (yet) with dynamodb conditions, but then I found a better solution from Github (Boto3)[https://github.com/boto/boto3/issues/2300]:
Replace paginator with the one from meta
dynamodb_resource = boto3.resource("dynamodb")
paginator = dynamodb_resource.meta.client.get_paginator('query')
And now I can simply use Attr and Key
Related
This is the same as this question, but I also want to limit the depth returned.
Currently, all answers return all the objects after the specified prefix. I want to see just what's in the current hierarchy level.
Current code that returns everything:
self._session = boto3.Session(
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
)
self._session.resource("s3")
bucket = self._s3.Bucket(bucket_name)
detections_contents = bucket.objects.filter(Prefix=prefix)
for object_summary in detections_contents:
print(object_summary.key)
How to see only the files and folders directly under prefix? How to go n levels deep?
I can parse everything locally, and this is clearly not what I am looking for here.
There is no definite way to do this using list objects without getting all the objects in the dir.
But there is a way using s3 select which uses sql query like format to get n levels deep to get the file content as well as to get object keys.
If you are fine with writing sql then use this.
reference doc
import boto3
import json
s3 = boto3.client('s3')
bucket_name = 'my-bucket'
prefix = 'my-directory/subdirectory/'
input_serialization = {
'CompressionType': 'NONE',
'JSON': {
'Type': 'LINES'
}
}
output_serialization = {
'JSON': {}
}
# Set the SQL expression to select the key field for all objects in the subdirectory
expression = 'SELECT s.key FROM S3Object s WHERE s.key LIKE \'' + prefix + '%\''
response = s3.select_object_content(
Bucket=bucket_name,
ExpressionType='SQL',
Expression=expression,
InputSerialization=input_serialization,
OutputSerialization=output_serialization
)
# The response will contain a Payload field with the selected data
payload = response['Payload']
for event in payload:
if 'Records' in event:
records = event['Records']['Payload']
data = json.loads(records.decode('utf-8'))
# The data will be a list of objects, each with a "key" field representing the file name
for item in data:
print(item['key'])
There is not built in way with the Boto3 or S3 APIs to do this. You'll need some version of processing each level and asking in turn for a list of objects at that level:
import boto3
s3 = boto3.client('s3')
max_depth = 2
paginator = s3.get_paginator('list_objects_v2')
# Track all prefixes to show with a list
common_prefixes = [(0, "")]
while len(common_prefixes) > 0:
# Pull out the next prefix to show
current_depth, current_prefix = common_prefixes.pop(0)
# Loop through all of the items using a paginator to handle common prefixes with more
# than a thousand items
for page in paginator.paginate(Bucket=bucket_name, Prefix=current_prefix, Delimiter='/'):
for cur in page.get("CommonPrefixes", []):
# Show each common prefix, here just use a format like AWS CLI does
print(" " * 27 + f"PRE {cur['Prefix']}")
if current_depth < max_depth:
# This is below the max depth we want to show, so
# add it to the list to be shown
common_prefixes.append((current_depth + 1, cur['Prefix']))
for cur in page.get("Contents", []):
# Show each item sharing this common prefix using a format like the AWS CLI
print(f"{cur['LastModified'].strftime('%Y-%m-%d %H:%M:%S')}{cur['Size']:11d} {cur['Key']}")
I'm having some trouble verifying the HMAC parameter coming from Shopify. The code I'm using per the Shopify documentation is returning an incorrect result.
Here's my annotated code:
import urllib
import hmac
import hashlib
qs = "hmac=96d0a58213b6aa5ca5ef6295023a90694cf21655cf301975978a9aa30e2d3e48&locale=en&protocol=https%3A%2F%2F&shop=myshopname.myshopify.com×tamp=1520883022"
Parse the querystring
params = urllib.parse.parse_qs(qs)
Extract the hmac value
value = params['hmac'][0]
Remove parameters from the querystring per documentation
del params['hmac']
del params['signature']
Recombine the parameters
new_qs = urllib.parse.urlencode(params)
Calculate the digest
h = hmac.new(SECRET.encode("utf8"), msg=new_qs.encode("utf8"), digestmod=hashlib.sha256)
Returns False!
hmac.compare_digest(h.hexdigest(), value)
That last step should, ostensibly, return true. Every step followed here is outlined as commented in the Shopify docs.
At some point, recently, Shopify started including the protocol parameter in the querystring payload. This itself wouldn't be a problem, except for the fact that Shopify doesn't document that : and / are not to be URL-encoded when checking the signature. This is unexpected, given that they themselves do URL-encode these characters in the query string that is provided.
To fix the issue, provide the safe parameter to urllib.parse.urlencode with the value :/ (fitting, right?). The full working code looks like this:
params = urllib.parse.parse_qsl(qs)
cleaned_params = []
hmac_value = dict(params)['hmac']
# Sort parameters
for (k, v) in sorted(params):
if k in ['hmac', 'signature']:
continue
cleaned_params.append((k, v))
new_qs = urllib.parse.urlencode(cleaned_params, safe=":/")
secret = SECRET.encode("utf8")
h = hmac.new(secret, msg=new_qs.encode("utf8"), digestmod=hashlib.sha256)
# Compare digests
hmac.compare_digest(h.hexdigest(), hmac_value)
Hope this is helpful for others running into this issue!
import hmac
import hashlib
...
# Inside your view in Django's views.py
params = request.GET.dict()
#
myhmac = params.pop('hmac')
params['state'] = int(params['state'])
line = '&'.join([
'%s=%s' % (key, value)
for key, value in sorted(params.items())
])
print(line)
h = hmac.new(
key=SHARED_SECRET.encode('utf-8'),
msg=line.encode('utf-8'),
digestmod=hashlib.sha256
)
# Cinderella ?
print(hmac.compare_digest(h.hexdigest(), myhmac))
Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?
Please find my code below (Google_Results is the class I use to initialize the variables):
import requests
def Book_Search(search_term):
parms = {"q": search_term, "maxResults": 3}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
print(r.url)
results = r.json()
i = 0
for result in results["items"]:
try:
isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
title = str(result["volumeInfo"]["title"])
author = str(result["volumeInfo"]["authors"])[2:-2]
publisher = str(result["volumeInfo"]["publisher"])
published_date = str(result["volumeInfo"]["publishedDate"])
description = str(result["volumeInfo"]["description"])
pages = str(result["volumeInfo"]["pageCount"])
genre = str(result["volumeInfo"]["categories"])[2:-2]
language = str(result["volumeInfo"]["language"])
image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])
dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
language, image_link)
gr.append(dict)
print(gr[i].title)
i += 1
except:
pass
return
gr = []
Book_Search("Linkedin")
I am a beginner to Python, so any help would be appreciated!
It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.
Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:
def book_search(search_term, max_results=3):
results = [] # a list to store the results
parms = {"q": search_term, "maxResults": max_results}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
try: # just in case the server doesn't return valid JSON
for result in r.json().get("items", []):
if "volumeInfo" not in result: # invalid entry - missing volumeInfo
continue
result_dict = {} # a dictionary to store our discovered fields
result = result["volumeInfo"] # all the data we're interested is in volumeInfo
isbns = result.get("industryIdentifiers", None) # capture ISBNs
if isinstance(isbns, list) and isbns:
for i, t in enumerate(("isbn10", "isbn13")):
if len(isbns) > i and isinstance(isbns[i], dict):
result_dict[t] = isbns[i].get("identifier", None)
result_dict["title"] = result.get("title", None)
authors = result.get("authors", None) # capture authors
if isinstance(authors, list) and len(authors) > 2: # you're slicing from 2
result_dict["author"] = str(authors[2:-2])
result_dict["publisher"] = result.get("publisher", None)
result_dict["published_date"] = result.get("publishedDate", None)
result_dict["description"] = result.get("description", None)
result_dict["pages"] = result.get("pageCount", None)
genres = result.get("authors", None) # capture genres
if isinstance(genres, list) and len(genres) > 2: # since you're slicing from 2
result_dict["genre"] = str(genres[2:-2])
result_dict["language"] = result.get("language", None)
result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
# make sure Google_Results accepts keyword arguments like title, author...
# and make them optional as they might not be in the returned result
gr = Google_Results(**result_dict)
results.append(gr) # add it to the results list
except ValueError:
return None # invalid response returned, you may raise an error instead
return results # return the results
Then you can easily retrieve as much info as possible for a term:
gr = book_search("Google")
And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.
Following #Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.
Thanks for the help guys!
I have read the official AWS docs and several forums, still I cant find what I am doing wrong while adding item to string_set using Python/Boto3 and Dynamodb. Here is my code:
table.update_item(
Key={
ATT_USER_USERID: event[ATT_USER_USERID]
},
UpdateExpression="add " + key + " :val0" ,
ExpressionAttributeValues = {":val0" : set(["example_item"]) },
)
The error I am getting is:
An error occurred (ValidationException) when calling the UpdateItem operation: An operand in the update expression has an incorrect data type\"
It looks like you figured out a method for yourself, but for others who come here looking for an answer:
Your 'Key' syntax needs a data type (like 'S' or 'N')
You need to use "SS" as the data type in ExpressionAttributeValues, and
You don't need "set" in your ExpressionAttributeValues.
Here's an example I just ran (I had an existing set, test_set, with 4 existing values, and I'm adding a 5th, the string 'five'):
import boto3
db = boto3.client("dynamodb")
db.update_item(TableName=TABLE,
Key={'id':{'S':'test_id'}},
UpdateExpression="ADD test_set :element",
ExpressionAttributeValues={":element":{"SS":['five']}})
So before, the string set looked like ['one','two','three','four'], and after, it looked like ['one','two','three','four','five']
Building off of #joe_stech's answer, you can now do it without having to define the type.
An example is:
import boto3
class StringSetTable:
def __init__(self) -> None:
dynamodb = boto3.resource("dynamodb")
self.dynamodb_table = dynamodb.Table("NAME_OF_TABLE")
def get_str_set(self, key: str) -> typing.Optional[typing.Set[str]]:
response = self.dynamodb_table.get_item(
Key={KEY_NAME: key}, ConsistentRead=True
)
r = response.get("Item")
if r is None:
print("No set stored")
return None
else:
s = r["string_set"]
s.remove("EMPTY_IF_ONLY_THIS")
return s
def add_to_set(self, key: str, str_set: typing.Set[str]) -> None:
new_str_set = str_set.copy()
new_str_set.add("EMPTY_IF_ONLY_THIS")
self.dynamodb_table.update_item(
Key={KEY_NAME: key},
UpdateExpression="ADD string_set :elements",
ExpressionAttributeValues={":elements": new_str_set},
)
I was trying to fetch auto scaling groups with Application tag value as 'CCC'.
The list is as below,
gweb
prd-dcc-eap-w2
gweb
prd-dcc-emc
gweb
prd-dcc-ems
CCC
dev-ccc-wer
CCC
dev-ccc-gbg
CCC
dev-ccc-wer
The script I coded below gives output which includes one ASG without CCC tag.
#!/usr/bin/python
import boto3
client = boto3.client('autoscaling',region_name='us-west-2')
response = client.describe_auto_scaling_groups()
ccc_asg = []
all_asg = response['AutoScalingGroups']
for i in range(len(all_asg)):
all_tags = all_asg[i]['Tags']
for j in range(len(all_tags)):
if all_tags[j]['Key'] == 'Name':
asg_name = all_tags[j]['Value']
# print asg_name
if all_tags[j]['Key'] == 'Application':
app = all_tags[j]['Value']
# print app
if all_tags[j]['Value'] == 'CCC':
ccc_asg.append(asg_name)
print ccc_asg
The output which I am getting is as below,
['prd-dcc-ein-w2', 'dev-ccc-hap', 'dev-ccc-wfd', 'dev-ccc-sdf']
Where as 'prd-dcc-ein-w2' is an asg with a different tag 'gweb'. And the last one (dev-ccc-msp-agt-asg) in the CCC ASG list is missing. I need output as below,
dev-ccc-hap-sdf
dev-ccc-hap-gfh
dev-ccc-hap-tyu
dev-ccc-mso-hjk
Am I missing something ?.
In boto3 you can use Paginators with JMESPath filtering to do this very effectively and in more concise way.
From boto3 docs:
JMESPath is a query language for JSON that can be used directly on
paginated results. You can filter results client-side using JMESPath
expressions that are applied to each page of results through the
search method of a PageIterator.
When filtering with JMESPath expressions, each page of results that is
yielded by the paginator is mapped through the JMESPath expression. If
a JMESPath expression returns a single value that is not an array,
that value is yielded directly. If the result of applying the JMESPath
expression to a page of results is a list, then each value of the list
is yielded individually (essentially implementing a flat map).
Here is how it looks like in Python code with mentioned CCP value for Application tag of Auto Scaling Group:
import boto3
client = boto3.client('autoscaling')
paginator = client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filtered_asgs = page_iterator.search(
'AutoScalingGroups[] | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format(
'Application', 'CCP')
)
for asg in filtered_asgs:
print asg['AutoScalingGroupName']
Elaborating on Michal Gasek's answer, here's an option that filters ASGs based on a dict of tag:value pairs.
def get_asg_name_from_tags(tags):
asg_name = None
client = boto3.client('autoscaling')
while True:
paginator = client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filter = 'AutoScalingGroups[]'
for tag in tags:
filter = ('{} | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format(filter, tag, tags[tag]))
filtered_asgs = page_iterator.search(filter)
asg = filtered_asgs.next()
asg_name = asg['AutoScalingGroupName']
try:
asgX = filtered_asgs.next()
asgX_name = asg['AutoScalingGroupName']
raise AssertionError('multiple ASG\'s found for {} = {},{}'
.format(tags, asg_name, asgX_name))
except StopIteration:
break
return asg_name
eg:
asg_name = get_asg_name_from_tags({'Env':env, 'Application':'app'})
It expects there to be only one result and checks this by trying to use next() to get another. The StopIteration is the "good" case, which then breaks out of the paginator loop.
I got it working with below script.
#!/usr/bin/python
import boto3
client = boto3.client('autoscaling',region_name='us-west-2')
response = client.describe_auto_scaling_groups()
ccp_asg = []
all_asg = response['AutoScalingGroups']
for i in range(len(all_asg)):
all_tags = all_asg[i]['Tags']
app = False
asg_name = ''
for j in range(len(all_tags)):
if 'Application' in all_tags[j]['Key'] and all_tags[j]['Value'] in ('CCP'):
app = True
if app:
if 'Name' in all_tags[j]['Key']:
asg_name = all_tags[j]['Value']
ccp_asg.append(asg_name)
print ccp_asg
Feel free to ask if you have any doubts.
The right way to do this isn't via describe_auto_scaling_groups at all but via describe_tags, which will allow you to make the filtering happen on the server side.
You can construct a filter that asks for tag application instances with any of a number of values:
Filters=[
{
'Name': 'key',
'Values': [
'Application',
]
},
{
'Name': 'value',
'Values': [
'CCC',
]
},
],
And then your results (in Tags in the response) are all the times when a matching tag is applied to an autoscaling group. You will have to make the call multiple times, passing back NextToken every time there is one, to go through all the pages of results.
Each result includes an ASG ID that the matching tag is applied to. Once you have all the ASG IDs you are interested in, then you can call describe_auto_scaling_groups to get their names.
yet another solution, in my opinion simple enough to extend:
client = boto3.client('autoscaling')
search_tags = {"environment": "stage"}
filtered_asgs = []
response = client.describe_auto_scaling_groups()
for group in response['AutoScalingGroups']:
flattened_tags = {
tag_info['Key']: tag_info['Value']
for tag_info in group['Tags']
}
if search_tags.items() <= flattened_tags.items():
filtered_asgs.append(group)
print(filtered_asgs)