Dynamodb - GetItem operation: The provided key element does not match the schema - python

I'm using python to query a dynamodb table,however I'm using two keys in the getitem call that are not part of the primary key or sort key.I created a global index that contains these two keys, but I still get the same error.
response = table.get_item(
Key={
'player_id': 22892251,'type': 1
}
)
item = response['Item']
print(item)

You cannot issue a GetItem against a secondary index as items are not unique.
You must use a Query request.
response = table.query(
IndexName='player_id-type-index'
KeyConditionExpression=Key('player_id').eq(22892251) & Key('type').eq(1) )
items = response['Items']
print(items)

You're trying to use GetItem to fetch data from a global secondary index. This is not supported. The GetItem API returns exactly 1 item, which is only possible because the Primary Key (Partition + Sort Key) is guaranteed to be unique in the base table.
This is not the case for global secondary indexes, which is why GetItem is not supported here. It requires a guarantee that the underlying data structure does not give.
The way to fetch this data is to use the Query operation that can return multiple items:
import boto3
from boto3.dynamodb.conditions import Key
table = boto3.resource("dynamodb").Table("table_name")
response = table.query(
KeyConditionExpression=Key("player_id").eq(number) & Key("type").eq(number),
IndexName="player_id-type-index"
)
items = response["Items"]
if len(items) > 1:
raise RuntimeError("Something broke our unique expectation")
print(items[0])
It's on your app to ensure that the entries are unique if you require it. Here's an example that lets you detect if this assumption got broken somehow.

Related

How to get column names in a SQLAlchemy query?

I have a function (remote database, without models in my app), I am making a request to it. Is it possible to get the column names using query rather than execute?
session = Session(bind=engine)
data = session.query(func.schema.func_name())
I am getting an array of strings with values, how do I get the keys? I want to generate a dict.
When I make a request with an execute, the dictionary is generated fine.
data = session.execute("select * from schema.func_name()")
result = [dict(row) for row in data]
You can do something like:
keys = session.execute("select * from schema.func_name()").keys()
Or try accessing it after the query:
data = session.query(func.schema.func_name()).all()
data[0].keys()
You can also use: data.column_descriptions
Documention:
https://docs.sqlalchemy.org/en/14/orm/query.html

DynamoDB - avoid data overwrite with primary partition key remaining the same for all data points

I'm working on migrating the data from csv file stored in s3 to a table in DynamoDB. The code seems working but only the last data point is being posted on DynamoDB. The primary partition key (serial) is same for all data points. Not sure if I'm doing something wrong here and any help is greatly appreciated.
import boto3
s3_client = boto3.client("s3")
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('scan_records')
def lambda_handler(event, context):
bucket_name = event['Records'][0]['s3']['bucket']['name']
s3_file_name = event['Records'][0]['s3']['object']['key']
resp = s3_client.get_object(Bucket=bucket_name,Key=s3_file_name)
data = resp['Body'].read().decode("utf-8")
scan_time = data.split("\n")
for scan in scan_time:
print(scan)
scan_data = scan.split(",")
# Add it to dynamoDB
try:
table.put_item(
Item = {
'serial' : scan_data[0],
'time' : scan_data[1],
}
)
except Exception as e:
print("End of File")
in your dynamoDb table your Primary key needs to be unique for each elements in the table. So if the your primary key is only composed of a partition key that is the same for all your data point you will always have the same element overwritten.
* You could add to your table a sort key that uses another field so that the partition key, sort key pair composing the primary key is unique and hence appending data to your table.
* If you can't have a unique primary key composed from your data points you can always add an UUID to the primary key to make it unique.
ConditionExpression='attribute_not_exists(serial) AND attribute_not_exists(time)',
Upon doing below two changes the issue was resolved and the code works fine.
1. Unique entry checked with the combination of partition and sort key
2. Add loop to go line by line in the csv file and ingest the data into DynamoDB.
Happy to share the code if anyone finds it useful.

How to dynamically check postgresql constraints in psycopg2?

I'm generating random data to fill a database (without knowing how is the database before runtime). I can fill it if it has no constraints, but when it has i can't differenciate between values passing the check and values that don't.
Let's see an example. Table definition:
CREATE TABLE test (
id INT,
age INT CONSTRAINT adult CHECK (age > 18),
PRIMARY KEY (id)
);
The data of that table that i have during runtime is:
Table and columns names
Columns types
Column UNIQUE, and NOT NULL
Column constraint definition as a string
Foreign keys
I can get more data from postgresql internal tables preferably from the information squema
I want to check the constraint before making an insert with that data. It's valid for me to do so using the database to check it, or to check it in code.
Here is a short snippet, try to detect when the check is False before the execution of the insert query:
# Data you have access to:
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
'id': '',
'age': 'age > 18'}
# you can access more data,
# but you have to query the database to do so
id_value = 1
#I want to check values HERE
age_value = 17
#I want to check values HERE
values = (id_value, age_value)
#I could want to check HERE
query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)
db_cursor.close()
Because of how data is generated in my application, managing the error thrown is not an option if it's done while/after executing the insert query, it would increment the cost of generating random data dramatically.
EDIT to explain why try: is not an option:
If I wait for the exception, the problematic element that provoke a thrown error would already be in multiple queries.
Let's see in the previous example how this could happen. I generate a random data pool to pick from and generate tuples of insert values:
age_pool = (7, 19, 23, 48)
id_pool = (0,2,3,...,99) #It's not that random for better understanding
Now if I generate 100 insert queries and supposing 25% of them has a 7 in them (an age < 18). From a single value i have 25 invalid queries that will try to execute in the database (a costly operation by the way) to fail hopelessly. After that i would have to generate more random data in this case 25 more insert queries that could have the same problem if i generate a 8 for example.
On the other hand if i check just after generating the element, i check if it's a valid value and for one single element i have multiple valid combinations of values.
You could use eval():
def constraint_check(constraints, keys, values):
vals = dict(zip(keys, values))
for k, v in constraints.items():
if v and not eval(v.replace(k, str(vals[k]))):
return False
return True
t_name = 'test'
t_col_names = ['id', 'age']
col_constraints = {
'id': '',
'age': 'age > 18'}
id_value = 1
age_value = 17
values = (id_value, age_value)
if constraint_check(col_constraints, ('id', 'age'), values):
query = "INSERT INTO test (id, age) VALUES (%s, %s);"
db_cursor.execute(query, values)
However, this will work well only for very simple constraints. A Postgres check expression may include constructs specific for Postgres and not known in Python. For example, the app fails with this obviously valid constraint:
create table test(
id int primary key,
age int check(age between 18 and 60));
I do not think you can implement the complete Postgres expression parser in Python in an easy way and whether this would be profitable to achieve the intended effect.
It's not clear why a try...except clause is not desired. You test for the precise exception and keep going.
How about:
problem_inserts = []
try:
db_cursor.execute(query, values)
db_cursor.close()
except <your exception here>:
problem_inserts.append(query)
In this snippet, you keep a list of all queries that didn't go through properly. I don't know what else you can do. I don't think you want to change the data to make it fit into the table.

sqlalchemy to filter on list by table name and filters condition

I am using sqlalchemy ORM layer to communicate with RDS.
Here this is common function for all tables to filter row.
We pass table name, column to select, filters and date range.
filter = { "company_guid": "xxxx", "status": "Active"}
filter is dictionary which have key as column name and value is condition.
Which work fine
but know I want filter on status column where value can be Active or TempInActive
So now filter become filter = { "company_guid": "xxxx", "status": ["Active", "TempActive"}
It not working because value is list, not string.
I know can use result = session.query(Customers).filter(Customers.id.in_([1,3])) but in my scenario table name and column names are function arguments.
def get_items_withvalue(self, table_name, column_name=None, attribute_value=None,
columns_to_select=None, date_value=False, filters=None):
"""
#Summary: This method used to get data based on a condition.
#param table_name (string): This is the table_name
#param column_name (None/string): for the column_name
#param attribute_value (None/list/string): for the column_value
#params columns_to_select(None/list/string): columns to send in response
#params filters(None/dict): where clause for rows to be fetched
#return (list of dict): fetched rows or count from DB
"""
data = []
session = None
try:
# Get session which communicate with RDS
session = self.get_session()
table = str_to_class(table_name)
if columns_to_select:
data = []
else:
if isinstance(attribute_value, list):
data = (session.query(table)
.filter(getattr(table, column_name)
.in_(attribute_value))
.all())
elif date_value:
data = (session.query(table)
.filter(cast(getattr(table, column_name), Date)
== attribute_value)
.all())
elif filters:
## How to update following code to filter on list(in)
# filters is dictionary
data = (session.query(table).filter_by(**filters).all())
##
else:
data = (
session.query(table).filter(getattr(table, column_name)
== attribute_value).all()
)
except Exception as err:
self.logger.exception("Error fetching items ")
raise Exception(err)
finally:
if session:
session.close()
if columns_to_select:
return [row._asdict() for row in data]
return [object_as_dict(row) for row in data]
Can anyone help me to solve this?
One way is to construct query string and do eval, but which is not good way.
As you are using the ORM I'll assume that what you call table in the function is actually a mapped ORM class.
If I understand correctly, you want to be able to handle both cases where the values of filters may be either a scalar value, in which case you'd like to filter on equality, or a list of values, in which case you'd like to test for presence in the list using in_(). However, a complicating factor is that you cannot use the str keys of filters directly in filter(). Hopefully I've understood.
I think you can solve this neatly by using getattr to get the column attribs from the table object, a conditional list comprehension and then unpacking the list into .filter(), for example:
filters = {'a': 'scalar', 'and': ['collection', 'of', 'values']}
(
session.query(table).filter(
*[
getattr(table, k).in_(v)
if isinstance(v, list)
else getattr(table, k) == v
for k, v in filters.items()
]
)
)
This will produce the equivalent of orm_table_object.column_attrib == val if val is not a list, and orm_table_object.column_attrib.in_(val) if val is a list.
Extending the above answer just with a list of conditions
filters = [orm_table_object.field_name_1 == expected_value_1]
if expected_value_2 is not None:
filters.append(orm_table_object.field_name_2 == expected_value_2)
if expected_value_3 is not None:
filters.append(orm_table_object.field_name_3 == expected_value_3)
session.query(table).filter(
*[f for f in filters]
)

getting a list after grouping in sqlalchemy

I have a table Food. It has fields: cust_name, phone_number , order_date
I am trying to build a dictionary where a key of pair (cust_name, phone_number) gives a list of order_date. For that I need to query appropriately in sqlalchemy. I'm using Postgres.
So far I have:
db.session.query(Food.cust_name, Food.phone_number).group_by(Food.cust_name, Food.phone_number).all()
What do I need to change so that I get a corresponding list of order_date
Use the array_agg() aggregate function to produce a list of order dates:
res = db.session.query(Food.cust_name,
Food.phone_number,
db.func.array_agg(Food.order_date).label('order_dates')).\
group_by(Food.cust_name, Food.phone_number).\
all()
the_dict = {(r.cust_name, r.phone_number): r.order_dates for r in res}

Categories