How to use start_at() & end_at() in Firebase query with Python? - python

My Firebase realtime database schema:
Let's suppose above Firebase database schema.
I want to get data with order_by_key() which after first 5 and before first 10 not more. Range should be 5-10. Like in the image.
My key is always starting with -.
I'm trying this but failed. It returns 0. How can I do this?
snapshot = ref.child('tracks').order_by_key().start_at('-\5').end_at(u'-\10').get()

Firebase queries are based on cursor/anchor values, and not on offsets. This means that the start_at and end_at calls expect values of the thing you order on, so in your keys they expect the keys of those notes.
To get the slice you indicate you'll need:
ref.child('tracks').order_by_key().start_at('-MQJ7P').end_at(u'-MQJ8O').get()
If you don't know either of those values, you can't specify them and can only start from the first item or end on the last item.
The only exception is that you can specify a limit_to_first instead of end_at to get a number of items at the start of the slice:
ref.child('tracks').order_by_key().start_at('-MQJ7P').limit_to_first(5).get()
Alternatively if you know only the key of the last item, you can get the five items before that with:
ref.child('tracks').order_by_key().end_at('-MQJ8O').limit_to_last(5).get()
But you'll need to know at least one of the keys, typically because you've shown it as the last item on the previous page/first item on the next page.

Related

dynamodb removing specific item from a list without collisions

I am using dynamodb with the python api, and have a list attribute, the list contains complex data in it.
I would like to be able to remove a specific item.
I found this tutorial explaining how to remove an item from the list by it's index.
And found this SO question regarding the situation.
Both the tutorial and the SO question show how to remove an item from a list by it's index, I have a more specific situation, where two users can use the same dynamodb table at once, and both of them might be trying to remove the same item, when using index, it can cause a situation as the following: having a list [1,2,3] two users want to remove the item "1" and using remove list[0], the first user removes the item 1, but now the list is [2,3] and the second user removes the item "2".
I found that you can remove a specific item by it's value when using dynamodb set datatype, but there is no set that can contain a complex data, only binary, str and number and I need to store something that is more like: {"att1":[1,2,3], "att2":str, "attr3":{...}} and nested.
How can I remove an item without the risk of removing another item by the index if someone already removed it before me causing me to remove the wrong item?
I don't remember exactly is dynamodb can return hash of the existing record
If not you can try to add it as additional field and create a key by this property
And then you can update your object with where clause
something like
aws dynamodb update-item \
--table-name ProductCatalog \
--key '{"myHash":{"N":"125948abcdef1234"}}' \
--update-expression
Idea is if object was already updated by someone hash also should be different

DynamoDB Querying in Python (Count with GroupBy)

This may be trivial, but I loaded a local DynamoDB instance with 30GB worth of Twitter data that I aggregated.
The primary key is id (tweet_id from the Tweet JSON), and I also store the date/text/username/geocode.
I basically am interested in mentions of two topics (let's say "Bees" and "Booze"). I want to get a count of each of those by state by day.
So by the end, I should know for each state, how many times each was mentioned on a given day. And I guess it'd be nice to export that as a CSV or something for later analysis.
Some issues I had with doing this...
First, the geocode info is a tuple of [latitude, longitude] so for each entry, I need to map that to a state. That I can do.
Second, is the most efficient way to do this to go through each entry and manually check if it contains a mention of either keyword and then have a dictionary for each that maps the date/location/count?
EDIT:
Since it took me 20 hours to load all the data into my table, I don't want to delete and re-create it. Perhaps I should create a global secondary index (?) and use that to search other fields in a query? That way I don't have to scan everything. Is that the right track?
EDIT 2:
Well, since the table is on my computer locally I should be OK with just using expensive operations like a Scan right?
So if I did something like this:
query = table.scan(
FilterExpression=Attr('text').contains("Booze"),
ProjectionExpression='id, text, date, geo',
Limit=100)
And did one scan for each keyword, then I would be able to go through the resulting filtered list and get a count of mentions of each topic for each state on a given day, right?
EDIT3:
response = table.scan(
FilterExpression=Attr('text').contains("Booze"),
Limit=100)
//do something with this set
while 'LastEvaluatedKey' in response:
response = table.scan(
FilterExpression=Attr('text').contains("Booze"),
Limit=100,
ExclusiveStartKey=response['LastEvaluatedKey']
)
//do something with each batch of 100 entries
So something like that, for both keywords. That way I'll be able to go through the resulting filtered set and do what I want (in this case, figure out the location and day and create a final dataset with that info). Right?
EDIT 4
If I add:
ProjectionExpression='date, location, user, text'
into the scan request, I get an error saying "botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Scan operation: Invalid ProjectionExpression: Attribute name is a reserved keyword; reserved keyword: location". How do I fix that?
NVM I got it. Answer is to look into ExpressionAttributeNames (see: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ExpressionPlaceholders.html)
Yes, scanning the table for "Booze" and counting the items in the result should give you the total count. Please note that you need to do recursive scan until LastEvaluatedKey is null.
Refer exclusive start key as well.
Scan
EDIT:-
Yes, the code looks good. One thing to note, the result set wouldn't always contain 100 items. Please refer the LIMIT definition below (not same as SQL database).
Limit — (Integer) The maximum number of items to evaluate (not
necessarily the number of matching items). If DynamoDB processes the
number of items up to the limit while processing the results, it stops
the operation and returns the matching values up to that point, and a
key in LastEvaluatedKey to apply in a subsequent operation, so that
you can pick up where you left off. Also, if the processed data set
size exceeds 1 MB before DynamoDB reaches this limit, it stops the
operation and returns the matching values up to the limit, and a key
in LastEvaluatedKey to apply in a subsequent operation to continue the
operation. For more information, see Query and Scan in the Amazon
DynamoDB Developer Guide.

How to delete the last item of a collection in mongodb

I made a program with python and mongodb to do some diaries. Like this
Sometimes I want to delete the last sentence, just by typing "delete!"
But I dont know how to delete in a samrt way. I dont want to use "skip".
Is there a good way to do it?
Be it first or last item, MongoDB maintains unique _id key for each record and thus you can just pass that id field in your delete query either using deleteOne() or deleteMany(). Since only one record to delete you need to use deleteOne() like
db.collection_name.deleteOne({"_id": "1234"}) // replace 1234 with actual id

How to get the row count of a table instantly in DynamoDB?

I'm using boto.dynamodb2, and it seems I can use Table.query_count(). However it had raised an exception when no query filter is applied.
What can I do to fix this?
BTW, where is the document of filters that boto.dynamodb2.table.Table.Query can use? I tried searching for it but found nothing.
There are two ways you can get a row count in DynamoDB.
The first is performing a full table scan and counting the rows as you go. For a table of any reasonable size this is generally a horrible idea as it will consume all of your provisioned read throughput.
The other way is to use the Describe Table request to get an estimate of the number of rows in the table. This will return instantly, but will only be updated periodically per the AWS documentation.
The number of items in the specified index. DynamoDB updates this
value approximately every six hours. Recent changes might not be
reflected in this value.
As per documentation boto3
"The number of items in the specified table. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value."
import boto3
dynamoDBResource = boto3.resource('dynamodb')
table = dynamoDBResource.Table('tableName')
print(table.item_count)
or you can use DescribeTable:
import boto3
dynamoDBClient = boto3.client('dynamodb')
table = dynamoDBClient.describe_table(
TableName='tableName'
)
print(table)
If you want to count the number of items:
import boto3
client = boto3.client('dynamodb','us-east-1')
response = client.describe_table(TableName='test')
print(response['Table']['ItemCount'])
#ItemCount (integer) --The number of items in the specified table.
# DynamoDB updates this value approximately every six hours.
# Recent changes might not be reflected in this value.
Ref: Boto3 Documentation (under ItemCount in describe_table())
You can use this, to get count of entire table items
from boto.dynamodb2.table import Table
dynamodb_table = Table('Users')
dynamodb_table.count() # updated roughly 6 hours
Refer here: http://boto.cloudhackers.com/en/latest/ref/dynamodb2.html#module-boto.dynamodb2.table
query_count method will return the item count based on the indexes you provide.
For example,
from boto.dynamodb2.table import Table
dynamodb_table = Table('Users')
print dynamodb_table.query_count(
index='first_name-last_name-index', # Get indexes from indexes tab in dynamodb console
first_name__eq='John', # add __eq to your index name for specific search
last_name__eq='Smith' # This is your range key
)
You can add the primary index or global secondary indexes along with range keys.
possible comparison operators
__eq for equal
__lt for less than
__gt for greater than
__gte for greater than or equal
__lte for less than or equal
__between for between
__beginswith for begins with
Example for between
print dynamodb_table.query_count(
index='first_name-last_name-index', # Get indexes from indexes tab in dynamodb console
first_name__eq='John', # add __eq to your index name for specific search
age__between=[30, 50] # This is your range key
)

Google App Engine: How to get data in query where column does not exist

Assume I have following query
MyModel.all().filter('transfered !=', True).fetch(limit = limit)
It will works fine where transferred column value will not be true in datastore. But in my collection there are some records which dont have a transferred column. How can I search those rows too from my collection?
I am afraid it's not possible. The indexes only store references to your entity
where there is a value for the given property.
I suggest you do a couple of things.
Reprocess the data to add some sort of sentinal value to all entities (possibly one of the valid values) that are missing the value. The sentinal value could be None, which is different to not having a value.
Set a default value on the property that is something such as None, so that you can query for items that have no explicit value, if that makes sense in your application. This guards against the possibility of future entities having no value set.

Categories