pymongo query for all items containing a unique identifier

pymongo query for all items containing a unique identifier - python

I have a mongo collection with data structure in the follwoing way
content: {'description': { 'text': [{'_date': '2019-05-21','_sectionId': 'a13a','_objectId: 'f637cee'},
{'_date': '2019-05-21','_objectId': '8b2ed183', '_source: 'f637cee'},
{ etc....}
{'_date': '2019-05-21','_sectionId': 'a13a','_objectId: 'XXXcee'}
},
'client' : {.....},
}
I am looking for the way to query the collection to get a list of tuples in the following way:
given a section Id I would like to get the corresponding 'objectId'
In this case the result would be:
('a13a','f637cee'), ('a13a','XXXcee')
I started to do something like this:
import pymongo
myclient = pymongo.MongoClient(mongoconnection)
print('databases names:')
myclient.list_database_names()
# getting the collection:
mydb = myclient["clients"]
query = {'content.description.text._sectionId': 'a13a'}
cur = mydb.find(query)
But I dont know how to extract the information from the cursor.
Some help?
Note the info might be nested in different places, i.e. there are more nodes preceding "content" that can vary.
Thanks a lot

Use the second parameter of the find() to get required fields.
Ex:
query = {'content.description.text._sectionId': 'a13a'}
cur = mydb.find(query, { "_id": 0, "_sectionId": 1, "_objectId": 1 })
print([tuple(i.values()) for i in cur])

Related

S3 Select Query JSON for nested value when keys are dynamic

I have a JSON object in S3 which follows this structure:
<code> : {
<client>: <value>
}
For example,
{
"code_abc": {
"client_1": 1,
"client_2": 10
},
"code_def": {
"client_2": 40,
"client_3": 50,
"client_5": 100
},
...
}
I am trying to retrieve the numerical value with an S3 Select query, where the "code" and the "client" are populated dynamically with each query.
So far I have tried:
sql_exp = f"SELECT * from s3object[*][*] s where s.{proc}.{client_name} IS NOT NULL"
sql_exp = f"SELECT * from s3object s where s.{proc}[*].{client_name}[*] IS NOT NULL"
as well as without the asterisk inside the square brackets, but nothing works, I get ClientError: An error occurred (ParseUnexpectedToken) when calling the SelectObjectContent operation: Unexpected token found LITERAL:UNKNOWN at line 1, column X (depending on the length of the query string)
Within the function defining the object, I have:
resp = s3.select_object_content(
Bucket=<bucket>,
Key=<filename>,
ExpressionType="SQL",
Expression=sql_exp,
InputSerialization={'JSON': {"Type": "Document"}},
OutputSerialization={"JSON": {}},
)
Is there something off in the way I define the object serialization? How can I fix the query so I can retrieve the desired numerical value on the fly when I provide ”code” and “client”?

I did some tinkering based on the documentation, and it works!
I need to access the single event in the EventStream (resp) as follows:
event_stream = resp['Payload']
# unpack successful query response
for event in event_stream:
if "Records" in event:
output_str = event["Records"]["Payload"].decode("utf-8") # bytes to string
output_dict = json.loads(output_str) # string to dict
Now the correct SQL expression is:
sql_exp= f"SELECT s['{code}']['{client}'] FROM S3Object s"
where I have gotten (dynamically) my values for code and client beforehand.
For example, based on the dummy JSON structure above, if code = "code_abc" and client = "client_2", I want this S3 Select query to return the value 10.
The f-string resolves to sql_exp = "SELECT s['code_abc']['client_2'] FROM S3Object s", and when we call resp, we retrieve output_dict = {'client_2': 10} (Not sure if there is a clear way to get the value by itself without the client key, this is how it looks like in the documentation as well).
So, the final step is to retrieve value = output_dict['client_2'], which in our case is equal to 10.

Access one item from a dict and store it into a variable

I am trying to get all the "uuid"'s from an API, and the issue is that it is stored into a dict (I think). Her is how it looks on the API:
{"guild": {
"_id": "5eba1c5f8ea8c960a61f38ed",
"name": "Creators Club",
"name_lower": "creators club",
"coins": 0,
"coinsEver": 0,
"created": 1589255263630,
"members":
[{ "uuid": "db03ceff87ad4909bababc0e2622aaf8",
"rank": "Guild Master",
"joined": 1589255263630,
"expHistory": {
"2020-06-01": 280,
"2020-05-31": 4701,
"2020-05-30": 0,
"2020-05-29": 518,
"2020-05-28": 1055,
"2020-05-27": 136665,
"2020-05-26": 34806}}]
}
}
Now I am interested in the "uuid" part there, and take note: There is multiple players, it can be 1 to 100 players, and I am going to need every UUID.
Now I have done this in my python to get the UUID's displayed on the website:
try:
f = requests.get(
"https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
members = client.getPlayer(uuid=guildMembers) #this converts UUID to player names
#I need to store all uuid's in variables and put them at "guildMembers"
And that gives me all the "UUID codes", and I will be using client.getPlayer(uuid=---) to convert the UUID into the Player Names. I have to loop through each "UUID" into that code client.getPlayer(uuid=---) . But first of I need to save the UUID'S in variables, I have been doing members.uuid to access the UUID on my HTML file, but I don't know how you do the .uuid part in python
If you need anything else, just comment :)

List comprehension is a powerful concept:
members = [client.getPlayer(member['uuid']) for member in guildMembers]
Edit:
If you want to insert the names back into your data (in guildMembers),
use a dictionary comprehension with {uuid: member_name,} format:
members = {member['uuid']: client.getPlayer(uuid=member['uuid']) for member in guildMembers}
Than you can update guildMembers with your results:
for member in guildMembers:
guildMembers[member]['name'] = members[member['uuid']]

Assuming that guild is the main dictionary in which a key called members exists with a list of "sub dictionaries", you can try
uuid = list()
for x in guild['members']:
uuid.append(x['uuid'])
uuid now has all the uuids

If i understood situation right, You just need to loop through all received uuids and get players' data. Something like this:
f = requests.get("https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
guildMembersData = dict() # Here we will save member's data from getPlayer method
for guildMember in guildMembers:
uuid = guildMember["uuid"]
memberData = client.getPlayer(uuid=uuid)
guildMembersData[uuid] = client.getPlayer(uuid=guildMember["uuid"])
print(guildMembersData) # Here will be players' Data.

How to get a single value of a collection in MongoDB python

I have the following python code in MongoDB:
input_1 = object_collection.find({"_id": ObjectId(key_1)})
for i in input_1:
print(i)
and it returns this:
{'_id': ObjectId('5d949843cc1e1fc0556983bc'), 'x_input': '11', 'y_input': '22'}
I am interested only in x_input and y_input where I would like to store them in order to calculate the same of them

So you can use 'project' in your query...
from pymongo import MongoClient
from bson import ObjectId
if __name__ == '__main__':
client = MongoClient("localhost:27017", username="barry", password="barry", authSource="admin", authMechanism="SCRAM-SHA-256")
with client.start_session(causal_consistency = True) as my_session:
with my_session.start_transaction():
db = client.mydatabase
collection = db.mycollection
for result in collection.find({"_id": ObjectId("5d97713e11261b4afebe517b")}, {"_id": 0, "x_input": 1}):
print (str(result))
See the bit ...
{"_id": 0, "x_input": 1}
... this instructs the query engine to turn off display for "_id", and turn on display for "x_input". If we specific any 'project' at all, then all fields we want to see must be specified. "_id" is the oddball to this strategy and will always show unless turned off.
Results:
{u'x_input': u'11'}

IN Query not working for Amazon DynamoDB

I would like to check retrieve items that have an attribute value that is present in the list of value I provide. Below is the query I have for searching. Unfortunately the response return an empty list of items. I don't understand why this is the case and would like to know the correct query.
def search(self, src_words, translations):
entries = []
query_src_words = [word.decode("utf-8") for word in src_words]
params = {
"TableName": self.table,
"FilterExpression": "src_word IN (:src_words) AND src_language = :src_language AND target_language = :target_language",
"ExpressionAttributeValues": {
":src_words": {"SS": query_src_words},
":src_language": {"S": config["source_language"]},
":target_language": {"S": config["target_language"]}
}
}
page_iterator = self.paginator.paginate(**params)
for page in page_iterator:
for entry in page["Items"]:
entries.append(entry)
return entries
Below is the table that I would like to query from. For example if my list of query_src_word have: [soccer ball, dog] then only row with entry_id=2 should be returned
Any insights would be much appreciated.

I think this is because in the query_src_word you have "soccer_ball" (with an underscore), while in the database you have "soccer ball" (without an underscore).
Change "soccer_ball" to "soccer ball" in your query_src_words and it should work find

Mongoengine, retriving only some of a MapField

For Example.. In Mongodb..
> db.test.findOne({}, {'mapField.FREE':1})
{
"_id" : ObjectId("4fb7b248c450190a2000006a"),
"mapField" : {
"BOXFLUX" : {
"a" : "f",
}
}
}
The 'mapField' field is made of MapField of Mongoengine.
and 'mapField' field has a log of key and data.. but I just retrieved only 'BOXFLUX'..
this query is not working in MongoEngine....
for example..
BoxfluxDocument.objects( ~~ querying ~~ ).only('mapField.BOXFLUX')
AS you can see..
only('mapField.BOXFLUX') or only only('mapField__BOXFLUX') does not work.
it retrieves all 'mapField' data, including 'BOXFLUX' one..
How can I retrieve only a field of MapField???

I see there is a ticket for this: https://github.com/hmarr/mongoengine/issues/508
Works for me heres an example test case:
def test_only_with_mapfields(self):
class BlogPost(Document):
content = StringField()
author = MapField(field=StringField())
BlogPost.drop_collection()
post = BlogPost(content='Had a good coffee today...',
author={'name': "Ross", "age": "20"}).save()
obj = BlogPost.objects.only('author__name',).get()
self.assertEquals(obj.author['name'], "Ross")
self.assertEquals(obj.author.get("age", None), None)

Try this:
query = BlogPost.objects({your: query})
if name:
query = query.only('author__'+name)
else:
query = query.only('author')

I found my fault! I used only twice.
For example:
BlogPost.objects.only('author').only('author__name')
I spent a whole day finding out what is wrong with Mongoengine.
So my wrong conclusion was:
BlogPost.objects()._collection.find_one(~~ filtering query ~~, {'author.'+ name:1})
But as you know it's a just raw data not a mongoengine query.
After this code, I cannot run any mongoengine methods.
In my case, I should have to query depending on some conditions.
so it will be great that 'only' method overwrites 'only' methods written before.. In my humble opinion.
I hope this feature would be integrated with next version. Right now, I have to code duplicate code:
not this code:
query = BlogPost.objects()
query( query~~).only('author')
if name:
query = query.only('author__'+name)
This code:
query = BlogPost.objects()
query( query~~).only('author')
if name:
query = BlogPost.objects().only('author__'+name)
So I think the second one looks dirtier than first one.
of course, the first code shows you all the data
using only('author') not only('author__name')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pymongo query for all items containing a unique identifier - python

Use the second parameter of the find() to get required fields. Ex: query = {'content.description.text._sectionId': 'a13a'} cur = mydb.find(query, { "_id": 0, "_sectionId": 1, "_objectId": 1 }) print([tuple(i.values()) for i in cur])

Related

S3 Select Query JSON for nested value when keys are dynamic

Access one item from a dict and store it into a variable

How to get a single value of a collection in MongoDB python

IN Query not working for Amazon DynamoDB

Mongoengine, retriving only some of a MapField

Categories

Resources