Unwind multiple arrays from different structure in document - python

I have these two types of structures in my collection:
{
_id: "date1",
users: [{"user": "123", ...}, {"user": "456", ...}]
}
and
{
_id: "date2",
points: [{"point": "1234", ...}, {"point": "5678", ...}]
}
I need to make an agregation, that returns me a list of these documents and only the specific point or user information and with skip and limit. Something like:
[
{_id: "date1", user: {"user": "123", ...}},
{_id: "date2", point: {"point": "1234", ...}},
]
I have used, I'm new in mongo, can you have me any recommendation?
collection.aggregate([
{"$unwind": "$users"},
{"$unwind": "$points"},
{"$match": {"$or": [
{'users.user': an_user},
{'points.point': a_point}]}},
{"$sort": {"_id": -1}},
{"$skip": 10},
{"$limit": 10}
])

with the information of one specific user or one specific point depending if point or user key is in that document
Given your provided document example you may be able to just utilise db.collection.find(), for example:
db.collection.find({"users.user": a_user}, {"users.$":1});
db.collection.find({"points.point": a_point}, {"points.$":1});
Depending on your use case, this may not be ideal because you're executing find() twice.
This would return your a list of documents with their _id. The above query is equivalent to saying:
Find all documents in the collection
where in array field users contain a user 'a_user'
OR in array field points contain a point 'a_point'
See also MongoDB Indexes
Having said the above, you should really reconsider your document schema. Depending on your use case, you may find difficulties in querying the data later on and may impact your query performance. Please review MongoDB Data Models to provide more information on how to design your schema.

Related

How to merge three lists of dictionaries into one nested dictionary using Python

Heads up: I learn by example, hence this question.
So, I am trying to create a nested dictionary from three (3) lists of dictionaries. The lists of dictionaries look like the following:
topic_list = [{"id": 1, "title": "I have no idea.", "slug": "i-have-no-idea"}, ...]
thread_list = [{"id": 1, "title": "I still have no idea.", "author_name": "me", "topic": 1}, ...]
message_list = [{"id": 1, "content": "I really have no clue what I am doing.", "author_name": "me", "thread": 1}, ...]
So, (I think) the end result should look something like the following:
nested_dictionary = [
"topic": {
"id": 1,
"title": "I have no idea.",
"slug": "i-have-no-idea",
"threads": {
"id": 1,
"title": "I still have no idea.",
"author": "me",
"messages": {
"id": 1,
"content": "I really have no clue what I am doing.",
"author_name": "me"
}
}
}
]
Basically, everything has an id. The messages are associated with the threads via the thread id and the threads are associated with the topics via the topic id.
My questions:
I know the format of the nested dictionary is not correct, but does it at least make sense as to what I am trying to accomplish?
Has anyone seen an example of this in Python?
I have literally written no code for this, but am assuming it would require nested loops that first associate all threads to the applicable topic using the topic id as a key, then associate all messages to the applicable threads using the thread id as a key.
I have looked through SO as well as performed numerous Google searches for something similar. Any suggestions and-or pointing me in the right direction would be appreciated.
I think that the ids your are using in your example may be the source of confusion.
Basically, everything has an id. The messages are associated with the threads via the thread id and the threads are associated with the topics via the topic id.
While everything has an id, the thread object needs a topic_id to know which topic they should be nested inside of. Likewise, messages need a thread_id (and probably a topic_id too).
With this in mind, your nested dictionary would look something like:
nested_dictionary = {
"topics": [
{"id": 5247,
"title": "I have no idea.",
"slug": "i-have-no-idea",
"threads": [
{"id": 9153,
"topic_id": 5247
"title": "I still have no idea.",
"author": "me",
"messages": [
{"id": 1935,
"thread_id": 9153
"content": "I really have no clue what I am doing.",
"author_name": "me"
},
]
},
]
},
]
}
Note: I've changed the ids to make the example a little clearer.
Assuming the above schema is correct, your intuition of a nested for loop is correct. I would imagine such an algorithm to look something like:
1) Insert each topic into nested_dictionary
2) For each thread, iterate over the topics in the nested_dictionary until you find topic[id] == thread[topic_id] and insert the thread into that topic
3) For each message, iterate over the threads in the nested_dictionary until you find thread[id] == message[thread_id]
This represents the simplest solution. However, a better solution would change the schema to use the ids as the keys for each of the items in the nested dictionary. This would get your algorithm down to linear time complexity.

Django ORM exclude records using an array of dictionaries

I have similar code that return all entries from a table:
all_entries = Entry.objects.all()
and I have the following array:
exclusion_list = [
{
"username": "Tom",
"start_date": 01/03/2019,
"end_date": 29/02/2020,
},
{
"username": "Mark",
"start_date": 01/02/2020,
"end_date": 29/02/2020,
},
{
"username": "Pam",
"start_date": 01/03/2019,
"end_date": 29/02/2020,
}
]
I want to exclude all Tom's records from "01/03/2019" to "29/02/2020", all "Mark" records from "01/02/2020" to "29/02/2020" and all Pam's record from "01/03/2019" to "29/02/2020"
I want to do that in a loop, so I believe i should do something like:
for entry in all_entries:
filtered_entry = all_entries.exclude(username=entry.username).filter(date__gte=entry.start_date, date__lte=entry.end_date)
Is this approach correct? I am new to Django ORM. Is there a better and more efficient solution?
Thank you for your help
Yes, you can do this with a loop.
This results in a query whose WHERE-clause gets extended every cycle of your loop. But to do this, you have to use the filtered queryset of your previous cycle:
filtered_entry = all_entries
for exclude_entry in exclusion_list:
filtered_entry = filtered_entry.exclude(username=exclude_entry.username, date__gte=exclude_entry.start_date, date__lte=exclude_entry.end_date)
Notes
Using the same reference of the queryset to limit the results further every loop cycle
To use multiple criteria connected with AND, just write multiple keyword arguments within exclude() (look into the docs [here][1])
Be aware, that this can result in a large WHERE-clause and maybe there are limitations of your database
So if your exclude_list is not too big, I think you can use this without concerns.
If your exclude_list grows, the best would be to save your exclusion_list in the database itself. With this the ORM can generate subqueries instead of single values. Just an example:
exclusion_query = ExclusionEntry.objects.all().values('username')
filtered = all_entries.exclude(username__in=exclusion_query)
[1]: https://docs.djangoproject.com/en/3.1/topics/db/queries/#retrieving-specific-objects-with-filters

how to fetch data from json schema? error shown-TypeError: string indices must be integers

I have a json response from an API in this way:-
{
"meta": {
"code": 200
},
"data": {
"username": "luxury_mpan",
"bio": "Recruitment Agents👑👑👑👑\nThe most powerful manufacturers,\nwe have the best quality.\n📱Wechat:13255996580💜💜\n📱Whatsapp:+8618820784535",
"website": "",
"profile_picture": "https://scontent.cdninstagram.com/t51.2885-19/10895140_395629273936966_528329141_a.jpg",
"full_name": "Mpan",
"counts": {
"media": 17774,
"followed_by": 7982,
"follows": 7264
},
"id": "1552277710"
}
}
I want to fetch the data in "media", "followed_by" and "follows" and store it in three different lists as shown in the below code:--
for r in range(1,5):
var=r,st.cell(row=r,column=3).value
xy=var[1]
ij=str(xy)
myopener=Myopener()
url=myopener.open('https://api.instagram.com/v1/users/'+ij+'/?access_token=641567093.1fb234f.a0ffbe574e844e1c818145097050cf33')
beta=json.load(url)
for item in beta['data']:
list1.append(item['media'])
list2.append(item['followed_by'])
list3.append(item['follows'])
When I run it, it shows the error TypeError: string indices must be integers
How would my loop change in order to fetch the above mentioned values?
Also, Asking out of curiosity:- Is there any way to fetch the Watzapp no from the "BIO" key in data dictionary?
I have referred questions similar to this and still did not get my answer. Please help!
beta['data'] is a dictionary object. When you iterate over it with for item in beta['data'], the values taken by item will be the keys of the dictionary: "username", "bio", etc.
So then when you ask for, e.g., item['media'] it's like asking for "username"['media'], which of course doesn't make any sense.
It isn't quite clear what it is that you want: is it just the stuff inside counts? If so, then instead of for item in beta['data']: you could just say item = beta['data']['counts'], and then item['media'] etc. will be the values you want.
As to your secondary question: I suggest looking into regular expressions.

How can I re-assemble list after using aggregate and $unwind in mongo?

I'm building an aggregate pipeline as follows
pipeline = [
{"$unwind": "$categories"}
]
if len(cat_comp) > 0:
pipeline.append({"$match": {"categories": {"$in": cat_comp}}})
result = mongo.db.xxx.aggregate(pipeline)['result']
The question is, how on performing the aggregation can I re-assemble the list of categories back in the results, because each record returned is the categories field corresponds to one of the items in the list. How can I rebuild the results such that I can perform the matching ($match) against a list of possibilities but recover the original list of categories.
It has been suggested that I try:
pipeline.append({"$group": {"categories": {"$push": "$categories"}}})
which I have modified to:
pipeline.append({"$group": {"_id": "anything", "categories": {"$push": "$categories"}}})
However now, I only get one record back which has for categories a massive list from all results. So what I would like to do is to take a document as thus:
{
"_id": 45666
"categories": ['Fiction', 'Biography']
"other": "sss"
}
and search from a user list category_list = ['Anything', ...] by passing through regular expressions as this:
cat_comp = [re.compile(cat, re.IGNORECASE) for cat in cat_list]
In the end, what is happening with aggregate(pipeline) is that I am losing "categories" as a list because of the $unwind. Now, how can I perform the query over the input data but return records that match where I have category as a list.
I'm also trying:
pipeline.append({"$group": {"_id": "$_id", "categories": { "$addToSet": "$categories" } } })
Which usefully returns a list of records with categories in a list - however, how can I see the rest of the record, I can only see _id and categories.
You need to use a $group step in the pipeline with a $push to re-build the lists:
pipeline.append({"$group": {"categories": {"$push": "$categories"},"_id":"$_id","other": {"$first":"$other"}}})

Mongodb Pymongo using $set to create an array/list/collection

I'm trying to use $set to create an array/list/collection (not sure which is proper terminology), and I'm not sure how to do it. For example:
I have a document inserted into my database that looks like this:
"_id": (unique, auto-generated id)
"Grade": Sophomore
I want to insert a collection/list/array using update. So, basically I want this:
"_id": (unique, auto-generated id)
"Grade": Sophomore
"Information"{
"Class_Info": [
{"Class_Name": "Math"}
]
What I've been doing so far is using .update and dot notation. So, what I was trying to do was use $set like this:
collection.update({'_id': unique ID}, {'$set': {'Information.Class_Info.Class_Name': 'Math}})
However, what that is doing is making Class_Info a document and not a list/collection/array, so it's doing:
"_id": (unique id)
"Grade": Sophomore
"Information"{
"Class_Info": {
"Class_Name": "Math"
}
How do I specify that I want Class_Info to be a list? IF for some reason I absolutely cannot use $set to do this, it is very important that I can use dot notation because of the way the rest of my program works, so if I'm supposed to use something other than $set, can it have dot notation to specify where to insert the list? (I know $push is another option, but it doesn't use dot notation, so I can't really use it in my case).
Thanks!
If you want to do it with only one instruction but starting up from NOT having any key created yet, this is the only way to do it ($set will never create an array that's not explicit, like {$set: {"somekey": [] }}
db.test.update(
{ _id: "(unique id)" },
{ $push: {
"Information.Class_Info": { "Class_Name": "Math" }
}}
)
This query does the trick, push to a non-existing key Information.Class_Info, the object you need to create as an array. This is the only possible solution with only one instruction, using dot notation and that works.
There is a way to do it with one instructions, $set and dot notation, as follows:
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: {
"Information.Class_Info": [ { "Class_Name": "Math" } ]
}}
)
There is also a way to do it with two instructions and the array index in the dot notation, allowing you to use similar statements to add more array elements:
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: { "Information.Class_Info": [] }}
)
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: {
"Information.Class_Info.0": { "Class_Name": "Math" },
"Information.Class_Info.1": { "Class_AltName": "Mathematics" }
}}
)
Deviating from these options has interesting failure modes:
If you try to combine the second option into a single updateOne() call, which is usually possible, MongoDB will complain that "Updating the path 'Information.Class_Info.0' would create a conflict at 'Information.Class_Info'"
If you try to use dot the notation with the array index ("Information.Class_Info.0.Class_Name": "Math") but without creating an empty array first, then MongoDB will create an object with numeric keys ("0", "1", …). It really refuses to create array except when told explicitly using […] (as also told in the answer by #Maximiliano).

Categories