Mongodb: Update Many documents with each element in given list - python

I have this DEVICE collection
[
{
"_id": ObjectId("60265a12f9bf1e3974dabe56"),
"Name": "Device",
"Configuration_ids": [
ObjectId("60265a11f9bf1e3974dabe54"),
ObjectId("60265a11f9bf1e3974dabe55")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe64"),
"Name": "Device2",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5a"),
ObjectId("60265a92f9bf1e3974dabe5b")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe65"),
"Name": "Device3",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5e"),
ObjectId"60265a92f9bf1e3974dabe5f")
]
}
]
I need to update all the documents that match the list of device ids. and push each element in a configuration_ids given list in each matched device. the 2 lists are the same lenght.
my solution is written in the following, but I can do it in one single query?
device_ids=[
ObjectId("60265a12f9bf1e3974dabe56"),
ObjectId("60265a92f9bf1e3974dabe64"),
ObjectId("60265a92f9bf1e3974dabe65")
]
configuration_ids = [
ObjectId("60267d14bc2f40d0dec1de3b"),
ObjectId("60267d14bc2f40d0dec1de3c"),
ObjectId("60267d14bc2f40d0dec1de3d")
]
for i in range(0, len(device_ids)):
update_devices = device_collection.update_one(
{'_id': ObjectId(device_ids[i])},
{'$push': {'Configuration_ids': configuration_ids[i]}}
)
The result:
[
{
"_id": ObjectId("60265a12f9bf1e3974dabe56"),
"Name": "Device",
"Configuration_ids": [
ObjectId("60265a11f9bf1e3974dabe54"),
ObjectId("60265a11f9bf1e3974dabe55"),
ObjectId("60267d14bc2f40d0dec1de3b")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe64"),
"Name": "Device2",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5a"),
ObjectId("60265a92f9bf1e3974dabe5b"),
ObjectId("60267d14bc2f40d0dec1de3c")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe65"),
"Name": "Device3",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5e"),
ObjectId"60265a92f9bf1e3974dabe5f"),
ObjectId("60267d14bc2f40d0dec1de3d")
]
}
]

If you were hoping to use update_many to achieve this in a single update, then the short answer is you can't. update_many takes a single filter to determine which documents to update; in your example, each update is a different document id.
If you have a large number of these updates, and performance is an issue, consider using the bulk write operators.

Related

jmespath how do I find the key values in the dictionary?

I have an example json file. I need to extract all the values of the downloadUrl keys:
{
"nodes": {
"children": [
{
"id": "",
"localizedName": "",
"name": "Documents",
"children": [
{
"id": "",
"localizedName": "Brochures",
"name": "Brochures",
"items": [
{
"title": "Brochure",
"downloadUrl": "/documents/brochure-en.pdf",
"fileType": "pdf",
"fileSize": "2.9 MB"
}
]
},
{
"id": "192194",
"localizedName": "",
"name": "Demonstrations",
"items": [
{
"title": "Safety Poster",
"downloadUrl": "safety-poster-en.pdf",
"fileType": "pdf",
"fileSize": "1.1 MB"
}
]
}
]
}
]
}
}
I'm trying to do this with this query:
jmespath.search('nodes[*].downloadUrl', file)
but the list of values is not displayed.
Where is the error?
Statically, your property is under
nodes
children
[ ]
children
[ ]
items
[ ]
downloadUrl
So a query giving you those values would be:
nodes.children[].children[].items[].downloadUrl
If you want something a little more dynamic (let's say that the property names can change but the level at which you will find downloadUrl won't, you could use this query:
*.*[][].*[][].*[?downloadUrl][][].downloadUrl
But sadly, something like querying in an arbitrary structure, like you can do it in jq is not something JMESPath supports at the moment.
You need to do something like.
.search(("nodes[*].children[*].items[*].downloadUrl"))

Removing a list entry in a list in pyMongo

I have a database collection that has objects like this:
{
"_id": ObjectId("something"),
"name_lower": "total",
"name": "Total",
"mounts": [
[
"mount1",
"instance1"
],
[
"mount2",
"instance1"
],
[
"mount1",
"instance2"
],
[
"mount2",
"instance2"
]
]
}
Say I want to remove every mount that has the instance instance2, How would I go about doing that? I have been searching for quite a while.
You can do something like this
[
{
$unwind: "$mounts"
},
{
$match: {
"mounts": {
$ne: "instance2"
}
}
},
{
$group: {
_id: "$_id",
name: {
$first: "$name"
},
mounts: {
$push: "$mounts"
}
}
}
]
Working Mongo playground
This answer is based on #varman answer but more pythonic and efficient.
The first stage should be a $match condition to filter out documents that don't need to be updated.
Since the mounts key consists of a nested array, we have to $unwind it, so that we can remove array elements that need to be removed.
We have to apply the $match condition again to filter out the element that has to be removed.
Finally, we have to $group the pipeline by _id key, so that the documents which got $unwind in the previous stage will be groupped into a single document.
from pymongo import MongoClient
client = MongoClient("<URI-String>")
col = client["<DB-Name"]["<Collection-Name>"]
count = 0
for cursor in col.aggregate([
{
"$match": {
"mounts": {"$ne": "instance2"}
}
},
{
"$unwind": "$mounts"
},
{
"$match": {
"mounts": {"$ne": "instance2"}
}
},
{
"$group": {
"_id": "$_id",
"newMounts": {
"$push": "$mounts"
}
}
},
]):
# print(cursor)
col.update_one({
"_id": cursor["_id"]
}, {
"$set": {
"mounts": cursor["newMounts"]
}
})
count += 1
print("\r", count, end="")
print("\n\nDone!!!")

Converting pandas Dataframe to nested json key pair

Here is a sample data from a csv file, where every generation is children of previous generation.
parant,gen1,gen2,get3,gen4,gen5,gen6
query1,AggregateExpression,abc,def,emg,cdf,bcf
query1,And,cse,rds,acd,,
query2,Arithmetic,cbd,rsd,msd,,
query2,Average,as,vs,ve,ew,
query2,BinaryExpression,avsd,sfds,sdf,,
query2,Comparison,sdfs,sdfsx,,,
query3,Count,sfsd,,,,
query3,methods1,add,asd,fdds,sdf,sdf
query3,methods1,average,sdfs,bf,fd,
query4,methods2,distinct,cz,asd,ada,
query4,methods2,eq,sdfs,sdfxcv,sdf,rtyr
query4,methods3,eq,vcx,xcv,cdf,
I need to create a json file of following format, where parents are the index and children are always list of dictionaries and there is a size for the last generation which is calculated no. of time their parent appear (in previous generation).
Example of the first row breakdown:
{
"name": "query1",
"children": [
{
"name": "AggregateExpression",
"children": [
{
"name": "abc",
"children": [
{
"name": "def",
"children": [
{
"name": "emg",
"children": [
{
"name": "cdf",
"children": [
{
"name": "bcf", "size": 1
}
]
}
]
}
]
}
]
}
]
}
]
}
I have tried to use groupby() and to_json() but was not able to complete. But still struggling to build the logic if I need to use lambda or looping. Any suggestion or solution is welcome. Thanks.

dictionary does not give me unique Ids in python

I have the output of an elasticsearch query saved in a file. The first few lines looks like this:
{"took": 1,
"timed_out": false,
"_shards": {},
"hits": {
"total": 27,
"max_score": 6.5157733,
"hits": [
{
"_index": "dbgap_062617",
"_type": "dataset",
***"_id": "595189d15152c64c3b0adf16"***,
"_score": 6.5157733,
"_source": {
"dataAcquisition": {
"performedBy": "\n\t\tT\n\t\t"
},
"provenance": {
"ingestTime": "201",
},
"studyGroup": [
{
"Identifier": "1",
"name": "Diseas"
}
],
"license": {
"downloadURL": "http",
},
"study": {
"alternateIdentifiers": "yes",
},
"disease": {
"name": [
"Coronary Artery Disease"
]
},
"NLP_Fields": {
"CellLine": [],
"MeshID": [
"C0066533",
],
"DiseaseID": [
"C0010068"
],
"ChemicalID": [],
"Disease": [
"coronary artery disease"
],
"Chemical": [],
"Meshterm": [
"migen",
]
},
"datasetDistributions": [
{
"dateReleased": "20150312",
}
],
"dataset": {
"citations": [
"20032323"
],
**"description": "The Precoc.",**
**"title": "MIGen_ExS: PROCARDIS"**
},
.... and the list goes on with a bunch of other items ....
From all of these nodes I was interested in Unique _Ids, title, and description. So, I created a dictionary and extracted the parts that I was interested in using json. Here is my code:
import json
s={}
d=open('local file','w')
with open('localfile', 'r') as ready:
for line in ready:
test=json.loads(line, encoding='utf-8')
for i in (test['hits']['hits']):
for x in i:
s.setdefault(i['_id'], [i['_source']['dataset']
['description'], i['_source']['dataset']['title']])
for k, v in s.items():
d.write(k +'\t'+v[0] +'\t' + v[1] + '\n')
d.close()
Now, when I run it, it gives me a file with duplicated _Ids! Does not dictionary suppose to give me unique _Ids? In my original output file, I have lots of duplicated Ids that I wanted to get rid of them.
Also, I ran set() only on _ids to get unique number of them and it came to 138. But with dictionary if i remove generated duplicated ids it comes down to 17!
Can someone please tell me why this is happening?
If you want a unique ID, if you're using a database it will create it for you. If you're not, you'll need to generate a unique number or string. Depending on how the dictionaries are created, you could use the timestamp of when the dictionary was created, or you could use uuid.uuid4(). For more info on uuid, here are the docs.

mongodb get element in multi array

I have mongodb document like this:
{
"post":[
{
"name": "post1",
"part": [
{
"name": "part1",
...
},{
"name": "part2",
...
}
]
},{
"name": "post2",
"part": [
{
"name": "part3",
...
},{
"name": "part4",
...
}
]
}
...
]
}
I want get output like this:
{
"post": [
{
"part":[
{
"name": "part2"
}
]
}
]
}
my query like this:
db.find_one({"_id": 123},{
"post.%s.part.%s.name" % (0, 1) : 1
})
I known index of list post (is 0) and part (is 1)
I can't get by index of output, can you help me get element of array ?
I have try $slice, but how to query with $slice in multi part of array
Thanks!
Projection can't project out specific elements except by matching with $. You can restrict to just the field "post.part.name" (but still getting the field value for each element of the bottom-level array).

Categories