MongoDB: Remove duplicate records from Projection - python

How can I remove duplicate records from mongoDB projection ?
Lets say I have My mongo documents in following form -
{"_id":"55555454", "From":"Bob", "To":"Alice", "subject":"Hi", "date":"04102011"}
{"_id":"55555455", "From":"Bob", "To":"Dave", "subject":"Hello", "date":"04102014"}
{"_id":"55555456", "From":"Bob", "To":"Alice", "subject":"Bye", "date":"04112013"}
When I do a simple projection
db.col.find({}, {"From":1, "To":1, "_id"=0})
which will obviously give me all three records like this.
{"From":"Bob", "To":"Alice"} {"From":"Bob","To":"Dave"} {"From":"Bob",
"To":"Alice"}
However What I want is only two records, this way -
{"From":"Bob", "To":"Alice"} {"From":"Bob","To":"Dave"}
As My application is in python currently (using pymongo), what I am doing is that I am removing duplicate in the application from the list of records using
result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in l)]
Is there any DB method which I can apply to the projection and gives me only two records.

You can't do a reduction and eliminate duplicate documents using just find with MongoDB and a projection.
The find commands won't work as you need remember that it's returning a cursor to the client and as such, can't reduce the results to only those documents that are unique without a secondary pass.
Using this as test data (removed the _id):
> db.test.find()
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "George", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
{ "From" : "David", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
You could use aggregation:
> db.test.aggregate({ $group: { _id: { "From": "$From", "To": "$To" }}})
Results:
{
"result" : [
{
"_id" : {
"From" : "David",
"To" : "Carl"
}
},
{
"_id" : {
"From" : "George",
"To" : "Carl"
}
},
{
"_id" : {
"From" : "Bob",
"To" : "Dave"
}
},
{
"_id" : {
"From" : "Bob",
"To" : "Alice"
}
}
],
"ok" : 1
}
The Python code should look very similar to the aggregation pipeline suggested above.

Projection only defines which fields you want to appear in the result. It is much like the statement starting with:
SELECT From, To
as opposed to the basic form of
SELECT *
So what you actually wanted to do was the equivalent of this:
db.collection.find(
{ "From": "Bob", "To": "Alice" },
{ "From": 1, "To": 1 }
)
Which actually selects the records that you want and is much the same form as:
SELECT From, To
FROM collection
WHERE
From = "Bob"
AND To = "Alice"
Should that actually somehow produce "duplicate" results the you can remove this with use of aggregate:
db.collection.aggregate([
{ "$match": {
"From": "Bob", "To": "Alice"
}}
{ "$group": {
"_id": {
"From": "$From", "To": "$To"
}
}}
])

Related

How to get all elements in the list from JSON in python code

trying to find out correct syntax to traverse through through the list to get all values and insert into oracle.
edit:
Below is the json structure :
[{
"publishTime" : "2021-11-02T20:18:36.223Z",
"data" : {
"DateTime" : "2021-11-01T15:10:17Z",
"Name" : "x1",
"frtb" : {
"code" : "set1"
},
"mainAccounts" : [ {
"System" : {
"identifier" : {
"domain" : "System",
"id" : "xxx"
},
"name" : "TEST1"
},
"Account" : "acc1"
}, {
"System" : {
"identifier" : {
"domain" : "System",
"id" : "xxx"
},
"name" : "TEST2"
},
"Account" : "acc2"
}, {
"System" : {
"identifier" : {
"domain" : "System",
"id" : "xxx"
},
"name" : "TEST3"
},
"Account" : "acc3"
}],
"sub" : {
"ind" : false,
"identifier" : {
"domain" : "ops",
"id" : "1",
"version" : "1"
}]
My python code :
insert_statement = """INSERT INTO mytable VALUES (:1,:2)"""
r =requests.get(url, cert=(auth_certificate, priv_key), verify=root_cert, timeout=3600)
data=json.loads(r.text)
myrows = []
for item in data:
try:
name = (item.get("data").get("Name"))
except AttributeError:
name=''
try:
account= (item.get("data").get("mainAccounts")[0].get("Account") )
except TypeError:
account=''
rows=(name,account)
myrows.append(rows)
cursor.executemany(insert_statement,myrows)
connection_target.commit()
with the above i only get first value for 'account' in list i.e. ("acc1") , how to get all the values i.e. (acc1,acc2,acc3) ?
I have tried below with no success :
try:
Account = (item.get("data").get("mainAccounts")[0].get("Account") for item in data["mainAccounts")
except TypeError:
Account= ''
please advise.Appreciate your help always.
import json
my_json = json.load(open("my_json.json", 'r'))
>>> mainAccounts = my_json['mainAccounts']
>>> for account in mainAccounts:
... account_name = account['Account']
... system_domain = account['System']['identifier']['domain']
... system_id = account['System']['identifier']['id']
... system_name = account['System']['name']
... print(f"\n======={account_name}==========")
... print(f"System domain: {system_domain}")
... print(f"System id: {system_id}")
... print(f"System name: {system_name}")
=======acc1==========
System domain: System
System id: xxx
System name: TEST1
=======acc2==========
System domain: System
System id: xxx
System name: TEST2
=======acc3==========
System domain: System
System id: xxx
System name: TEST3
I have not worked with Oracle before so I'm going to assume your SQL statement is correct and requires a tuple of (name, account_number).
data = [
{
"publishTime" : "2021-11-02T20:18:36.223Z",
"data" : {
"DateTime" : "2021-11-01T15:10:17Z",
"Name" : "x1",
"frtb" : {
"code" : "set1"
},
"mainAccounts" : [
{
"System" : {
"identifier" : {
"domain" : "System",
"id" : "xxx"
},
"name" : "TEST1"
},
"Account" : "acc1"
}, {
"System" : {
"identifier" : {
"domain" : "System",
"id" : "xxx"
},
"name" : "TEST2"
},
"Account" : "acc2"
}, {
"System" : {
"identifier" : {
"domain" : "System",
"id" : "xxx"
},
"name" : "TEST3"
},
"Account" : "acc3"
}
],
"sub" : {
"ind" : False,
"identifier" : {
"domain" : "ops",
"id" : "1",
"version" : "1"
}
}
}
}
]
myrows = []
for item in data:
accounts = [
(act["System"]["name"], act["Account"])
for act in item.get("data", {}).get("mainAccounts", [])
]
myrows.extend(accounts)
print(myrows) # [('TEST1', 'acc1'), ('TEST2', 'acc2'), ('TEST3', 'acc3')]
And then assuming it's in the correct format.
cursor.executemany(insert_statement, myrows)
How it works:
This line gets your top level dict and assigns it to item.
I used a for loop in case there are multiple dicts in the list.
for item in data:
The list comprehension is a bit more complex. But it can be translated to this.
accounts = []
for act in item.get("data", {}).get("mainAccounts", []):
name = act["System"]["name"]
account = act["Account"]
accounts.append(name, account)
Or simplified even more:
data_dict = item.get("data", {})
main_accounts_list = data_dict.get("mainAccounts", [])
for act in main_accounts_list:
# the rest of the code
I used dict.get() with an empty dict and list as a default so that even if the keys do not exist, it will still be able to process the subsequent .get and iteration without erroring.
Finally, you extend my_rows rather than append so you add don't get a list of list.
myrows.extend(accounts)

My code is woring in mongodb but not working in pymongo

I have a documents in collection and I want to find document and update elements of list.
Here is sample data:
{
{
"_id" : ObjectId("5edd3faaf6c9d938e0bfd966"),
"id" : 1,
"status" : "XXX",
"number" : [
{
"code" : "AAA"
},
{
"code" : "CVB"
},
{
"code" : "AAA"
},
{
"code" : "BBB"
}
]
},
{
"_id" : ObjectId("asseffsfpo2dedefwef"),
"id" : 2,
"status" : "TUY",
"number" : [
{
"code" : "PPP"
},
{
"code" : "SSD"
},
{
"code" : "HDD"
},
{
"code" : "IOO"
}
]
}
}
I planed to find where "id":1 and value of number.code in ["AAA", "BBB"], change number.code to "DDD". I did it with following code:
db.test.update(
{
id: 1,
"number.code": {$in: ["AAA", "BBB"]}
},
{
$set: {"number.$[elem].code": "VVV"}
},
{ "arrayFilters": [{ "elem.code": {$in: ["AAA", "BBB"]} }], "multi": true, "upsert": false
}
)
It works in mongodb shell, but in python (with pymongo) it doesn't with the following error:
raise TypeError("%s must be True or False" % (option,))
TypeError: upsert must be True or False
Please help me. What can I do?
pymongo just has syntax that's a tad different. it would look like this:
db.test.update_many(
{
"id": 1,
"number.code": {"$in": ["AAA", "BBB"]}
},
{
"$set": {"number.$[elem].code": "VVV"}
},
array_filters=[{"elem.code": {"$in": ["AAA", "BBB"]}}],
upsert=False
)
multi flag not needed with update_many.
upsert is False by default hence also redundant.
You can find pymongo's docs here.

How to write match condition for array values?

I have stored values in multiple variables. below are the input variables.
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
These values are changed dynamically. and below is my code:
user_posts.aggregate([{
"$match": {
"$or": [{ "userid": uid }, {
"userid": {
"$eq":
disuid
}
}]
}
},
{
"$lookup": {
"from": "user_profile",
"localField": "userid",
"foreignField": "_id",
"as": "details"
}
},
{ "$unwind": "$details" },
{
"$sort": { "created_ts": -1 }
},
{
"$project": {
"userid": 1,
"type": 1,
"location": 1,
"caption": 1
}
}
])
In the above code, I am getting matched uid values only but I need documents matched to disuid also.
In userid field, we have stored "Objectid" values only.
So my concern is how to add "Objectid" to "disuid" variable and how to write match condition for both variables using userid field?
Ok you can do it in two ways :
As you've this :
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
You need to convert your list of strings to list of ObjectId's using python code :
from bson.objectid import ObjectId
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
my_list = []
for i in disuid:
my_list.append(ObjectId(i))
It will look like this :
[ObjectId('5d76b2c847c8d3000184a090'),ObjectId('5d7abb7a97a90b0001326010')]
then by using new list my_list, you can do query like this :
user_posts.aggregate([{"$match" : { "$or" : [{ "userid" : uid }, { "userid" : { "$in" : my_list }}]}}])
Or in the other way which I wouldn't prefer, as converting just few in code is easier compared to n num of values for userid field over all documents in DB, but just in case if you want it to be done using DB query :
user_posts.aggregate([{$addFields : {userStrings : {$toString: '$userid'}}},{"$match" : { "$or" : [{ "userid" : uid }, { "userStrings" : { "$in" : disuid }}]}}])
Note : In case if you don't have bson package, then you need to install it by doing something like pip install bson

Matching / Mapping lists with elasticsearch

There is a list in mongodb,
eg:
db_name = "Test"
collection_name = "Map"
db.Map.findOne()
{
"_id" : ObjectId(...),
"Id" : "576",
"FirstName" : "xyz",
"LastName" : "abc",
"skills" : [
"C++",
"Java",
"Python",
"MongoDB",
]
}
There is a list in elastcisearch index (I am using kibana to execute queries)
GET /user/_search
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "xyz abc"
"Age" : 21,
"skills" : [
"C++",
"Java",
"Python",
"MongoDB",
]
}
},
]
}
}
Can anyone help with the elasticsearch query that will match both the records based on skills.
I am using python to write the code
If a match is found, I am trying to get the first name and last name of that user
First name : "xyz"
Last name : "abc"
Assuming you are indexing all the document in elastic and of these you want to match documents where skills has both java and mongodb the query will be as:
{
"query": {
"bool": {
"filter": [
{
"term": {
"skills": "mongodb"
}
},
{
"term": {
"skills": "java"
}
}
]
}
}
}

Updating an object inside an array with PyMongo

I am wondering how do you update a nested array with PyMongo/MongoDB by selecting a document(row) and then going into the nested array and selecting a specific object.
{
"_id" : "12345",
"name" : "John Doe,
"mylist" : [
{
"nested_id" : "1",
"data1" : "lorem ipsum",
"data2" : "stackoverflow",
"data3" : "james bond"
},
{
"nested_id" : "2",
"data1" : "lorem ipsum",
"data2" : "stackoverflow",
"data3" : "james bond"
},
{
....
}
]
}
and then lets say you pass a discretionary with the elements you want to update. In this example only update data1 and data3
data = {
"data1" : "new lorem",
"data3" : "goldeneye"
}
I have tried with the following syntax, but with no success.
db.testing.find_and_modify(
query={"_id": "12345", 'mylist.nested_id' : "1"},
update={"$set": {'mylist' : data}})
what it should look like after the update
{
"_id" : "12345",
"name" : "John Doe,
"mylist" : [
{
"nested_id" : "1",
"data1" : "new lorem",
"data2" : "stackoverflow",
"data3" : "goldeneye"
},
{
"nested_id" : "2",
"data1" : "lorem ipsum",
"data2" : "stackoverflow",
"data3" : "james bond"
},
{
....
}
]
}
Use "dot notation" and the positional operator in the update portion. Also transform your input to match the "dot notation" form for the key representation:
# Transform to "dot notation" on explicit field
for key in data:
data["mylist.$." + key] = data[key]
del data[key]
# Basically makes
# {
# "mylist.$.data1": "new lorem",
# "mylist.$.data3": "goldeneye"
# }
db.testing.find_and_modify(
query = {"_id": "12345", 'mylist.nested_id' : "1"},
update = { "$set": data }
)
So that will transpose $ to the actual matched element position from the query portion of the update. The matched array element will be updated and using "dot notation" only the mentioned fields will be affected.
Have no idea what "service" is supposed to mean in this context and I am just treating it as a "transcribing error" since you are clearly trying to match an array element in position.
That could be cleaner, but this should give you the general idea.

Categories