converting mongodb arrays into python arrays - python

I'm writing a code in which i find this kind of database (i'm using pymongo).
How can i attribute these arrays inside the wishlist field to python arrays?
Alternatively, how can i search my database for a value inside an array inside the wishlist field. E.g.: i want to find all IDs that have, say, ["feldon","c15", "sp"] in their wishlists
{
"_id" : "david",
"password" : "azzzzzaa",
"url" : "url3",
"old_url" : "url3",
"new_url" : ["url1", "url2"],
"wishlist" : [
["all is dust", "mm4", "nm"],
["feldon", "c15", "sp"],
["feldon", "c15", "sp"],
["jenara", "shards", "nm"],
["rafiq", "shards", "nm"]
]
}

You can use distinct if elements in your sublist and are exactly in the same order.
db.collection.distinct("_id", {"wishlist": ["feldon", "c15", "sp"]})
If not, you need to use the aggregate method and the $redact operator.
db.collection.aggregate([
{"$redact": {
"$cond": [
{"$setIsSubset": [
[["feldon","c15", "sp"]],
"$wishlist"
]},
"$$KEEP",
"$$PRUNE"
]
}},
{"$group": {
"_id": None,
"ids": {"$push": "$_id"}
}}
])

Related

Convert nested dictionary inside dictionary into relational and add missing keys using Python

I am trying to convert below json records into relational but I am not getting the expected output,
Filename.json:-
{
"SampleRecord":{
"SampleRules":[
{
"Scaler_id":"1",
"family_min_samples_percentage":5,
"original_number_of_clusters":4,
"Results":[
{
"eps_value":0.1,
"min_samples":5,
"number_of_clusters":9,
"number_of_noise_samples":72,
"scores":{
"adjusted_rand_index":0.001,
"adjusted_mutual_info_score":0.009
}
}
],
"isnegative":"False",
"comment":[
"#Comment"
],
"enable":"enabled",
"additional_value":{
"type":[
{
"value":"AAA"
}
],
"uid":[
{
"value":"BBB"
}
],
"options":[
{
"value":"CCC"
},
{
"value":"DDD"
}
],
"scope":[
{
"value":"EEE"
}
]
}
},
{
"Scaler_id":"2",
"family_min_samples_percentage":5,
"original_number_of_clusters":4,
"Results":[
{
"eps_value":0.1,
"min_samples":5,
"number_of_clusters":9,
"number_of_noise_samples":72,
"scores":{
"adjusted_rand_index":0.001,
"adjusted_mutual_info_score":0.009
}
}
],
"isnegative":"False",
"comment":[
"#Comment"
],
"enable":"enabled",
"additional_value":{
"type":[
{
"value":"AAA"
}
],
"uid":[
{
"value":"BBB"
}
],
"options":[
{
"value":"CCC"
}
]
}
}
]
}
}
Expected output:
Scaler_id~original_number_of_clusters~Results_eps_value~Results_Scores_adjusted_rand_index~Results_Scores_avies_bouldin_score~isnegative~comment~additional_value_type~additional_value_uid~additional_value_options
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC~EEE
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~DDD~EEE
2~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC~Null
with open(Filename.json) as inputfile:
content = inputfile.read()
data=json.loads(json.dumps(content,ensure_ascii=False))
df1=pd.json_normalize(data['SampleRecord'],'SampleRules',sep='_')
df23.to_csv('Sample1.txt',encoding='utf-8',index=False,sep'~',na_rep='')
output(Sample1.txt):
Scaler_id~original_number_of_clusters~Results~isnegative~comment~additional_value_type~additional_value_uid~additional_value_options
1~4~[{0.1},{0.001},{1.70}]~False~#comment~[{AAA}]~[{BBB}]~[{CCC},{DDD}]~[{EEE}]
2~4~[{0.1},{0.001},{1.70}]~False~#comment~[{AAA}]~[{BBB}]~[{CCC}]~
df2=pd.json_normalize(data['SampleRecord'],['SampleRules','Results'],[['SampleRules','Scaler_id'],['SampleRules','original_number_of_clusters'],['SampleRules','isnegative'],['SampleRules','comment']],record_prefix='Results',sep='_',max_level=None,errors='ignore')
df3=pd.json_normalize(data['SampleRecord'],['SampleRules','additional_value','type'],['SampleRules','Scaler_id'],record_prefix='additional_value',sep='_',max_level=None,errors='ignore')
df23=pd.merge(df2,df3,how='inner',left_on=('SampleRules_Scaler_id'),right_on=('SampleRules_Scaler_id'))
df23.to_csv('Sample2.txt',encoding='utf-8',index=False,sep'~',na_rep='')
Current output(Sample2.txt):
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC
1~4~0.1~0.001~1.70~False~#comment~AAA~BBB~DDD
2~4~0.1~0.001~1.70~False~#comment~AAA~BBB~CCC
df4=pd.json_normalize(data['SampleRecord'],['SampleRules','additional_value','scope'],['SampleRules','Scaler_id'],record_prefix='additional_value',sep='_',max_level=None,errors='ignore') #This throws KeyError='Scope' (since this key is missing in few records)
I tried to use get() since it gives default value None but it didnt work,
df4=pd.json_normalize(data['SampleRecord']['SampleRules'],['additional_value'],['scope'].get('value'),['SampleRules','Scaler_id'],record_prefix='additional_value',sep='_',max_level=None,errors='ignore')
#TypeError : list indices must be integers or slices,not str
Problems:
1)How to get nested dictionary (additional_value) values in single normalize python code like without explicitly defining df2,df3,df4 for each sub dictionaries?
2)How to get missing key as Null if key itself missing in Json record and avoid keyError
I have already referred the below,but no luck
How to fill missing json keys with key and null value?
If key not in JSON then set value to null and insert into dataframe
Python JSON TypeError list indices must be integers or slices, not str
python dictionary keyError
I am a beginner to Python. Any suggestions would be of great help.
Thanks in advance!

Save reference to a python dictionary (from a json file) item in a variable

I'm trying to save the reference to a value in a json file where item orders could not be guaranteed. So far, what I have for a dataset like this one:
"Values": [
{
"Object": "DFC_Asset_05",
"Properties": [
{
"Property": "WeightKilograms",
"Value Offset": 5
},
{
"Property": "WeightPounds",
"Value Offset": 10
}
]
},
{
"Object": "DFC_Asset_05",
"Properties": [
{
"Property": "Name",
"Value Offset": 25
},
{
"Property": "ShortName",
"Value Offset": 119
}
]
}
]
and retrieving this object:
{
"Property": "ShortName",
"Value Offset": 119
}
Is a string like this:
reference = "[Object=DFC_Asset_06][Properties][Property=Name]"
Which looks nice and understandable in a string, but it's very unclean to find the referenced value as I must first parse the reference it with a regex then loop in the data to retrieve the matching item.
Am I doing this wrong? Is there a better way to do this? I looked at the reduce() function however it seems like it's made for dictionaries with static data. For example, I could not save the direct keys:
reference = "[1][Properties][1]"
reference_using_reduce = [1, "Properties", 1]
As they might not always be in that order
You can run "queries" on JSONs without referencing specific indexes using the pyjq module:
query = (
'.Values[]' # On all the items in "Values"
'|select(."Object" == "DFC_Asset_06")' # Find key "Object" which holds this value
'|."Properties"[]' # And get all the items of "Properties"
'|select(."Property" == "Name")' # Where the key "Property" holds the value "Name"
)
pyjq.first(query, d)
Result:
{'Property': 'Name', 'Value Offset': 25}
You can read more about jq in the documentations.

Use $cond within $match in mongoDB aggregation

i've tried to use $cond within $match in one of the stages of an aggregation as shown below :
{ "$match" : {
"field1" : {
"$cond" : {
"if" : { "$eq" : [ "$id" , 1206]},
"then" : 0,
"else" : 1545001200
}
},
"field2" : value2 }}
But i got this error :
Error:
Assert: command failed: {
"ok" : 0,
"errmsg" : "unknown operator: $cond",
"code" : 2,
"codeName" : "BadValue"
} : aggregate failed
The mongodb version is 3.4.4.
Any idea about this issue?
You just have to reword the logic a little bit.
{ $match: { $expr: {
$or: [
{ $and: [
{ $eq: [ "$id", 1206 ] },
{ $eq: [ "$field1", 0 ] }
]},
{ $and: [
{ $ne: [ "$id", 1206 ] },
{ $eq: [ "$field1", 1545001200 ] }
]},
],
}}}
Logically, the two statements are equivalent:
Match the document by checking field1 == 0 if id == 1206, otherwise match the document by checking field1 == 1545001200
Match the document if either (id == 1206 and field1 == 0) or (id != 1206 and field1 == 1545001200).
For those coming across this later on down the road:
This won't work for 3.4.4. But in MongoDB 3.6 they introduced the $expr operator that lets you use $cond and other operations within a $match query.
https://docs.mongodb.com/manual/reference/operator/aggregation/match/
For an example see iamfrank's answer.
Also as mentioned in the comments you could do this later down the pipeline. But ideally you'll want to filter out results as early on in the pipeline using $match in order to improve processing times.
Unless important details were left out of the question, I think you are complicating something simple. Filtering during a $match aggregation step is the natural expected thing it should do.
For this particular example, there are only two simple scenarios to match a document. There is no need to use any other operators, just define the two different mutually exclusive queries and put them in an $or logical operator:
{'$match': {'$or': [
{'id': 1206, 'field1': 0},
{'id': {'$ne': 1206}, 'field1': 1545001200},
]}}

TypeError unhashable type: 'dict' using $elemMatch in count or find in Python

The Users collection in the Mongo database contains records containing an element that is an array (SubscriptionSet) of dictionaries with pairs of values as shown below.
{
"_id": {
"$oid": "567019357a5c390d040cbbc2"
},
"EmailAddress": "joejane#myco.com",
"SubscriptionSet": [
{
"SubscriptionId": 586102,
"SeatState": "ASSIGNED"
},
{
"SubscriptionId": 588972,
"SeatState": "ASSIGNED"
}
],
"DisplayName": "Joe Jane",
"SubscriberState": "ACTIVE"
}
I want to find all users that have a SubscriptionId matching one of two values AND a SeatState that is not equal to ASSIGNED. I am using the following find call.
GoodSubscriptions = [586102, 586104]
db = client.bsssubscriptions
Users = db.Users
BadSubscriptions = Users.find({'$and': [{'SubscriberState': 'ACTIVE'}, {'SubscriptionSet': {'$elemMatch': {{'SubscriptionId': {'$in': GoodSubscriptions}}, {'SeatState': {'$ne': 'ASSIGNED'}}}}}]})
and get the following error:
File "C:\Users\IBM_ADMIN\Desktop\BSS API\db query.py", line 24, in <module>
BadVerse = Users.find({'SubscriptionSet': {'$elemMatch': {{'SubscriptionId': {'$in': GoodSubscriptions}}, {'SeatState': {'$ne': 'ASSIGNED'}}}}})
TypeError: unhashable type: 'dict'
I have looked for this specific error and found many examples, but none were related to using a find with a records containing an array of dictionary pairs.
You have too many curly braces in the $elem subquery. Try this query:
{
'$and':
[
{'SubscriberState': 'ACTIVE'},
{
'SubscriptionSet': {
'$elemMatch': {
'SubscriptionId': {'$in': GoodSubscriptions},
'SeatState': {'$ne': 'ASSIGNED'}
}
}
}
]
}
I actually think you want to use mongodb $filter. Because with the $elemMatch if there is any element within embedded array which satisfies the condition then a whole document with a whole embedded array(despite whether another elements satisfy condition) will be returned.

How do I delete values from this document in MongoDB using Python

I am having a document which is structured like this
{
"_id" : ObjectId("564c0cb748f9fa2c8cdeb20f"),
"username" : "blah",
"useremail" : "blah#blahblah.com",
"groupTypeCustomer" : true,
"addedpartners" : [
"562f1a629410d3271ba74f74",
"562f1a6f9410d3271ba74f83"
],
"groupName" : "Mojito",
"groupTypeSupplier" : false,
"groupDescription" : "A group for fashion designers"
}
Now I want to delete one of the values from this 'addedpartners' array and update the document.
I want to just delete 562f1a6f9410d3271ba74f83 from the addedpartners array
This is what I had tried earlier.
db.myCollection.update({'_id':'564c0cb748f9fa2c8cdeb20f'},{'$pull':{'addedpartners':'562f1a6f9410d3271ba74f83'}})
db.myCollection.update(
{ _id: ObjectId(id) },
{ $pull: { 'addedpartners': '562f1a629410d3271ba74f74' } }
);
Try with this
db.myCollection.update({}, {$unset : {"addedpartners.1" : 1 }})
db.myCollection.update({}, {$pull : {"addedpartners" : null}})
No way to delete array directly, i think this is going to work, i haven't tried yet.

Categories