PyMongo: JSON Keys getting updated in mongo - python

I am having difficulty updating nested json structure in mongo.
I am using pymongo along with Mongoengine-Rest-framework.
Since this particular json has dynamic structure and is heavily nested, I chose to use pymongo over mongo-engine ORM.
The create, retrieve and delete operations faring fine.
I would like some suggestions on the updation issue.
Lets consider a sample object which is already present in mongo:
st1 = {
"name": "Some_name",
"details": {
"address1": {
"house_no": "731",
"street": "Some_street",
"city": "some_city"
"state": "some_state"
}
}
}
If I try to update st1 by adding address2 to the details by sending the json st2 in the update command with _id being the condition for updation,
st2 = {
"details": {
"address2": {
"house_no": "5102",
"street": "Some_street",
"city": "some_city"
"state": "some_state"
}
}
}
I get the following object st3 as result , in mongo,
st3 = {
"name": "Some_name",
"details": {
"address2": {
"house_no": " 5102",
"street": "Some_street",
"city": "some_city"
"state": "some_state"
}
}
}
instead of the expected st4 object.
st4 = {
"name": "Some_name",
"details": {
"address1": {
"house_no": "731",
"street": "Some_street",
"city": "some_city"
"state": "some_state"
},
"address2": {
"house_no": "5102",
"street": "Some_street",
"city": "some_city"
"state": "some_state"
}
}
}
my update command is:
result = collection.update_one({'_id': id}, doc)
where
id: _id of document
doc: (here) st2
collection: pymongo colllection object
The original JSON depth is 6 and the keys are dynamic. The updation will be needed at different depths.

First, change the object to update to this:
to_update = {
"house_no": "5102",
"street": "Some_street",
"city": "some_city",
"state": "some_state"
}
And then use it to update the specific part of the document you want:
collection.update_one({_id: id}, { '$set': {"details.address2" : to_update} });

use this to add address 2:
collection.update({'_id': ObjectId(doc_id)}, {'$set': {'details.%s' %
'address2': address2}}, upsert=True)
Checkout complete code :
import pymongo
from bson.objectid import ObjectId
data = {"name": "Some_name",
"details": {"address1": {"house_no": "731", "street": "Some_street", "city": "some_city", "state": "some_state"}}}
address2 = {"house_no": "731", "street": "Some_street", "city": "some_city", "state": "some_state"}
connect = pymongo.MongoClient('192.168.4.202', 20020)
database = connect['my_test']
collection = database['coll']
# # CREATE COLLECTIONS AND INSERT DATA
# _id = collection.insert(data)
# print _id
doc_id = '57568aa11ec52522343ee695'
collection.update({'_id': ObjectId(doc_id)}, {'$set': {'details.%s' % 'address2': address2}}, upsert=True)

Related

complex json file to csv in python

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format
csv file
{
"_id": {
"$oid": "2e3230"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "10005",
},
"name": "Evgiya Kovava",
"address2": {
"country": "US",
"country_name": "NY",
}
}
}
{
"_id": {
"$oid": "2d118c8bo"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "52805",
},
"name": "Eiya tceva",
"address2": {
"country": "US",
"country_name": "TX",
}
}
}
import pandas as pd
null = 'null'
data = {
"_id": {
"$oid": "2e3230s314i5dc07e118c8bo"
},
"add": {
"address": {
"address_type": "Door",
"address": "kvartira 14",
"city": "new york",
"region": null,
"zipcode": "10005",
},
"name": "Evgeniya Kovantceva",
"type": "Private person",
"code": null,
"additional_phone_nums": null,
"email": null,
"notifications": [],
"address": {
"address": "kvartira 14",
"city": "new york",
"region": null,
"zipcode": "10005",
"country": "US",
"country_name": "NY",
}
}
}
df = pd.json_normalize(data)
df.to_csv('yourpath.csv')
Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?
EDIT
Ok after your information it looks like json.JSONDecoder() is what you need.
Originally posted by #pschill on this link:
how to analyze json objects that are NOT separated by comma (preferably in Python)
I tried his code on your data:
import json
import pandas as pd
data = """{
"_id": {
"$oid": "2e3230"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "10005"
},
"name": "Evgiya Kovava",
"address2": {
"country": "US",
"country_name": "NY"
}
}
}
{
"_id": {
"$oid": "2d118c8bo"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "52805"
},
"name": "Eiya tceva",
"address2": {
"country": "US",
"country_name": "TX"
}
}
}"""
Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).
You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.
So after that I tried this:
content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:
value, new_start = decoder.raw_decode(content)
content = content[new_start:].strip()
# You can handle the value directly in this loop:
# print("Parsed:", value)
# Or you can store it in a container and use it later:
parsed_values.append(value)
which gave me an error but the list seems to get populated with all the values:
parsed_values
[{'_id': {'$oid': '2e3230'},
'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},
'name': 'Evgiya Kovava',
'address2': {'country': 'US', 'country_name': 'NY'}}},
{'_id': {'$oid': '2d118c8bo'},
'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},
'name': 'Eiya tceva',
'address2': {'country': 'US', 'country_name': 'TX'}}}]
next I did:
df = pd.json_normalize(parsed_values)
which worked fine.
You can always save that to a csv with:
df.to_csv('yourpath.csv')
Tell me if that helped.
Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P

Elasticsearch Parent-Child Mapping and Indexing

I was following the book "Elasticsearch: The Definitive Guide". This book is outdated and when something was not working I was searching it on the internet and making it work with newer versions. But I can't find anything useful for Parent-Child Mapping and Indexing.
For example:
{
"mappings": {
"branch": {},
"employee": {
"_parent": {
"type": "branch"
}
}
}
}
How can I represent following mapping in new version of Elasticsearch.
And How can I index following parent:
{ "name": "London Westminster", "city": "London", "country": "UK" }
and following childer:
PUT company/employee/1?parent=London
{
"name": "Alice Smith",
"dob": "1970-10-24",
"hobby": "hiking"
}
Also, I am using elasticsearch python client and providing examples in it would be appreciated.
The _parent field has been removed in favor of the join field.
The join data type is a special field that creates parent/child
relation within documents of the same index. The relations section
defines a set of possible relations within the documents, each
relation being a parent name and a child name.
Consider company as the parent and employee as its child
Index Mapping:
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"company": "employee"
}
}
}
}
}
Parent document in the company context
PUT /index-name/_doc/1
{
"name": "London Westminster",
"city": "London",
"country": "UK",
"my_join_field": {
"name": "company"
}
}
Child document
PUT /index-name/_doc/2?routing=1&refresh
{
"name": "Alice Smith",
"dob": "1970-10-24",
"hobby": "hiking",
"my_join_field": {
"name": "employee",
"parent": "1"
}
}

How can I add database references in schema validation when creating a mongodb collection?

Say I have a collection "cities" with the following documents:
Document 1:
{
"_id": {
"$oid": "5e00979d7c21388869c2c048"
},
"cityName": "New York"
}
Document 2:
{
"_id": {
"$oid": "5e00979d7c21388869c2c432"
},
"cityName": "Los Angeles"
}
and I want to create another collection "students" with the following document:
{
"name": "John",
"citiesVisited": [
{
"$ref": "cities",
"$id": "5e00979d7c21388869c2c048"
},
{
"$ref": "cities",
"$id": "5e00979d7c21388869c2c432"
}
]
}
How should the schema validation be? I tried the following validation:
validator = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "citiesVisited"],
"properties": {
"name": {
"bsonType": "string",
"description": "name of student."
},
"citiesVisited": {
"bsonType": ["array"],
"items": {
"bsonType": "object",
"required": ["$ref", "$id"],
"properties": {
"$ref": {
"bsonType": "string",
"description": "collection name"
},
"$id": {
"bsonType": "string",
"description": "document id of visited city"
}
}
},
"description": "cities visited by the student"
}
}
}}
but it gives the following error when I try to get a list of all collections in the database:
bson.errors.InvalidBSON: collection must be an instance of str
I tried creating the validation without the "$" in "$ref" and "$id" and it worked fine but the document validation failed because of database references.
I want to use dbrefs when storing the cities.

How to create json object for mongodb

I'm building json object which I want to insert to an already existing collection in mongoldb with data using pymongo. The data looks like this:
[
{
"title": "PyMongo",
"publication_date": "2015-09-07 10:00:00",
"tags": ["python", "mongodb", "nosql"],
"author": {
"name": "David",
"author_info": {
"$oid": "1870981708ddb1a352189e25w"
},
},
"date": {
"$date": 1484071215000 },
}
]
I noticed author has objectid and date is a timestamp, how can I create values for author_info and date and insert into mongodb?
I have a function that builds the values like this:
def build_json(title, pubctndate,taglist,author):
json_ob = \
[ {
"title": title,
"publication_date": pubctndate,
"tags": tablets,
"author": {
"name": author,
"author_info": {
"$oid": " "
},
},
"date": {"$date": },
}]
return json_ob
which I intend to call this way json.dumps(build_json(title, pubctndate,taglist,author))
Please bear with me, I'm a complete beginner.

Mongodb: updating fields in embedded document

If I have a document that looks like this:
{
"_id" : 1,
"name": "Homer J. Simpson",
"income" : 45000,
"address": {
"street": "742 Evergreen Terrace",
"city": "Springfield",
"state": "???",
"email": "homer#springfield.com",
"zipcode": "12345",
"country": "USA"
}
}
And want to do an update on some of the fields in the address document (leaving the other ones unchanged), and insert new fields if they do not already exist, such as this:
{
"address": {
"email": "homer#gmail.com",
"zipcode": "77788",
"latitude" : 23.43545,
"longitude" : 123.45553
}
}
Is there a way to do an atomic update all at once, or do you need to loop over the key/values in the new data and do a .update() for each one?
Use dot notation with a $set to target multiple embedded fields in a single update:
{ "$set": {
"address.email": "homer#gmail.com",
"address.zipcode": "77788",
"address.latitude" : 23.43545,
"address.longitude" : 123.45553
} }
As Sergio metioned use a $set.
{address.latitude : "77788"}

Categories