How to get documents' objects length in MongoDB using PyMongo?

How to get documents' objects length in MongoDB using PyMongo? - python

I have hundreds of documents following the format below. Basically, each doc contains objects with the types "BehavioralData", "Notes", etc.
{
"_id": "123",
"Notes": {
"1222": "Something is here"
},
"BehavioralData": {
"Folder1": {
"Sex": "Male",
"Age": "22",
"Date": "",
"ResearchGroup": "",
"Institution": "University of Manitoba"
},
"MoCA": {
"Visual-Executive": "",
"Naming": "NameHere",
"Attention": "",
"Language": "",
"Abstraction": "",
"Delayed Recall": "",
"Orientation": "",
"Education": "",
"Total": ""
}
}
}
Is there a way to query the collection using PyMongo in Python to get the following result:
{
"NotesLength": 1,
"BehavioralLength": 2
}
What I did before, is upload documents one by one to my python script, and then measured the lengths of the dictionaries inside the dictionary. This takes a bit long as my python program queries the whole collection. Is there a way to do it faster than I do now?

Related

Converting from json to dataframe to sql

I'm trying to save all the json data to the sql database and I'm using python so I decided to use pandas.
Part of the JSON:
{
"stores": [
{
"ID": "123456",
"name": "Store 1",
"status": "Active",
"date": "2019-03-28T15:20:00Z",
"tagIDs": null,
"location": {
"cityID": 2,
"countryID": 4,
"geoLocation": {
"latitude": 1.13121,
"longitude": 103.4324231
},
"postcode": "123456",
"address": ""
},
"new": false
},
{
"ID": "223456",
"name": "Store 2",
"status": "Active",
"date": "2020-03-28T15:20:00Z",
"tagIDs": [
12,
35
],
"location": {
"cityID": 21,
"countryID": 5,
"geoLocation": {
"latitude": 1.12512,
"longitude": 103.23342
},
"postcode": "223456",
"address": ""
},
"new": true
}
]
}
My Code:
response = requests.get(.....)
result = response.text
data = json.loads(result)
df = pd.json_normalize(data["store"])
.....
db_connection = sqlalchemy.create_engine(.....)
df.to_sql(con=db_connection, name="store", if_exists="append" )
Error: _mysql_connector.MySQLInterfaceError: Python type list cannot be converted
How I want the dataframe to actually look like:
ID tagIDs date
0 123456 [] 2020-04-23T09:32:26Z
1 223456 [12,35] 2019-05-24T03:21:39Z
2 323456 [709,1493] 2019-03-28T15:38:39Z
I tried using different dataframes & json objects so far and they all work.
So I discovered the issue is with the json object.
Without the "tagIDs", everything else works fine.
I was thinking maybe if I converted the object to a string it can be parsed to sql but it didn't work either. How do I change the tagIDs such that I can parse everything to sql? Or is there another more efficient way to do this?

I think the tagIDs field is a list and your database does not seem to be happy with it.
Not sure this is the best way but you can try to convert it from list to string
df['tagIDs'] = df['tagIDs'].apply(lambda x: str(x))

JSON EXtraction in Python

I am trying to extract a specific part of the JSON but I keep on getting errors.
I am interested in the following sections:
"field": "tag",
"value": "Wian",
I can extract the entire filter section using:
for i in range(0,values_num):
dedata[i]['filter']
But if I try to filter beyond that point I just get errors.
Could someone please assist me with this?
Here is the JSON output style:
{
"mod_time": 1594631137499,
"description": "",
"id": 82,
"name": "Wian",
"include_custom_devices": true,
"dynamic": true,
"field": null,
"value": null,
"filter": {
"rules": [
{
"field": "tag",
"operand": {
"value": "Wian",
"is_regex": false
},
"operator": "~"
}
],
"operator": "and"
}
}

You are probably trying to access the data in rules but since its an array, you have to specifically access that array by getting the [0] index.
You could simplistically just use .get('<name>') as shown below:
dedata['filter']['rules'][0].get('field'))
Likewise for value:
dedata[i]['filter']['rules'][0]['operand'].get('value')
comment out the for loop and try without it and [i] and see if it works

Using Response of Confluence Rest API

I am performing bulk operation of creating Space and Pages in our Confluence Cloud instance. I am not particularly a programmer. Need some assistance in using the response we are getting after running the APIs using Python. The output is in Json format. If this is the output can you please let me know how can I access the title and ID -
{
"content": {
"id": "398852913",
"type": "page",
"status": "current",
"title": "Test Project Name 1 SOW",
"childTypes": {},
"macroRenderedOutput": {},
"restrictions": {},
"_expandable": {
"container": "",
"metadata": "",
"extensions": "",
"operations": "",
"children": "",
"history": "/rest/api/content/398852913/history",
"ancestors": "",
"body": "",
"version": "",
"descendants": "",
"space": "/rest/api/space/TestSpace1",
},
"_links": {
"webui": "/spaces/TestSpace1/pages/398852913/Test+Project+Name+1++SOW",
"self": "https://enerzinx.atlassian.net/wiki/rest/api/content/398852913",
"tinyui": "/x/MQPGFw",
},
},
"title": "Test Project Name 1 SOW",
"excerpt": "file-list",
"url": "/spaces/TestSpace1/pages/398852913/Test+Project+Name+1++SOW",
"resultGlobalContainer": {
"title": "TestSpace1",
"displayUrl": "/spaces/TestSpace1",
},
"breadcrumbs": [],
"entityType": "content",
"iconCssClass": "aui-iconfont-page-default",
"lastModified": "2020-06-24T06:17:32.333Z",
"friendlyLastModified": "Jun 24, 2020",
"score": 0.59390986,
}

If your response is a string in JSON format first you will need to parse the JSON into a dictionary. For this you can use the json module, which is part of the set of standard modules. After parsing the JSON you can then access the keys using a dictionary lookup.
>>> import json
>>>
>>> json_string = '{"foo": "bar"}'
>>> json_dict = json.loads(json_string)
>>> json_dict["foo"]
'bar'
It looks like your response JSON will yield a nested dictionary, so the path for the title will look something like json_dict["foo"]["bar"].

Read specific JSON object from response in Python [duplicate]

This question already has answers here:
Accessing elements of Python dictionary by index
(11 answers)
Closed 6 years ago.
When you have a JSON response that contains multiple JSON objects, how do you pull out a specific object within the JSON using Python?
For example, with the following JSON response, I have three objects in it.
{
"_links": {
"base": "REDACTED",
"context": "",
"self": "REDACTED"
},
"limit": 20,
"results": [
{
"_expandable": {
"ancestors": "",
"body": "",
"children": "",
"container": "",
"descendants": "",
"extensions": "",
"history": "/rest/api/content/198121503/history",
"metadata": "",
"operations": "",
"space": "/rest/api/space/ReleaseNotes",
"version": ""
},
"_links": {
"self": "REDACTED",
"tinyui": "/x/HxjPCw",
"webui": "UNIQUE_URL_HERE"
},
"id": "198121503",
"status": "current",
"title": "Unique Title of Content",
"type": "page"
},
{
"_expandable": {
"ancestors": "",
"body": "",
"children": "",
"container": "",
"descendants": "",
"extensions": "",
"history": "/rest/api/content/197195923/history",
"metadata": "",
"operations": "",
"space": "/rest/api/space/ReleaseNotes",
"version": ""
},
"_links": {
"self": "REDACTED",
"tinyui": "/x/k-jACw",
"webui": "UNIQUE_URL_HERE"
},
"id": "197195923",
"status": "current",
"title": "Unique Title of Content",
"type": "page"
},
{
"_expandable": {
"ancestors": "",
"body": "",
"children": "",
"container": "",
"descendants": "",
"extensions": "",
"history": "/rest/api/content/198121203/history",
"metadata": "",
"operations": "",
"space": "/rest/api/space/ReleaseNotes",
"version": ""
},
"_links": {
"self": "REDACTED",
"tinyui": "/x/8xbPCw",
"webui": "UNIQUE_URL_HERE"
},
"id": "198121203",
"status": "current",
"title": "Unique Title of Content",
"type": "page"
}
],
"size": 3,
"start": 0
}
How can I retrieve the ID and TITLE for a specific object in the response?
I read in other threads that when you use json.loads(your_json), it becomes a dictionary. If that's the case, how do I pull this data if it's stored as a dictionary?
Update
Let me clarify, as maybe I'm not seeing or explaining this clearly.
Is the only option to cycle through everything? There's not an option to say get me the 2nd JSON object and return the ID and Title? If that's the case, why shouldn't I create a custom object, store the items I want from each JSON object into those within an array, then I can access each object within the array?

After you transform your response to json, you can just use key attributes.
for result in data['results']:
print("id: {}, title: {}".format(result['id'], result['title']))
As you mentioned, you can use json.load to transform string to dictionary. But if you're using requests library, just use response.json to get data in required format.

Use bracket notation to access the keys after loading the string into a json object. Loop through the results key until you find the object you want, just like this:
j = json.loads(your_json)
for r in j["results"]:
if r["title"] == "Something":
print(r["id"])
print(r["title"])

Python .get nested Json values

I have a json file with the following example json entry:
{
"title": "Test prod",
"leafPage": true,
"type": "product",
"product": {
"title": "test product",
"offerPrice": "$19.95",
"offerPriceDetails": {
"amount": 19.95,
"text": "$19.95",
"symbol": "$"
},
"media": [
{
"link": "http://www.test.com/cool.jpg",
"primary": true,
"type": "image",
"xpath": "/html[1]/body[1]/div[1]/div[3]/div[2]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]"
}
],
"availability": true
},
"human_language": "en",
"url": "http://www.test.com"
}
I can post via python script this to my test server perfectly when I use:
"text": entry.get("title"),
"url": entry.get("url"),
"type": entry.get("type"),
However I cannot get the following nested item to upload the values, how do I structure the python json call to get a nested python json entry?
Ive tried the below without success, I need to have it as .get because there are different fields currently in the json file and it errors out without the .get call.
"Amount": entry.get("product"("offerPrice"))
Any help on how to structure the nested json entry would be very much appreciated.

You need to do:
"Amount": entry.get("product", {}).get("offerPrice")
entry.get("product", {}) returns a product dictionary (or an empty dictionary if there is no product key).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get documents' objects length in MongoDB using PyMongo? - python

Related

Converting from json to dataframe to sql

JSON EXtraction in Python

Using Response of Confluence Rest API

Read specific JSON object from response in Python [duplicate]

Python .get nested Json values

Categories

Resources