How to create json object for mongodb - python

I'm building json object which I want to insert to an already existing collection in mongoldb with data using pymongo. The data looks like this:
[
{
"title": "PyMongo",
"publication_date": "2015-09-07 10:00:00",
"tags": ["python", "mongodb", "nosql"],
"author": {
"name": "David",
"author_info": {
"$oid": "1870981708ddb1a352189e25w"
},
},
"date": {
"$date": 1484071215000 },
}
]
I noticed author has objectid and date is a timestamp, how can I create values for author_info and date and insert into mongodb?
I have a function that builds the values like this:
def build_json(title, pubctndate,taglist,author):
json_ob = \
[ {
"title": title,
"publication_date": pubctndate,
"tags": tablets,
"author": {
"name": author,
"author_info": {
"$oid": " "
},
},
"date": {"$date": },
}]
return json_ob
which I intend to call this way json.dumps(build_json(title, pubctndate,taglist,author))
Please bear with me, I'm a complete beginner.

Related

Read JSON file to dataframe

I have a JSON with the following structure below into a list in a python variable. I'd like to extract this JSON value as a table. My question is, how can I extract it from the list and how can I change it into a table?
Once I have converted it, I will insert the output into a SQL table.
JSON structure:
{
"Details": [
{
"List": [
{
"year": "2018-19",
"Title": "PLANTN-OTRWRKS"
},
{
"year": "2018-19",
"Title": "EXTERNAL"
},
{
"year": "2019-20",
"Title": "INTERNAL"
},
{
"year": "2020-21",
"Title": "BIANNUAL"
},
{
"year": "2022-23",
"Title": "WORKS OF 2017-18 AND 2018-19"
}
],
"Wise": [
{
"Circle": "dgrgr",
"ID": "912",
"total_seedlings_evaluated": "5270",
"Average_height_in_meters": "2.53",
"Average_collar_girth_in_cms": "11.32",
"Survival_perc": "86.79",
"Condition": "Very Good"
},
{
"Circle": "hgrj",
"ID": "4654",
"total_seedlings_evaluated": "117206",
"Average_height_in_meters": "3.04",
"Average_collar_girth_in_cms": "22.61",
"Survival_perc": "71.61",
"Condition": "Good"
}
]
}
]
}
Desired Output: Output
I had used Python and following code:
df_table = pd.json_normalize(jsonObj['Details'])
df_table1 = pd.DataFrame(df_table['Wise'],index=df_table.index)
But it didn't work.

replace nested document array mongodb with python

i have this document in mongodb
{
"_id": {
"$oid": "62644af0368cb0a46d7c2a95"
},
"insertionData": "23/04/2022 19:50:50",
"ipfsMetadata": {
"Name": "data.json",
"Hash": "Qmb3FWgyJHzJA7WCBX1phgkV93GiEQ9UDWUYffDqUCbe7E",
"Size": "431"
},
"metadata": {
"sessionDate": "20220415 17:42:55",
"dataSender": "user345",
"data": {
"height": "180",
"weight": "80"
},
"addtionalInformation": [
{
"name": "poolsize",
"value": "30m"
},
{
"name": "swimStyle",
"value": "mariposa"
},
{
"name": "modality",
"value": "swim"
},
{
"name": "gender-title",
"value": "schoolA"
}
]
},
"fileId": {
"$numberLong": "4"
}
}
I want to update nested array document, for instance the name with gender-tittle. This have value schoolA and i want to change to adult like the body. I give the parameter number of fileId in the post request and in body i pass this
post request : localhost/sessionUpdate/4
and body:
{
"name": "gender-title",
"value": "adultos"
}
flask
#app.route('/sessionUpdate/<string:a>', methods=['PUT'])
def sessionUpdate(a):
datas=request.json
r=str(datas['name'])
r2=str(datas['value'])
print(r,r2)
r3=collection.update_one({'fileId':a, 'metadata.addtionalInformation':r}, {'$set':{'metadata.addtionalInformation.$.value':r2}})
return str(r3),200
i'm getting the 200 but the document don't update with the new value.
As you are using positional operator $ to work with your array, make sure your select query is targeting array element. You can see in below query that it is targeting metadata.addtionalInformation array with the condition that name: "gender-title"
db.collection.update({
"fileId": 4,
"metadata.addtionalInformation.name": "gender-title"
},
{
"$set": {
"metadata.addtionalInformation.$.value": "junior"
}
})
Here is the Mongo playground for your reference.

Use Python to iterate through nested JSON data

TASK:
I am using a API call to get JSON data from our TeamCity CI tool.
We need to identify all those builds which are using old version of msbuild.
We can identify from this API call data
{
"name": "msbuild_version",
"value": "15.0"
}
At the moment i am saving the entire API call data to a file; however i will later integrate the API call to the same script.
Now to the question at hand; How can i filter this above property i.e. msbuild_version, to say msbuild_version < 15.0 (i.e. all msbuild less than version 15.0) and display the corresponding 'id' and 'projectName' under 'buildType'; e.g.
"id": "AIntegration_BTool_BToolBuilds_DraftBuild",
"projectName": "A Integration / B Tool / VAR Builds",
here is a part of the JSON data file:-
{
"project": [{
"id": "_Root",
"buildTypes": {
"buildType": []
}
}, {
"id": "AI_BTool_BToolBuilds",
"buildTypes": {
"buildType": [{
"id": "AI_BTool_BToolBuilds_DraftBuild",
"projectName": "A I / B Tool / VAR Builds",
"steps": {
"step": [ {
"id": "RUNNER_213",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [ {
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "run-platform",
"value": "x64"
}, {
"name": "targets",
"value": "Build"
}, {
"name": "teamcity.step.mode",
"value": "default"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}, {
"id": "RUNNER_228",
"name": "temp",
"type": "VS.Solution",
"properties": {
"property": [{
"name": "build-file-path",
"value": "x"
}, {
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "vs.version",
"value": "vs2019"
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_ContinuousBuildWithNexusI",
"projectName": "A I / B Tool / VAR Builds",
"steps": {
"step": [ {
"id": "RUNNER_22791",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [{
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "run-platform",
"value": "x86"
}, {
"name": "teamcity.step.mode",
"value": "default"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_VARApiBuilds",
"buildTypes": {
"buildType": [{
"id": "AI_BTool_BToolBuilds_CiVARNewSolutionContinuousBuild",
"projectName": "A I / B Tool / VAR Builds / VAR API builds",
"steps": {
"step": [ {
"id": "RUNNER_22791",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [{
"name": "msbuilds_version",
"value": "15.0"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_VARApiBuilds_CiVARIngestionWindowsServiceNonReleaseBranchBuild",
"projectName": "A I / B Tool / VAR Builds / VAR API builds",
"steps": {
"step": [{
"id": "RUNNER_22790",
"name": "Nuget Installer",
"type": "jb.nuget.installer",
"properties": {
"property": [{
"name": "nuget.path",
"value": "%teamcity.tool.NuGet.CommandLine.4.9.2%"
}, {
"name": "msbuilds_version",
"value": "16.0"
}, {
"name": "nuget.use.restore",
"value": "restore"
}, {
"name": "sln.path",
"value": "VAR.sln"
}, {
"name": "teamcity.step.mode",
"value": "default"
}]
}
}]
}
}]
}
}]
}
My Solution till now
And my code snippet till now
import json
with open('UnArchivedBuilds.txt') as api_call:
read_content = json.load(api_call)
#for project in read_content['project']:
# print (project.get('buildTypes'))
for project in read_content['project']:
# print (project['id'])
print (project['buildTypes']['buildType'])
I am not able to decide on the hierarchy of the JSON to print the relevant data (i.e id and projectName) wherever msbuild_version is less than 15.0
I had a look at your JSON data which was broken. In order to work with the snippet you provided I fixed the malformed data and removed unneeded parts to decrease clutter:
{
"project": [{
"buildTypes": {
"buildType": [{
"id": "AIntegration_BTool_BToolBuilds_DraftBuild",
"projectName": "A Integration / B Tool / VAR Builds"
},
{
"id": "AIntegration_BTool_BToolBuilds_ContinuousBuildIntegration",
"projectName": "A Integration / B Tool / VAR Builds"
}]
}
}]
}
As per my comment above I suggested recursion or using a schema validator to boil the JSON data down. However, as a quick-and-dirty approach and due to the weird structure of your dataset, I decided to do a multi-iteration over some parts of your data. Iterating over the same data is quite ugly and considered to be bad practice in most cases.
Assuming the data is stored in a file called input.json, the following snippet should give you the desired output:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
with open('input.json') as f:
data = json.load(f)
projects = (element for element in data.get('project'))
build_types = (element.get('buildTypes') for element in projects)
build_types = (element.get('buildType') for element in build_types)
for item in build_types:
for element in item:
identifier = element.get('id')
project_name = element.get('projectName')
print('{} --> {}'.format(identifier, project_name))
Printing:
AIntegration_BTool_BToolBuilds_DraftBuild --> A Integration / B Tool / VAR Builds
AIntegration_BTool_BToolBuilds_ContinuousBuildIntegration --> A Integration / B Tool / VAR Builds

How can I add database references in schema validation when creating a mongodb collection?

Say I have a collection "cities" with the following documents:
Document 1:
{
"_id": {
"$oid": "5e00979d7c21388869c2c048"
},
"cityName": "New York"
}
Document 2:
{
"_id": {
"$oid": "5e00979d7c21388869c2c432"
},
"cityName": "Los Angeles"
}
and I want to create another collection "students" with the following document:
{
"name": "John",
"citiesVisited": [
{
"$ref": "cities",
"$id": "5e00979d7c21388869c2c048"
},
{
"$ref": "cities",
"$id": "5e00979d7c21388869c2c432"
}
]
}
How should the schema validation be? I tried the following validation:
validator = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "citiesVisited"],
"properties": {
"name": {
"bsonType": "string",
"description": "name of student."
},
"citiesVisited": {
"bsonType": ["array"],
"items": {
"bsonType": "object",
"required": ["$ref", "$id"],
"properties": {
"$ref": {
"bsonType": "string",
"description": "collection name"
},
"$id": {
"bsonType": "string",
"description": "document id of visited city"
}
}
},
"description": "cities visited by the student"
}
}
}}
but it gives the following error when I try to get a list of all collections in the database:
bson.errors.InvalidBSON: collection must be an instance of str
I tried creating the validation without the "$" in "$ref" and "$id" and it worked fine but the document validation failed because of database references.
I want to use dbrefs when storing the cities.

Accessing nested value in loop in json using python

I want to fetch the value of each api3 in this json object where each array has api3 value.
{
"count": 10,
"result": [
{
"type": "year",
"year": {
"month": {
"api1": {
"href": "https://Ap1.com"
},
"api2": {
"href": "FETCH-CONTENT"
},
"api3": {
"href": "https://Ap3.com"
},
"api4": {
"href": "https://Ap4.com"
}
},
"id": "sdvnkjsnvj",
"summary": "summeryc",
"type": "REST",
"apiId": "mlksmfmksdfs",
"idProvider": {
"id": "sfsmkfmskf",
"name": "Apikey"
},
"tags": []
}
},
{
"type": "year1",
"year": {
"month": {
"api1": {
"href": "https://Ap11.com"
},
"api2": {
"href": "FETCH-CONTENT-1"
},
"api3": {
"href": "https://Ap13.com"
},
"api4": {
"href": "https://Ap14.com"
}
},
"id": "sdvnkjsnvj",
"summary": "summeryc",
"type": "REST",
"apiId": "mlksmfmksdfs",
"idProvider": {
"id": "sfsmkfmskf",
"name": "Apikey"
},
"tags": []
}
},
I am able to get the whole json object and first value inside it.
with open('C:\python\examplee.json','r+') as fr:
data = json.load(fr)
print(data["result"])
Thank you in advance for helping me figuring this.
For each element in list of result key, get the value for the nested dictionary within item
print([item['year']['month']['api3'] for item in data['result']])
The output will be [{'href': 'https://Ap3.com'}, {'href': 'https://Ap13.com'}]
Or if you want to get the href value as well
print([item['year']['month']['api3']['href'] for item in data['result']])
The output will be
['https://Ap3.com', 'https://Ap13.com']
So your whole code will look like
data = {}
with open('C:\python\examplee.json','r+') as fr:
data = json.load(fr)
print([item['year']['month']['api3']['href'] for item in dct['result']])
Looks like your JSON schema is static so you can just use this:
print([x['year']['month']['api3']['href'] for x in data['result']])
will return you:
['https://Ap3.com', 'https://Ap13.com']

Categories