I am trying to read a nested JSON using json_normalize method of Pandas. I am trying to use one of the fields as the record_path. I have also included the errors = 'ignore' to ignore any errors due to missing key. Can you please help me with what am I doing wrong here?
Here is the JSON -
{
"_id" : "31aa9894-6a43-40f9-8911-116c14c42636",
"message" : {
"serviceOperationName" : "/logUserEvents/event",
"accountNumber" : "1234",
"userId" : null,
"market" : null,
"extract" : {
"request" : {
"USER_EVENT_LOGGING" : {
"payload" : [
{
"eventType" : "audibleSummaryUsage",
"ntid" : "abc",
"accountNumber" : "Not Found",
"workOrderNumber" : "",
"data" : [
{
"name" : "userAction",
"value" : "DISMISSED"
},
{
"name" : "employeeTenure",
"value" : "3.9"
},
{
"name" : "ffc",
"value" : "1234"
},
{
"name" : "ntid",
"value" : "abcd"
},
{
"name" : "isAccountView",
"value" : "true"
},
{
"name" : "userAction",
"value" : "DISMISSED"
},
{
"name" : "title",
"value" : "abcd"
},
{
"name" : "jobType",
"value" : ""
},
{
"name" : "jobClassCd",
"value" : ""
}
]
}
]
}
},
"response" : {}
},
"#timestamp" : "2021-02-18T05:38:48.00269Z",
"eventKeys" : [
"USER_EVENT_LOGGING"
],
"requestStartTimestampText" : "2021-02-18T05:38:48.268Z"
},
"createdOn" : ISODate("2021-02-18T05:38:48.269Z")
}
/* 2 */
{
"_id" : "4189da82-299d-4a9e-8f10-ddb5da9b97b5",
"message" : {
"serviceOperationName" : "/logUserEvents/event",
"accountNumber" : "7890",
"userId" : null,
"market" : null,
"extract" : {
"request" : {
"USER_EVENT_LOGGING" : {
"payload" : [
{
"eventType" : "audibleSummaryUsage",
"ntid" : "defg",
"accountNumber" : "Not Found",
"workOrderNumber" : "",
"data" : [
{
"name" : "userAction",
"value" : "DISMISSED"
},
{
"name" : "userAction",
"value" : "DISMISSED"
},
{
"name" : "employeeTenure",
"value" : "3.9"
},
{
"name" : "jobType",
"value" : ""
},
{
"name" : "jobClassCd",
"value" : ""
},
{
"name" : "ntid",
"value" : "dfer"
},
{
"name" : "ffc",
"value" : "3456"
},
{
"name" : "title",
"value" : "erty"
},
{
"name" : "isAccountView",
"value" : "true"
}
]
}
]
}
},
"response" : {}
},
"#timestamp" : "2021-02-18T05:39:11.00659Z",
"eventKeys" : [
"USER_EVENT_LOGGING"
],
"requestStartTimestampText" : "2021-02-18T05:39:11.658Z"
},
"createdOn" : ISODate("2021-02-18T05:39:11.659Z")
}
Here is the code -
db = mongo_client.conciselogs
col = db.logs
cursor = col.find({"message.extract.request.USER_EVENT_LOGGING.payload.eventType":"audibleSummaryUsage"})
mongo_docs = list(cursor)
df = pd.json_normalize(mongo_docs, ['message.extract.request.USER_EVENT_LOGGING.payload.data'], errors = 'ignore')
df.to_csv('sample_data0220_3.csv', index=False)```
Your record_path argument is incorrect, it should be a list:
df = pd.json_normalize(
mongo_docs,
['message', 'extract', 'request', 'USER_EVENT_LOGGING', 'payload', 'data'], # list, not 'key.key.key'
errors='ignore',
)
df.to_csv('sample_data0220_3.csv', index=False)
Output:
name,value
userAction,DISMISSED
employeeTenure,3.9
ffc,1234
ntid,abcd
isAccountView,true
userAction,DISMISSED
title,abcd
jobType,
jobClassCd,
userAction,DISMISSED
userAction,DISMISSED
employeeTenure,3.9
jobType,
jobClassCd,
ntid,dfer
ffc,3456
title,erty
isAccountView,true
Related
Being new to python, I am unable to resolve the following issue.
Below is my python code, which returns different json outputs compared to when executed with list passed in a variable vs each line read from a file.
Code with lines read from file which throws partial/corrupted output:
import requests
import json
import re
repo_name = "repo"
with open('file_list_new.txt') as file:
for line in file:
url = "http://fqdn/repository/{0}/{1}?describe=json".format(repo_name, line)
response = requests.get(url)
json_data = response.text
data = json.loads(json_data)
print(data)
for size in data['items']:
if size['name'] == 'Payload':
value_size= size['value']['Size']
if value_size != -1:
print(value_size)
content of file_list_new.txt
mysql.odbc/5.1.14
mysql.odbc/5.1.11
corrupted output
{
"parameters" : {
"path" : "/mysql.odbc/5.1.14\n",
"nexusUrl" : "http://fqdn"
},
"items" : [ {
"name" : "Exception during handler processing",
"type" : "topic",
"value" : "Exception during handler processing"
}, {
"name" : "java.lang.IllegalArgumentException",
"type" : "table",
"value" : {
"Message" : "Illegal character in path at index 40: Packages(Id='mysql.odbc',Version='5.1.14\n')"
}
}, {
"name" : "java.net.URISyntaxException",
"type" : "table",
"value" : {
"Message" : "Illegal character in path at index 40: Packages(Id='mysql.odbc',Version='5.1.14\n')"
}
}, {
"name" : "Request",
"type" : "topic",
"value" : "Request"
}, {
"name" : "Details",
"type" : "table",
"value" : {
"Action" : "GET",
"path" : "/mysql.odbc/5.1.14\n"
}
}, {
"name" : "Parameters",
"type" : "table",
"value" : {
"describe" : "json"
}
}, {
"name" : "Headers",
"type" : "table",
"value" : {
"Accept" : "*/*",
"User-Agent" : "python-requests/2.27.1",
"Connection" : "keep-alive",
"Host" : "fqdn",
"Accept-Encoding" : "gzip, deflate"
}
}, {
"name" : "Attributes",
"type" : "table",
"value" : {
"org.apache.shiro.subject.support.DefaultSubjectContext.SESSION_CREATION_ENABLED" : false,
"Key[type=org.sonatype.nexus.security.SecurityFilter, annotation=[none]].FILTERED" : true,
"authcAntiCsrf.FILTERED" : true,
"nx-authc.FILTERED" : true,
"org.apache.shiro.web.servlet.ShiroHttpServletRequest_SESSION_ID_URL_REWRITING_ENABLED" : true,
"javax.servlet.include.servlet_path" : "/repository/repo/mysql.odbc/5.1.14%0A",
"nx-anonymous.FILTERED" : true,
"org.sonatype.nexus.security.anonymous.AnonymousFilter.originalSubject" : "org.apache.shiro.web.subject.support.WebDelegatingSubject#33c429ba",
"nx-apikey-authc.FILTERED" : true
}
}, {
"name" : "Payload",
"type" : "table",
"value" : {
"Content-Type" : "",
"Size" : -1
}
} ]
}
Code with variable of list with in the code:
import requests
import json
import re
repo_name = "repo"
file_list = ["mysql.odbc/5.1.11","mysql.odbc/5.1.14"]
for i in file_list:
url = "http://fqdn/repository/{0}/{1}?describe=json".format(repo_name, i)
response = requests.get(url)
json_data = response.text
data = json.loads(json_data)
for size in data['items']:
if size['name'] == 'Payload':
value_size= size['value']['Size']
if value_size != -1:
print(value_size)
[Expected output]Output with list passed within the code as
{
"parameters" : {
"path" : "/mysql.odbc/5.1.14",
"nexusUrl" : "http://fqdn"
},
"items" : [ {
"name" : "Request",
"type" : "topic",
"value" : "Request"
}, {
"name" : "Details",
"type" : "table",
"value" : {
"Action" : "GET",
"path" : "/mysql.odbc/5.1.14"
}
}, {
"name" : "Parameters",
"type" : "table",
"value" : {
"describe" : "json"
}
}, {
"name" : "Headers",
"type" : "table",
"value" : {
"Accept" : "*/*",
"User-Agent" : "python-requests/2.27.1",
"Connection" : "keep-alive",
"Host" : "fqdn",
"Accept-Encoding" : "gzip, deflate"
}
}, {
"name" : "Attributes",
"type" : "table",
"value" : {
"org.apache.shiro.subject.support.DefaultSubjectContext.SESSION_CREATION_ENABLED" : false,
"Key[type=org.sonatype.nexus.security.SecurityFilter, annotation=[none]].FILTERED" : true,
"authcAntiCsrf.FILTERED" : true,
"nx-authc.FILTERED" : true,
"org.apache.shiro.web.servlet.ShiroHttpServletRequest_SESSION_ID_URL_REWRITING_ENABLED" : true,
"javax.servlet.include.servlet_path" : "/repository/repo/mysql.odbc/5.1.14",
"nx-anonymous.FILTERED" : true,
"org.sonatype.nexus.security.anonymous.AnonymousFilter.originalSubject" : "org.apache.shiro.web.subject.support.WebDelegatingSubject#1433a6c9",
"nx-apikey-authc.FILTERED" : true
}
}, {
"name" : "Payload",
"type" : "table",
"value" : {
"Content-Type" : "",
"Size" : -1
}
}, {
"name" : "Response",
"type" : "topic",
"value" : "Response"
}, {
"name" : "Status",
"type" : "table",
"value" : {
"Code" : 200,
"Message" : ""
}
}, {
"name" : "Headers",
"type" : "table",
"value" : {
"ETag" : "\"df4f013db18103f1b9541cdcd6ba8632\"",
"Content-Disposition" : "attachment; filename=mysql.odbc.5.1.14.nupkg",
"Last-Modified" : "Tue, 13 Oct 2015 03:54:48 GMT"
}
}, {
"name" : "Attributes",
"type" : "table",
"value" : { }
}, {
"name" : "Payload",
"type" : "table",
"value" : {
"Content-Type" : "application/zip",
"Size" : 3369
}
} ]
}
I am not sure if I am doing something wrong or missing something simple.
Any help is much appreciated.
It looks like the newlines are being passed into the url string
"Message" : "Illegal character in path at index 40: Packages(Id='mysql.odbc',Version='5.1.14**\n**')"
You can do something like this to remove them
with open('file_list_new.txt') as file:
for line in file:
url = "http://fqdn/repository/{0}/{1}?describe=json".format(repo_name, line.strip())
Json is below
result = {
"took" : 21,
"timed_out" : False,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "data",
"_type" : "_doc",
"_id" : "qwcs",
"_score" : 1.0,
"_source" : {
"id" : "10",
"name" : "Country ",
"description" : "This product contains all currency details",
"Owner" : {
"id" : "11",
"Name" : "David",
"Email" : "nons#utc.com",
"role" : "Analyst"
},
"Area" : [
"Data Management"
],
"Type" : [
"API",
"TXT"
],
"Level" : [
"A"
]
}
}
]
}
}
I wrote the python code to extract the data from elastic through api hit and above is the result
sample api: http://utc.com/search/Owner.id?=11
Back-end query will generate {'query': {'match': {'Owner.id': '11'}}}
But i need only small details the expected out is below
"Owner" : {
"id" : "11",
"Name" : "David",
"Email" : "nons#utc.com",
"role" : "Analyst"
}
If you're saying you want to return only the Owners in the hits list with an id matching your query, you can use a list comprehension:
query = {'query': {'match': {'Owner.id': '11'}}}
owners = [hit['_source']['Owner'] for hit in result['hits']['hits']
if hit['_source']['Owner']['id'] == query['query']['match']['Owner.id']]
print(owners)
Output:
[{'id': '11', 'Name': 'David', 'Email': 'nons#utc.com', 'role': 'Analyst'}]
I have a documents in collection and I want to find document and update elements of list.
Here is sample data:
{
{
"_id" : ObjectId("5edd3faaf6c9d938e0bfd966"),
"id" : 1,
"status" : "XXX",
"number" : [
{
"code" : "AAA"
},
{
"code" : "CVB"
},
{
"code" : "AAA"
},
{
"code" : "BBB"
}
]
},
{
"_id" : ObjectId("asseffsfpo2dedefwef"),
"id" : 2,
"status" : "TUY",
"number" : [
{
"code" : "PPP"
},
{
"code" : "SSD"
},
{
"code" : "HDD"
},
{
"code" : "IOO"
}
]
}
}
I planed to find where "id":1 and value of number.code in ["AAA", "BBB"], change number.code to "DDD". I did it with following code:
db.test.update(
{
id: 1,
"number.code": {$in: ["AAA", "BBB"]}
},
{
$set: {"number.$[elem].code": "VVV"}
},
{ "arrayFilters": [{ "elem.code": {$in: ["AAA", "BBB"]} }], "multi": true, "upsert": false
}
)
It works in mongodb shell, but in python (with pymongo) it doesn't with the following error:
raise TypeError("%s must be True or False" % (option,))
TypeError: upsert must be True or False
Please help me. What can I do?
pymongo just has syntax that's a tad different. it would look like this:
db.test.update_many(
{
"id": 1,
"number.code": {"$in": ["AAA", "BBB"]}
},
{
"$set": {"number.$[elem].code": "VVV"}
},
array_filters=[{"elem.code": {"$in": ["AAA", "BBB"]}}],
upsert=False
)
multi flag not needed with update_many.
upsert is False by default hence also redundant.
You can find pymongo's docs here.
i have this code:
def get_attribute_colour(colour_code):
attribute_colour_meta = db.attributes.aggregate([{ '$match': {"name.en-UK": "Colour"} },
{ '$unwind' : "$values" },
{ '$project': { "code" : "$values.code", "valueId": "$values._id"} },
{ '$match': {"code": colour_code} }])
return attribute_colour_meta['result']
that looks up a collection called attributes, which has the following structure:
> db.attributes.find({}).pretty();
{
"_id" : ObjectId("53b27bded901f26432996e00"),
"values" : [
{
"code" : "AQ",
"pmsCode" : "638c",
"name" : {
"en-UK" : "Aqua"
},
"tcxCode" : "16-4529 TCX",
"hexCode" : "#00aed8",
"images" : [
"AQ.jpg"
],
"_id" : ObjectId("53b27bded901f26432996d83")
},
{
"code" : "AQ",
"pmsCode" : "3115c",
"name" : {
"en-UK" : "Aqua"
},
"tcxCode" : "",
"hexCode" : "#00c4db",
"images" : [
"AQ.jpg"
],
"_id" : ObjectId("53b27bded901f26432996d84")
},
.....
}
],
"name" : {
"en-UK" : "Colour"
}
}
{
"_id" : ObjectId("53b27bded901f26432996e1b"),
"values" : [
{
"code" : 0,
"_id" : ObjectId("53b27bded901f26432996e01"),
"name" : {
"en-UK" : "0-3 MTHS"
}
},
.....
}
],
"name" : {
"en-UK" : "Size"
}
}
{
"_id" : ObjectId("53b27bded901f26432996e28"),
"values" : [
{
"Currency" : "GBP",
"_id" : ObjectId("53b27bded901f26432996e1c"),
"name" : {
"en-UK" : "Carton price list"
}
},
}
],
"name" : {
"en-UK" : "Price list"
}
}
>
basically, there are 3 attributes, colour, size and price list, each of which has sub-documents called values
in my def get_attribute_colour function, how do i return the _id for the attribute within the results, so that i get something like:
{ attributeId: ObjectId("53b27bded901f26432996e00"),
valueId: ObjectId("53b27bded901f26432996d83") }
the result does return the _id:
[{u'code': u'AQ', u'_id': ObjectId('53b27bded901f26432996e00'), u'valueId': ObjectId('53b27bded901f26432996d83')}]
but i don't see where this is specified?
any advice much appreciated.
hello i have the following mongodb collection:
> db.attributes.find().pretty()
{
"_id" : ObjectId("53a4445fd901f278f8685b91"),
"values" : [
{
"code" : "AQ",
"pmsCode" : "638c",
"name" : {
"en-UK" : "Aqua"
},
"tcxCode" : "16-4529 TCX",
"hexCode" : "#00aed8",
"images" : [
"AQ.jpg"
],
"_id" : ObjectId("53a4445fd901f278f8685b17")
},
{
"code" : "AQ",
"pmsCode" : "3115c",
"name" : {
"en-UK" : "Aqua"
},
"tcxCode" : "",
"hexCode" : "#00c4db",
"images" : [
"AQ.jpg"
],
"_id" : ObjectId("53a4445fd901f278f8685b18")
}],
"name" : {
"en-UK" : "Colour"
}
}
{
"_id" : ObjectId("53a4445fd901f278f8685bac"),
"values" : [
{
"code" : 0,
"_id" : ObjectId("53a4445fd901f278f8685b92"),
"name" : {
"en-UK" : "0-3 MTHS"
}
}, {
"code" : 0,
"_id" : ObjectId("53a4445fd901f278f8685b93"),
"name" : {
"en-UK" : "ONE SIZE"
}
}
,
"name" : {
"en-UK" : "Size"
}
}
basically a collection that has two object Colour and Size which have sub-objects called values
what is the correct way to find the ObjectId for specific Colour values code using pymongo?
I have this attribute_id = attributes.find({"values.code": product_color_code}) but how do i extract the actual ObjectID from this?
any advise much appreciated.
u can try using select _id from table_name GROUP BY _id HAVING some_condition_on_colout methods of SQL
in mongodb using python, you can do the following
ideas.aggregate([
{"$match": {'colour':'some_Color_of_ur_choice' }},
{'$group':{'_id': "$_id",'count':{"$sum": 1 } }}
])
this will help u even count no. of occurrences of colours according to the ObjectIds