Extract values from oddly-nested Python - python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.

Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']

somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.

This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

Related

python searching through dict that contain list of nested dict

I am looking to pull all the "symbol" from a Dict that looks like this:
file_data = json.load(f)
{
"symbolsList" : [ {
"symbol" : "SPY",
"name" : "SPDR S&P 500",
"price" : 261.56,
"exchange" : "NYSE Arca"
}, {
"symbol" : "CMCSA",
"name" : "Comcast Corporation Class A Common Stock",
"price" : 35.49,
"exchange" : "Nasdaq Global Select"
}, {
"symbol" : "KMI",
"name" : "Kinder Morgan Inc.",
"price" : 13.27,
"exchange" : "New York Stock Exchange"
}
}
after looking up I found a way to access certain symbol. but I would like to get all the symbol in a form of list or dict doesn't really matter to me.
this is what I got:
print([next(item for item in file_data["symbolsList"] if item["symbol"] == "SPY")])
I know that the problem is with the next function I just don't know how to get all the symbols
you can use a list comprehension:
[e['symbol'] for e in d['symbolsList']]
output:
['SPY', 'CMCSA', 'KMI']
the same thing using a for loop:
result = []
for e in d['symbolsList']
result.append(e['symbol'])

Restructuring and reformatting json api output data

Hello I have some json api output data that I am trying to restructure/reformat. Here is a sample of the output:
{
"playergamelogs: {
"gamelogs": [
{
"game" : {
"date" : "2016-10-13"
"id" : "32637},
"player": {
"ID": "4419"},
"team" : {
"id" : "16},
"stats" : {
"minutes": "10"}
},
{
"game": {
"date" : "2016-10-17"
"id" : "33737},
"player": {
"ID": "4419"},
"team" : {
"id" : "16
},
"stats" : {
"minutes": "10"
What I would like to do is group the data by player id (or name). For example:
`{
"playerlogs" : [
{
"player" : {
"ID" : "4419"
"team" : {
"id" : "16"
},
"gamelogs" : [
{
"game" : {}
"game" : {}
}
}
"player" : {
....
}
}`
The best way that I can think to accomplish this is nested for loops using dict.items() and if statements to match the appropriate player ID's. I am having trouble with the most efficient way to go about restructuring. I am fairly new to python and any help is greatly appreciated.
Instead of hardcoding your transformation logic into python code I suggest to check MongoDB. It is a JSON based document database and you can create such queries with it.
Here is a very simple example:
https://docs.mongodb.com/manual/aggregation/
Your data is more complex, but player is similar to cust_id and game to amount in the example.
Here is how I was able solve my initial question and reformat the original data:
for i in range(0,len(dataedit)):
playerID = dataedit[i]["player"]["ID"]
if not any(p.get('player',{}).get('ID',{}) == playerID for p in playerlog):
playerlog.append({})
playerlog[x]["player"] = dataedit[i]["player"]
x+=1
gameID = dataedit[i]["game"]["id"]
playerlog[x-1]["player"]["game" + gameID] = dataedit[i]["game"]
playerlog[x-1]["player"]["game" + gameID]["stats"] = dataedit[i]["stats"]
playerlog[x-1]["player"]["game" + gameID]["team"] = dataedit[i]["team"]
As I am still learning, I would love to get feedback/comments on how to improve.

How do I delete values from this document in MongoDB using Python

I am having a document which is structured like this
{
"_id" : ObjectId("564c0cb748f9fa2c8cdeb20f"),
"username" : "blah",
"useremail" : "blah#blahblah.com",
"groupTypeCustomer" : true,
"addedpartners" : [
"562f1a629410d3271ba74f74",
"562f1a6f9410d3271ba74f83"
],
"groupName" : "Mojito",
"groupTypeSupplier" : false,
"groupDescription" : "A group for fashion designers"
}
Now I want to delete one of the values from this 'addedpartners' array and update the document.
I want to just delete 562f1a6f9410d3271ba74f83 from the addedpartners array
This is what I had tried earlier.
db.myCollection.update({'_id':'564c0cb748f9fa2c8cdeb20f'},{'$pull':{'addedpartners':'562f1a6f9410d3271ba74f83'}})
db.myCollection.update(
{ _id: ObjectId(id) },
{ $pull: { 'addedpartners': '562f1a629410d3271ba74f74' } }
);
Try with this
db.myCollection.update({}, {$unset : {"addedpartners.1" : 1 }})
db.myCollection.update({}, {$pull : {"addedpartners" : null}})
No way to delete array directly, i think this is going to work, i haven't tried yet.

MongoDB find in array of objects

I want to query Mongodb: find all users, that have 'artist'=Iowa in any array item of objects.
Here is Robomongo of my collection:
In Python I'm doing:
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music': {
'items': {
'$elemMatch': {
'artist': 'Iowa'
}
}
}
})
but this returns nothing. Also tried this:
{'member_of_group': 20548570, 'my_music': {'$elemMatch': {'$.artist': 'Iowa'}}} and that didn't work.
Here is part of document with array:
"can_see_audio" : 1,
"my_music" : {
"items" : [
{
"name" : "Anastasia Plotnikova",
"photo" : "http://cs620227.vk.me/v620227451/9c47/w_okXehPbYc.jpg",
"id" : "864451",
"name_gen" : "Anastasia"
},
{
"title" : "Ain't Talkin' 'Bout Dub",
"url" : "http://cs4964.vk.me/u14671028/audios/c5b8a0735224.mp3?extra=jgV4ZQrFrsfxZCJf4gsRgnKWvdAfIqjE0M6eMtxGFpj2yp4vjs5DYgAGImPMp4mCUSUGJzoyGeh2Es6L-H51TPa3Q_Q",
"lyrics_id" : 24313846,
"artist" : "Apollo 440",
"genre_id" : 18,
"id" : 344280392,
"owner_id" : 864451,
"duration" : 279
},
{
"title" : "Animals",
"url" : "http://cs1316.vk.me/u4198685/audios/4b9e4536e1be.mp3?extra=TScqXzQ_qaEFKHG8trrwbFyNvjvJKEOLnwOWHJZl_cW5EA6K3a9vimaMpx-Yk5_k41vRPywzuThN_IHT8mbKlPcSigw",
"lyrics_id" : 166037,
"artist" : "Nickelback",
"id" : 344280351,
"owner_id" : 864451,
"duration" : 186
},
The following query should work. You can use the dot notation to query into sub documents and arrays.
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music.items.artist':'Iowa'
})
The following query worked for me in the mongo shell
db.collection1.find({ "my_music.items.artist" : "Iowa" })

Parsing json in python

I am having trouble parsing the following json file. I am trying to parse this using logstash/python.
{
"took" : 153,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 946,
"max_score" : 1.0,
"hits" : [ {
"_index" : "incoming_bytes",
"_type" : "logs",
"_id" : "lZSq4mBRSVSxO0kyTwh3fQ",
"_score" : 1.0, "_source" : {"user_id":"86c8c25d81a448c49e3d3924ea5ceddf","name":"network.incoming.bytes","resource_id":"instance-00000001-c8be5ca1-116b-45b3-accb-1b40050abc90-tapaf1e421f-c8","timestamp":"2013-11-02T07:32:36Z","resource_metadata":{"name":"tapaf1e421f-c8","parameters":{},"fref":null,"instance_id":"c8be5ca1-116b-45b3-accb-1b40050abc90","instance_type":"5c014e54-ee16-43a8-a763-54e243bd8969","mac":"fa:16:3e:67:39:29"},"volume":557462,"source":"openstack","project_id":"9ac587404bdd4fcdafe41c0b10f9f8ae","type":"cumulative","id":"f1eb19aa-4390-11e3-8bac-000c2973cfb1","unit":"B","#timestamp":"2013-11-02T07:32:38.276Z","#version":"1","host":"127.0.0.1","tags":["_grokparsefailure"],"priority":13,"severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","#type":"%{appdeliveryserver}"}
}, {
"_index" : "incoming_bytes",
"_type" : "logs",
"_id" : "073URWt5Sc-krLACxQnI3g",
"_score" : 1.0, "_source" : {"user_id":"86c8c25d81a448c49e3d3924ea5ceddf","name":"network.incoming.bytes","resource_id":"instance-00000001-c8be5ca1-116b-45b3-accb-1b40050abc90-tapaf1e421f-c8","timestamp":"2013-11-02T07:32:38Z","resource_metadata":{"name":"tapaf1e421f-c8","parameters":{},"fref":null,"instance_id":"c8be5ca1-116b-45b3-accb-1b40050abc90","instance_type":"5c014e54-ee16-43a8-a763-54e243bd8969","mac":"fa:16:3e:67:39:29"},"volume":562559,"source":"openstack","project_id":"9ac587404bdd4fcdafe41c0b10f9f8ae","type":"cumulative","id":"f31e38d4-4390-11e3-8bac-000c2973cfb1","unit":"B","#timestamp":"2013-11-02T07:32:39.001Z","#version":"1","host":"127.0.0.1","tags":["_grokparsefailure"],"priority":13,"severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","#type":"%{appdeliveryserver}"}
}]
}
}
I have used the following configuration for logstash, however the configuration does not work as expected which is: to parse individual fields in the JSON document and output to STDOUT.
input {
stdin{}
file {
path => ["/home/****/Downloads/throughput"]
codec => "json"
}
}
filter{
json{
source => "message"
target => "throughput"
}
}
output {
stdout {codec => rubydebug }
}
For python I am trying to access the Individual Volume and source (IP address) fields.
I tried the following code, with goal to map individual fields for each record, and I would like to know how to proceed in order to traverse and extract individual elements in the list.
import json
from pprint import pprint
json_data=open('throughput')
data = json.load(json_data)
pprint(data["hits"])
json_data.close()
Thanks
Parsed json is a dictionary, you can use itemgetter to drill down.
For example for volume
>>> for hits in data['hits']['hits']:
... print hits['_source']['volume']
...
557462
562559
or can use map to get a list:
>>> from operator import itemgetter
>>> map(itemgetter('volume'), map(itemgetter('_source'), data['hits']['hits']))
[557462, 562559]

Categories