Select specific keys inside a json using python - python

I have the following json that I extracted using request with python and json.loads. The whole json basically repeats itself with changes in the ID and names. It has a lot of information but I`m just posting a small sample as an example:
"status":"OK",
"statuscode":200,
"message":"success",
"apps":[
{
"id":"675832210",
"title":"AGED",
"desc":"No annoying ads&easy to play",
"urlImg":"https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
"urlImgWide":"https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
"urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
"androidPackage":"com.agedstudio.freecell",
"revenueType":"cpi",
"revenueRate":"0.10",
"categories":"Card",
"idx":"2",
"country":[
"CH"
],
"cityInclude":[
"ALL"
],
"cityExclude":[
],
"targetedOSver":"ALL",
"targetedDevices":"ALL",
"bannerId":"675832210",
"campaignId":"495181210",
"campaignType":"network",
"supportedVersion":"",
"storeRating":"4.3",
"storeDownloads":"10000+",
"appSize":"34603008",
"urlVideo":"",
"urlVideoHigh":"",
"urlVideo30Sec":"https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
"urlVideo30SecHigh":"https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
"offerId":"5825774"
},
I dont need all that data, just a few like 'title', 'country', 'revenuerate' and 'urlApp' but I dont know if there is a way to extract only that.
My solution so far was to make the json a dataframe and then drop the columns, however, I wanted to find an easier solution.
My ideal final result would be to have a dataframe with selected keys and arrays
Does anybody know an easy solution for this problem?
Thanks

I assume you have that data as a dictionary, let's call it json_data. You can just iterate over the apps and write them into a list. Alternatively, you could obviously also define a class and initialize objects of that class.
EDIT:
I just found this answer: https://stackoverflow.com/a/20638258/6180150, which tells how you can convert a list of dicts like from my sample code into a dataframe. See below adaptions to the code for a solution.
json_data = {
"status": "OK",
"statuscode": 200,
"message": "success",
"apps": [
{
"id": "675832210",
"title": "AGED",
"desc": "No annoying ads&easy to play",
"urlImg": "https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
"urlImgWide": "https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
"urlApp": "https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
"androidPackage": "com.agedstudio.freecell",
"revenueType": "cpi",
"revenueRate": "0.10",
"categories": "Card",
"idx": "2",
"country": [
"CH"
],
"cityInclude": [
"ALL"
],
"cityExclude": [
],
"targetedOSver": "ALL",
"targetedDevices": "ALL",
"bannerId": "675832210",
"campaignId": "495181210",
"campaignType": "network",
"supportedVersion": "",
"storeRating": "4.3",
"storeDownloads": "10000+",
"appSize": "34603008",
"urlVideo": "",
"urlVideoHigh": "",
"urlVideo30Sec": "https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
"urlVideo30SecHigh": "https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
"offerId": "5825774"
},
]
}
filtered_data = []
for app in json_data["apps"]:
app_data = {
"id": app["id"],
"title": app["title"],
"country": app["country"],
"revenueRate": app["revenueRate"],
"urlApp": app["urlApp"],
}
filtered_data.append(app_data)
print(filtered_data)
# Output
d = [
{
'id': '675832210',
'title': 'AGED',
'country': ['CH'],
'revenueRate': '0.10',
'urlApp': 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='
}
]
d = pd.DataFrame(filtered_data)
print(d)
# Output
id title country revenueRate urlApp
0 675832210 AGED [CH] 0.10 https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=

if your endgame is dataframe, just load the dataframe and take the columns you want:
setting the json to data
df = pd.json_normalize(data['apps'])
yields
id title desc urlImg ... urlVideoHigh urlVideo30Sec urlVideo30SecHigh offerId
0 675832210 AGED No annoying ads&easy to play https://test.com/pImg.aspx?b=675832&z=1041813&... ... https://cdn.test.com/banner/video/video-675832... https://cdn.test.com/banner/video/video-675832... 5825774
[1 rows x 28 columns]
then if you want certain columns:
df_final = df[['title', 'desc', 'urlImg']]
title desc urlImg
0 AGED No annoying ads&easy to play https://test.com/pImg.aspx?b=675832&z=1041813&...

use a dictionary comprehension to extract a dictionary of key/value pairs you want
import json
json_string="""{
"id":"675832210",
"title":"AGED",
"desc":"No annoying ads&easy to play",
"urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
"revenueRate":"0.10",
"categories":"Card",
"idx":"2",
"country":[
"CH"
],
"cityInclude":[
"ALL"
],
"cityExclude":[
]
}"""
json_dict = json.loads(json_string)
filter_fields=['title','country','revenueRate','urlApp']
dict_result = { key: json_dict[key] for key in json_dict if key in filter_fields}
json_elements = []
for key in dict_result:
json_elements.append((key,json_dict[key]))
print(json_elements)
output:
[('title', 'AGED'), ('urlApp', 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='), ('revenueRate', '0.10'), ('country', ['CH'])]

Related

Improperly formatted json? [duplicate]

This question already has answers here:
Python list of dictionaries search
(24 answers)
Closed last month.
First, I am new to Python and working with JSON.
I am trying to extract just one value from an API request response, and I am having a difficult time parsing out the data I need.
I have done a lot of searching on how to do this, but most all the examples use a string or file that is formatted is much more basic than what I am getting.
I understand the key - value pair concept but I am unsure how to reference the key-value I want. I think it has something to do with the response having multiple objects having the same kay names. Or maybe the first line "Bookmark" is making things goofy.
The value I want is for the model name in the response example below.
That's all I need from this. Any help would be greatly appreciated.
{
"Bookmark": "<B><P><p>SerNum</p><p>Item</p></P><D><f>false</f><f>false</f></D><F><v>1101666</v><v>ADDMASTER IJ7102-23E</v></F><L><v>123456</v><v>Model Name</v></L></B>",
"Items": [
[
{
"Name": "SerNum",
"Value": "123456"
},
{
"Name": "Item",
"Value": "Model Name"
},
{
"Name": "_ItemId",
"Value": "PBT=[unit] unt.DT=[2021-07-28 08:20:33.513] unt.ID=[eae2621d-3e9f-4515-9763-55e67f65fae6]"
}
]
],
"Message": "Success",
"MessageCode": 0
}
If you want to find value of dictionary with key 'Name' and value 'Item' you can do:
import json
with open('your_data.json', 'r') as f_in:
data = json.load(f_in)
model_name = next((i['Value'] for lst in data['Items'] for i in lst if i['Name'] == 'Item'), 'Model name not found.')
print(model_name)
Prints:
Model Name
Note: if the dictionary is not found string 'Model name not found.' is returned
First, load the JSON into a python dict:
import json
x = '''{
"Bookmark": "<B><P><p>SerNum</p><p>Item</p></P><D><f>false</f><f>false</f></D><F><v>1101666</v><v>ADDMASTER IJ7102-23E</v></F><L><v>123456</v><v>Model Name</v></L></B>",
"Items": [
[
{
"Name": "SerNum",
"Value": "123456"
},
{
"Name": "Item",
"Value": "Model Name"
},
{
"Name": "_ItemId",
"Value": "PBT=[unit] unt.DT=[2021-07-28 08:20:33.513] unt.ID=[eae2621d-3e9f-4515-9763-55e67f65fae6]"
}
]
],
"Message": "Success",
"MessageCode": 0
}'''
# parse x:
y = json.loads(x)
# The result is a Python dictionary.
Now if you want the value 'Model Name', you would do:
print(y['Items'][0][1]['Value'])

Parse the inside json data of all variables

Im working on big json data . In every data set ther will be Name,Type,Value. Im actually need to parse the all type which as equal to "Shirt". I don't know to parse the inside json data.
x={
"Data": "Ecommerce",
"Host":{
"Items": [
[
{
"Name": "cot",
"Value": 99,
"Type": "Shirt"
},
{
"Name": "ploy",
"Value": 90,
"Type": "Pant"
},
{
"Name": "lyc",
"Value": 22,
"Type":"Shirt"
}
]
],
}}
k=x.get("Host")
print(k)
The above code will Display all data inside the Host.
What I'm trying to get output as parse only the Type: Shirt and Value of the Shirt .
I tried with some Def ,loop concepts but i can't able to achieve my output.
Output I'm looking for :-
If Type = Shirt , need parse that json data
Cot:90
Lyc:22
In dict format
This shows how to access the data. I assume you can change this to create a dictionary.
for item in x['Host']['Items'][0]:
if item['Type'] == 'Shirt':
print(item['Name'],':',item['Value'])
This is an example to do it with short for
items = x["Host"]["Items"]
shirt_values = {item["Name"]: item["Value"] for sublist in items for item in sublist if item["Type"] == "Shirt"}

How to get the values of dictionary python?

I have the below python dictionary stored as dictPython
{
"paging": {"count": 10, "start": 0, "links": []},
"elements": [
{
"organizationalTarget~": {
"vanityName": "vv",
"localizedName": "ViV",
"name": {
"localized": {"en_US": "ViV"},
"preferredLocale": {"country": "US", "language": "en"},
},
"primaryOrganizationType": "NONE",
"locations": [],
"id": 109,
},
"role": "ADMINISTRATOR",
},
],
}
I need to get the values of vanityName, localizedName and also the values from name->localized and name->preferredLocale.
I tried dictPython.keys() and it returned dict_keys(['paging', 'elements']).
Also I tried dictPython.values() and it returned me what is inside of the parenthesis({}).
I need to get [vv, ViV, ViV, US, en]
I am writing this in a form of answer, so I can get to explain it better without the comments characters limit
a dict in python is an efficient key/value structure or data type
for example dict_ = {'key1': 'val1', 'key2': 'val2'} to fetch key1 we can do it in 2 different ways
dict_.get(key1) this returns the value of the key in this case val1, this method has its advantage, that if the key1 is wrong or not found it returns None so no exceptions are raised. You can do dict_.get(key1, 'returning this string if the key is not found')
dict_['key1'] doing the same .get(...) but will raise a KeyError if the key is not found
So to answer your question after this introduction,
a dict can be thought of as nested dictionaries and/or objects inside of one another
to get your values you can do the following
# Fetch base dictionary to make code more readable
base_dict = dict_["elements"][0]["organizationalTarget~"]
# fetch name_dict following the same approach as above code
name_dict = base_dict["name"]
localized_dict = name_dict["localized"]
preferred_locale_dict = name_dict ["preferredLocale"]
so now we fetch all of the wanted data in their corresponding locations from your given dictionary, now to print the results, we can do the following
results_arr = []
for key1, key2 in zip(localized_dict, preferredLocale_dict):
results_arr.append(localized_dict.get(key1))
results_arr.append(preferred_locale_dict.get(key2))
print(results_arr)
What about:
dic = {
"paging": {"count": 10, "start": 0, "links": []},
"elements": [
{
"organizationalTarget~": {
"vanityName": "vv",
"localizedName": "ViV",
"name": {
"localized": {"en_US": "ViV"},
"preferredLocale": {"country": "US", "language": "en"},
},
"primaryOrganizationType": "NONE",
"locations": [],
"id": 109,
},
"role": "ADMINISTRATOR",
},
],
}
base = dic["elements"][0]["organizationalTarget~"]
c = base["name"]["localized"]
d = base["name"]["preferredLocale"]
output = [base["vanityName"], base["localizedName"]]
output.extend([c[key] for key in c])
output.extend([d[key] for key in d])
print(output)
outputs:
['vv', 'ViV', 'ViV', 'US', 'en']
So something like this?
[[x['organizationalTarget~']['vanityName'],
x['organizationalTarget~']['localizedName'],
x['organizationalTarget~']['name']['localized']['en_US'],
x['organizationalTarget~']['name']['preferredLocale']['country'],
x['organizationalTarget~']['name']['preferredLocale']['language'],
] for x in s['elements']]

How to create a tree using BFS in python?

So I have a flattened tree in JSON like this, as array of objects:
[{
aid: "id3",
data: ["id1", "id2"]
},
{
aid: "id1",
data: ["id3", "id2"]
},
{
aid: "id2",
nested_data: {aid: "id4", atype: "nested", data: ["id1", "id3"]},
data: []
}]
I want to gather that tree and resolve ids into data with recursion loops into something like this (say we start from "id3"):
{
"aid":"id3",
"payload":"1",
"data":[
{
"id1":{
"aid":"id1",
"data":[
{
"id3":null
},
{
"id2":null
}
]
}
},
{
"id2":{
"aid":"id2",
"nested_data":{
"aid":"id4",
"atype":"nested",
"data":[
{
"id1":null
},
{
"id3":null
}
]
},
"data":[
]
}
}
]
}
So that we would get breadth-first search and resolve some field into "value": "object with that field" on first entrance and "value": Null
How to do such a thing in python 3?
Apart from all the problems that your structure has in terms of syntax (identifiers must be within quotes, etc.), the code below will provide you with the requested answer.
But you should carefully think about what you are doing, and have the following into account:
Using the relations expressed in the flat structure that you provide will mean that you will have an endless recursion since you have items that include other items that in turn include the first ones (like id3 including id1, which in turn include id3. So, you have to define stop criteria, or be sure that this does not occur in your flat structure.
Your initial flat structure is better to be in the form of a dictionary, instead of a list of pairs {id, data}. That is why the first thing the code below does is to transform this.
Your final, desired structure contains a lot of redundancies in terms of information contained. Consider simplifying it.
Finally, you mentioned nothing about the "nested_data" nodes, and how they should be treated. I simply assumed that in case that exist, further expansion is required.
Please, consider trying to provide a bit of context in your questions, some real data examples (I believe the data provided is not real data, therefore the inconsistencies and redundancies), and try yourself and provide your efforts; that's the only way to learn.
from pprint import pprint
def reformat_flat_info(flat):
reformatted = {}
for o in flat:
key = o["aid"]
del o["aid"]
reformatted[key] = o
return reformatted
def expand_data(aid, flat, lvl=0):
obj = flat[aid]
if obj is None: return {aid: obj}
obj.update({"aid": aid})
if lvl > 1:
return {aid: None}
for nid,id in enumerate(obj["data"]):
obj["data"][nid] = expand_data(id, flat, lvl=lvl+1)
if "nested_data" in obj:
for nid,id in enumerate(obj["nested_data"]["data"]):
obj["nested_data"]["data"][nid] = expand_data(id, flat, lvl=lvl+1)
return {aid: obj}
# Provide the flat information structure
flat_info = [
{
"aid": "id3",
"data": ["id1", "id2"]
}, {
"aid": "id1",
"data": ["id3", "id2"]
}, {
"aid": "id2",
"nested_data": {"aid": "id4", "atype": "nested", "data": ["id1", "id3"]},
"data": []
}
]
pprint(flat_info)
print('-'*80)
# Reformat the flat information structure
new_flat_info = reformat_flat_info(flat=flat_info)
pprint(new_flat_info)
print('-'*80)
# Generate the result
starting_id = "id3"
result = expand_data(aid=starting_id, flat=new_flat_info)
pprint(result)

python trasform data from csv to array of dictionaries and group by field value

I have csv like this:
id,company_name,country,country_id
1,batstop,usa, xx
2,biorice,italy, yy
1,batstop,italy, yy
3,legstart,canada, zz
I want an array of dictionaries to import to firebase. I need to group the different country informations for the same company in a nested list of dictionaries. This is the desired output:
[ {'id':'1', 'agency_name':'batstop', countries [{'country':'usa','country_id':'xx'}, {'country':'italy','country_id':'yy'}]} ,
{'id':'2', 'agency_name':'biorice', countries [{'country':'italy','country_id':'yy'}]},
{'id':'3', 'legstart':'legstart', countries [{'country':'canada','country_id':'zz'}]} ]
Recently I had a similar task, the groupby function from itertools and the itemgetter function from operator - both standard python libraries - helped me a lot. Here's the code considering your csv, note how defining the primary keys of your csv dataset is important.
import csv
import json
from operator import itemgetter
from itertools import groupby
primary_keys = ['id', 'company_name']
# Start extraction
with open('input.csv', 'r') as file:
# Read data from csv
reader = csv.DictReader(file)
# Sort data accordingly to primary keys
reader = sorted(reader, key=itemgetter(*primary_keys))
# Create a list of tuples
# Each tuple containing a dict of the group primary keys and its values, and a list of the group ordered dicts
groups = [(dict(zip(primary_keys, _[0])), list(_[1])) for _ in groupby(reader, key=itemgetter(*primary_keys))]
# Create formatted dict to be converted into firebase objects
group_dicts = []
for group in groups:
group_dict = {
"id": group[0]['id'],
"agency_name": group[0]['company_name'],
"countries": [
dict(country=_['country'], country_id=_['country_id']) for _ in group[1]
],
}
group_dicts.append(group_dict)
print("\n".join([json.dumps(_, indent=2) for _ in group_dicts]))
Here's the output:
{
"id": "1",
"agency_name": "batstop",
"countries": [
{
"country": "usa",
"country_id": " xx"
},
{
"country": "italy",
"country_id": " yy"
}
]
}
{
"id": "2",
"agency_name": "biorice",
"countries": [
{
"country": "italy",
"country_id": " yy"
}
]
}
{
"id": "3",
"agency_name": "legstart",
"countries": [
{
"country": "canada",
"country_id": " zz"
}
]
}
There's no external library,
Hope it suits you well!
You can try this, you may have to change a few parts to get it working with your csv, but hope it's enough to get you started:
csv = [
"1,batstop,usa, xx",
"2,biorice,italy, yy",
"1,batstop,italy, yy",
"3,legstart,canada, zz"
]
output = {} # dictionary useful to avoid searching in list for existing ids
# Parse each row
for line in csv:
cols = line.split(',')
id = int(cols[0])
agency_name = cols[1]
country = cols[2]
country_id = cols[3]
if id in output:
output[id]['countries'].append([{'country': country,
'country_id': country_id}])
else:
output[id] = {'id': id,
'agency_name': agency_name,
'countries': [{'country': country,
'country_id': country_id}]
}
# Put into list
json_output = []
for key in output.keys():
json_output.append( output[key] )
# Check output
for row in json_output:
print(row)

Categories