Python Extracting nested values from a json string using pandas

Python Extracting nested values from a json string using pandas - python

Below is a nested json I am using:
{
"9": {
"uid": "9",
"name": "pedro",
"mail": "pedro#pedro.com",
"roles": [
"authenticated",
"administrator"
],
"user_status": "1"
},
"10": {
"uid": "10",
"name": "Rosa",
"mail": "rosa#rosa.com",
"roles": [
"authenticated",
"administrator"
],
"user_status": "1"
},
"11": {
"uid": "11",
"name": "Tania",
"mail": "tania#tania.com",
"roles": [
"authenticated",
"administrator"
],
"user_status": "1"
}
}
Each first key is different from the rest. I need to extract the information between each of the keys, e.g. uid, name, mail, etc but not interested on the key id (9,10,11). Is there any way to achieve this without passing the key id on the code?
Below is what I’ve attempted thus far:
import json
outputuids = {
"9": {
"uid": "9",
"name": "pedro",
"mail": "pedro#pedro.com",
"roles": [
"authenticated",
"administrator"
],
"user_status": "1"
},
"10": {
"uid": "10",
"name": "Rosa",
"mail": "rosa#rosa.com",
"roles": [
"authenticated",
"administrator"
],
"user_status": "1"
},
"11": {
"uid": "11",
"name": "Tania",
"mail": "tania#tania.com",
"roles": [
"authenticated",
"administrator"
],
"user_status": "1"
}
}
data1 = json.loads(outputuids)
for i in data1:
fuid=data1['9']['uid']
fname=data1['9']['name']
print (fuid + fname)

Pandas is overkill for this task. You can iterate over outputuids.values() to avoid having to explicitly refer to the keys of the dictionary:
result = []
keys_to_retain = {"uid", "name", "mail"}
for val in outputuids.values():
result.append({k: v for k, v in val.items() if k in keys_to_retain})
print(result)
This outputs:
[
{'uid': '9', 'name': 'pedro', 'mail': 'pedro#pedro.com'},
{'uid': '10', 'name': 'Rosa', 'mail': 'rosa#rosa.com'},
{'uid': '11', 'name': 'Tania', 'mail': 'tania#tania.com'}
]

Related

Python equivalent of PHP http_build_query

Here is the PHP code that I want to write in Python.
<?php
$json = '{
"targeting": [
{
"country": {
"allow": [
"US",
"DE"
]
},
"region" : {
"allow" : {
"US" : [
33
],
"DE" : [
10383
]
}
},
"city": {
"allow": {
"US": [
57
],
"DE": [
3324
]
}
},
"os": {
"allow": [
{
"name": "Android",
"comparison": "GTE",
"version": "2.3.1"
},
{
"name": "Apple TV Software",
"comparison": "EQ",
"version": "4.4"
},
{
"name": "Windows",
"comparison": "EQ",
"version": "Vista"
}
]
},
"isp" : {
"allow" : {
"US" : [
"Att"
],
"DE" : [
"Telekom"
]
}
},
"ip": {
"allow": [
"11.12.13.0-17.18.19.22",
"6.0.0.0",
"10.0.0.0-10.0.0.2",
"11.0.0.0/24"
]
},
"device_type": [
"mobile"
],
"browser": {
"allow": [
"Yandex.Browser for iOS",
"SlimBrowser",
"Edge Mobile"
]
},
"brand": {
"allow": [
"Smartbook Entertainment",
"Walton",
"PIPO"
]
},
"sub": {
"allow": {
"1": [
"A",
"B"
]
},
"deny": {
"2": [
"C",
"D"
]
},
"deny_groups": [
{
"1": ""
},
{
"1": "X",
"2": "Y"
}
]
},
"connection": [
"wi-fi",
"cellular"
],
"block_proxy": true,
"affiliate_id": [
1
],
"url": "http://test-url.com"
}
]
}';
$arr = json_decode($json);
$postData = http_build_query($arr);
//POST SomeURLhere
echo urldecode($arr);
What I need is to send this json in this format
targeting[0][country][allow][]=TR
targeting[0][os][allow][][name]=iOS
targeting[1][country][allow][]=DE
targeting[1][os][allow][][name]=iOS
I guess I need to figure out how to use http_build_query in Python.

with referring this answer I found the solution.
from collections.abc import MutableMapping
from urllib.parse import urlencode, unquote
def flatten(dictionary, parent_key=False, separator='.', separator_suffix=''):
"""
Turn a nested dictionary into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:return: A flattened dictionary
"""
items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key + separator_suffix if parent_key else key
if isinstance(value, MutableMapping):
items.extend(flatten(value, new_key, separator, separator_suffix).items())
elif isinstance(value, list) or isinstance(value, tuple):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key, separator, separator_suffix).items())
else:
items.append((new_key, value))
return dict(items)
req = {'check': 'command', 'parameters': ({'parameter': '1', 'description':
'2'}, {'parameter': '3', 'description': '4'})}
req = flatten(req, False, '[', ']')
query = urlencode(req)
query_parsed = unquote(query)
print(query)
print(query_parsed)
And the outputs:
check=command&parameters%5B0%5D%5Bparameter%5D=1&parameters%5B0%5D%5Bdescription%5D=2&parameters%5B1%5D%5Bparameter%5D=3&parameters%5B1%5D%5Bdescription%5D=4
check=command&parameters[0][parameter]=1&parameters[0][description]=2&parameters[1][parameter]=3&parameters[1][description]=4

glom assign based on data

In the following code, I am trying to mask personal information based on data. I have two scenarioes. In scenario 1, I want to update when type = 'FirstName', update or assign valueString value to "Masked". In scenario 2, I want to update when type matches the pattern "first****Name", update or assign valueString value to "Masked". I was wondering if anyone have suggestions for writing glom assign statements to solve the above cases.
Example Json String
{
"id": "985babac-9999-8888-8887",
"entity": [
{
"what": {
"reference": "4lincoln-123-11eb-bc1a-732f"
},
"detail": [
{
"type": "uuid",
"valueString": "4obama-f199-77eb-bc1a-555555704d2f"
},
{
"type": "firstName",
"valueString": "John"
},
{
"type": "userName",
"valueString": "Johns"
},
{
"type": "middleInitial",
"valueString": "S"
},
{
"type": "lastName",
"valueString": "Trump"
},
{
"type": "first-4fa999-f1999-Name",
"valueString": "John"
},
{
"type": "birth-4fa999-f1999-Date",
"valueString": "2010-01-01"
}
]
}
]
}
Updated output should look like the following
{
"id": "985babac-9999-8888-8887",
"entity": [
{
"what": {
"reference": "4lincoln-123-11eb-bc1a-732f"
},
"detail": [
{
"type": "uuid",
"valueString": "4obama-f199-77eb-bc1a-555555704d2f"
},
{
"type": "firstName",
"valueString": "Masked"
},
{
"type": "userName",
"valueString": "Johns"
},
{
"type": "middleInitial",
"valueString": "S"
},
{
"type": "lastName",
"valueString": "Trump"
},
{
"type": "first-4fa999-f1999-Name",
"valueString": "Masked"
},
{
"type": "birth-4fa999-f1999-Date",
"valueString": "2010-01-01"
}
]
}
]
}

I came up with the following solution. I was wondering if this can be done in one glom call instead of calling multiple times?
import json
import logging
import sys
import time
import re
from glom import glom, assign, Coalesce, SKIP, Spec, Path, Call, T, Iter, Inspect
LOGGING_FORMAT = '%(asctime)s - [%(filename)s:%(name)s:%(lineno)d] - %(levelname)s - %(message)s'
LOGLEVEL = logging.INFO
logging.basicConfig(level=LOGLEVEL,format=LOGGING_FORMAT)
logger = logging.getLogger(__name__)
start_time = time.time()
target = {
"id": "985babac-9999-8888-8887",
"entity": [
{
"what": {
"reference": "4lincoln-123-11eb-bc1a-732f"
},
"detail": [
{
"type": "uuid",
"valueString": "4obama-f199-77eb-bc1a-555555704d2f"
},
{
"type": "firstName",
"valueString": "John"
},
{
"type": "userName",
"valueString": "Johns"
},
{
"type": "middleInitial",
"valueString": "S"
},
{
"type": "lastName",
"valueString": "Trump"
},
{
"type": "first-4fa999-f1999-Name",
"valueString": "John"
},
{
"type": "birth-4fa999-f1999-Date",
"valueString": "2010-01-01"
}
]
}
]
}
# def myupdate(x):
# for count, item in enumerate(x):
# myspec = 'entity.0.detail.{}.valueString'.format(count)
# if item == 'firstName':
# _ = assign(target,myspec,'Masked')
piiRegex = re.compile(r'^first.*Name$|^last.*Name$|^middle.*Initial$|^birth.*Date$')
def myupdate(x):
for count, item in enumerate(x):
myspec = 'entity.0.detail.{}.valueString'.format(count)
mo = piiRegex.search(item)
if mo:
_ = assign(target,myspec,'Masked')
spec = {'result': ('entity.0.detail', ['type'], myupdate)}
xyz = glom(target, spec)
print(xyz)
print(target)
logger.info("Program completed in --- %s seconds ---" % (time.time() - start_time))
===============
Result:
{'result': None}
{'id': '985babac-9999-8888-8887', 'entity': [{'what': {'reference': '4lincoln-123-11eb-bc1a-732f'}, 'detail': [{'type': 'uuid', 'valueString': '4obama-f199-77eb-bc1a-555555704d2f'}, {'type': 'firstName', 'valueString': 'Masked'}, {'type': 'userName', 'valueString': 'Johns'}, {'type': 'middleInitial', 'valueString': 'Masked'}, {'type': 'lastName', 'valueString': 'Masked'}, {'type': 'first-4fa999-f1999-Name', 'valueString': 'Masked'}, {'type': 'birth-4fa999-f1999-Date', 'valueString': 'Masked'}]}]}

Creating custom JSON from existing JSON using Python

(Python beginner alert) I am trying to create a custom JSON from an existing JSON. The scenario is - I have a source which can send many set of fields but I want to cherry pick some of them and create a subset of that while maintaining the original JSON structure. Original Sample
{
"Response": {
"rCode": "11111",
"rDesc": "SUCCESS",
"pData": {
"code": "123-abc-456-xyz",
"sData": [
{
"receiptTime": "2014-03-02T00:00:00.000",
"sessionDate": "2014-02-28",
"dID": {
"d": {
"serialNo": "3432423423",
"dType": "11111",
"dTypeDesc": "123123sd"
},
"mode": "xyz"
},
"usage": {
"duration": "661",
"mOn": [
"2014-02-28_20:25:00",
"2014-02-28_22:58:00"
],
"mOff": [
"2014-02-28_21:36:00",
"2014-03-01_03:39:00"
]
},
"set": {
"abx": "1",
"ayx": "1",
"pal": "1"
},
"rEvents": {
"john": "doe",
"lorem": "ipsum"
}
},
{
"receiptTime": "2014-04-02T00:00:00.000",
"sessionDate": "2014-04-28",
"dID": {
"d": {
"serialNo": "123123",
"dType": "11111",
"dTypeDesc": "123123sd"
},
"mode": "xyz"
},
"usage": {
"duration": "123",
"mOn": [
"2014-04-28_20:25:00",
"2014-04-28_22:58:00"
],
"mOff": [
"2014-04-28_21:36:00",
"2014-04-01_03:39:00"
]
},
"set": {
"abx": "4",
"ayx": "3",
"pal": "1"
},
"rEvents": {
"john": "doe",
"lorem": "ipsum"
}
}
]
}
}
}
Here the sData array tag has got few tags out of which I want to keep only 24 and get rid of the rest. I know I could use element.pop() but I cannot go and delete a new incoming field every time the source publishes it. Below is the expected output -
Expected Output
{
"Response": {
"rCode": "11111",
"rDesc": "SUCCESS",
"pData": {
"code": "123-abc-456-xyz",
"sData": [
{
"receiptTime": "2014-03-02T00:00:00.000",
"sessionDate": "2014-02-28",
"usage": {
"duration": "661",
"mOn": [
"2014-02-28_20:25:00",
"2014-02-28_22:58:00"
],
"mOff": [
"2014-02-28_21:36:00",
"2014-03-01_03:39:00"
]
},
"set": {
"abx": "1",
"ayx": "1",
"pal": "1"
}
},
{
"receiptTime": "2014-04-02T00:00:00.000",
"sessionDate": "2014-04-28",
"usage": {
"duration": "123",
"mOn": [
"2014-04-28_20:25:00",
"2014-04-28_22:58:00"
],
"mOff": [
"2014-04-28_21:36:00",
"2014-04-01_03:39:00"
]
},
"set": {
"abx": "4",
"ayx": "3",
"pal": "1"
}
}
]
}
}
}
I myself took reference from How can I create a new JSON object form another using Python? but its not working as expected. Looking forward for inputs/solutions from all of you gurus. Thanks in advance.

Kind of like this:
data = json.load(open("fullset.json"))
def subset(d):
newd = {}
for name in ('receiptTime','sessionData','usage','set'):
newd[name] = d[name]
return newd
data['Response']['pData']['sData'] = [subset(d) for d in data['Response']['pData']['sData']]
json.dump(data, open('newdata.json','w'))

How to get this json specific word from this array

I have this json and I would like to get only the Name from every array. How do I write it in python,
Currently, I have this li = [item.get(data_new[0]'id') for item in data_new]
where data_new is my json data.
[
{
"id": "1687fbfa-8936-4b77-a7bc-123f9f276c49",
"attributes": [
{
"name": "status",
"value": "rejected",
"scope": "identity"
},
{
"name": "created_ts",
"value": "2020-06-25T16:22:07.578Z",
"scope": "system"
},
{
"name": "updated_ts",
"value": "2020-07-08T12:43:09.361Z",
"scope": "system"
},
{
"name": "artifact_name",
"value": "release-v10",
"scope": "inventory"
},
{
"name": "device_type",
"value": "proddemo-device",
"scope": "inventory"
},
],
"updated_ts": "2020-07-08T12:43:09.361Z"
},
{
"id": "0bf2a1fe-6004-473f-88b7-aab061972115",
"attributes": [
{
"name": "status",
"value": "rejected",
"scope": "identity"
},
{
"name": "created_ts",
"value": "2020-07-01T16:23:00.631Z",
"scope": "system"
},
{
"name": "updated_ts",
"value": "2020-07-08T17:41:16.45Z",
"scope": "system"
},
{
"name": "artifact_name",
"value": "Module_logs_v7",
"scope": "inventory"
},
{
"name": "cpu_model",
"value": "ARMv8 Processor",
"scope": "inventory"
},
{
"name": "device_type",
"value": "device",
"scope": "inventory"
},
{
"name": "hostname",
"value": "device004",
"scope": "inventory"
},
{
"name": "ipv4_br-d6eae8b3a339",
"value": "172.0.0.1/18",
"scope": "inventory"
}
],
"updated_ts": "2020-07-08T12:43:09.361Z"
}
]
This is the output snippet from my API and from this output I want to retrieve the value of the device whose name is hostname, as you can see that is the second last entry from this code where "name": "hostname"
So, I want to retrieve the value for that particular json only where the name will be "hostname", how can I do that.
Please guide me through.

a = [{'id': '291ae0e5956c69c2267489213df4459d19ed48a806603def19d417d004a4b67e',
'attributes': [{'name': 'ip_addr',
'value': '1.2.3.4',
'descriptionName': 'IP address'},
{'name': 'ports', 'value': ['8080', '8081'], 'description': 'Open ports'}],
'updated_ts': '2016-10-03T16:58:51.639Z'},
{'id': '76f40e5956c699e327489213df4459d1923e1a806603def19d417d004a4a3ef',
'attributes': [{'name': 'mac',
'value': '00:01:02:03:04:05',
'descriptionName': 'MAC address'}],
'updated_ts': '2016-10-04T18:24:21.432Z'}]
descriptionName = []
for i in a:
for j in i["attributes"]:
for k in j:
if k == "descriptionName":
descriptionName.append(j[k])
One liner:
[j["descriptionName"] for j in i["attributes"] for i in a if "descriptionName" in j ]
Output:
['IP address', 'MAC address']
Update 1:
To get all names
One liner code -
[j["name"] for j in i["attributes"] for i in a if "name" in j.keys()]
Output:
['status',
'status',
'created_ts',
'created_ts',
'updated_ts',
'updated_ts',
'artifact_name',
'artifact_name',
'cpu_model',
'cpu_model',
'device_type',
'device_type',
'hostname',
'hostname',
'ipv4_br-d6eae8b3a339',
'ipv4_br-d6eae8b3a339']
To get value for which name is "hostname"
[j["value"] for j in i["attributes"] for i in a if "name" in j.keys() and j["name"] == "hostname"]
Output:
['device004', 'device004']

Is there more effective way to get result (O(n+m) rather than O(n*m))?

Origin data as below show, every item has a type mark, such as interests, family, behaviors, etc and I want to group by this type field.
return_data = [
{
"id": "112",
"name": "name_112",
"type": "interests",
},
{
"id": "113",
"name": "name_113",
"type": "interests",
},
{
"id": "114",
"name": "name_114",
"type": "interests",
},
{
"id": "115",
"name": "name_115",
"type": "behaviors",
},
{
"id": "116",
"name": "name_116",
"type": "family",
},
{
"id": "117",
"name": "name_117",
"type": "interests",
},
...
]
And expected ouput data format like:
output_data = [
{"interests":[
{
"id": "112",
"name": "name_112"
},
{
"id": "113",
"name": "name_113"
},
...
]
},
{
"behaviors": [
{
"id": "115",
"name": "name_115"
},
...
]
},
{
"family": [
{
"id": "116",
"name": "name_116"
},
...
]
},
...
]
And here is my trial:
type_list = []
for item in return_data:
if item['type'] not in type_list:
type_list.append(item['type'])
interests_list = []
for type in type_list:
temp_list = []
for item in return_data:
if item['type'] == type:
temp_list.append({"id": item['id'], "name": item['name']})
interests_list.append({type: temp_list})
Obviously my trial is low efficient as it is O(n*m), but I cannot find the more effective way to solve the problem.
Is there more effective way to get the result? any commentary is great welcome, thanks.

Use a defaultdict to store a list of items for each type:
from collections import defaultdict
# group by type
temp_dict = defaultdict(list)
for item in return_data:
temp_dict[item["type"]].append({"id": item["id"], "name": item["name"]})
# convert back into a list with the desired format
output_data = [{k: v} for k, v in temp_dict.items()]
Output:
[
{
'behaviors': [
{'name': 'name_115', 'id': '115'}
]
},
{
'family': [
{'name': 'name_116', 'id': '116'}
]
},
{
'interests': [
{'name': 'name_112', 'id': '112'},
{'name': 'name_113', 'id': '113'},
{'name': 'name_114', 'id': '114'},
{'name': 'name_117', 'id': '117'}
]
},
...
]
If you don't want to import defaultdict, you could use a vanilla dictionary with setdefault:
# temp_dict = {}
temp_dict.setdefault(item["type"], []).append(...)
Behaves in exactly the same way, if a little less efficient.

please see Python dictionary for map.
for item in return_data:
typeMap[item['type']] = typeMap[item['type']] + delimiter + item['name']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Extracting nested values from a json string using pandas - python

Related

Python equivalent of PHP http_build_query

glom assign based on data

Creating custom JSON from existing JSON using Python

How to get this json specific word from this array

Is there more effective way to get result (O(n+m) rather than O(n*m))?

Categories

Resources