Creating new columns with a list of JSON column - python

I have the following data with a column which is essentially a list of json:
[{
"$id": "2",
"Name": "IDMAWH-L853902$",
"Mail": "ENTA.AD.COM",
"Type": "account"
}, {
"$id": "3",
"Address": "::ffff:10.192.20.125",
"Type": "ad"
}]
[{
"$id": "2",
"MailboxPrimaryAddress": "bw165#gmail.com",
"Upn": "bw165#gmail.com",
"AadId": "936b3d90-b86c-4dba-812d-8f6025c1e379",
"RiskLevel": "None",
"Type": "mailbox",
"Urn": "urn:UserEntity:1ccd1b06205baec05fe19e55e0d602c6",
"Source": "OATP",
"FirstSeen": "0001-01-01T00:00:00"
}, {
"$id": "3",
"Recipient": "bw165#gmail.com",
"Urls": ["https://towardsdatascience.com/cleaning-and-extracting-json-from-pandas-dataframes-f0c15f93cb38"]}
I want to get the "account","Urls" & "IP" as individual columns. Any help is highly appreciated

Loop through the list of dictionaries. If it contains one of the columns you want, copy that value the corresponding column of the result.
from collections import defaultdict
result = defaultdict(dict)
for item in data:
id = item['$id']
for col in ('account', 'Urls', 'IP'):
if col in item:
$result[id][col] = item[col]
result = list(result.values())

Related

Substring string column Pandas Python

I have a pandas dataframe with two columns : ticket number and history.
History is a string with the following structure. I need to create third column which include author name who change status from New to Open. Is it possible?
[
{
"id": "1,
"author": {
"name": "user1",
"emailAddress": "user1#test.com",
"displayName": "user1"
},
"created": "2021-06-09T12:54:22.915+0000",
"items": [
{
"field": "name",
"from": "1",
"fromString": null,
"to": "2",
"toString": "test"
}
]
},
{
"id": "2",
"author": {
"name": "user2",
"emailAdress": "user2#test.com",
"displayName": "user2"
},
"created": "2021-06-11T09:33:18.692+0000",
"items": [
{
"field": "status",
"from": 3,
"fromString": "New",
"to": "7",
"toString": "Open"
}
]
}]
If your dataframe is named df, the history column (column 2) is named history and the items in the history column actually are json strings with a structure like the one you've provided, you could do the following:
import json
def extract_author(json_string):
records = json.loads(json_string)
for record in records:
items = record['items'][0]
if (items['field'] == 'status'
and items['fromString'] == 'New'
and items['toString'] == 'Open'):
return record['author']['name']
return None
df['author'] = df['history'].map(extract_author)

Very nested JSON with optional fields into pandas dataframe

I have a JSON with the following structure. I want to extract some data to different lists so that I will be able to transform them into a pandas dataframe.
{
"ratings": {
"like": {
"average": null,
"counts": {
"1": {
"total": 0,
"users": []
}
}
}
},
"sharefile_vault_url": null,
"last_event_on": "2021-02-03 00:00:01",
],
"fields": [
{
"type": "text",
"field_id": 130987800,
"label": "Name and Surname",
"values": [
{
"value": "John Smith"
}
],
{
"type": "category",
"field_id": 139057651,
"label": "Gender",
"values": [
{
"value": {
"status": "active",
"text": "Male",
"id": 1,
"color": "DCEBD8"
}
}
],
{
"type": "category",
"field_id": 151333010,
"label": "Field of Studies",
"values": [
{
"value": {
"status": "active",
"text": "Languages",
"id": 3,
"color": "DCEBD8"
}
}
],
}
}
For example, I create a list
names = []
where if "label" in the "fields" list is "Name and Surname" I append ["values"][0]["value"] so names now contains "John Smith". I do exactly the same for the "Gender" label and append the value to the list genders.
The above dictionary is contained in a list of dictionaries so I just have to loop though the list and extract the relevant fields like this:
names = []
genders = []
for r in range(len(users)):
for i in range(len(users[r].json()["items"])):
for field in users[r].json()["items"][i]["fields"]:
if field["label"] == "Name and Surname":
names.append(field["values"][0]["value"])
elif field["label"] == "Gender":
genders.append(field["values"][0]["value"]["text"])
else:
# Something else
where users is a list of responses from the API, each JSON of which has the items is a list of dictionaries where I can find the field key which has as the value a list of dictionaries of different fields (like Name and Surname and Gender).
The problem is that the dictionary with "label: Field of Studies" is optional and is not always present in the list of fields.
How can I manage to check for its presence, and if so append its value to a list, and None otherwise?
To me it seems that the data you have is not valid JSON. However if I were you I would try using pandas.json_normalize. According to the documentation this function will put None if it encounters an object with a label not inside it.

Group and sort JSON array of dictionaries by repeatable keys in Python

I have a json that is a list of dictionaries that looks like this:
I am getting it from MySQL with pymysql
[{
"id": "123",
"name": "test",
"group": "test_group"
},
{
"id": "123",
"name": "test",
"group": "test2_group"
},
{
"id": "456",
"name": "test2",
"group": "test_group2"
},
{
"id": "456",
"name": "test2",
"group": "test_group3"
}]
I need to group it so each "name" will have just one dict and it will contain a list of all groups that under this name.
something like this :
[{
"id": "123",
"name": "test",
"group": ["test2_group", "test_group"]
},
{
"id": "456",
"name": "test2",
"group": ["test_group2", "test_group3"]
}]
I would like to get some help,
Thanks !
You can use itertools.groupby for grouping of data.
Although I don't guarantee solution below to be shortest way but it should do the work.
# Your input data
data = []
from itertools import groupby
res = []
key_func = lambda k: k['id']
for k, g in groupby(sorted(data, key=key_func), key=key_func):
obj = { 'id': k, 'name': '', 'group': []}
for group in g:
if not obj['name']:
obj['name'] = group['name']
obj['group'].append(group['group'])
res.append(obj)
print(res)
It should print the data in required format.

Convert json data to dictionary

I am trying to convert my json data into a dictionary with key the id of the json data. for example lets say i have the following json:
{
"id": "1",
"name": "John",
"surname": "Smith"
},
{
"id": "2",
"name": "Steve",
"surname": "Ger"
}
And i want to construct a new dictionary which includes the id as a key and save it into a file so i wrote the following code:
json_dict = []
request = requests.get('http://example.com/...')
with open("data.json", "w") as out:
loaded_data = json.loads(request.text)
for list_item in loaded_data:
json_dict.append({"id": list_item["id"], "data": list_item })
out.write(json.dumps(json_dict))
In the file i get the following output:
[{"data": {"id":"1",
"name":"John",
"Surname":"Smith"
}
},
{"data": {"id":"2",
"name":"Steve",
"Surname":"Ger"
}
},
]
Why the id is not included in my dict before data ?
I'm pretty sure you're looking at a ghost here. You probably tested wrong. It will go away when you try to create a minimal, complete, and verifiable example for us (i. e. with a fixed string as input instead of a request call, with a print instead of an out.write, etc).
This is my test in which I could not reproduce the problem:
entries = [
{
"id": "1",
"name": "John",
"surname": "Smith"
},
{
"id": "2",
"name": "Steve",
"surname": "Ger"
}
]
json_dict = []
for i in entries:
json_dict.append({"id":i["id"], "data": i})
json.dumps(json_dict, indent=2)
This prints as expected:
[
{
"id": "1",
"data": {
"id": "1",
"surname": "Smith",
"name": "John"
}
},
{
"id": "2",
"data": {
"id": "2",
"surname": "Ger",
"name": "Steve"
}
}
]
You could try this instead:
json_dict = []
with open('data.json', 'w) as f:
loaded_data = json.loads(request.text)
for list_item in loaded_data:
json_dict.append({list_item['id'] : list_item})

python dictionary/json inception

How do you pull, split, and append an array inside a dictionary inside a dictionary?
This is the data I've got:
data = {
"Event":{
"distribution":"0",
"orgc":"Oxygen",
"Attribute": [{
"type":"ip-dst",
"category":"Network activity",
"to_ids":"true",
"distribution":"3",
"value":["1.1.1.1","2.2.2.2"]
}, {
"type":"url",
"category":"Network activity",
"to_ids":"true",
"distribution":"3",
"value":["msn.com","google.com"]
}]
}
}
This is what I need --
{
"Event": {
"distribution": "0",
"orgc": "Oxygen",
"Attribute": [{
"type": "ip-dst",
"category": "Network activity",
"to_ids": "true",
"distribution": "3",
"value": "1.1.1.1"
}, {
"type": "ip-dst",
"category": "Network activity",
"to_ids": "true",
"distribution": "3",
"value": "2.2.2.2"
}, {
"type": "url",
"category": "Network activity",
"to_ids": "true",
"distribution": "3",
"value": "msn.com"
}, {
"type": "url",
"category": "Network activity",
"to_ids": "true",
"distribution": "3",
"value": "google.com"
}
}
}
Here is where I was just playing around with it and totally lost!!
for item in data["Event"]["Attribute"]:
if "type":"ip-dst" and len("value")>1:
if 'ip-dst' in item["type"] and len(item["value"])>1:
for item in item["value"]:
...and totally lost
How about this?
#get reference to attribute dict
attributes = data["Event"]["Attribute"]
#in the event dictionary, replace it with an empty list
data["Event"]["Attribute"] = []
for attribute in attributes:
for value in attribute["value"]:
#for every value in every attribute, copy that attribute
new_attr = attribute.copy()
#set the value to that value
new_attr["value"] = value
#and append it to the attribute list
data["Event"]["Attribute"].append(new_attr)
This will work with the data structure you've shown, but not necessarily with all kinds of nested data, since we do a shallow copy of the attribute. That will mean that you have to make sure that apart from the "value" list, it only contains atomic values like numbers, strings, or booleans. The values list may contain nested structures, since we're only moving references there.

Categories