I am trying to merge nested list of dictionary key/value into single key and list of values. I am loading csv file into data frame and from that I am trying to convert it into nested json. Please see below I have tried this. Should I be going this route to create json or does pandas have a native functionality that do this type of conversion?
Sample Data:
Subject,StudentName,Category
ENGLISH,Jane,
ENGLISH,,A
MATH,Matt,B
MATH,Newman,AA
MATH,,B
MATH,Dylan,A
ENGLISH,Noah,
ENGLISH,,C
Tried this:
df1 = pd.read_csv('../data/file.csv')
json_doc = defaultdict(list)
for _id in df1.T:
data = df1.T[_id]
key = data.Subject
values = {'StudentName': data.StudentName,'Category':data.Category}
json_doc[key].append(values)
new_d = json.dumps(json_doc, indent=4)
{k: int(v) for k, v in new_d} # error: ValueError: not enough values to unpack (expected 2, got 1)
and I get this from the code above:
{
"ENGLISH": [
{
"StudentName": "Jane",
"Category": NaN
},
{
"StudentName": NaN,
"Category": "A"
},
{
"StudentName": "Noah",
"Category": NaN
},
{
"StudentName": NaN,
"Category": "C"
}
],
"MATH": [
{
"StudentName": "Matt",
"Category": "B"
},
{
"StudentName": "Newman",
"Category": "AA"
},
{
"StudentName": NaN,
"Category": "B"
},
{
"StudentName": "Dylan",
"Category": "A"
}
]
}
How do I merge key/value to get it look like this one?
{
"ENGLISH": [
{
"StudentName": ["Jane","Noah"],
"Category": ["A","C"]
}
],
"MATH": [
{
"StudentName": ["Matt","Newman","Dylan"]
"Category": ["B","AA","A"]
}
]
}
It is not entirely clear to me if it is safe to ignore missing values, but here is my one-liner:
df.groupby('Subject').agg(lambda g: list(g.dropna())).to_dict(orient='index')
Default methods (to_json, to_dict) do not have a suitable orient option. So, we have to do some work by hands by grouping by index and then converting column data to a list. Then, .to_dict(orient='index') will do what you want (replace with to_json if you want a string instead of an object).
Note: Subject here is expected to be a column, not an index.
Related
I have some data which looks like this :
{
"key_value": [
{
"key": "name",
"value": "kapil"
},
{
"key": "age",
"value": "36"
}
]
}
I need to convert it to look like this:
{
"age": "36",
"name": "kapil"
}
Would somebody be able to help with this?
I have already tried using json.dumps()
I'm not sure why you were trying to use json.dumps, but all you need to do is loop through all the pairs and add them to a new dictionary. Like this:
data = {
"key_value": [
{
"key": "name",
"value": "kapil"
},
{
"key": "age",
"value": "36"
}
]
}
res = {}
for pair in data["key_value"]:
res[pair["key"]] = pair["value"]
print(res)
Note that if your data is in JSON, then you need to use json.loads() to convert your JSON to a dictionary, then use json.dumps() to convert that dictionary back to a string that can be written to a file.
Here is my df:
text
date
channel
sentiment
product
segment
0
I like the new layout
2021-08-30T18:15:22Z
Snowflake
predict
Skills
EMEA
I need to convert this to JSON output that matches the following:
[
{
"text": "I like the new layout",
"date": "2021-08-30T18:15:22Z",
"channel": "Snowflake",
"sentiment": "predict",
"fields": [
{
"field": "product",
"value": "Skills"
},
{
"field": "segment",
"value": "EMEA"
}
]
}
]
I'm getting stuck with mapping the keys of the columns to the values in the first dict and mapping the column and row to new keys in the final dict. I've tried various options using df.groupby with .apply() but am coming up short.
Samples of what I've tried:
df.groupby(['text', 'date','channel','sentiment','product','segment']).apply(
lambda r: r[['27cf2f]].to_dict(orient='records')).unstack('text').apply(lambda s: [
{s.index.name: idx, 'fields': value}
for idx, value in s.items()]
).to_json(orient='records')
Any and all help is appreciated!
Solved with this:
# Specify field column names
fieldcols = ['product','segment']
# Build a dict for each group as a Series named `fields`
res = (df.groupby(['text', 'date','channel','sentiment'])
.apply(lambda s: [{'field': field,
'value': value}
for field in fieldcols
for value in s[field].values])
).rename('fields')
# Convert Series to DataFrame and then to_json
res = res.reset_index().to_json(orient='records', date_format='iso')
Output:
[
{
"text": "I like the new layout",
"date": "2021-08-30T18:15:22Z",
"channel": "Snowflake",
"sentiment": "predict",
"fields": [
{
"field": "product",
"value": "Skills"
},
{
"field": "segment",
"value": "EMEA"
}
]
}
]
I have the following list:
{
"id":1,
"name":"John",
"status":2,
"custom_attributes":[
{
"attribute_code":"address",
"value":"st"
},
{
"attribute_code":"city",
"value":"st"
},
{
"attribute_code":"job",
"value":"test"
}]
}
I need to get the value from the attribute_code that is equal city
I've tried this code:
if list["custom_attributes"]["attribute_code"] == "city" in list:
var = list["value"]
But this gives me the following error:
TypeError: list indices must be integers or slices, not str
What i'm doing wrong here? I've read this solution and this solution but din't understood how to access each value.
Another solution, using next():
dct = {
"id": 1,
"name": "John",
"status": 2,
"custom_attributes": [
{"attribute_code": "address", "value": "st"},
{"attribute_code": "city", "value": "st"},
{"attribute_code": "job", "value": "test"},
],
}
val = next(d["value"] for d in dct["custom_attributes"] if d["attribute_code"] == "city")
print(val)
Prints:
st
Your data is a dict not a list.
You need to scan the attributes according the criteria you mentioned.
See below:
data = {
"id": 1,
"name": "John",
"status": 2,
"custom_attributes": [
{
"attribute_code": "address",
"value": "st"
},
{
"attribute_code": "city",
"value": "st"
},
{
"attribute_code": "job",
"value": "test"
}]
}
for attr in data['custom_attributes']:
if attr['attribute_code'] == 'city':
print(attr['value'])
break
output
st
I have a JSON with the following structure. I want to extract some data to different lists so that I will be able to transform them into a pandas dataframe.
{
"ratings": {
"like": {
"average": null,
"counts": {
"1": {
"total": 0,
"users": []
}
}
}
},
"sharefile_vault_url": null,
"last_event_on": "2021-02-03 00:00:01",
],
"fields": [
{
"type": "text",
"field_id": 130987800,
"label": "Name and Surname",
"values": [
{
"value": "John Smith"
}
],
{
"type": "category",
"field_id": 139057651,
"label": "Gender",
"values": [
{
"value": {
"status": "active",
"text": "Male",
"id": 1,
"color": "DCEBD8"
}
}
],
{
"type": "category",
"field_id": 151333010,
"label": "Field of Studies",
"values": [
{
"value": {
"status": "active",
"text": "Languages",
"id": 3,
"color": "DCEBD8"
}
}
],
}
}
For example, I create a list
names = []
where if "label" in the "fields" list is "Name and Surname" I append ["values"][0]["value"] so names now contains "John Smith". I do exactly the same for the "Gender" label and append the value to the list genders.
The above dictionary is contained in a list of dictionaries so I just have to loop though the list and extract the relevant fields like this:
names = []
genders = []
for r in range(len(users)):
for i in range(len(users[r].json()["items"])):
for field in users[r].json()["items"][i]["fields"]:
if field["label"] == "Name and Surname":
names.append(field["values"][0]["value"])
elif field["label"] == "Gender":
genders.append(field["values"][0]["value"]["text"])
else:
# Something else
where users is a list of responses from the API, each JSON of which has the items is a list of dictionaries where I can find the field key which has as the value a list of dictionaries of different fields (like Name and Surname and Gender).
The problem is that the dictionary with "label: Field of Studies" is optional and is not always present in the list of fields.
How can I manage to check for its presence, and if so append its value to a list, and None otherwise?
To me it seems that the data you have is not valid JSON. However if I were you I would try using pandas.json_normalize. According to the documentation this function will put None if it encounters an object with a label not inside it.
I'm have been struggling with understanding the reason for the following Json parsing issue, I have tried many combinations to access the 'val' item value but I have hit a brick wall.
I have used the code below successfully on 'similar' Json style data, but I dont have the knowledge to craft this approach to the data below.
All advice gratefully accepted.
result = xmltodict.parse(my_read)
result = result['REPORT']['REPORT_BODY']
result =json.dumps(result, indent=1)
print(result)
{
"PAGE": [
{
"D-ROW": [
{
"#num": "1",
"type": "wew",
"val": ".000"
},
{
"#num": "2",
"type": "wew",
"val": ".000"
}
]
},
{
"D-ROW": [
{
"#num": "26",
"type": "wew",
"val": ".000"
},
{
"#num": "27",
"type": "wew",
"val": ".000"
},
{
"#num": "28",
"type": "wew",
"val": ".000"
}
]
}
]
}
for item in json.loads(json_data):
print(item['PAGE']['D-ROW']['val']
error string indices must be integers
item['PAGE'] contains a list, so you cannot index it with 'D-ROW'. If your json-loaded data is in a variable data you could use:
for page in data['PAGE']:
for drow in page['D-ROW']:
print drow['val']
The first thing should notice, based on your JSON structure is that it's a dict {"PAGE": [...], ...}, so when you use json.loads() on it, you'll get a dict too
In this for loop, your item iterator actually refers to the key from the dict
for item in json.loads(json_data):
print(item['PAGE']['D-ROW']['val']
Here's a simpler example easier to follow
>>> for key in json.loads('{"a": "a-value", "b": "b-value"}'):
... print(key)
...
a
b
error string indices must be integers
So you can guess that in your loop item would refer to the key "PAGE", and you can't index that string with ['D-ROW'] ("PAGE"['D-ROW'] doesn't make sense, hence your error)
Key/values in the for loop
To get items if you use the loop below, item becomes a tuple of (key, value)
for item in json.loads(json_data).items():
print(item)
You can also expand the key, value like this
>>> for key, value in json.loads('{"a": "a-value", "b": "b-value"}').items():
... print("key is {} value is {}".format(key, value))
...
key is a value is a-value
key is b value is b-value
Your JSON should not include quotes around values with numbers. For example, change
"D-ROW": [
{
"#num": "1",
"type": "wew",
"val": ".000"
},
to
"D-ROW": [
{
"#num": 1, // Key requires quotes, Value omits quotes if number
"type": "wew",
"val": 0.000
},
"D-ROW": [
{
"#num": "26",
"type": "wew",
"val": ".000"
},
{
"#num": "27",
"type": "wew",
"val": ".000"
},
{
"#num": "28",
"type": "wew",
"val": ".000"
}
D-ROW key contains a list, not a dict.
You should change
print(item['PAGE']['D-ROW']['val']
to
print([_item['val'] for _item in item['PAGE']['D-ROW']])
to iterate over the list which contains you dicts.