my DF is:
df = pd.DataFrame({'city': ['POA', 'POA', 'SAN'], 'info' : [10,12,5]}, index = [4314902, 4314902, 4300803])
df.index.rename('ID_city', inplace=True)
output:
city info
ID_city
4314902 POA 10
4314902 POA 12
4300803 SAN 5
I need to save as json oriented by index. The following command works only when each index is unique.
df.to_json('df.json', orient='index')
Is possible to save this DataFrame and when he find a duplicate index, create a array?
My desire output:
{ 4314902 : [ {'city': 'POA', 'info': 10} , {'city': 'POA', 'info': 11} ]
,4300803 : {'city': 'SAN', 'info': 5} }
I'm not aware of built-in Pandas functionality, that handles duplicate indexes in json orient='index' exporting.
You could of course build this manually. Merge the columns into one that contains a dict:
cols_as_dict = df.apply(dict, axis=1)
ID_city
4314902 {'city': 'POA', 'info': 10}
4314902 {'city': 'POA', 'info': 12}
4300803 {'city': 'SAN', 'info': 5}
Put rows into lists, grouped by the index:
combined = cols_as_dict.groupby(cols_as_dict.index).apply(list)
ID_city
4300803 [{'city': 'SAN', 'info': 5}]
4314902 [{'city': 'POA', 'info': 10}, {'city': 'POA', ...
Then write the json:
combined.to_json()
'{"4300803":[{"city":"SAN","info":5}],"4314902":[{"city":"POA","info":10},{"city":"POA","info":12}]}'
It creates a list even if there's just a single entry per index. That should make processing actually easier than if you mix the data types (either list of elements or single element).
If you are set on the mixed type (either dict or list of several dicts), then do combined.to_dict(), change the lists with single elements back into their first element, and then dump the json.
Related
I have a very big dictionary with keys containing a list of items, these are unordered. I would like to group certain elements in a new key. For example
input= [{'name':'emp1','state':'TX','areacode':'001','mobile':123},{'name':'emp1','state':'TX','areacode':'002','mobile':234},{'name':'emp1','state':'TX','areacode':'003','mobile':345},{'name':'emp2','state':'TX','areacode':None,'mobile':None},]
for above input i would like to group areacode and mobile in a new key contactoptions
opdata = [{'name':'emp1','state':'TX','contactoptions':[{'areacode':'001','mobile':123},{'areacode':'002','mobile':234},{'areacode':'003','mobile':345}]},{'name':'emp2','state':'TX','contactoptions':[{'areacode':None,'mobile':None}]}]
i am doing this now with a two long iterations. i wanted to achieve the same more efficiently as the number of records are large. open to using existing methods if available in packages like pandas.
Try
result = (
df.groupby(['name', 'state'])
.apply(lambda x: x[['areacode', 'mobile']].to_dict(orient='records'))
.reset_index(name='contactoptions')
).to_dict(orient='records')
With regular dictionaries, you can do it in a single pass/loop using the setdefault method and no sorting:
data = [{'name':'emp1','state':'TX','areacode':'001','mobile':123},{'name':'emp1','state':'TX','areacode':'002','mobile':234},{'name':'emp1','state':'TX','areacode':'003','mobile':345},{'name':'emp2','state':'TX','areacode':None,'mobile':None}]
merged = dict()
for d in data:
od = merged.setdefault(d["name"],{k:d[k] for k in ("name","state")})
od.setdefault("contactoptions",[]).append({k:d[k] for k in ("areacode","mobile")})
merged = list(merged.values())
output:
print(merged)
# [{'name': 'emp1', 'state': 'TX', 'contactoptions': [{'areacode': '001', 'mobile': 123}, {'areacode': '002', 'mobile': 234}, {'areacode': '003', 'mobile': 345}]}, {'name': 'emp2', 'state': 'TX', 'contactoptions': [{'areacode': None, 'mobile': None}]}]
As you asked, you want to group the input items by 'name' and 'state' together.
My suggestion is, you can make a dictionary which keys will be 'name' plus 'state' such as 'emp1-TX' and values will be list of 'areacode' and 'mobile' such as [{'areacode':'001','mobile':123}]. In this case, the output can be achieved in one iteration.
Output:
{'emp1-TX': [{'areacode':'001','mobile':123}, {'areacode':'001','mobile':123}, {'areacode':'003','mobile':345}], 'emp2-TX': [{'areacode':None,'mobile':None}]}
I'd like to parse json dictionaries from a pandas dataframe column, iterate over the dicts and assign them to new column values.
Here's a column of dataframe: df['Column'][0]
[{'Name': 'Vacant', 'Value': 3904000, 'Unit': 'Qty'},
{'Name': 'Vacant', 'Value': 11.7, 'Unit': 'Pct'},
{'Name': 'Absorption', 'Value': 415000, 'Unit': 'Units'},
{'Name': 'AbsorpOcc', 'Value': 1.4, 'Unit': 'Pct'},
{'Name': 'Occupied', 'Value': None, 'Unit': 'Qty'}]
I have the following code to iterate over each row in pandas dataframe, and then iterate over each dicts in a list and create new columns.
# Iterate over dataframe to parse select rows
# Declare array
s = ""
#Iterate over each row in Dataframe
for index, row in df.iterrows():
# Iterate over each json object in each row in DataFrame
for i in range(0,len(row['Column'])):
for k,v in row['Column'][i].items():
# Concat string labels to assign them as column names
if type(v) == str:
s += v
print(s)
Expected Output, new columns:
You have a specific requirement to process the 'Column' column of the dataframe.
I think you should use apply https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html. Also this change would be in place in dataframe so your function could be.
def func(row):
# your parsing logic
index = row.name
# {'Name': 'Vacant', 'Value': 3904000, 'Unit': 'Qty'}
# col = 'Vacant', value = 3904000
df.loc[index, col] = value
df.apply(func, axis=1)
Below is my subsetted data frame, I am having a hard time trying to convert it into my desired output as I am fairly new to Python. Essentially I want to create a nested dictionary inside a list, with the column names as a value, and then another nested dictionary inside a list. Is this doable?
import pandas as pd
Sector Community Name
0 centre: 10901.0 park: 3238.0
1 northeast: 6958.0 heights: 1955.0
Desired output:
[{'column': 'Sector',
'value': [{'name': 'centre', 'value': 10901.0},
{'name': 'northeast', 'value': 6958.0}]},
{'column': 'Community Name',
'value': [{'name': 'park', 'value': 3238.0},
{'name': 'heights', 'value': 1955.0},
{'name': 'hill', 'value': 1454.0}]}]
From #sushanth's answer, I may add up to this solution. Assume that your dataframe variable is defined as df.
result = []
for header in list(df):
column_values = df[header].to_list()
result.append({
"column" : header,
"value" : [dict(zip(['name', 'value'], str(value).split(":"))) for value in column_values]
})
Using pandas in above case might be a overkill, Here is a solution using python inbuilt functions which you can give a try,
input_ = {"Sector": ["centre: 10901.0", "northeast: 6958.0"],
"Community Name": ["park: 3238.0", "heights: 1955.0"]}
result = []
for k, v in input_.items():
result.append({
"column" : k,
"value" : [dict(zip(['name', 'value'], vv.split(":"))) for vv in v]
})
print(result)
[{'column': 'Sector',
'value': [{'name': 'centre', 'value': ' 10901.0'},
{'name': 'northeast', 'value': ' 6958.0'}]},
{'column': 'Community Name',
'value': [{'name': 'park', 'value': ' 3238.0'},
{'name': 'heights', 'value': ' 1955.0'}]}]
I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active
a portion of one column 'relatedWorkOrder' in my dataframe looks like this:
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
My desired output is to have a column 'name','labor_name','labor_code' with their respective values. I can do this using regex extract and replace:
df['name'] = df['relatedWorkOrder'].str.extract(r'{regex}',expand=False).str.replace('something','')
But I have several dictionaries in this column and in this way is tedious, also I'm wondering if it's possible doing this through accessing the keys and values of the dictionary
Any help with that?
You can join the result from pd.json_normalize:
df.join(pd.json_normalize(df['relatedWorkOrder'], sep='_'))