Related
I need help with a problem with a dataframe like this:
df = pd.DataFrame({'column_A': [[{'zone':'A', 'number':'7'}, {'zone':'B', 'number': '8'}],
[{'zone':'A', 'number':'6'}, {'zone':'E', 'number':'7'}]],
'column_B': [[{'zone':'C', 'number':'4'}], [{'zone':'D', 'number': '9'}]]})
I want to insert column_B into the column_A list so the output of the first line of column_A has to be:
[{'zone':'A', 'number':'7'}, {'zone':'B', 'number': '8'}, {'zone':'C', 'number':'4'}]
Probably is the easiest thing, I can imagine, but I find so many errors with functions like insert and the '+' command and I ran out of ideas.
Simpliest is join lists by +:
df['column_A'] = df['column_A'] + df['column_B']
print (df)
column_A \
0 [{'zone': 'A', 'number': '7'}, {'zone': 'B', '...
1 [{'zone': 'A', 'number': '6'}, {'zone': 'E', '...
column_B
0 [{'zone': 'C', 'number': '4'}]
1 [{'zone': 'D', 'number': '9'}]
Data are different, seems in second column are not lists:
df = pd.DataFrame({'column_A': [[{'zone':'A', 'number':'7'}, {'zone':'B', 'number': '8'}],
[{'zone':'A', 'number':'6'}, {'zone':'E', 'number':'7'}]],
'column_B': [{'zone':'C', 'number':'4'}, {'zone':'D', 'number': '9'}]})
df['column_A'] = df['column_A'] + df['column_B'].apply(lambda x: [x])
print (df)
column_A \
0 [{'zone': 'A', 'number': '7'}, {'zone': 'B', '...
1 [{'zone': 'A', 'number': '6'}, {'zone': 'E', '...
column_B
0 {'zone': 'C', 'number': '4'}
1 {'zone': 'D', 'number': '9'}
description
I need to break a dict which value is a list to several dict, keep the other parts.
The key I want to break may have different name,but the cond only and always get one value which type is list.
example
input
cond = {"type":"image","questionType":["3","4","5"]}
cond = {"type":"example","fieldToBreak":["1","2","3"],"fieldInt":1,"fieldFloat":0.1}
output
[
{'type': 'image', 'questionType': '3'},
{'type': 'image', 'questionType': '4'},
{'type': 'image', 'questionType': '5'}
]
[
{'type': 'example', 'fieldToBreak': '1', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '2', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '3', 'fieldInt': 1, 'fieldFloat': 0.1}
]
what I have tried
cond_queue = []
for k,v in cond.items():
if isinstance(v,list):
for ele in v:
cond_copy = cond.copy()
cond_copy[k] = ele
cond_queue.append(cond_copy)
break
It works, but I think it is not the best pythonic solution.
question:
Any better pythonic solution?
Possible approach utilizing python's built-in functions and standard library. The code should work with any number of keys. It creates all combinations of values' elements in case of multiple lists presented in the original dict. Not sure if this logic a correct one.
import itertools
def dict_to_inflated_list(d):
ans, keys, vals = list(), list(), list()
# copy keys and 'listified' values in the same order
for k, v in d.items():
keys.append(k)
vals.append(v if isinstance(v, list) else [v])
# iterate over all possible combinations of elements of all 'listified' values
for combination in itertools.product(*vals):
ans.append({k: v for k, v in zip(keys, combination)})
return ans
if __name__ == '__main__':
cond = {'type': 'image', 'questionType': ['3', '4', '5']}
print(dict_to_inflated_list(cond))
cond = {'a': 0, 'b': [1, 2], 'c': [10, 20]}
print(dict_to_inflated_list(cond))
Output:
[{'type': 'image', 'questionType': '3'}, {'type': 'image', 'questionType': '4'}, {'type': 'image', 'questionType': '5'}]
[{'a': 0, 'b': 1, 'c': 10}, {'a': 0, 'b': 1, 'c': 20}, {'a': 0, 'b': 2, 'c': 10}, {'a': 0, 'b': 2, 'c': 20}]
something like the below (the solution is based on the input from the post which I assume represents the general case)
cond = {"type": "image", "questionType": ["3", "4", "5"]}
data = [{"type": "image", "questionType": e} for e in cond['questionType']]
print(data)
output
[{'type': 'image', 'questionType': '3'}, {'type': 'image', 'questionType': '4'}, {'type': 'image', 'questionType': '5'}]
This little function does the job without any extra argument except the input dictionary
def unpack_dict(d):
n = [len(v) for k,v in d.items() if type(v) is list][0] #number of items in the list
r = []
for i in range(n):
_d = {}
for k,v in d.items():
if type(v) is list:
_d[k] = v[i]
else:
_d[k] = v
r.append(_d)
return r
cond = {"type":"example","fieldToBreak":["1","2","3"],"fieldInt":1,"fieldFloat":0.1}
unpack_dict(cond)
[{'type': 'example', 'fieldToBreak': '1', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '2', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '3', 'fieldInt': 1, 'fieldFloat': 0.1}]
The function determines how many items (n) there are in the list entry and uses that info to extract the right value to be inserted in the dictionary. Looping over n (for i in range(n):) is used to append the correct number of dictionaries in the final output. That's it. Quite simple to read and understand.
try this:
lens = 0
for index, item in enumerate(cond):
if isinstance(cond[item], list):
lens = len(cond[item])
idx = index
break
print([{k : v if i!=idx else v[j] for i,(k,v) in enumerate(cond.items()) } for j in range(lens)])
output:
# cond = {"type":"image","questionType":["3","4","5"]}
[{'type': 'image', 'questionType': '3'},
{'type': 'image', 'questionType': '4'},
{'type': 'image', 'questionType': '5'}]
# cond = {"type":"example","fieldToBreak":["1","2","3"],"fieldInt":1,"fieldFloat":0.1}
[{'type': 'example', 'fieldToBreak': '1', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '2', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '3', 'fieldInt': 1, 'fieldFloat': 0.1}]
if dict have another shape:
# cond = {"questionType":["3","4","5"], "type":"image"}
[{'questionType': '3', 'type': 'image'},
{'questionType': '4', 'type': 'image'},
{'questionType': '5', 'type': 'image'}]
I try to convert json to dataframe but I could not find what i want
here is dictionary and result I got
{'20210df12820df1456-ssddsd': {'2': {'num': '2',
'product_name': 'apple',
'product_price': '20900'},
'order': {'add_info': None,
'basket_count': '2',
'deli_price': '2500',
'id': 'nhdd#abvc',
'is_member': 'MEMBER',
'mem_type': 'PERSON',
'order_date': '2021-01-28 20:14:56',
'ordernum': '20210df12820df1456-ssddsd',
'pay_price': '43100',
'reserve': '840',
'start_price': '43100',
'total_product_price': '41800',
'used_emoney': '0',
'used_reserve': '0'},
'pay_history': [{'add_price': '0',
'deli_price': '2500',
'discount_price': '-1200',
'order_price': '43100',
'pay_date': '2021-01-28 '
'20:15:14',
'pay_price': '43100',
'pay_type': 'creditcard',
'paymethod': 'C',
'total_price': '41800',
'used_emoney': '0',
'used_reserve': '0'}],
'payment': {'card_flag': '0000',
'card_partcancel_code': '00',
'card_state': 'Y',
'in_card_price': '43100',
'pay_date': '2021-01-28 20:15:14',
'pay_status': 'Y',
'paymethod': 'C',
'simple_pay': 'NPY'},
'product': {'1': {'num': '1',
'product_name': 'banana',
'product_price': '20900'}}}}
json_data = response.json()
result =json_data['list']
df = pd.DataFrame(result).transpose()
df.head()
I would like to make this dictionary to four dataframe.
but i have this result
I expected like below
order = df[['order']]
payment = df[['payment']]
pay_history = df[['pay_history']]
product = df[['product']]
something like this.
any ideas??
Since the data for each category (order, payment, pay history, product) are organized differently, you should consider iterating through each category, adding additional data (such as ordernum for indexing purpose) and putting them in their own list that you will later use to convert them into DataFrame object
import pandas as pd
json_data = {'20210df12820df1456-ssddsd': {'order': {'ordernum': '20210df12820df1456-ssddsd', 'order_date': '2021-01-28 20:14:56', 'is_member': 'MEMBER', 'start_price': '43100', 'pay_price': '43100', 'deli_price': '2500', 'total_product_price': '41800', 'basket_count': '2', 'id': 'nhdd#abvc', 'mem_type': 'PERSON', 'used_emoney': '0', 'used_reserve': '0', 'add_info': None, 'reserve': '840'}, 'payment': {'paymethod': 'C', 'pay_date': '2021-01-28 20:15:14', 'card_state': 'Y', 'pay_status': 'Y', 'simple_pay': 'NPY', 'card_flag': '0000', 'card_partcancel_code': '00', 'in_card_price': '43100'}, 'pay_history': [{'pay_date': '2021-01-28 20:15:14', 'pay_type': 'creditcard', 'total_price': '41800', 'deli_price': '2500', 'discount_price': '-1200', 'add_price': '0', 'order_price': '43100', 'pay_price': '43100', 'used_reserve': '0', 'used_emoney': '0', 'paymethod': 'C'}], 'product': {'1': {'num': '1', 'product_name': 'banana', 'product_price': '20900'}, '2': {'num': '2', 'product_name': 'apple', 'product_price': '20900'}}}}
order_data = []
payment_data = []
pay_history_data = []
product_data = []
for key in json_data:
order_data.append(json_data[key]['order'])
payment = dict(json_data[key]['payment'])
payment['ordernum'] = key
pay_history_data.append(payment)
pay_history = json_data[key]['pay_history']
if 'pay_history' in json_data[key]:
for p in pay_history:
p_clone = dict(p)
p_clone['ordernum'] = key
payment_data.append(p_clone)
product = json_data[key]['product']
if 'product' in json_data[key]:
for product_key in product:
p_clone = dict(product[product_key])
p_clone['ordernum'] = key
product_data.append(p_clone)
order_df = pd.DataFrame(order_data)
payment_df = pd.DataFrame(payment_data)
pay_history_df = pd.DataFrame(pay_history_data)
product_df = pd.DataFrame(product_data)
Edit: If you're experiencing KeyError exception in iterating through pay_history, it could be that in some order, there are no pay_history key in the json data of that order so you can avoid this by first checking if the key exists in the json file before proceeding to iterating through the pay_history (if 'pay_history' in json_data[key]:), same thing can be done before iterating through product (if 'product' in json_data[key]:).
What you need is pandas json_normalize function.
I wrote some example codes.
import pandas as pd
from pandas.io.json import json_normalize
data = {'20210df12820df1456-ssddsd': {'order': {'ordernum': '20210df12820df1456-ssddsd', 'order_date': '2021-01-28 20:14:56', 'is_member': 'MEMBER',
'start_price': '43100', 'pay_price': '43100', 'deli_price': '2500', 'total_product_price': '41800',
'basket_count': '2', 'id': 'nhdd#abvc', 'mem_type': 'PERSON', 'used_emoney': '0', 'used_reserve': '0', 'add_info': None, 'reserve': '840'},
'payment': {'paymethod': 'C', 'pay_date': '2021-01-28 20:15:14', 'card_state': 'Y', 'pay_status': 'Y', 'simple_pay': 'NPY', 'card_flag': '0000', 'card_partcancel_code': '00', 'in_card_price': '43100'},
'pay_history': [{'pay_date': '2021-01-28 20:15:14', 'pay_type': 'creditcard', 'total_price': '41800', 'deli_price': '2500', 'discount_price': '-1200', 'add_price': '0', 'order_price': '43100', 'pay_price': '43100', 'used_reserve': '0', 'used_emoney': '0', 'paymethod': 'C'}],
'product': {'1': {'num': '1', 'product_name': 'banana', 'product_price': '20900'}}, '2': {'num': '2', 'product_name': 'apple', 'product_price': '20900'}}}
df = pd.DataFrame(data).transpose()
order = json_normalize(df['order'])
the result will look like:
I have a list of a dictionary of data that is in order in some places and out of order in others:
Eg:
data = [{"text":'a', "value":1},
{"text":'b', "value":1},
{"text":'j', "value":2},
{"text":'k', "value":50},
{"text":'b', "value":50},
{"text":'y', "value":52},
{"text":'x', "value":2},
{"text":'k', "value":3},
{"text":'m', "value":3}]
I want to sort them as:
o = [{"text":'a', "value":1},
{"text":'b', "value":1},
{"text":'j', "value":2},
{"text":'x', "value":2},
{"text":'k', "value":3},
{"text":'m', "value":3},
{"text":'k', "value":50},
{"text":'b', "value":50},
{"text":'y', "value":52}]
wherein my sorting is some combination of the index of the item and the 2nd value, I was thinking sort with:
key=[(2nd value)<<len(closest power of 2 to len(index)) + index]
I can sort by the list of dicts by the 2nd value with:
data.sort(key= lambda x:x['value'])
How do I also add the index of the dictionary?
And is there a better sorting key I could use?
It appears that you're looking for the text field as a secondary sort key. The easiest way is to simply use a tuple for your keys, in priority order:
sorted(data, key=lambda x: (x['value'], x['text']) )
Does that yield what you need? Output:
[{'text': 'a', 'value': 1}, {'text': 'b', 'value': 1}, {'text': 'j', 'value': 2}, {'text': 'x', 'value': 2}, {'text': 'k', 'value': 3}, {'text': 'm', 'value': 3}, {'text': 'b', 'value': 50}, {'text': 'k', 'value': 50}, {'text': 'y', 'value': 52}]
The values (k, 50) and (b, 50) are now in the other order; I'm hopeful that I read your mind correctly.
UPDATE per OP clarification
I checked the docs. Python's sort method is stable, so you don't need the second sort key at all: in case of a tie, sort will maintain the original ordering:
>>> data.sort(key= lambda x:x['value'])
>>> data
[{'text': 'a', 'value': 1}, {'text': 'b', 'value': 1}, {'text': 'j', 'value': 2}, {'text': 'x', 'value': 2}, {'text': 'k', 'value': 3}, {'text': 'm', 'value': 3}, {'text': 'k', 'value': 50}, {'text': 'b', 'value': 50}, {'text': 'y', 'value': 52}]
... and this is what you requested.
Use enumerate to get the index and use that to sort
>>> res = [d for i,d in sorted(enumerate(data), key=lambda i_d: (i_d[1]['value'], i_d[0]))]
>>> pprint(res)
[{'text': 'a', 'value': 1},
{'text': 'b', 'value': 1},
{'text': 'j', 'value': 2},
{'text': 'x', 'value': 2},
{'text': 'k', 'value': 3},
{'text': 'm', 'value': 3},
{'text': 'k', 'value': 50},
{'text': 'b', 'value': 50},
{'text': 'y', 'value': 52}]
To sort it in-place, you can try using itertools.count
>>> from itertools import count
>>> cnt=count()
>>> data.sort(key=lambda d: (d['value'], next(cnt)))
>>> pprint(data)
[{'text': 'a', 'value': 1},
{'text': 'b', 'value': 1},
{'text': 'j', 'value': 2},
{'text': 'x', 'value': 2},
{'text': 'k', 'value': 3},
{'text': 'm', 'value': 3},
{'text': 'k', 'value': 50},
{'text': 'b', 'value': 50},
{'text': 'y', 'value': 52}]
>>>
Have you tried this:
sorted(data, key=lambda x: x['value'])
I hate to ask this but I can't figure it out and it's getting to me.
I have to make a function that takes a given dictionary d1 and sort of compares it to another dictionary d2 then adds the compared value to d2.
d1 is already in the format needed to I don't have to worry about it.
d2 however, is a nested dictionary. It looks like this:
{’345’: {’Name’: ’xyzzy’, ’ID’: ’345’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’a’}},
’123’: {’Name’: ’foo’, ’ID’: ’123’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’a’, ’Q2’: ’b’}},
’234’: {’Name’: ’bar’, ’ID’: ’234’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’b’}}}
So d1 is in the format of the Responses key, and that's what I need from d2 to compare it to d1.
So to do that I isolate responses:
for key, i in d2.items():
temp = i['Responses']
Now I need to run temp through a function with d1 that will output an integer. Then match that integer with the top-level key it came from and update a new k/v entry associated with it. But I don't know how to do this.
I've managed to update each top-level key with that compared value, but it only uses the first compared value for all the top-level keys. I can't figure out how to match the integer found to its key. This is what I have so far that works the best:
for i in d2:
score = grade_student(d1,temp) #integer
placement = {'Score': score}
d2[i].update(placement)
You could just iterate over sub dictionaries in d2 and update them once you've called grade_student:
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
Here's a complete example:
import pprint
d1 = {}
d2 = {
'345': {'Name': 'xyzzy', 'ID': '345', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'a'}},
'123': {'Name': 'foo', 'ID': '123', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'a', 'Q2': 'b'}},
'234': {'Name': 'bar', 'ID': '234', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'b'}}
}
# Dummy
def grade_student(x, y):
return 1
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
pprint.pprint(d2)
Output:
{'123': {'ID': '123',
'Name': 'foo',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'a'},
'Score': 1},
'234': {'ID': '234',
'Name': 'bar',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'b'},
'Score': 1},
'345': {'ID': '345',
'Name': 'xyzzy',
'Responses': {'Q1': 'a', 'Q2': 'a', 'Q3': 'c', 'Q4': 'b'},
'Score': 1}}
You don't have to iterate them. Use the built-in update() method. Here is an example
>>> A = {'cat':10, 'dog':5, 'rat':50}
>>> B = {'cat':5, 'dog':10, 'pig':20}
>>> A.update(B) #This will merge the dicts by keeping the values of B if collision
>>> A
{'rat': 50, 'pig': 20, 'dog': 10, 'cat': 5}
>>> B
{'pig': 20, 'dog': 10, 'cat': 5}