Related
I want to remove pairs of key and value when those pairs have the value smaller than another pairs in a dictionary of dictionaries. Suppose that I have a dictionary of dictionaries as below:
ex_dict = {'C1': {'I1': 1, 'I2': 1.5, 'I3': 2}, 'C2': {'I2': 2, 'I3': 3, 'I4': 3.5}, 'C3': {'I4': 2, 'I5': 4, 'I6': 3}, 'C4': {'I1': 2, 'I3': 3.5, 'I6': 5}, 'C5': {'I7': 1, 'I8': 1.5, 'I9': 2}}
I want the expected output as follow:
new_ex_dict = {'C1': {}, 'C2': {'I2': 2, 'I4': 3.5}, 'C3': {'I5': 4}, 'C4': {'I1': 2, 'I3': 3.5, 'I6': 5}, 'C5': {'I7': 1, 'I8': 1.5, 'I9': 2}}
How can I do this efficiently? Any help will be much appreciated.
This is my quick solution
ex_dict = {'C1': {'I1': 1, 'I2': 1.5, 'I3': 2}, 'C2': {'I2': 2, 'I3': 3, 'I4': 3.5}, 'C3': {'I4': 2, 'I5': 4, 'I6': 3}, 'C4': {'I1': 2, 'I3': 3.5, 'I6': 5}, 'C5': {'I7': 1, 'I8': 1.5, 'I9': 2}}
temp_dict = {}
new_ex_dict = {}
for main_key in ex_dict:
for k, v in ex_dict[main_key].items():
temp_dict.setdefault(k, []).append((main_key, v))
for k, v in temp_dict.items():
max_value = max(v, key=lambda x: x[1])
main_key = max_value[0]
new_ex_dict.setdefault(main_key, {})[k] = max_value[1]
comp_dict = {'ap': {'val': 0.3, 'count': 3}, 'sd': {'val': 0.02, 'count': 1}, 'ao': {'val': 0.01, 'count': 1}}
avg_rate = {}
for value in comp_dict.keys():
avg_rate[value] = comp_dict[value]['val']/comp_dict[value]['count']
print(avg_rate[value])
It seems like the output I got only generates the average I want for the last element and I am wondering how is it possible for me to get the mean for all three elements.
the output i got now is just 0.01
My desired output would be something like {ap:0.1,sd:0.02,ao:0.01}
Thanks a lot!
You just made a little mistake when you print out the avg_rate value.
you can do this:
avg_rate = {}
for value in comp_dict.keys():
avg_rate[value] = comp_dict[value]['val']/comp_dict[value]['count']
print(avg_rate)
given comp_dict = {'ap': {'val': 0.3, 'count': 3}, 'sd': {'val': 0.02, 'count': 1}, 'ao': {'val': 0.01, 'count': 1}} ,
the output is:
{'ap': 0.09999999999999999, 'sd': 0.02, 'ao': 0.01}
I guess that would work
avg_rate = {k:comp_dict[k]['val']/comp_dict[k]['count'] for k in comp_dict for k2 in comp_dict[k]}
I have a JSON object that looks like this:
data = {'A': {'code': 'Ok',
'tracepoints': [None,
None,
{'alternatives_count': 0,
'location': [-122.419189, 37.753805],
'distance': 28.078003,
'hint': '5Qg7hUqpFQA2AAAAOgAAAAwAAAAPAAAAiVMWQq2VIEIAuABB7FgoQTYAAAA6AAAADAAAAA8AAAD4RAAACwi0-M0TQALvB7T4yRRAAgEAXwX5Wu6N',
'name': '23rd Street',
'matchings_index': 0,
'waypoint_index': 0},
{'alternatives_count': 0,
'location': [-122.417866, 37.75389],
'distance': 26.825184,
'hint': 'K8w6BRinFYAdAAAACwAAAA0AAAAAAAAAIxmmQTSs6kCiuRFBAAAAAB0AAAALAAAADQAAAAAAAAD4RAAANg20-CIUQAJNDbT4MRNAAgIAnxD5Wu6N',
'name': '23rd Street',
'matchings_index': 0,
'waypoint_index': 1},
{'alternatives_count': 0,
'location': [-122.416896, 37.75395],
'distance': 16.583412,
'hint': 'Jcw6BSzMOoUqAAAAQwAAABAAAAANAAAA0i_uQb3SOEKKPC9BG1EaQSoAAABDAAAAEAAAAA0AAAD4RAAAABG0-F4UQALyELT48xRAAgEAnxD5Wu6N',
'name': '23rd Street',
'matchings_index': 0,
'waypoint_index': 2},
{'alternatives_count': 7,
'location': [-122.415502, 37.754028],
'distance': 10.013916,
'hint': 'Jsw6hbN6kQBmAAAACAAAABAAAAANAAAAQOKOQg89nkCKPC9BEMcOQWYAAAAIAAAAEAAAAA0AAAD4RAAAcha0-KwUQAJ6FrT4UhRAAgEAbwX5Wu6N',
'name': '23rd Street',
'matchings_index': 0,
'waypoint_index': 3}],
'matchings': [{'duration': 50.6,
'distance': 325.2,
'weight': 50.6,
'geometry': 'y{h_gAh~znhF}#k[OmFMoFcAea#IeD[uMAYKsDMsDAe#}#u_#g#aTMwFMwFwAqq#',
'confidence': 0.374625,
'weight_name': 'routability',
'legs': [{'steps': [],
'weight': 18.8,
'distance': 116.7,
'annotation': {'nodes': [1974590926,
4763953263,
65359046,
4763953265,
5443374298,
2007343352]},
'summary': '',
'duration': 18.8},
{'steps': [],
'weight': 12.2,
'distance': 85.6,
'annotation': {'nodes': [5443374298,
2007343352,
4763953266,
65359043,
4763953269,
2007343354,
4763953270]},
'summary': '',
'duration': 12.2},
{'steps': [],
'weight': 19.6,
'distance': 122.9,
'annotation': {'nodes': [2007343354,
4763953270,
65334199,
4763953274,
2007343347]},
'summary': '',
'duration': 19.6}]}]},
'B': {'code': 'Ok',
'tracepoints': [{'alternatives_count': 0,
'location': [-122.387971, 37.727587],
'distance': 11.53267,
'hint': 'xHWRAEJ2kYALAAAArQAAAA4AAAAsAAAAnpH1QDVG8EJWgBdBa2v0QQsAAACtAAAADgAAACwAAAD4RAAA_YG0-GOtPwJKgrT4t60_AgIA3wf5Wu6N',
'name': 'Underwood Avenue',
'matchings_index': 0,
'waypoint_index': 0},
{'alternatives_count': 0,
'location': [-122.388563, 37.727175],
'distance': 13.565054,
'hint': 'w3WRgBuxOgVPAAAACAAAABMAAAASAAAA7ONaQo4CrUDv7U1BJdFAQU8AAAAIAAAAEwAAABIAAAD4RAAArX-0-MerPwIsgLT4gqs_AgIAbw35Wu6N',
'name': 'Jennings Street',
'matchings_index': 0,
'waypoint_index': 1},
{'alternatives_count': 1,
'location': [-122.388478, 37.725984],
'distance': 9.601917,
'hint': 't3WRABexOoWcAAAAbAAAABEAAAALAAAAdujYQqu4lUJXHD1B9-ruQJwAAABsAAAAEQAAAAsAAAD4RAAAAoC0-CCnPwJCgLT4Zqc_AgIAHxP5Wu6N',
'name': 'Wallace Avenue',
'matchings_index': 0,
'waypoint_index': 2}],
'matchings': [{'duration': 50,
'distance': 270.4,
'weight': 50,
'geometry': 'euu}fAd_~lhFoAlCMTuAvCvC|Bh#`#hXbUnAdADBhDzCzClCXVzZnW\\X~CnC~#qBLWnWej#',
'confidence': 1e-06,
'weight_name': 'routability',
'legs': [{'steps': [],
'weight': 17.8,
'distance': 84.8,
'annotation': {'nodes': [5443147626,
6360865540,
6360865536,
65307580,
6360865535,
6360865539,
6360865531]},
'summary': '',
'duration': 17.8},
{'steps': [],
'weight': 32.2,
'distance': 185.6,
'annotation': {'nodes': [6360865539,
6360865531,
6360865525,
65343521,
6360865527,
6360865529,
6360865523,
6360865520,
65321110,
6360865519,
6360865522,
6376329343]},
'summary': '',
'duration': 32.2}]}]},
'C': {'code': 'Ok',
'tracepoints': [None,
None,
{'alternatives_count': 0,
'location': [-122.443682, 37.713254],
'distance': 6.968076,
'hint': 'QXo6hUR6OgUAAAAANQAAAAAAAAAkAAAAAAAAAOCMMUEAAAAA_Z1yQQAAAAAbAAAAAAAAACQAAAD4RAAAXqiz-GZ1PwKiqLP4hnU_AgAAzxL5Wu6N',
'name': '',
'matchings_index': 0,
'waypoint_index': 0},
{'alternatives_count': 0,
'location': [-122.442428, 37.714335],
'distance': 16.488956,
'hint': 'E3o6BVRukYAJAAAAIgAAAGgAAAAUAAAA2RnSQL_5uUEPjI9CBTlaQQkAAAAiAAAAaAAAABQAAAD4RAAARK2z-J95PwKTrLP4b3k_AgEAXxX5Wu6N',
'name': 'Allison Street',
'matchings_index': 0,
'waypoint_index': 1},
{'alternatives_count': 1,
'location': [-122.441751, 37.712761],
'distance': 17.311636,
'hint': 'Fno6hRl6OgWZAAAANwAAAAAAAAAKAAAAH4vUQgKXFkIAAAAAXtbYQJkAAAA3AAAAAAAAAAoAAAD4RAAA6a-z-HlzPwKjsLP4q3M_AgAAHwr5Wu6N',
'name': 'Allison Street',
'matchings_index': 0,
'waypoint_index': 2}],
'matchings': [{'duration': 64.1,
'distance': 420.1,
'weight': 66.7,
'geometry': 'kuy|fAbyjphFcBxEmE`FqJkKiBqBuP}Qgc#ie#eAiAcB}ArA_Eb#mAjKkDnBo#fe#mOrw#kW',
'confidence': 7.3e-05,
'weight_name': 'routability',
'legs': [{'steps': [],
'weight': 40.1,
'distance': 235.2,
'annotation': {'nodes': [5440513673,
5440513674,
5440513675,
65363070,
1229920760,
65307726,
6906452420,
1229920717,
65361047,
1229920749,
554163599,
3978809925]},
'summary': '',
'duration': 37.5},
{'steps': [],
'weight': 26.6,
'distance': 184.9,
'annotation': {'nodes': [554163599, 3978809925, 65345518, 8256268328]},
'summary': '',
'duration': 26.6}]}]}}
I would like to extract the values under the key nodes per user (A, B and C) and store these values in a pandas dataframe, together with the corresponding user. Like below:
value user
1974590926 A
4763953263 A
65359046 A
4763953265 A
5443374298 A
2007343352 A
5443374298 A
2007343352 A
4763953266 A
65359043 A
4763953269 A
2007343354 A
4763953270 A
2007343354 A
4763953270 A
65334199 A
4763953274 A
2007343347 A
5443147626 B
6360865540 B
6360865536 B
65307580 B
6360865535 B
6360865539 B
6360865531 B
6360865539 B
6360865531 B
6360865525 B
65343521 B
6360865527 B
6360865529 B
6360865523 B
6360865520 B
65321110 B
6360865519 B
6360865522 B
6376329343 B
5440513673 C
5440513674 C
5440513675 C
65363070 C
1229920760 C
65307726 C
6906452420 C
1229920717 C
65361047 C
1229920749 C
554163599 C
3978809925 C
554163599 C
3978809925 C
65345518 C
8256268328 C
I am able to extract and store only the nodes belonging to user C to a pandas dataframe with the code below. However, I struggle to add the user column and the other nodes with their corresponding user. Any ideas?
import pandas as pd
nodes_df = pd.DataFrame({'node':{}})
for user in output[user]['matchings'][0]['legs']:
result = user['annotation']['nodes']
values_temp = pd.DataFrame(result, columns=['value'])
values_df = values_df.append(values_temp, ignore_index=True)
values_df.node = values_df.value.astype(int)
values_df
value
0 5440513673
1 5440513674
2 5440513675
3 65363070
4 1229920760
5 65307726
6 6906452420
7 1229920717
8 65361047
9 1229920749
10 554163599
11 3978809925
12 554163599
13 3978809925
14 65345518
15 8256268328
You can use json_normalize() with record_path and then concat() the users:
dfs = []
for user in output.keys():
df = pd.json_normalize(output, record_path=[user, 'matchings', 'legs', 'annotation', 'nodes'])
df['user'] = user
dfs.append(df)
nodes_df = pd.concat(dfs).rename(columns={0: 'node'})
# node user
# 1974590926 A
# 4763953263 A
# 65359046 A
# ... ...
# 3978809925 C
# 65345518 C
# 8256268328 C
If there are some users with missing matchings, you can check if 'matchings' in output[user]:
dfs = []
for user in output.keys():
if 'matchings' in output[user]:
df = pd.json_normalize(output, record_path=[user, 'matchings', 'legs', 'annotation', 'nodes'])
df['user'] = user
dfs.append(df)
nodes_df = pd.concat(dfs).rename(columns={0: 'node'})
If the output keys are like ('2018-02-03', 'A') and you're iterating them as trip, you need to access its date and user as trip[0] and trip[1]:
dfs = []
for trip in output.keys():
if 'matchings' in output[trip]:
df = pd.json_normalize(output, record_path=[trip, 'matchings', 'legs', 'annotation', 'nodes'])
df['date'] = trip[0]
df['user'] = trip[1]
dfs.append(df)
nodes_df = pd.concat(dfs).rename(columns={0: 'node'})
We want to put all the node values in [legs]
If you want the simplest way with just for loop:
nodes = []
user = []
for i in output.keys():
for j in output[i]['matchings'][0]['legs']:
for k in j['annotation']['nodes']:
col1.append(k)
col2.append(i)
d = {'nodes':nodes, 'user':user}
df = pd.DataFrame(data=d)
print(df)
You could use the jmespath module to extract the data, before recombining within the dataframe; you should get some speed up, since the iteration is within the dictionary:
The summary for jmespath is : if accessing a key, then use dot, if the data is within a list, then use the [] to access the data:
#pip install jmespath
import jmespath
from itertools import chain
query ={letter: jmespath.compile(f"{letter}.matchings[].legs[].annotation.nodes")
for letter in ("A", "B", "C")}
result = {letter: pd.DataFrame(chain.from_iterable(expression.search(output)),
columns=['node'])
for letter, expression in query.items()}
result = pd.concat(result).droplevel(-1).rename_axis(index='user').reset_index()
result.head(15)
user node
0 A 1974590926
1 A 4763953263
2 A 65359046
3 A 4763953265
4 A 5443374298
5 A 2007343352
6 A 5443374298
7 A 2007343352
8 A 4763953266
9 A 65359043
10 A 4763953269
11 A 2007343354
12 A 4763953270
13 A 2007343354
14 A 4763953270
I have some data similar to:
#Simulate some data
d = {
"id": [1,1,1,1,1,2,2,2,2],
"action_order": [1,2,3,4,5,1,2,3,4],
"n_actions": [5,5,5,5,5,4,4,4,4],
"seed": ['1','2','3','4','5','10','11','12','13'],
"time_spent": [0.3,0.4,0.5,0.6,0.7,10.1,11.1,12.1,13.1]
}
data = pd.DataFrame(d)
I need a function that for each row will return the values from two columns (seed and time_spent) in that row AND ALL PREVIOUS ROWS within the group as a dictionary. I have attempted to use the apply function as follows but the results are not quite what I need.
data \
.groupby(["profile_id"])[["artist_seed", "tlh"]] \
.apply(lambda x: dict(zip(x["artist_seed"], x["tlh"]))) \
.tolist()
data \
.groupby("profile_id")[["artist_seed", "tlh", "action_order"]] \
.apply(lambda x: dict(zip(list(x["artist_seed"]), list(x["tlh"]))))
The new DataFrame should look like this:
id new_col
0 1 {u'1': 0.3}
1 1 {u'1': 0.3, u'2': 0.4}
2 1 {u'1': 0.3, u'3': 0.5, u'2': 0.4}
...
You can keep a running dict and just return a copy of the most recent version on each apply iteration, per group:
def wrapper(g):
cumdict = {}
return g.apply(update_cumdict, args=(cumdict,), axis=1)
def update_cumdict(row, cd):
cd[row.seed] = row.time_spent
return cd.copy()
data["new_col"] = data.groupby("id").apply(wrapper).reset_index()[0]
data.new_col
0 {'1': 0.3}
1 {'1': 0.3, '2': 0.4}
2 {'1': 0.3, '2': 0.4, '3': 0.5}
3 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6}
4 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6, '5': ...
5 {'10': 10.1}
6 {'10': 10.1, '11': 11.1}
7 {'10': 10.1, '11': 11.1, '12': 12.1}
8 {'10': 10.1, '11': 11.1, '12': 12.1, '13': 13.1}
Name: new_col, dtype: object
How about this.
In [15]: data.groupby(['id']).apply(lambda d: pd.Series(np.arange(len(d))).apply(lambda x: d[['seed', 'time_spent']].iloc[:x+1].to_dict()))
Out[15]:
id
1 0 {'seed': {0: '1'}, 'time_spent': {0: 0.3}}
1 {'seed': {0: '1', 1: '2'}, 'time_spent': {0: 0...
2 {'seed': {0: '1', 1: '2', 2: '3'}, 'time_spent...
3 {'seed': {0: '1', 1: '2', 2: '3', 3: '4'}, 'ti...
4 {'seed': {0: '1', 1: '2', 2: '3', 3: '4', 4: '...
2 0 {'seed': {5: '10'}, 'time_spent': {5: 10.1}}
1 {'seed': {5: '10', 6: '11'}, 'time_spent': {5:...
2 {'seed': {5: '10', 6: '11', 7: '12'}, 'time_sp...
3 {'seed': {5: '10', 6: '11', 7: '12', 8: '13'},...
dtype: object
additionally, you can modify the parameter of .to_dict() method to change the output dict style, refer to: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html
or maybe this is what you want:
In [18]: data.groupby(['id']).apply(lambda d: pd.Series(np.arange(len(d))).apply(lambda x: dict(zip(d['seed'].iloc[:x+1], d['time_spent'].iloc[:x+1]))))
Out[18]:
id
1 0 {'1': 0.3}
1 {'1': 0.3, '2': 0.4}
2 {'1': 0.3, '2': 0.4, '3': 0.5}
3 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6}
4 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6, '5': ...
2 0 {'10': 10.1}
1 {'10': 10.1, '11': 11.1}
2 {'10': 10.1, '11': 11.1, '12': 12.1}
3 {'10': 10.1, '11': 11.1, '12': 12.1, '13': 13.1}
dtype: object
Let's say we have the following data
all_values = (('a', 0, 0.1), ('b', 1, 0.5), ('c', 2, 1.0))
from which we want to produce a list of dictionaries like so:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
What's the most elegant way to do this in Python?
The best solution I've been able to come up with is
>>> import itertools
>>> zipped = zip(itertools.repeat(('name', 'location', 'value')), all_values)
>>> zipped
[(('name', 'location', 'value'), ('a', 0, 0.1)),
(('name', 'location', 'value'), ('b', 1, 0.5)),
(('name', 'location', 'value'), ('c', 2, 1.0))]
>>> dicts = [dict(zip(*e)) for e in zipped]
>>> dicts
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
It seems like a more elegant way to do this exists, probably using more of the tools in itertools.
How about:
In [8]: [{'location':l, 'name':n, 'value':v} for (n, l, v) in all_values]
Out[8]:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
or, if you prefer a more general solution:
In [12]: keys = ('name', 'location', 'value')
In [13]: [dict(zip(keys, values)) for values in all_values]
Out[13]:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]