Create new list of dictionary from two list of dictionaries - python

I have two list of dictionaries and wanted to create new list of dictionary from existing two list of dictionaries. dict1 have all the details about person (pid, pname , pscore, sid) and dict2 have details about person with city (pid, cid, cscore) wanted to create new list of dictionary where pid from dict1 matches pid of dict2 and add pid, pname, pscore, cscore from both list of dictionaries where match happened into list of new_dict. Any help will be appreciated. Thanks in advance.
dict1 = [{'pid': [7830351800, 8756822045, 7985031822, 8882181833],
'pname': ['ABC', 'XYZ', 'QWE', 'MNQ'],
'pscore': [0.8, 0.8, 0.8, 0.8],
'sid': 8690694}]
dict2 = [{'pid': 7830351800, 'cid': [1, 2, 3, 4], 'cscore': [0.8, 0.78, 0.7, 0.45]},
{'pid': 8756822045, 'cid': [5, 6, 7, 8], 'cscore': [0.9, 0.88, 0.8, 0.75]},
{'pid': 7985031822, 'cid': [9, 10, 11, 12], 'cscore': [0.5, 0.48, 0.3, 0.25]},
{'pid': 8882181833, 'cid': [2, 13, 14, 15], 'cscore': [0.6, 0.58, 0.5, 0.45]}]
new_dict = [{'pid': 7830351800,
'pname': 'ABC',
'pscore': 0.8,
'cid': [1, 2, 3, 4],
'cscore': [0.8, 0.78, 0.7, 0.45]},
{'pid': 8756822045,
'pname': 'XYZ',
'pscore': 0.8,
'cid': [5, 6, 7, 8],
'cscore': [0.9, 0.88, 0.8, 0.75]},
{'pid': 7985031822,
'pname': 'QWE',
'pscore': 0.8,
'cid': [9, 10, 11, 12],
'cscore': [0.5, 0.48, 0.3, 0.25]},
{'pid': 8882181833,
'pname': 'MNQ',
'pscore': 0.8,
'cid': [2, 13, 14, 15],
'cscore': [0.6, 0.58, 0.5, 0.45]}]
I tried below code but ran into error. I am not able to understand how to solve this. Just started learning python:
new_dict = {}
for k, v in dict1[0].items():
if v[0] in dict2[0]['pid']:
new_dict = dict({'pid': v[0], 'pname' :v[0], 'pscore':v[0], 'cid':dict2[0]['cid'], 'cscore':dict2[0]['score']})
print(new_dict)

dict1 = dict1[0]
pname_dict = {key:value for key,value in zip(dict1['pid'], dict1['pname'])}
pscore_dict = {key:value for key,value in zip(dict1['pid'], dict1['pscore'])}
ans = dict2.copy()
for d in ans:
d['pname'] = pname_dict[d['pid']]
d['pscore'] = pscore_dict[d['pid']]
Output :
>> ans
[{'pid': 7830351800,
'cid': [1, 2, 3, 4],
'cscore': [0.8, 0.78, 0.7, 0.45],
'pname': 'ABC',
'pscore': 0.8},
{'pid': 8756822045,
'cid': [5, 6, 7, 8],
'cscore': [0.9, 0.88, 0.8, 0.75],
'pname': 'XYZ',
'pscore': 0.8},
{'pid': 7985031822,
'cid': [9, 10, 11, 12],
'cscore': [0.5, 0.48, 0.3, 0.25],
'pname': 'QWE',
'pscore': 0.8},
{'pid': 8882181833,
'cid': [2, 13, 14, 15],
'cscore': [0.6, 0.58, 0.5, 0.45],
'pname': 'MNQ',
'pscore': 0.8}]
Create 2 dictionaries to match pid ->pname and pid->pscore. These dictionaries are used to add the other 2 key values to the dict2

Related

Merge list of dict and add new values

I need merge 2 list of dicts by conditions without using pandas
x1 - fact values
x1 = [{'id': '94ffe1d6-0afa-11ec-b139-4cd98f4d62e3',
'fact_group': '0.05',
'probability': 0.2},
{'id': '86ae0229-0af8-11ec-be8c-4cd98f847094',
'fact_group': '0.05',
'probability': 0.56},
{'id': '867ef7ac-0af8-11ec-be8c-4cd98f847094',
'fact_group': '0.2',
'probability': 0.31},
{'id': '211bc00c-0af6-11ec-b139-4cd98f4d62e3',
'fact_group': '0.2',
'probability': 0.96}]
x2 - list of dict with intervals probability to labels point
x2 = [{'group': 0.05,
'predict_labels': 0,
'predict_intervals_min': 0.00,
'predict_intervals_max': 0.6},
{'group': 0.05,
'predict_labels': 1,
'predict_intervals_min': 0.6,
'predict_intervals_max': 1.0},
{'group': 0.2,
'predict_labels': 2,
'predict_intervals_min': 0.0,
'predict_intervals_max': 0.45},
{'group': 0.2,
'predict_labels': 3,
'predict_intervals_min': 0.45,
'predict_intervals_max': 1.0}]
I need to merge them by x1['fact_group'] & x2['group']
and x1['probability']>=x2['predict_intervals_min']
and x1['probability']<x2['predict_intervals_max']
Expected: update x1 by x2['predict_labels'] by thresholds and group
x3 = [{'id': '94ffe1d6-0afa-11ec-b139-4cd98f4d62e3',
'predict_labels': 1,
'fact_group': '0.05',
'predict_labels': 0,
'probability': 0.2},
{'id': '86ae0229-0af8-11ec-be8c-4cd98f847094',
'predict_labels': 1,
'fact_group': '0.05',
'predict_labels': 0,
'probability': 0.56},
{'id': '867ef7ac-0af8-11ec-be8c-4cd98f847094',
'fact_group': '0.2',
'predict_labels': 2,
'probability': 0.31},
{'id': '211bc00c-0af6-11ec-b139-4cd98f4d62e3',
'fact_group': '0.2',
'predict_labels': 3,
'probability': 0.96}]
I suggest you this simple solution:
result = []
for y1 in x1:
for y2 in x2:
if (y1['fact_group'] == str(y2['group'])) and \
(y1['probability'] >= y2['predict_intervals_min']) and \
(y1['probability'] < y2['predict_intervals_max']):
d = y1
d['predict_labels'] = y2['predict_labels']
result.append(d)
print(result)

Transform JSON file to Data Frame in Python

I have a text file which has a JSON structure and I want to transform it to a data frame.
The JSON files includes several such JSON strings:
{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}
tweets_data_path = "data.txt"
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
tweets_data
df = pd.DataFrame.from_dict(pd.json_normalize(tweets_data), orient='columns')
df
However, apparently there is something wrong with either the json.loads or the append command, because the tweets_data is empty when I call it.
Do you have an idea?
This is how your code should be to append data to tweets_data.
import json
tweets_data_path = "data.txt"
tweets_data = []
with open(tweets_data_path, 'r') as f:
for line in f.readlines():
try:
tweet = json.loads(json.dumps(line))
tweets_data.append(tweet)
except:
continue
print(tweets_data)
["{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}\n", "{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}"]
instead of loading JSON into a dictionary, then converting that dictionary into a pandas dataframe, simply use pandas built-in function to convert from JSON to pandas dataframe
df = pd.read_json(tweets_file)
alternatively, if you wish to load JSON into dictionary, then convert dictionary to dataframe:
tweets_data = json.loads(tweets_file.read())
df = pd.DataFrame.from_dict(tweets_data, orient='columns')

Fill a python dictionary with values from a pandas dataFrame

This is my dictionary, called "reviews":
reviews= {1: {'like', 'the', 'acting'},
2: {'hate', 'plot', 'story'}}
And this is my "lexicon" dataFrame:
import pandas as pd
lexicon = {'word': ['like', 'movie', 'hate'],
'neg': [0.0005, 0.0014, 0.0029],
'pos': [0.0025, 0.0019, 0.0002]
}
lexicon = pd.DataFrame(lexicon, columns = ['word', 'neg','pos'])
print (lexicon)
I need to fill my "reviews" dictionary with the neg and pos values from the "lexicon" dataFrame.
If there is no value in the lexicon, then I want to put 0.5
To finally get this outcome:
reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}
You can use df.reindex here.
df_ = lexicon.set_index("word").agg(list, axis=1)
out = {k: df_.reindex(v, fill_value=[0.5, 0.5]).to_dict() for k, v in reviews.items()}
# {1: {'the': [0.5, 0.5], 'like': [0.0005, 0.0025], 'acting': [0.5, 0.5]},
# 2: {'story': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'plot': [0.5, 0.5]}}
Create dictionary from lexicon and then in double dictionary comprehension mapping by dict.get for possible add default value if no match:
d = lexicon.set_index('word').agg(list, axis=1).to_dict()
print (d)
{'like': [0.0005, 0.0025], 'movie': [0.0014, 0.0019], 'hate': [0.0029, 0.0002]}
out = {k: {x: d.get(x, [0.5,0.5]) for x in v} for k, v in reviews.items()}
print (out)
{1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'story': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'plot': [0.5, 0.5]}}

How to do math manipulations on python dictionaries?

I have a dictionary as
ex_dict_tot={'recency': 12, 'frequency': 12, 'money': 12}
another count dictionary as
ex_dict_count= {'recency': {'current': 4, 'savings': 2, 'fixed': 6},
'frequency': {'freq': 10, 'infreq': 2},
'money': {'med': 2, 'high': 8, 'low': 1, 'md': 1}}
I would like to calculate the proportions of each key values as,
In key - recency,
current=4/12,
savings=2/12,
fixed=6/12
Similarly - in key - frequency,
freq=10/12
infreq=2/12
And the required output would be,
{'recency': {'current': 0.3, 'savings': 0.16, 'fixed': 0.5},
'frequency': {'freq': 0.83, 'infreq': 0.16},
'money': {'med': 0.16, 'high': 0.6, 'low': 0.08, 'md': 0.08}}
Could you please write your suggestions/inputs on it?
You can do this with dict comprehension.
out = {key:{k:v/ex_dict_tot[key] for k,v in val.items()} for key,val in ex_dict_count.items()}
out
{'recency': {'current': 0.3333333333333333, 'savings': 0.16666666666666666, 'fixed': 0.5},
'frequency': {'freq': 0.8333333333333334, 'infreq': 0.16666666666666666},
'money': {'med': 0.16666666666666666, 'high': 0.6666666666666666, 'low': 0.08333333333333333, 'md': 0.08333333333333333}}
Use round to get values with floating-point precision 2.
out = {key:{k:round(v/ex_dict_tot[key],2) for k,v in val.items()} for key,val in ex_dict_count.items()}
out
{'recency': {'current': 0.33, 'savings': 0.17, 'fixed': 0.5},
'frequency': {'freq': 0.83, 'infreq': 0.17},
'money': {'med': 0.17, 'high': 0.67, 'low': 0.08, 'md': 0.08}}

Filtering a list of dictionaries using its values in python

I have a list of dictionaries that I would like to filter and create a vector that corresponds to the list values. That list files contains several entries, where each one has a field time, item, state: {values1, value2, value3}. The variable item could take the following 11 values [0.0, 0.1, 0.2, 0.3 ... 1.0]. For each of whose values I would like to find the correspondent value3 values and create a vector of 11 elements where each one should correspond to value3 for the associated item variable. For example if my list is:
my_json = [{'time': datetime.datetime(2018, 7, 4, 13, 42, 55, 613000), 'item': 0.3, 'state': {'value1': 0.0, 'value2': 0.167, 'value3': 0.833}}
{'time': datetime.datetime(2018, 7, 6, 9, 40, 54, 44000), 'item': 0.6, 'state': {'value1': 0.0, 'value2': 0.273, 'value3': 0.727}}
{'time': datetime.datetime(2018, 7, 6, 10, 0, 16, 507000), 'item': 0.5, 'state': {'value1': 0.0, 'value2': 0.0, 'value3': 1.0}}
{'time': datetime.datetime(2018, 7, 6, 10, 37, 16, 769000), 'item': 0.5, 'state': {'value1': 0.0, 'value2': 0.0, 'value3': 1.0}}
{'time': datetime.datetime(2018, 7, 6, 10, 38, 28, 948000), 'item': 0.5, 'state': {'value1': 0.0, 'value2': 0.143, 'value3': 0.857}}
{'time': datetime.datetime(2018, 7, 6, 10, 41, 11, 201000), 'item': 0.4, 'state': {'value1': 0.0, 'value2': 0.091, 'value3': 0.909}}
{'time': datetime.datetime(2018, 7, 6, 11, 45, 25, 145000), 'item': 0.1, 'state': {'value1': 0.0, 'value2': 0.083, 'value3': 0.917}}
{'time': datetime.datetime(2018, 7, 6, 11, 46, 31, 508000), 'item': 0.1, 'state': {'value1': 0.0, 'value2': 0.0, 'value3': 1.0}}
{'time': datetime.datetime(2018, 7, 6, 11, 46, 33, 120000), 'item': 0.1, 'state': {'value1': 0.0, 'value2': 0.214, 'value3': 0.786}}
{'time': datetime.datetime(2018, 7, 6, 12, 36, 25, 695000), 'item': 0.0, 'state': {'value1': 0.0, 'value2': 0.0, 'value3': 1.0}}
{'time': datetime.datetime(2018, 7, 6, 12, 37, 35, 721000), 'item': 0.0, 'state': {'value1': 0.0, 'value2': 0.0, 'value3': 1.0}}]
The desired output of the above example is: [1.0, 0.76, 0.0, 0.833, 0.909, 0.857, 0.727, 0.0, 0.0, 0.0, 0.0] that is keeping also the most recent value (when there are multiple item values) by taking into account time. I have tried to solve it using if-else statements, however, I would like a more elegant solution.
Create a dictionary whose keys are the item values. Loop through my_json, assigning value3 to the corresponding element.
d = {}
for i in my_json:
d[i['item']] = i['state']['value3']
I'm assuming the list is already sorted by the timestamp; if not, sort the list first.
I've extended the above solution to include sorting by time in case it is needed
sorted_list = sorted(my_json, key=lambda k: k['time'])
d = {}
for i in sorted_list:
d[i['item']] = i['state']['value3']

Categories