dictionary from df with columns within a key - python

I have a df such as follows:
data = [['a', 10, 1], ['b', 15,12], ['c', 14,12]]
df = pd.DataFrame(data, columns = ['Name', 'x', 'y'])
Name x y
0 a 10 1
1 b 15 12
2 c 14 12
Now I want it to pass it to a dict where x and y are inside of a key called total:
so the final dict would be like this
{
'Name': 'a',
"total": {
"x": 308,
"y": 229
},
}
I know i can use df.to_dict('records') to get this dict:
{
'Name': 'a',
"x": 308,
"y": 229
}
Any tips?

You could try
my_dict = [{'Name': row['Name'], 'total': {'x': row['x'], 'y': row['y']}} for row in df.to_dict('records')]
Result:
[{'Name': 'a', 'total': {'x': 10, 'y': 1}}, {'Name': 'b', 'total': {'x': 15, 'y': 12}}, {'Name': 'c', 'total': {'x': 14, 'y': 12}}]
Or, if you wish to convert all columns except the 'Name' to the 'total', and provided that there are no repititions in 'Name':
df.set_index('Name', inplace=True)
result = [{'Name': name, 'total': total} for name, total in df.to_dict('index').items()]
With the same result as before.

Related

How to convert a list of nested dictionaries (includes tuples) as a dataframe

I have a piece of code which generates a list of nested dictionaries like below:
[{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
I want to convert the dictionary into a dataframe like below
How can I do it using Python?
I have tried the below piece of code
merge_values = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
test = pd.DataFrame()
i = 0
for match in merge_values:
for d in match:
final_cfr = d['final_value']
comb = d['cb']
i = i+1
z = pd.DataFrame()
for t in comb:
dct = {k:[v] for k,v in t.items()}
x = pd.DataFrame(dct)
x['merge_id'] = i
x['Final_Value'] = final_value
test = pd.concat([test, x])
The problem with this piece of code is it adds the rows one below another. I need the elements of the tuple next to each other.
You will need to clean your data by creating a new dict with the structure that you want, like this:
import pandas as pd
dirty_data = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
def clean_data(dirty_data: dict) -> dict:
names = []
ids = []
nums = []
m_ids = []
m_nums = []
finals = []
for cb in dirty_data:
names.append(cb["cb"][0]["Name"])
ids.append(cb["cb"][0]["ID"])
nums.append(cb["cb"][0]["num"])
m_ids.append(cb["cb"][1]["ID"])
m_nums.append(cb["cb"][1]["num"])
finals.append(cb["final_value"])
return {"Name": names, "ID": ids, "num": nums, "M_ID": m_ids, "M_num": m_nums, "Final": finals}
df = pd.DataFrame(clean_data(dirty_data))
df
You could try to read the data into a dataframe as is and then restructure it until you get the desired result, but in this case, it doesn't seem practical.
Instead, I'd flatten the input into a list of lists to pass to pd.DataFrame. Here is a relatively concise way to do that with your sample data:
from operator import itemgetter
import pandas as pd
data = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
keys = ['Name', 'ID', 'num', 'M_Name', 'M_ID', 'M_num', 'final_value']
# generates ['A', 1, 50, 'A', 2, 68, 118] etc.
flattened = ([value for item in row['cb']
for value in itemgetter(*keys[:3])(item)]
+ [row['final_value']]
for row in data)
df = pd.DataFrame(flattened)
df.columns = keys
# get rid of superfluous M_Name column
df.drop('M_Name', axis=1, inplace=True)
itemgetter(*keys[:3])(item) is the same as [item[k] for k in keys[:3]]. On flattening lists of lists with list (or generator) comprehensions, see How do I make a flat list out of a list of lists?.
Result:
Name ID num M_ID M_num final_value
0 A 1 50 2 68 118
1 A 1 50 4 67 117
2 A 1 50 6 67 117

[Replace duplicate value - Python list]

Input:
data = [
{'name': 'A', 'value': 19, 'no': 1},
{'name': 'B', 'value': 5, 'no': 2},
{'name': 'A', 'value': 19, 'no': 3}
]
request_change_data = [
{'name': 'A', 'value': 35, 'no': 1},
{'name': 'B', 'value': 10, 'no': 2},
{'name': 'A', 'value': 40, 'no': 3}
]
expected_result:
data = [
{'name': 'A', 'value': 35, 'no': 1},
{'name': 'B', 'value': 10, 'no': 2},
{'name': 'A', 'value': 40, 'no': 3}
]
But actual:
[
{'name': 'A', 'value': 40, 'no': 1},
{'name': 'B', 'value': 10, 'no': 2},
{'name': 'A', 'value': 40, 'no': 3}
]
My code is:
data = [{'name': 'A', 'value': 19, 'no': 1}, {'name': 'B', 'value': 5, 'no': 2}, {'name': 'A', 'value': 19, 'no': 3}]
requests = [{'name': 'A', 'value': 35, 'no': 1}, {'name': 'B', 'value': 10, 'no': 2}, {'name': 'A', 'value': 40, 'no': 3}]
def test(data, requests):
for k, v in enumerate(data):
for request in requests:
if v['name'] == request['name']:
v['value'] =request['value']
return data
print(test(data, requests))
How could I change the duplicate stt1 vĂ  stt3. I used for to update the value of the key, it always updates only stt3 value is 40.
Please help. Thanks in advance
Each time you iterate through data, you then iterate over all of the request dictionaries, and your code only checks the name fields for each dictionary and then updates the value field in the dict from data if they match.
However, you have multiple dictionaries in requests with the same name, so if you were working the first data dict:
{'name': 'A', 'value': 19, 'no': 1}
You'd get this in for request in requests:
Iteration 1: request = {'name': 'A', 'value': 35, 'no': 1},
Iteration 2: request = {'name': 'B', 'value': 10, 'no': 2},
Iteration 3: request = {'name': 'A', 'value': 40, 'no': 3}
So you'd end up updating the data dict twice, first with v['value'] = 35 and then with v['value'] = 40.
So for your data, you want to check both name and no in the dicts and if they both match, then update the fields. Here's a fixed version of your code that does that:
data = [{'name': 'A', 'value': 19, 'no': 1}, {'name': 'B', 'value': 5, 'no': 2}, {'name': 'A', 'value': 19, 'no': 3}]
requests = [{'name': 'A', 'value': 35, 'no': 1}, {'name': 'B', 'value': 10, 'no': 2}, {'name': 'A', 'value': 40, 'no': 3}]
# You didn't seem to need the idx from enumerating so I removed it
# You also don't need to return data because lists/dicts are mutable
# types so you're modifying the actual dicts you pass in
def test(data, requests):
for d in data:
for request in requests:
if d['name'] == request['name'] and d['no'] == request['no']:
d['value'] = request['value']
test(data, requests)
print(data)
And I get this output, which is your expected:
[
{'name': 'A', 'value': 35, 'no': 1},
{'name': 'B', 'value': 10, 'no': 2},
{'name': 'A', 'value': 40, 'no': 3}
]

find cells with specific value and replace its value

Using pandas I have created a csv file containing 2 columns and saved my data into these columns. something like this:
fist second
{'value': 2} {'name': 'f'}
{'value': 2} {'name': 'h'}
{"value": {"data": {"n": 2, "m":"f"}}} {'name': 'h'}
...
Is there any way to look for all the rows whose the first column contains "data" and if any, only keep its value in this cell? I mean is it possible to change my third row from:
{"value": {"data": {"n": 2, "m":"f"}}} {'name': 'h'}
to something like this:
{"data": {"n": 2, "m":"f"}} {'name': 'h'}
and delete or replace the value of all other cells that does not contain data to something like -?
So my csv file will look like this:
fist second
- {'name': 'f'}
- {'name': 'h'}
{"data": {"n": 2, "m":"f"}} {'name': 'h'}
...
Here is my code:
import json
import pandas as pd
result = []
for line in open('file.json', 'r'):
result.append(json.loads(line))
df = pd.DataFrame(result)
print(df)
df.to_csv('document.csv')
f = pd.read_csv("document.csv")
keep_col = ['first', 'second']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)
here is my short example:
df = pd.DataFrame({
'first' : [{'value': 2}, {'value': 2}, {"value": {"data": {"n": 2, "m":"f"}}}]
,'secound' : [{'name': 'f'}, {'name': 'h'},{'name': 'h'}]
})
a = pd.DataFrame(df["first"].tolist())
a[~a["value"].str.contains("data",regex=False).fillna(False)] = "-"
df["first"] = a.value
first step is to remove the 'value' field. after this the value field
if the new field contains the word "data" the field is set to true; all other fields are False, Numberic Fields have the value NaN this is Replaced with False. And the whole gets negated and replaced with "-"
last step is to overwrite the column in the original Data Frame.
Something like this might work.
first=[{'value': 2} , {'value': 2} , {"value": {"data": {"n": 2, "m":"f"}}}, {"data": {"n": 2, "m":"f"}}]
second=[{'name': 'f'}, {'name': 'h'}, {'name': 'h'}, {'name': 'h'}]
df = pd.DataFrame({'first': first,
'second': second})
f = lambda x: x.get('value', x) if isinstance(x, dict) else np.nan
df['first'] = df['first'].apply(f)
df['first'][~df["first"].str.contains("data",regex=False).fillna(False)] = "-"
print(df)
first second
0 - {'name': 'f'}
1 - {'name': 'h'}
2 {'data': {'n': 2, 'm': 'f'}} {'name': 'h'}
3 {'data': {'n': 2, 'm': 'f'}} {'name': 'h'}

Combine elements by the key in list

I have the following list
some_list = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
and I would like to change it to:
[{'key': 'YOUNG', 'values': [ {'x': 25, 'y': 0.3}, {'x': 22, 'y': 0.9} ]}
{'key': 'OLD', 'values': [ {'x': 40, 'y': 0.3}, {'x': 45, 'y': 0.6} ]}]
Added some of my attempts
arr = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
all_keys = []
for item in arr:
all_keys.append(item['key'])
all_keys = list(set(all_keys))
res = [[{
'key': key,
'values': {'x': each['x'], 'y': each['y']}
} for each in arr if each['key'] == key]
for key in all_keys]
print res
But the result is not right, it constructs more lists:
[[{'values': {'y': 0.6, 'x': 45}, 'key': 'OLD'}, {'values': {'y': 0.3, 'x': 40}, 'key': 'OLD'}], [{'values': {'y': 0.9, 'x': 22}, 'key': 'YOUNG'}, {'values': {'y': 0.3, 'x': 25}, 'key': 'YOUNG'}]]
Thanks.
The loops should be like this:
res = [{ 'key': key,
'values': [{'x': each['x'], 'y': each['y']}
for each in arr if each['key'] == key] }
for key in all_keys]
Using an intermediate dictionary you can do:
>>> temp_data = {}
>>> for x in some_list:
... temp_data.setdefault(x['key'], []).append({k: x[k] for k in ['x', 'y']})
>>> [{'key': k, 'values': v} for k,v in temp_data.items()]
[{'key': 'OLD', 'values': [{'x': 45, 'y': 0.6}, {'x': 40, 'y': 0.3}]},
{'key': 'YOUNG', 'values': [{'x': 22, 'y': 0.9}, {'x': 25, 'y': 0.3}]}]
Though personally I would just leave it in dictionary form:
>>> temp_data
{'OLD': [{'x': 45, 'y': 0.6}, {'x': 40, 'y': 0.3}],
'YOUNG': [{'x': 22, 'y': 0.9}, {'x': 25, 'y': 0.3}]}
from itertools import *
data = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
data = sorted(data, key=lambda x: x['key'])
groups = []
uniquekeys = []
for k, v in groupby(data, lambda x: x['key'] ):
val_list = []
for each_val in v:
val_list.append({ 'x' : each_val['x'], 'y': each_val['y']})
groups.append(val_list)
uniquekeys.append(k)
print uniquekeys
print groups
print zip(uniquekeys, groups)
You will get your output as a list of tuples where the first element is your key and the second one is the group/list of values,
[('OLD', [{'y': 0.6, 'x': 45}, {'y': 0.3, 'x': 40}]), ('YOUNG', [{'y': 0.9, 'x': 22}, {'y': 0.3, 'x': 25}])]
some_list = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
outDict = {}
for dictionary in some_list:
key = dictionary['key']
copyDict = dictionary.copy() #This leaves the original dict list unaltered
del copyDict['key']
if key in outDict:
outDict[key].append(copyDict)
else:
outDict[key] = [copyDict]
print(outDict)
print(some_list)
Here you go-
some_list = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
dict_young_vals = []
dict_old_vals = []
for dict_step in some_list:
temp_dict = {}
if (dict_step['key'] == 'YOUNG'):
for keys in dict_step.keys():
if keys != 'key':
temp_dict[keys] = dict_step[keys]
if temp_dict != {}:
dict_young_vals.append(temp_dict)
if (dict_step['key'] == 'OLD'):
for keys in dict_step.keys():
if keys != 'key':
temp_dict[keys] = dict_step[keys]
if temp_dict != {}:
dict_old_vals.append(temp_dict)
dict_young = {'key':'YOUNG'}
dict_young['values'] = dict_young_vals
dict_old = {'key': 'OLD'}
dict_old['values'] = dict_old_vals
print(dict_young_vals)
result_dict = []
result_dict.append(dict_young)
result_dict.append(dict_old)
print(result_dict)
Another try may be using defaultdict- It will run faster if data is larger.
from collections import defaultdict
data = defaultdict(list)
some_list = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
for item in some_list:
vals = item.copy()
del vals['key']
data[item['key']].append(vals)
print [{'key':k,'values':v} for k,v in data.items()]
Output (dictionary does not care about ordering)-
[{'values': [{'y': 0.6, 'x': 45}, {'y': 0.3, 'x': 40}], 'key': 'OLD'}, {'values': [{'y': 0.9, 'x': 22}, {'y': 0.3, 'x': 25}], 'key': 'YOUNG'}]
some_list = [{'key': 'YOUNG', 'x': 22, 'y': 0.9},
{'key': 'OLD', 'x': 45, 'y': 0.6},
{'key': 'OLD', 'x': 40, 'y': 0.3},
{'key': 'YOUNG', 'x': 25, 'y': 0.3}]
x=[]
for i in some_list:
d={}
d["key"]=i["key"]
d["values"]=[{m:n for m,n in i.items() if m!="key"}]
if d["key"] not in [j["key"] for j in x]:
x.append(d)
else:
for k in x:
if k["key"]==d["key"]:
k["values"].append(d["values"][0])
print x
Output:[{'values': [{'y': 0.9, 'x': 22}, {'y': 0.3, 'x': 25}], 'key': 'YOUNG'}, {'values': [{'y': 0.6, 'x': 45}, {'y': 0.3, 'x': 40}], 'key': 'OLD'}]

all possible combinations of dicts based on values inside dicts

I want to generate all possible ways of using dicts, based on the values in them. To explain in code, I have:
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
I want to be able to pick (say) exactly 7 items from these dicts, and all the possible ways I could do it in.
So:
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
would give me:
(0, 3, 4)
(1, 2, 4)
(1, 3, 3)
...
What I'd really like is:
({'name' : 'a', 'picked': 0}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 2}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 3})
....
Any ideas on how to do this, cleanly?
Here it is
import itertools
import operator
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
dcts = [a,b,c]
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(dcts, val)]) for val in y)
for zz in z:
print zz
You can modify it to create copies of dictionaries. If you need a new dict instance on every iteration, you can change z line to
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(map(dict,dcts), val)]) for val in y)
easy way is to generate new dicts:
names = [x['name'] for x in [a,b,c]]
ziped = map(lambda x: zip(names, x), y)
maped = map(lambda el: [{'name': name, 'picked': count} for name, count in el],
ziped)

Categories