Clean dict column in pandas dataframe

Clean dict column in pandas dataframe - python

I have a DataFrame like the below
A B
1 {1:3,2:0,3:5}
2 {3:2}
3 {1:2,2:3,3:9}
I want the column B to have missing keys in few rows like for example 2nd row only has key 3, but key 1 and key 2 are missing. For key 1, I want to set value to 1, for key 2 i want to set value to 2 and the final dataframe I would like is`
A B
1 {1:3,2:0,3:5}
2 {1:1,2:1,3:2}
3 {1:2,2:3,3:9}

One idea is use merge of dicts, but is necessary first pass missing for avoid overwrite existing keys:
missing = {1:1, 2:2}
df['B'] = df['B'].apply(lambda x: {**missing, **x})
print (df)
A B
0 1 {1: 3, 2: 0, 3: 5}
1 2 {1: 1, 2: 2, 3: 2}
2 3 {1: 2, 2: 3, 3: 9}
If change order values are overwrite:
df['B1'] = df['B'].apply(lambda x: {**x, **missing})
print (df)
A B B1
0 1 {1: 3, 2: 0, 3: 5} {1: 1, 2: 2, 3: 5}
1 2 {1: 1, 2: 2, 3: 2} {1: 1, 2: 2, 3: 2}
2 3 {1: 2, 2: 3, 3: 9} {1: 1, 2: 2, 3: 9}
If want more dynamic solution and add all misisng keys to same value, e.g. 1:
missing = dict.fromkeys(set().union(*df['B'].tolist()), 1)
df['B'] = df['B'].apply(lambda x: {**missing, **x})
print (df)
A B
0 1 {1: 3, 2: 0, 3: 5}
1 2 {1: 1, 2: 1, 3: 2}
2 3 {1: 2, 2: 3, 3: 9}
EDIT:
For replace values by means:
print (df)
A B
0 1 {1:3,2:5}
1 2 {3:2}
2 3 {1:2,2:3,3:9}
df['B'] = df['B'].apply(lambda x: {**dict.fromkeys([1,2,3], np.mean(list(x.values()))), **x})
print (df)
A B
0 1 {1: 3, 2: 5, 3: 4.0}
1 2 {1: 2.0, 2: 2.0, 3: 2}
2 3 {1: 2, 2: 3, 3: 9}

Related

How to insert / transplant columns from one dataframe to another at a certain position?

I have two dataframes:
print(df1)
A B C
0 1 5 9
1 2 6 8
2 3 7 7
3 4 8 6
print(df2)
D E F
0 1 5 9
1 2 6 8
2 3 7 7
3 4 8 6
I want to insert columns D and E from df2 into df1 after column B.
The end result should be like this:
A B D E C
0 1 5 1 5 9
1 2 6 2 6 8
2 3 7 3 7 7
3 4 8 4 8 6
I know there's already a solution with the insert method with pandas:
df1.insert(1, "D", df2["D"])
df1.insert(2, "E", df2["E"])
However I would like to insert D and E at the same time. Like "transplant" it into df1, rather than having multiple inserts. (in real life the data to be transplanted is bigger which is why I want to avoid all the inserts)
My dataframes in dict format, so you can use DataFrame.from_dict():
# df1
{'A': {0: 1, 1: 2, 2: 3, 3: 4},
'B': {0: 5, 1: 6, 2: 7, 3: 8},
'C': {0: 9, 1: 8, 2: 7, 3: 6}}
# df2
{'D': {0: 1, 1: 2, 2: 3, 3: 4},
'E': {0: 5, 1: 6, 2: 7, 3: 8},
'F': {0: 9, 1: 8, 2: 7, 3: 6}}

You can slice the dataframe df1 into two parts based on the location of column B, then concat these slices with columns D, E along the columns axis
i = df1.columns.get_loc('B') + 1
pd.concat([df1.iloc[:, :i], df2[['D', 'E']], df1.iloc[:, i:]], axis=1)
A B D E C
0 1 5 1 5 9
1 2 6 2 6 8
2 3 7 3 7 7
3 4 8 4 8 6

I think your solution is optimal. Alternatively you can:
df1[["D", "E"]] = df2[["D", "E"]]
and then change the column order

import pandas as pd
df1 = pd.DataFrame.from_dict({'A': {0: 1, 1: 2, 2: 3, 3: 4},
'B': {0: 5, 1: 6, 2: 7, 3: 8},
'C': {0: 9, 1: 8, 2: 7, 3: 6}})
df2 = pd.DataFrame.from_dict({'D': {0: 1, 1: 2, 2: 3, 3: 4},
'E': {0: 5, 1: 6, 2: 7, 3: 8},
'F': {0: 9, 1: 8, 2: 7, 3: 6}})
df1.merge(df2[['D', 'E']],on=df1.index)
You can reorder based on your requirement

How to create a master dictionary from a from a dictionary

I have a dictionary:
d = {1: 2,
2:3,
3: 4,
5:6,
6:7}
I want this output:
d= { 1: 4 , 2 :4 , 3:4 , 5:7 , 6:7}
basically 2 is parent of 1, 3 is parent of 2, 4 is parent of 3. I want to say that 1,2 and 3 is related to 4

For each key-value pair in the dict, you can use a while loop to keep checking if the current value is a valid key in the dict, and make the value of that key the new value, until the value is not a key, at which point you have found the leaf and you can assign that value to the current key:
for k, v in d.items():
while v in d:
v = d[v]
d[k] = v
so that given:
d = {1: 2,
2: 3,
3: 4,
5: 6,
6: 7}
d becomes:
{1: 4, 2: 4, 3: 4, 5: 7, 6: 7}

Select only one unique element from multiple lists in Python

This is not a homework that I'm struggling to do but I am trying to solve a problem (here is the link if interested https://open.kattis.com/problems/azulejos).
Here you actually don't have to understand the problem but what I would like to accomplish now is that I want to select only one element from multiple lists and they do not overlap with each other.
For example, at the end of my code, I get an output:
{1: [1, 2, 3], 2: [1, 2, 3, 4], 3: [2, 4], 4: [1, 2, 3, 4]}
I would like to transform this into, for example,
{3: 4, 2: 2, 4:1, 1: 3} -- which is the sample answer that is in the website.
But from my understanding, it can also be simply
{1: 3, 2: 2, 3: 4, 4: 1}
I am struggling to select only one integer that does not overlap with the others. The dictionary I produce in my code contains lists with multiple integers. And I would like to pick only one from each and they are all unique
import sys
n_tiles_row = int(sys.stdin.readline().rstrip())
# print(n_tiles_row) ==> 4
# BACK ROW - JOAO
back_row_price = sys.stdin.readline().rstrip()
# print(back_row_price) ==> 3 2 1 2
back_row_height = sys.stdin.readline().rstrip()
# print(back_row_height) ==> 2 3 4 3
# FRONT ROW - MARIA
front_row_price = sys.stdin.readline().rstrip()
# print(front_row_price) ==> 2 1 2 1
front_row_height = sys.stdin.readline().rstrip()
# print(front_row_height) ==> 2 2 1 3
br_num1_price, br_num2_price, br_num3_price, br_num4_price = map(int, back_row_price.split())
# br_num1_price = 3; br_num2_price = 2; br_num3_price = 1; br_num4_price = 2;
br_num1_height, br_num2_height, br_num3_height, br_num4_height = map(int, back_row_height.split())
# 2 3 4 3
fr_num1_price, fr_num2_price, fr_num3_price, fr_num4_price = map(int, front_row_price.split())
# 2 1 2 1
fr_num1_height, fr_num2_height, fr_num3_height, fr_num4_height = map(int, front_row_height.split())
# 2 2 1 3
back_row = {1: [br_num1_price, br_num1_height],
2: [br_num2_price, br_num2_height],
3: [br_num3_price, br_num3_height],
4: [br_num4_price, br_num4_height]}
# {1: [3, 2], 2: [2, 3], 3: [1, 4], 4: [2, 3]}
front_row = {1: [fr_num1_price, fr_num1_height],
2: [fr_num2_price, fr_num2_height],
3: [fr_num3_price, fr_num3_height],
4: [fr_num4_price, fr_num4_height]}
# {1: [2, 2], 2: [1, 2], 3: [2, 1], 4: [1, 3]}
_dict = {1: [],
2: [],
3: [],
4: []
}
for i in range(n_tiles_row):
_list = []
for n in range(n_tiles_row):
if(list(back_row.values())[i][0] >= list(front_row.values())[n][0]
and list(back_row.values())[i][1] >= list(front_row.values())[n][1]):
_list.append(list(front_row.keys())[n])
_dict[list(back_row.keys())[i]] = _list
print(_dict)
# {1: [1, 2, 3], 2: [1, 2, 3, 4], 3: [2, 4], 4: [1, 2, 3, 4]}
Please let me know if there is another approach to this problem.

Here is a solution using the same syntax as the code you provided.
The trick here was to order the tiles first by price ascending (the question asked for non-descending) then by height descending such that the tallest tile of the next lowest price in the back row would be matched the tallest tile of the next lowest price in the front row.
To do this sorting I utilized Python's sorted() function. See a Stack Overflow example here.
I assumed if there was no such match then immediately break and print according to the problem you linked.
As a side note, you had originally claimed that a python dictionary
{3: 4, 2: 2, 4:1, 1: 3} was equivalent to {1: 3, 2: 2, 3: 4, 4: 1}. While you are correct, you must remember that in Python dictionary objects are unsorted by default so it is not easy to compare keys this way.
import sys
n_tiles_row = int(sys.stdin.readline().rstrip())
# print(n_tiles_row) ==> 4
# BACK ROW - JOAO
back_row_price = sys.stdin.readline().rstrip()
# print(back_row_price) ==> 3 2 1 2
back_row_height = sys.stdin.readline().rstrip()
# print(back_row_height) ==> 2 3 4 3
# FRONT ROW - MARIA
front_row_price = sys.stdin.readline().rstrip()
# print(front_row_price) ==> 2 1 2 1
front_row_height = sys.stdin.readline().rstrip()
# print(front_row_height) ==> 2 2 1 3
# preprocess data into lists of ints
back_row_price = [int(x) for x in back_row_price.strip().split(' ')]
back_row_height = [int(x) for x in back_row_height.strip().split(' ')]
front_row_price = [int(x) for x in front_row_price.strip().split(' ')]
front_row_height = [int(x) for x in front_row_height.strip().split(' ')]
# store each tile into lists of tuples
front = list()
back = list()
for i in range(n_tiles_row):
back.append((i, back_row_price[i], back_row_height[i])) # tuples of (tile_num, price, height)
front.append((i, front_row_price[i], front_row_height[i]))
# sort tiles by price first (as the price must be non-descending) then by height descending
back = sorted(back, key=lambda x: (x[1], -x[2]))
front = sorted(front, key=lambda x: (x[1], -x[2]))
# print(back) ==> [(2, 1, 4), (1, 2, 3), (3, 2, 3), (0, 3, 2)]
# print(front) ==> [(3, 1, 3), (1, 1, 2), (0, 2, 2), (2, 2, 1)]
possible_back_tile_order = list()
possible_front_tile_order = list()
for i in range(n_tiles_row):
if back[i][2] > front[i][2]: # if next lowest priced back tile is taller than next lowest priced front tile
possible_back_tile_order.append(back[i][0])
possible_front_tile_order.append(front[i][0])
else:
break
if len(possible_back_tile_order) < n_tiles_row: # check that all tiles had matching pairs in back and front
print("impossible")
else:
print(possible_back_tile_order)
print(possible_front_tile_order)

A, possibly inefficient, way of solving the issue, is to generate all possible "solutions" (with values potentially not present in the lists corresponding to a specific key) and settle for a "valid" one (for which all values are present in the corresponding lists).
One way of doing this with itertools.permutation (that is able to compute all possible solutions satisfying the uniqueness constraint) would be:
import itertools
def gen_valid(source):
keys = source.keys()
possible_values = set(x for k, v in source.items() for x in v)
for values in itertools.permutations(possible_values):
result = {k: v for k, v in zip(keys, values)}
# : check that `result` is valid
if all(v in source[k] for k, v in result.items()):
yield result
d = {1: [1, 2, 3], 2: [1, 2, 3, 4], 3: [2, 4], 4: [1, 2, 3, 4]}
next(gen_valid(d))
# {1: 1, 2: 2, 3: 4, 4: 3}
list(gen_valid(d))
# [{1: 1, 2: 2, 3: 4, 4: 3},
# {1: 1, 2: 3, 3: 2, 4: 4},
# {1: 1, 2: 3, 3: 4, 4: 2},
# {1: 1, 2: 4, 3: 2, 4: 3},
# {1: 2, 2: 1, 3: 4, 4: 3},
# {1: 2, 2: 3, 3: 4, 4: 1},
# {1: 3, 2: 1, 3: 2, 4: 4},
# {1: 3, 2: 1, 3: 4, 4: 2},
# {1: 3, 2: 2, 3: 4, 4: 1},
# {1: 3, 2: 4, 3: 2, 4: 1}]
This generates n! solutions.
The "brute force" approach using a Cartesian product over the lists, produces prod(n_k) = n_1 * n_1 * ... * n_k solutions (with n_k the length of each list). In the worst case scenario (maximum density) this is n ** n solutions, which is asymptotically much worse than the factorial.
In the best case scenario (minimum density) this is 1 solution only.
In general, this can be either slower or faster than the "permutation solution" proposed above, depending on the "sparsity" of the lists.
For an average n_k of approx. n / 2, n! is smaller/faster for n >= 6.
For an average n_k of approx. n * (3 / 4), n! is smaller/faster for n >= 4.
In this example there are 4! == 4 * 3 * 2 * 1 == 24 permutation solutions, and 3 * 4 * 2 * 4 == 96 Cartesian product solutions.

Writing a nested dictionary into csv python

I am trying to write a nested dictionary into python. I have a dictionary like below:
{'09-04-2018' : {1: 11, 2: 5, 3: 1, 4: 1, 5: 0} , '10-04-2018' : {1: 5, 2: 1, 3: 1, 4: 1, 5: 0}}
and i wanted to write it something like:
count,09-04-2018,10-04-2018
1,11,5
2,5,1
3,1,1
4,1,1
5,0,0

The following produces the requested output:
data = {'09-04-2018' : {1: 11, 2: 5, 3: 1, 4: 1, 5: 0} , '10-04-2018' : {1: 5, 2: 1, 3: 1, 4: 1, 5: 0}}
rows = []
keys = sorted(data)
header = ['count'] + keys
counts = sorted(set(k for v in data.values() for k in v))
for count in counts:
l = [count]
for key in keys:
l.append(data[key].get(count))
rows.append(l)
print header
print rows
import csv
with open('output.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(header)
writer.writerows(rows)
This builds up the rows before writing them it is possible to write them directly rather than appending them to the list and writing the contents of the list.
produces this output:
count,09-04-2018,10-04-2018
1,11,5
2,5,1
3,1,1
4,1,1
5,0,0

If you are open to using a 3rd party library, you can use pandas:
import pandas as pd
d = {'09-04-2018' : {1: 11, 2: 5, 3: 1, 4: 1, 5: 0},
'10-04-2018' : {1: 5, 2: 1, 3: 1, 4: 1, 5: 0}}
# create dataframe from dictionary
df = pd.DataFrame.from_dict(d).reset_index().rename(columns={'index': 'count'})
# write dataframe to csv file
df.to_csv('file.csv', index=False)
print(df)
# count 09-04-2018 10-04-2018
# 0 1 11 5
# 1 2 5 1
# 2 3 1 1
# 3 4 1 1
# 4 5 0 0

You can shorten your code by using zip:
import csv
d = {'09-04-2018' : {1: 11, 2: 5, 3: 1, 4: 1, 5: 0} , '10-04-2018' : {1: 5, 2: 1, 3: 1, 4: 1, 5: 0}}
with open('filename.csv', 'w') as f:
write = csv.writer(f)
full_rows = [i for h in [zip(*b.items()) for _, b in sorted(d.items(), key=lambda x:map(int, x[0].split('-')))] for i in h]
write.writerows([['counts']+[a for a, _ in sorted(d.items(), key=lambda x:map(int, x[0].split('-')))]]+list(zip(*[full_rows[0]]+full_rows[1::2])))
Output:
counts,09-04-2018,10-04-2018
1,11,5
2,5,1
3,1,1
4,1,1
5,0,0

Adding missing keys in dictionary in Python

I have a list of dictionaries:
L = [{0:1,1:7,2:3,4:8},{0:3,2:6},{1:2,4:6}....{0:2,3:2}].
As you can see, the dictionaries have different length. What I need is to add missing keys:values to every dictionary to make them being with the same length:
L1 = [{0:1,1:7,2:3,4:8},{0:3,1:0,2:6,3:0,4:0},{0:0, 1:2,3:0,4:6}....{0:2,1:0,2:0,3:2,4:0}],
Means to add zeros for missing values. The maximum length isn't given in advance, so one may get it only iterating through the list.
I tried to make something with defaultdicts, like L1 = defaultdict(L) but it seems I don't understand properly how does it work.

You'll have to make two passes: 1 to get the union of all keys, and another to add the missing keys:
max_key = max(max(d) for d in L)
empty = dict.fromkeys(range(max_key + 1), 0)
L1 = [dict(empty, **d) for d in L]
This uses an 'empty' dictionary as a base to quickly produce all keys; a new copy of this dictionary plus an original dictionary produces the output you want.
Note that this assumes your keys are always sequential. If they are not, you can produce the union of all existing keys instead:
empty = dict.fromkeys(set().union(*L), 0)
L1 = [dict(empty, **d) for d in L]
Demo:
>>> L = [{0: 1, 1: 7, 2: 3, 4: 8}, {0: 3, 2: 6}, {1: 2, 4: 6}, {0: 2, 3: 2}]
>>> max_key = max(max(d) for d in L)
>>> empty = dict.fromkeys(range(max_key + 1), 0)
>>> [dict(empty, **d) for d in L]
[{0: 1, 1: 7, 2: 3, 3: 0, 4: 8}, {0: 3, 1: 0, 2: 6, 3: 0, 4: 0}, {0: 0, 1: 2, 2: 0, 3: 0, 4: 6}, {0: 2, 1: 0, 2: 0, 3: 2, 4: 0}]
or the set approach:
>>> empty = dict.fromkeys(set().union(*L), 0)
>>> [dict(empty, **d) for d in L]
[{0: 1, 1: 7, 2: 3, 3: 0, 4: 8}, {0: 3, 1: 0, 2: 6, 3: 0, 4: 0}, {0: 0, 1: 2, 2: 0, 3: 0, 4: 6}, {0: 2, 1: 0, 2: 0, 3: 2, 4: 0}]
The above approach to merge two dictionaries into a new one with dict(d1, **d2) always works in Python 2. In Python 3 additional constraints have been set on what kind of keys you can use this trick with; only string keys are allowed for the second dictionary. For this example, where you have numeric keys, but you can use dictionary unpacking instead:
{**empty, **d} # Python 3 dictionary unpacking
That'll work in Python 3.5 and newer.

a bit of caution: changes L
>>> allkeys = frozenset().union(*L)
>>> for i in L:
... for j in allkeys:
... if j not in i:
... i[j]=0
>>> L
[{0: 1, 1: 7, 2: 3, 3: 0, 4: 8}, {0: 3, 1: 0, 2: 6, 3: 0, 4: 0}, {0: 0, 1: 2, 2:
0, 3: 0, 4: 6}, {0: 2, 1: 0, 2: 0, 3: 2, 4: 0}]

Maybe not the most elegant solution, but should be working:
L = [{0:1,1:7,2:3,4:8},{0:3,2:6},{1:2,4:6},{0:2,3:2}]
alldicts = {}
for d in L:
alldicts.update(d)
allkeys = alldicts.keys()
for d in L:
for key in allkeys:
if key not in d:
d[key] = 0
print(L)

This is only a solution, but I think it's simple and straightforward. Note that it modifies the dictionaries in place, so if you want them to be copied, let me know and I'll revise accordingly.
keys_seen = []
for D in L: #loop through the list
for key in D.keys(): #loop through each dictionary's keys
if key not in keys_seen: #if we haven't seen this key before, then...
keys_seen.append(key) #add it to the list of keys seen
for D1 in L: #loop through the list again
for key in keys_seen: #loop through the list of keys that we've seen
if key not in D1: #if the dictionary is missing that key, then...
D1[key] = 0 #add it and set it to 0

This is quick and slim:
missing_keys = set(dict1.keys()) - set(dict2.keys())
for k in missing_keys:
dict1[k] = dict2[k]

Unless None is a valid value for a dictionary key you have herein is a great solution for you
L = [{0: 1, 1: 7, 2: 3, 4: 8}, {0: 3, 2: 6}, {1: 2, 4: 6}, {0: 2, 3: 2}]
for i0, d0 in enumerate(L[:-1]):
for d1 in L[i0:]:
_ = [d0.__setitem__(k,d1[k]) for k in d1 if d0.get(k,None) is None]
_ = [d1.__setitem__(k,d0[k]) for k in d0 if d1.get(k,None) is None]
print(L)
>>> [{0: 1, 1: 7, 2: 3, 3: 2, 4: 8}, {0: 3, 1: 2, 2: 6, 3: 2, 4: 6}, {0: 2, 1: 2, 2: 3, 3: 2, 4: 6}, {0: 2, 1: 7, 2: 3, 3: 2, 4: 8}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Clean dict column in pandas dataframe - python

Related

How to insert / transplant columns from one dataframe to another at a certain position?

How to create a master dictionary from a from a dictionary

Select only one unique element from multiple lists in Python

Writing a nested dictionary into csv python

Adding missing keys in dictionary in Python

Categories

Resources