Convert Pandas dataframe to a dictionary with first column as key - python

I have a Pandas Dataframe :
A || B || C
x1 x [x,y]
x2 a [b,c,d]
and I am trying to make a dictionary to that looks like:
{x1: {B : x, c : [x,y]}, x2: {B: a, C:[b,c,d}}
I have tried the to_dict function but that changes the entire dataframe into a dictionary. I am kind of lost on how to iterate onto the first column and make it the key and the rest of the df a dictionary as the value of that key.

Try:
x = df.set_index("A").to_dict("index")
print(x)
Prints:
{'x1': {'B': 'x', 'C': ['x', 'y']}, 'x2': {'B': 'a', 'C': ['b', 'c', 'd']}}

Related

Convert Dataframe to dictionary with one column as key and the other columns as another dict

Currently I have a dataframe.
ID
A
B
123
a
b
456
c
d
I would like to convert this into a dictionary, where the key of the dictionary is the "ID" column. The value of the dictionary would be another dictionary, where the keys of that dictionary are the name of the other columns, and the value of that dictionary would be the corresponding column value. Using the example above, this would look like:
{ 123 : { A : a, B : b}, 456 : {A : c, B : d} }
I have tried:
mydataframe.set_index("ID").to_dict() , but this results in a different format than the one wanted.
You merely need to pass the proper orient parameter, per the documentation.
import io
pd.read_csv(io.StringIO('''ID A B
123 a b
456 c d'''), sep='\s+').set_index('ID').to_dict(orient='index')
{123: {'A': 'a', 'B': 'b'}, 456: {'A': 'c', 'B': 'd'}}
Of course, the columns maintain their string types, as indicated by the quote marks.
Consider the following:
import pandas as pd
df = pd.DataFrame({'ID':[1,2,3], 'A':['x','y','z'], 'B':[111,222,333]})
What you're going for would be returned with the following two lines:
df.set_index('ID', inplace=True)
some_dict = {i:dict(zip(row.keys(), row.values)) for i, row in df.iterrows()}
With the output being equal to:
{1: {'A': 'x', 'B': 111}, 2: {'A': 'y', 'B': 222}, 3: {'A': 'z', 'B': 333}}

New DataFrame column using the key of a dictionary as row value when one of it's values is found in a given row

I have a Pandas DataFrame with a large number of unique values. I would like to group these values with a more general column. By doing so I expect to add hierarchies to my data and thus make analysis easier.
One thing that worked was to copy the column and replaced the values as follows:
data.loc[data['new_col'].str.contains('string0|string1'), 'new_col']\
= 'substitution'
However, I am trying to find a way to reproduce this easily without adding a condition for each entry.
Also tried using without success using the following methods:
dict.items()
pd.df.replace()
Those attempts were futile for me.
I would like to hear your advice to know how to approach this.
import pandas as pd
# My DataFrame looks similar to this:
>>> df = pd.DataFrame({'A': ['a', 'w', 'c', 'd', 'z']})
# The dictionary were I store the generalization:
>>> subs = {'g1': ['a', 'b', 'c', 'd'],
... 'g2': ['w', 'x', 'y', 'z']}
>>> df
A H
0 a g1
1 w g2
2 c g1
3 d g1
4 z g2
create a new dict by swapping key with values of list. Next, map df.A with the swapped dict.
swap_dict = {x: k for k, v in d.items() for x in v}
Out[1054]:
{'a': 's1',
'b': 's1',
'c': 's1',
'd': 's1',
'w': 's2',
'x': 's2',
'y': 's2',
'z': 's2'}
df['H'] = df.A.map(swap_dict)
Out[1058]:
A H
0 a s1
1 w s2
2 c s1
3 d s1
4 z s2
Note: I directly use keys of your dict as values of H instead of g1, g2,.... because I think it is enough to identify each group of values. If you still want g1, g2,..., it is easy to accomplish. Just let me know.
I also named your dict as d in my code

Nested JSON into Dataframe

I have a Dataframe and one of the columns contains JSON objects of this type:
{'a': 'x', 'b':'y', 'c':'z'}
{'a': 'x1', 'b':'y2', 'c':'z3'}
...
How can I split such object and expand it into different a/b/c columns with their relative elements, within the same dataframe?
a b c
x y z
x1 y1 z1
...
Thank you in advance!
if your dataframe looks like this, with a column called json_col:
import pandas as pd
>>> df
json_col
0 {'a': 'x', 'b': 'y', 'c': 'z'}
1 {'a': 'x1', 'b': 'y2', 'c': 'z3'}
You can do this:
df[['a','b','c']] = df.json_col.apply(pd.Series)
resulting in this final df:
>>> df
json_col a b c
0 {'a': 'x', 'b': 'y', 'c': 'z'} x y z
1 {'a': 'x1', 'b': 'y2', 'c': 'z3'} x1 y2 z3

Flat map list without losing mapping?

I have a existing dict that maps single values to lists.
I want to reverse this dictionary and map from every list entry on the original key.
The list entries are unique.
Given:
dict { 1: ['a', 'b'], 2: ['c'] }
Result:
dict { 'a' : 1, 'b' : 1, 'c' : 2 }
How can this be done?
Here's an option
new_dict = {v: k for k, l in d.items() for v in l}
{'a': 1, 'b': 1, 'c': 2}
You can use a list comprehension to produce a tuple with the key-value pair, then, flatten the new list and pass to the built-in dictionary function:
d = { 1: ['a', 'b'], 2: ['c'] }
new_d = dict([c for h in [[(i, a) for i in b] for a, b in d.items()] for c in h])
Output:
{'a': 1, 'c': 2, 'b': 1}

Creating graph using dictionaries and nodes only

I have this data.
CITY1 CITY2
A B
A C
A D
B C
B D
C D
How i can create dictionary looking like this from the above data
x={A:[B,C,D],
B:[A,C,D],
C:[A,B,D],
D:[A,B,C]
}
Thanks
Is it in a csv? It looks like, from the data you provide, you are doing an undirected graph. Assuming that the data is in some kind of "row" type format that you can loop through, (i.e. row[0] is the city1 value, and row[1] is the city2 value):
from collections import defaultdict
def make_graph(data):
graph = defaultdict(set)
for a, b in data:
graph[a].add(b)
graph[b].add(a) # delete this line if you want a directed graph
return graph
data = [
['A','B'],
['C','D'],
['A','C']
]
print make_graph(data)
I was trying to do it without any library import.
I made a simple dictionary first .
x={'A':['B','C','D'],'B':['C','D'],'C':['D']}
for i,j in x.items():
for p in j:
if p not in x.keys():
x[p]=[]
if p in x[i] and i not in x[p]:
x[p].append(i)
print x
{'A': ['B', 'C', 'D'], 'C': ['D', 'A', 'B'], 'B': ['C', 'D', 'A'], 'D': ['A', 'C', 'B']}

Categories