Pairwise reshaping dataframe - python

I am trying to build a list of graph edges from a two-column data frame representing one edge per node.
pd.DataFrame({'node': ['100', '100', '200', '200', '200'],
'edge': ['111111', '222222', '123456', '456789', '987654']})
The result should look like this
pd.DataFrame({'node': ['100', '100','200', '200', '200', '200', '200', '200'],
'edge1': ['111111','222222','123456', '123456', '456789', '456789', '987654', '987654'],
'edge2': ['222222', '111111','456789', '987654', '987654', '123456' , '123456','456789']})
I have been wrestling with pivot table and stack for a while but no success.

You can use itertools.permutations to get the permutations of the edges after groupby, then convert the output to a new df to generate the desired output:
import pandas as pd
from itertools import permutations
df = pd.DataFrame({'node': ['100', '100', '200', '200', '200'],'edge': ['111111', '222222', '123456', '456789', '987654']})
df = df.groupby('node')['edge'].apply(list).apply(lambda x:list(permutations(x, 2))).reset_index().explode('edge')
pd.DataFrame(df["edge"].to_list(), index=df['node'], columns=['edge1', 'edge2']).reset_index()
Result:
node
edge1
edge2
0
100
111111
222222
1
100
222222
111111
2
200
123456
456789
3
200
123456
987654
4
200
456789
123456
5
200
456789
987654
6
200
987654
123456
7
200
987654
456789

Related

"Flatten" pandas column that is JSON to create new row but keep ID

I am looking to "flatten" the JSON in this table. I am trying to use pandas but it doesn't need to use pandas. I have tried both the explode method and json_normalize() but neither worked. Maybe I used them wrong?
I'm trying to go from this:
order_id
JSON
1
[{'key': '100', 'product': 'soap'},{key': '104', 'product': 'butter'}]
2
[{'key': '97', 'product': 'baby wipes'},{key': '104', 'product': 'butter'},{key': '107', 'product': 'milk'}]
3
[{'key': '95', 'product': 'diapers'},{key': '104', 'product': 'butter'},{key': '110', 'product': 'toothpaste'}]
4
[{'key': '100', 'product': 'soap'},{key': '101', 'product': 'yogurt'},{key': '111', 'product': 'hair brush'},{key': '112', 'product': 'hair dye'}]
to this:
order_id
key
product
1
100
soap
1
104
butter
2
97
baby wipes
2
104
butter
2
107
milk
3
95
diapers
3
104
butter
3
110
toothpaste
4
100
soap
4
101
yogurt
4
111
hair brush
4
112
hair dye
Any help or point in the right direction is extremely appreciated!
Use explode to get the objects out of the lists, and then convert the list of objects to a dataframe and concatenate it:
new_df = df.explode('JSON')
new_df = pd.concat([new_df.drop('JSON', axis=1), pd.DataFrame(new_df['JSON'].tolist(), index=new_df.index)], axis=1)
Output:
>>> new_df
order_id key product
0 1 100 soap
0 1 104 butter
1 2 97 baby wipes
1 2 104 butter
1 2 107 milk
2 3 95 diapers
2 3 104 butter
2 3 110 toothpaste
3 4 100 soap
3 4 101 yogurt
3 4 111 hair brush
3 4 112 hair dye
Note that before you run that code, you may need to run one of the following two lines to convert the JSON strings to actual Python dicts:
import json
new_df['JSON'] = new_df['JSON'].apply(json.loads)
or
import ast
new_df['JSON'] = new_df['JSON'].apply(ast.literal_eval)

Converting to dataframe, beginner question

I have a piece of data that looks like this
my_data[:5]
returns:
[{'key': ['Aaliyah', '2', '2016'], 'values': ['10']},
{'key': ['Aaliyah', '2', '2017'], 'values': ['26']},
{'key': ['Aaliyah', '2', '2018'], 'values': ['21']},
{'key': ['Aaliyah', '2', '2019'], 'values': ['26']},
{'key': ['Aaliyah', '2', '2020'], 'values': ['15']}]
The key represents Name, Gender, and Year. The value is number.
I do not manage to generate a data frame with columns name, gender, year, and number.
Can you help me?
Here is one way, using a generator:
from itertools import chain
pd.DataFrame.from_records((dict(zip(['name', 'gender', 'year', 'number'],
chain(*e.values())))
for e in my_data))
Without itertools:
pd.DataFrame(((E:=list(e.values()))[0]+E[1] for e in my_data),
columns=['name', 'gender', 'year', 'number'])
output:
name gender year number
0 Aaliyah 2 2016 10
1 Aaliyah 2 2017 26
2 Aaliyah 2 2018 21
3 Aaliyah 2 2019 26
4 Aaliyah 2 2020 15

Pandas: Flatten Nested Dictionary vertically

I have a list of dictionary as below:
[{'name': 'jack', 'tagList': [{'tagId': '10', 'tagName': 'AB'},{'tagId': '20',
'tagName': 'BC'}]},
{'name': 'mike', 'tagList': [{'tagId': '30', 'tagName': 'DE'},{'tagId': '40',
'tagName': 'FG'}]}]
I want to turn this into a dataframe like below:
Name tagList_tagID tagList_tagName
Jack 10 AB
Jack 20 BC
mike 30 DE
mike 40 FG
How can I convert this list of dictionaries to pandas dataframe in an efficient way.
Try with json.normalize:
lst = [{'name': 'jack', 'tagList': [{'tagId': '10', 'tagName': 'AB'},
{'tagId': '20', 'tagName': 'BC'}]},
{'name': 'mike', 'tagList': [{'tagId': '30', 'tagName': 'DE'},
{'tagId': '40', 'tagName': 'FG'}]}]
df = pd.json_normalize(lst, record_path="tagList", meta=["name"])
#formatting to match expected output
df = df.set_index("name").add_prefix("tagList_")
>>> df
tagList_tagId tagList_tagName
name
jack 10 AB
jack 20 BC
mike 30 DE
mike 40 FG

Python, Take Multiple Lists and Putting into pd.Dataframe

I have seen a variety of answers to this question (like this one), and have had no success in getting my lists into one dataframe. I have one header list (meant to be column headers), and then a variable that has multiple records in it:
list1 = ['Rank', 'Athlete', 'Distance', 'Runs', 'Longest', 'Avg. Pace', 'Elev. Gain']
list2 = (['1', 'Jack', '57.4 km', '4', '21.7 km', '5:57 /km', '994 m']
['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m']
['3', 'Kelsey', '32.6 km', '2', '21.3 km', '5:46 /km', '141 m'])
When I try something like:
df = pd.DataFrame(list(zip(['1', 'Jack, '57.4 km', '4', '21.7 km', '5:57 /km', '994 m'],
# ['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m'])))
It lists all the attributes as their own rows, like so:
0 1
0 1 2
1 Jack Jill
2 57.4 km 34.0 km
3 4 2
4 21.7 km 17.9 km
5 5:57 /km 5:27 /km
6 994 m 152 m
How do I get this into a frame that has list1 as the headers, and the rest of the data neatly squared away?
Given
list1 = ['Rank', 'Athlete', 'Distance', 'Runs', 'Longest', 'Avg. Pace', 'Elev. Gain']
list2 = (['1', 'Jack', '57.4 km', '4', '21.7 km', '5:57 /km', '994 m'],
['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m'],
['3', 'Kelsey', '32.6 km', '2', '21.3 km', '5:46 /km', '141 m'])
do
pd.DataFrame(list2, columns=list1)
which returns
Rank Athlete Distance Runs Longest Avg. Pace Elev. Gain
0 1 Jack 57.4 km 4 21.7 km 5:57 /km 994 m
1 2 Jill 34.0 km 2 17.9 km 5:27 /km 152 m
2 3 Kelsey 32.6 km 2 21.3 km 5:46 /km 141 m
Change your second list into a list of lists and then
df = pd.DataFrame(columns = list1, data = list2)

Converting Data frame into a dict with columns as key inside key

I have a pandas data frame.
mac_address no. of co_visit no. of random_visit
0 00:02:1a:11:b0:b9 1 2
1 00:02:71:d6:04:84 1 1
2 00:05:33:34:2f:f2 1 3
3 00:08:22:04:c4:fb 1 4
4 00:08:22:06:7b:41 1 1
5 00:08:22:07:48:15 1 1
6 00:08:22:08:a8:54 1 3
7 00:08:22:0e:0a:fc 1 1
I want to convert it into a dictionary with mac_address as key and 'no. of co_visit' and 'no. of random_visit' as subkeys inside key and value across that column as value inside subkey. So, my output for first 2 row will be like.
00:02:1a:11:b0:b9:{no. of co_visit:1, no. of random_visit: 2}
00:02:71:d6:04:84:{no. of co_visit:1, no. of random_visit: 1}
I am using python2.7. Thank you.
I was able to set mac_address as key but the values were being added as list inside key, not key inside key.
You can use pandas.DataFrame.T and to_dict().
df.set_index('mac_address').T.to_dict()
Output:
{'00:02:1a:11:b0:b9': {'no. of co_visit': '1', 'no. of random_visit': '2'},
'00:02:71:d6:04:84': {'no. of co_visit': '1', 'no. of random_visit': '1'},
'00:05:33:34:2f:f2': {'no. of co_visit': '1', 'no. of random_visit': '3'},
'00:08:22:04:c4:fb': {'no. of co_visit': '1', 'no. of random_visit': '4'},
'00:08:22:06:7b:41': {'no. of co_visit': '1', 'no. of random_visit': '1'},
'00:08:22:07:48:15': {'no. of co_visit': '1', 'no. of random_visit': '1'},
'00:08:22:08:a8:54': {'no. of co_visit': '1', 'no. of random_visit': '3'},
'00:08:22:0e:0a:fc': {'no. of co_visit': '1', 'no. of random_visit': '1'}}

Categories