Stock taking items in Python Dataframe into a dictionary - python

Suppose I have a Python Dataframe:
Column A
Column B
A
Val 1
A
Val 2
B
Val A
B
Val B
B
Val C
B
Val D
I want to stock-take Column B into a dictionary with key = unique values of Column A, as such:
out = { 'A': ['Val 1','Val 2'],
'B': ['Val A','Val B','Val C','Val D'] }
How would I do that?
I tried making a pivot table but it only allows aggregating column B; I want them as separate value in a list.

One way using pandas.DataFrame.groupby:
out = df.groupby("Column A")["Column B"].apply(list).to_dict()
Output:
{'A': ['Val 1', 'Val 2'], 'B': ['Val A', 'Val B', 'Val C', 'Val D']}

Related

How to combine multiple columns of a pandas Dataframe into one column in JSON format

I have a sample dataframe as follows:
Main Key
Second
Column A
Column B
Column C
Column D
Column E
First
A
Value 1
Value 2
Value 3
Value 4
Value 5
Second
B
Value 6
Value 7
Value 8
Value 9
Value 10
Third
C
Value 11
Value 12
Value 13
Value 14
Value 15
Fourth
D
Value 16
Value 17
Value 18
Value 19
Value 20
I want to make a new column called 'Aggregated Data', where I make each value in Columns A to E, as key-value pair, and combine them in 'Aggregated Data' in JSON Format
The expected output would look like this:
Main Key
Second
Aggregated Data
First
A
{"Column A":"Value 1","Column B":"Value 2","Column C":"Value 3","Column D":"Value 4","Column E":"Value 5"}
Second
B
{"Column A":"Value 6","Column B":"Value 7","Column C":"Value 8","Column D":"Value 9","Column E":"Value 10"}
Third
C
{"Column A":"Value 11","Column B":"Value 12","Column C":"Value 13","Column D":"Value 14","Column E":"Value 15"}
Fourth
D
{"Column A":"Value 16","Column B":"Value 17","Column C":"Value 18","Column D":"Value 19","Column E":"Value 20"}
Any idea how this can be achieved? Thanks
Via intermediate pandas.DataFrame.to_dict call (with orient records to obtain lists like [{column -> value}, … , {column -> value}]):
df[['Main Key', 'Second']].assign(Aggregated_Data=df.set_index(['Main Key', 'Second']).to_dict(orient='records'))
Main Key Second Aggregated_Data
0 First A {'Column A': 'Value 1 ', 'Column B': 'Value 2 ...
1 Second B {'Column A': 'Value 6 ', 'Column B': 'Value 7 ...
2 Third C {'Column A': 'Value 11 ', 'Column B': 'Value 1...
3 Fourth D {'Column A': 'Value 16 ', 'Column B': 'Value 1...
Just skip the first two columns and call to_json :
out = (df[["Main Key", "Second"]]
.assign(Aggregated_Data= df.iloc[:, 2:]
.apply(lambda x: x.to_json(), axis=1))
Alternatively, use a dict/listcomp :
df["Aggregated_Data"] = [{k: v for k, v in zip(df.columns[2:], v)}
for v in df.iloc[:,2:].to_numpy()]
Output :
print(out)
Main Key Second Aggregated_Data
0 First A {"Column A":"Value 1","Column B":"Value 2","Co...
1 Second B {"Column A":"Value 6","Column B":"Value 7","Co...
2 Third C {"Column A":"Value 11","Column B":"Value 12","...
3 Fourth D {"Column A":"Value 16","Column B":"Value 17","...

Nested Dictionary using Pandas DataFrame

I have some data with duplicates that looks like this:
WEBPAGE
ID
VALUE
Webpage 1
ID 1
Value 1
Webpage 1
ID 1
Value 2
Webpage 1
ID 1
Value 3
Webpage 1
ID 2
Value 4
Webpage 1
ID 2
Value 5
Each webpage can have more than 1 ID associated with it and each ID can have more than one value associated with it.
I'd like to ideally have a nested dictionary with lists to handle the multiple IDs and multiple values:
{WEBPAGE: {ID 1: [value 1, value 2, value 3], ID 2: [value 4, value 5]}}
I've tried using to_dict and group_by but I can't seem to find the right syntax to create a nested dictionary within those.
Try:
out = {}
for _, x in df.iterrows():
out.setdefault(x["WEBPAGE"], {}).setdefault(x["ID"], []).append(x["VALUE"])
print(out)
Prints:
{
"Webpage 1": {
"ID 1": ["Value 1", "Value 2", "Value 3"],
"ID 2": ["Value 4", "Value 5"],
}
}
For a pandas approach, you just need to use a nested groupby:
d = (df.groupby('WEBPAGE')
.apply(lambda g: g.groupby('ID')['VALUE'].agg(list).to_dict())
.to_dict()
)
output:
{'Webpage 1': {'ID 1': ['Value 1', 'Value 2', 'Value 3'],
'ID 2': ['Value 4', 'Value 5']}}
Another possible solution, using dictionary comprehension:
{x: {y: [z for z in df.VALUE[(df.WEBPAGE == x) & (df.ID == y)]]
for y in df.ID[df.WEBPAGE == x]} for x in df.WEBPAGE}
Output:
{'Webpage 1': {'ID 1': ['Value 1', 'Value 2', 'Value 3'],
'ID 2': ['Value 4', 'Value 5']}}

Convert nested dict to column in Dataframe

I have data from a json file. I have been able to normalize it to an extent.
I'm wondering if there's an elegant way to convert the remaining data to columns.
I tried to use pd.json_normalize but I get an error on the list in column D.
My next attempt was to separate D and create a Dataframe from the list in D. And normalize each dict individually. And then concat the Dataframes together.
My current issue is that the value of the first key in each dict should be a column name, with the value coming from the key called 'value'. There are five column names across the dicts in column D.
{'key': 'column name', 'value' : 'data value'}
A further complication is that most rows have three dicts, but some have one or two.
I think I could brute force it by swapping the keys and values for the first key in each dict? And then using json_normalise to create columns and values from each dict. But I'm wondering if there's a more elegant way of handling this json data?
I'm trying to turn this:
A
B
C
D
E
0
Value A
Value B
Value c
[{'key': 'column name 1', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} , {'key': 'column name 3', 'value' : 'data value'} ]
1
Value A
Value B
Value c
[{'key': 'column name 1', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} , {'key': 'column name 3', 'value' : 'data value'} ]
2
Value A
Value B
Value c
[{'key': 'column name 1', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} , {'key': 'column name 3', 'value' : 'data value'} ]
3
Value A
Value B
Value c
[{'key': 'column name 1', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} , {'key': 'column name 3', 'value' : 'data value'} ]
4
Value A
Value B
Value c
[{'key': 'column name 4', 'value' : 'data value'} ]
5
Value A
Value B
Value c
[{'key': 'column name 5', 'value' : 'data value'} , {key': 'column name 4', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} ]
6
Value A
Value B
Value c
[{'key': 'column name 5', 'value' : 'data value'} , {key': 'column name 4', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} ]
7
Value A
Value B
Value c
[{'key': 'column name 5', 'value' : 'data value'} , {key': 'column name 4', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} ]
8
Value A
Value B
Value c
[{'key': 'column name 5', 'value' : 'data value'} , {key': 'column name 4', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} ]
9
Value A
Value B
Value c
[{'key': 'column name 5', 'value' : 'data value'} , {key': 'column name 4', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} ]
10
Value A
Value B
Value c
[{'key': 'column name 5', 'value' : 'data value'} , {key': 'column name 4', 'value' : 'data value'} , {key': 'column name 2', 'value' : 'data value'} ]
Into:
A
B
C
D
column name 1
column name 2
column name 3
column name 4
column name 5
0
Value A
Value B
Value c
data value
data value
data value
1
Value A
Value B
Value c
data value
data value
data value
2
Value A
Value B
Value c
data value
data value
data value
3
Value A
Value B
Value c
data value
data value
data value
4
Value A
Value B
Value c
data value
5
Value A
Value B
Value c
data value
data value
6
Value A
Value B
Value c
data value
data value
7
Value A
Value B
Value c
data value
data value
8
Value A
Value B
Value c
data value
data value
9
Value A
Value B
Value c
data value
data value
10
Value A
Value B
Value c
data value
data value
Code:
df = pd.read_json(file)
df2 = pd.DataFrame(df['D'].to_list(),columns = ['list_a','list_b','list_c'])
for column in df2:
df3 = pd.json_normalize(df[column])
df = pd.concat([df,df3], axis = 1)

How to save multiple lists into multiple rows in Pandas?

So I have multiple lists that I would like to convert them to some soft of table format.
list1 has
1
2
3
list2 has
4
5
6
etc.
I would like to save this into a table format such as
list_1, list 2
1, 4
2, 5
3, 6
I've tried
col_a_c_df = pd.DataFrame(columns=['Column A and C', 'Column A and B and C', 'Column A and D and F', 'Column A and B and D and F'],
data=[col_a_c, col_a_b_c, col_a_d_f, col_a_b_d_f])
col_a_c_df.to_csv("result.csv")
but it tells me that ValueError: 4 columns passed, passed data had 17181 columns
How do I do this?
You can call DataFrame constructor after zipping both lists, where A, B represents column names and a, b are lists
df = pd.DataFrame(columns=['A','B'], data=zip(a, b))
If lists are of uneven lengths
from itertools import zip_longest
df = pd.DataFrame(columns=['A','B'], data=zip_longest(a, b)
You can do it like this:
list1 = [1,2,3]
list2 = [4,5,6]
df = pd.DataFrame({'list1': list1, 'list2': list2})
list1 list2
0 1 4
1 2 5
2 3 6
This is a reference example.
feature = ['Column A and C', 'Column A and B and C', 'Column A and D and F',    'Column A and B and D and F']
data = np.array([
[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]
]).T
df = pd.DataFrame(data)
df.columns = feature
print(df)
df.to_csv("result.csv")

Skip item with more columns when creating Pandas DataFrame

I have a list of lists:
list = [
['Row 1','Value 1'],
['Row 2', 'Value 2'],
['Row 3', 'Value 3', 'Value 4']
]
And I have a list for dataframe header:
header_list = ['RowID', 'Value']
If I create the DataFrame using df = pd.DataFrame(list, columns = header_list), then python will through me an error says Row3 has more than 2 columns, which is inconsistent with the header_list.
So how can I skip Row 3 when creating the DataFrame. And how to achieve this with "in-place" calculation, which means NOT creating a new list which loops through the original list and append the item with length=2.
Thanks for the help!
First change variable list to L, because list is python code reserved word.
Then for filter use list comprehension:
L = [['Row 1','Value 1'], ['Row 2', 'Value 2'], ['Row 3', 'Value 3', 'Value 4']]
#for omit all rows != 2
df = pd.DataFrame([x for x in L if len(x) == 2], columns = header_list)
print (df)
RowID Value
0 Row 1 Value 1
1 Row 2 Value 2
#filter last 2 values if len != 2
df = pd.DataFrame([x if len(x) == 2 else x[-2:] for x in L], columns = header_list)
print (df)
RowID Value
0 Row 1 Value 1
1 Row 2 Value 2
2 Value 3 Value 4
Or:
#filter first 2 values if len != 2
df = pd.DataFrame([x if len(x) == 2 else x[:2] for x in L], columns = header_list)
print (df)
RowID Value
0 Row 1 Value 1
1 Row 2 Value 2
2 Row 3 Value 3
try below code:
list1 = [['Row 1','Value 1'], ['Row 2', 'Value 2'], ['Row 3', 'Value 3']]
dff = pd.DataFrame(list1)
dff = dff[[x for x in range(len(header_list))]]
dff.columns = header_list

Categories