Convert nested dictionary to appended dataframe - python

I have a dictionary as such:
{1:{'name':'john', 'age':26,'salary':50000},11:{'name':'peter', 'age':34, 'salary':70000},14:{'name': 'david', 'age': 21, 'salary': 15000}}
I would like to convert it to a dataframe like this:
name age salary
john 26 50000
peter 34 70000
david 21 15000

Use from_dict with orient='index':
pd.DataFrame.from_dict(d, orient='index')
name age salary
1 john 26 50000
11 peter 34 70000
14 david 21 15000

You can load the dictionary directly into a dataframe and then transpose it:
d = {1:{'name':'john', 'age':26,'salary':50000},11:{'name':'peter', 'age':34, 'salary':70000},14:{'name': 'david', 'age': 21, 'salary': 15000}}
df = pd.DataFrame(d).T
age name salary
1 26 john 50000
11 34 peter 70000
14 21 david 15000

Construct the dataframe out of your dict's values.
>>> d = {1:{'name':'john', 'age':26,'salary':50000},11:{'name':'peter', 'age':34, 'salary':70000},14:{'name': 'david', 'age': 21, 'salary': 15000}}
>>> pd.DataFrame(list(d.values()))
age name salary
0 26 john 50000
1 34 peter 70000
2 21 david 15000
With rearranged columns:
>>> pd.DataFrame(list(d.values()), columns=['name', 'age', 'salary'])
name age salary
0 john 26 50000
1 peter 34 70000
2 david 21 15000

Do this:
pd.DataFrame(list(d.values()))
If you're using Python2, you can directly call pd.DataFrame with p.values() like this:
pd.DataFrame(d.values())
This is because dictionary values no longer returns a list in python3

Related

How to replace a row value in a pandas dataframe after a desired number is achieved?

Here's a simple piece of code, something similar to what I am doing. I'm trying to replace the value after 1 with a -1. But in my case, how would I do it if I don't know where the 1's are in a dataframe of over 1000's of rows?
import pandas as pd
df = pd.DataFrame({'Name':['Craig', 'Davis', 'Anthony', 'Tony'], 'Age':[22, 27, 24, 33], 'Employed':[0, 1, 0, 0]})
df
I have this...
Name
Age
Employed
Craig
22
0
Davis
27
1
Anthony
24
0
Tony
33
0
I want something similar to this but iterable through 1000's of rows
Name
Age
Employed
Craig
22
0
Davis
27
1
Anthony
24
-1
Tony
33
0
Use shift to get the next row after a 1:
df = df.loc[df['Employed'].shift() == 1, 'Employed'] = -1
print(df)
# Output
Name Age Employed
0 Craig 22 0
1 Davis 27 1
2 Anthony 24 -1
3 Tony 33 0

Pandas filtering based on 2 different columns conditions

So lets say, I have the following dataframe.
data = pd.DataFrame({'Name': ['RACHEL', 'MONICA', 'PHOEBE', 'ROSS', 'CHANDLER', 'JOEY', 'RACHEL', 'RACHEL'],
'Age': [30, 35, 37, 33, 34, 30, 30, 15],
'Salary': [100000, 93000, 88000, 120000, 94000, 95000, 100000, 10],
'Job': ['DESIGNER', 'CHEF', 'MASUS', 'PALENTOLOGY',
'IT', 'ARTIST', 'DESIGNER', 'CHEF']})
which gives:
Name Age Salary Job
RACHEL 30 100000 DESIGNER
MONICA 35 93000 CHEF
PHOEBE 37 88000 MASUS
ROSS 33 120000 PALENTOLOGY
CHANDLER 34 94000 IT
JOEY 30 95000 ARTIST
RACHEL 30 100000 DESIGNER
RACHEL 15 10 CHEF
What I want to do it pretty simple, I want to filter(get rows) and get rows where Name != 'RACHEL' and Job != 'CHEF';
Expected result set:
Name Age Salary Job
RACHEL 30 100000 DESIGNER
MONICA 35 93000 CHEF
PHOEBE 37 88000 MASUS
ROSS 33 120000 PALENTOLOGY
CHANDLER 34 94000 IT
JOEY 30 95000 ARTIST
RACHEL 30 100000 DESIGNER
Note that the last entry is removed.
What i have tried so far is:
data = data.loc[ (data.Name != 'RACHEL') & (data.Job != 'CHEF') ]
This filters other rows Where Name = "RACHEL" OR Job = "CHEF". I only want to filter the last row where Name = 'RACHEL' and in the same row the Job = "CHEF".
Any help is appreciated. Thanks.
Use this:
data = data.loc[ ~((data.Name == 'RACHEL') & (data.Job == 'CHEF')) ]
You want to remove all the rows that have both Name = RACHEL and Job = CHEF. So just write that condition and invert it to filter them out.
rachefs = df[~(df["Name"] == "RACHEL") | ~(df["Job"] == "CHEF")]
The | usually meaning OR turns into an AND because the negatives we use.
https://en.wikipedia.org/wiki/De_Morgan%27s_laws#

Pandas dataframe sorting string values and by descending aggregated values

I'm working on transforming a dataframe to show the top 3 earners.
The dataframe looks like this
data = {'Name': ['Allistair', 'Bob', 'Carrie', 'Diane', 'Allistair', 'Bob', 'Carrie','Evelyn'], 'Sale': [20, 21, 19, 18, 5, 300, 35, 22]}
df = pd.DataFrame(data)
print(df)
Name Sale
0 Allistair 20
1 Bob 21
2 Carrie 19
3 Diane 18
4 Allistair 5
5 Bob 300
6 Carrie 35
7 Evelyn 22
In my actual dataset, I have several more columns and rows, and I want to print out and get to
something like
Name Sale
0 Bob 321
1 Carrie 35
2 Allistair 25
Every iteration that I've searched through doesn't quite get there because I get
'Name' is both an index level and a column label, which is ambiguous.
Use groupby:
>>> df.groupby('Name').sum().sort_values('Sale', ascending=False)
Sale
Name
Bob 321
Carrie 54
Allistair 25
Evelyn 22
Diane 18
Thanks to #Andrej Kasely above,
df.groupby("Name")["Sale"].sum().nlargest(3)

Filtering duplicate values using groupby

I'm reading the documentation to understand the method filter when used with groupby. In order to understand it, I've got the below scenario:
I'm trying to get the duplicate names grouped by city from my DataFrame df.
Below is my try:
df = pd.DataFrame({
'city':['LA','LA','LA','LA','NY', 'NY'],
'name':['Ana','Pedro','Maria','Maria','Peter','Peter'],
'age':[24, 27, 19, 34, 31, 20],
'sex':['F','M','F','F','M', 'M'] })
df_filtered = df.groupby('city').filter(lambda x: len(x['name']) >= 2)
df_filtered
The output I'm getting is:
city name age sex
LA Ana 24 F
LA Pedro 27 M
LA Maria 19 F
LA Maria 34 F
NY Peter 31 M
NY Peter 20 M
The output I'm expecting is:
city name age sex
LA Maria 19 F
LA Maria 34 F
NY Peter 31 M
NY Peter 20 M
It's not clear to me in which cases I have to use different column names in the "groupby" method and in the "len" inside of the "filter" method
Thank you
How about just duplicated:
df[df.duplicated(['city', 'name'], keep=False)]
You should groupby two columns 'city','name'
Yourdf=df.groupby(['city','name']).filter(lambda x : len(x)>=2)
Yourdf
Out[234]:
city name age sex
2 LA Maria 19 F
3 LA Maria 34 F
4 NY Peter 31 M
5 NY Peter 20 M

Convert list of dictionaries in a column to multiple columns in same dataframe

My dataframe has a column with list of dictionaries. How can i convert it into a extented dataframe? The dataframe is as shown.
A B C
123 abc [{"name":"john"},{"age":"28"},{"salary":"50000"}]
345 bcd [{"name":"alex"},{"age":"38"},{"salary":"40000"}]
567 xyx [{"name":"Dave"},{"age":"82"},{"salary":"30000"}]
I tried the following
> df1=pd.concat([pd.DataFrame(x) for x
> indf['C']],keys=df['A']).reset_index(level=1, drop=True).reset_index()
The Final output looks like
A B name salary age
123 abc john 50000 28
345 bcd alex 40000 38
567 xyx Dave 30000 82
IIUC, flatten your list of dicts to one dict, then we using dataframe constructor , and just need concat back to original df
from itertools import chain
s=pd.DataFrame([dict(chain(*map(dict.items,x))) for x in df.pop('C').tolist()],index=df.index)
s
age name salary
0 28 john 50000
1 38 alex 40000
2 82 Dave 30000
s=pd.concat([df,s],1)
s
A B age name salary
0 123 abc 28 john 50000
1 345 bcd 38 alex 40000
2 567 xyx 82 Dave 30000
Data input :
df.to_dict()
{'A': {0: 123, 1: 345, 2: 567}, 'B': {0: 'abc', 1: 'bcd', 2: 'xyx'}, 'C': {0: [{'name': 'john'}, {'age': '28'}, {'salary': '50000'}], 1: [{'name': 'alex'}, {'age': '38'}, {'salary': '40000'}], 2: [{'name': 'Dave'}, {'age': '82'}, {'salary': '30000'}]}}

Categories