Mapping a Column from One Dataframe to Another - python

I would like to map the values in df2['col2'] to df['col1']:
df col1 col2
0 w a
1 1 2
2 2 3
I would like to use a column from the dataframe as a dictionary to get:
col1 col2
0 w a
1 A 2
2 B 3
However the data dictionary is just a column in df2, which looks like
df2 col1 col2
1 1 A
2 2 B
I have tried using this:
di = {"df2['col1']: df2['col2']}
final = df1.replace({"df2['col2']": di})
But get an error: TypeError: 'Series' objects are mutable, thus they cannot be hashed
I have about a 200,000 rows. Any help would be appreciated.
Edit:
The sample dictionary would look like di = {1: "A", 2: "B"}, but is in df2['col1']: df2['col2']. I have 200k+ rows, can I convert df2['col1']: df2['col2'] to a tuple, etc?

You can build a lookup dictionary based on the col1:col2 of df2 and then use that to replace the values in df1.col1.
import pandas as pd
df1 = pd.DataFrame({'col1':['w',1,2],'col2':['a',2,3]})
df2 = pd.DataFrame({'col1':[1,2],'col2':['A','B']})
print(df1)
# col1 col2
#0 w a
#1 1 2
#2 2 3
print(df2)
# col1 col2
#0 1 A
#1 2 B
dataLookUpDict = {row[1]:row[2] for row in df2[['col1','col2']].itertuples()}
final = df1.replace({'col1': dataLookUpDict})
print(final)
# col1 col2
#0 w a
#1 A 2
#2 B 3

Related

Find name of column which is non nan

I have a Dataframe defined like :
df1 = pd.DataFrame({"col1":[1,np.nan,np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan],
"col2":[np.nan,3,np.nan,4,np.nan,np.nan,np.nan,5,6],
"col3":[np.nan,np.nan,7,np.nan,np.nan,8,9,np.nan, np.nan]})
I want to transform it into a DataFrame like:
df2 = pd.DataFrame({"col_name":['col1','col2','col3','col2','col1',
'col3','col3','col2','col2'],
"value":[1,3,7,4,2,8,9,5,6]})
If possible, can we reverse this process too? By that I mean convert df2 into df1.
I don't want to go through the DataFrame iteratively as it becomes too computationally expensive.
You can stack it:
out = (df1.stack().astype(int).droplevel(0)
.rename_axis('col_name').reset_index(name='value'))
Output:
col_name value
0 col1 1
1 col2 3
2 col3 7
3 col2 4
4 col1 2
5 col3 8
6 col3 9
7 col2 5
8 col2 6
To go from out back to df1, you could pivot:
out1 = pd.pivot(out.reset_index(), 'index', 'col_name', 'value')

Add column to pandas dataframe from a reversed dictionary

I have a dataframe (pandas) and a dictionary with keys and values as list. The values in lists are unique across all the keys. I want to add a new column to my dataframe based on values of the dictionary having keys in it. E.g. suppose I have a dataframe like this
import pandas as pd
df = {'a':1, 'b':2, 'c':2, 'd':4, 'e':7}
df = pd.DataFrame.from_dict(df, orient='index', columns = ['col2'])
df = df.reset_index().rename(columns={'index':'col1'})
df
col1 col2
0 a 1
1 b 2
2 c 2
3 d 4
4 e 7
Now I also have dictionary like this
my_dict = {'x':['a', 'c'], 'y':['b'], 'z':['d', 'e']}
I want the output like this
col1 col2 col3
0 a 1 x
1 b 2 y
2 c 2 x
3 d 4 z
4 e 7 z
Presently I am doing this by reversing the dictionary first, i.e. like this
my_dict_rev = {value:key for key in my_dict for value in my_dict[key]}
df['col3']= df['col1'].map(my_dict_rev)
df
But I am sure that there must be some direct method.
I know this is an old question but here are two other ways to do the same job. First convert my_dict to a Series object, then explode it. Then reverse the mapping and use map:
tmp = pd.Series(my_dict).explode()
df['col3'] = df['col1'].map(pd.Series(tmp.index, tmp))
Another option (starts similar to above) but instead of map, merge:
df = df.merge(pd.Series(my_dict, name='col1').explode().rename_axis('col3').reset_index())
Output:
col1 col2 col3
0 a 1 x
1 b 2 y
2 c 2 x
3 d 4 z
4 e 7 z

How to conditionally replace Pandas dataframe column values from another dataframe

I have the following 2 dataframes:
df1 = pd.DataFrame({"col1":[1, 2, 3],
"col2":["a", "b", "c"]})
df1
Output:
col1 col2
0 1 a
1 2 b
2 3 c
And the second one:
df2 = pd.DataFrame({"col1":[1, 2, 3, 4, 5],
"col2":["x", "y", "z", "q", "w"]})
df2
Output:
col1 col2
0 1 x
1 2 y
2 3 z
3 4 q
4 5 w
Additional info:
col1 in both data frames have unique values.
col2 does not necessarily have unique values.
What to achieve:
How can I replace values of col2 in df1 with the corresponding col2 values from df2 from the matching col1 values?
Desired final content of df1 is supposed to be as following:
col1 col2
0 1 x
1 2 y
2 3 z
Create dict by zipping the df2 columns.
Use map to transfer values over to df1. Code below
df1['col2']=df1['col1'].map(dict(zip(df2['col1'],df2['col2'])))
try .map
df1['col2'] = df1['col1'].map(df2.set_index('col1')['col2'])
# col1 col2
# 0 1 x
# 1 2 y
# 2 3 z

Reformat dataframe using pandas by adding new rows based on dictionary value

Given below is my dataframe
df = pd.DataFrame({'Col1':['1','2'],'Col2':[{'a':['a1','a2']},{'b':['b1']}]})
Col1 Col2
0 1 {u'a': [u'a1', u'a2']}
1 2 {u'b': [u'b1']}
I need to reformat this data frame as below
Col1 NCol2 NCol3
0 1 a a1
1 1 a a2
2 2 b b1
Basically, for each key value pair in the dictionary, i am adding a row with key and value in Ncol2 and Ncol3.
Thanks for help in advance.
You can use the following solution:
df1 = df['Col2'].apply(pd.Series).apply(lambda x: x.explode())\
.stack().reset_index(level=1)
df1.columns = ['Col2', 'Col3']
df.drop('Col2', axis=1).merge(df1, left_index=True, right_index=True)\
.reset_index(drop=True)
Output:
Col1 Col2 Col3
0 1 a a1
1 1 a a2
2 2 b b1

how to create a dataframe aggregating (grouping?) a dataframe containing only strings

I would like to create a dataframe "aggregating" a larger data set.
Starting:
df:
col1 col2
1 A B
2 A C
3 A B
and getting:
df_aggregated:
col1 col2
1 A B
2 A C
without using any calclulation (count())
I would write:
df_aggreagated = df.groupby('col1')
but I do not get anything
print ( df_aggregated )
"error"
any help appreciated
You can accomplish this by simply dropping the duplicate entries using the df.drop_duplicates function:
df_aggregated = df.drop_duplicates(subset=['col1', 'col2'], keep=False)
print(df_aggregated)
col1 col2
1 A B
2 A C
You can use groupby with a function:
In [849]: df.groupby('col2', as_index=False).max()
Out[849]:
col2 col1
0 B A
1 C A

Categories