Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe

Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe - python

I am trying to add a column to a pandas dataframe (df1) that has a unique identifier ('id') column from another dataframe (df2) that has the same unique identifier ('sameid'). I have tried merge, but I need to only add one specific column ('addthiscolumn') not all of the columns. What is the best way to do this?
print df1
'id' 'column1'
0 aaa randomdata1
1 aab randomdata2
2 aac randomdata3
3 aad randomdata4
print df2
'sameid' 'irrelevant' 'addthiscolumn'
0 aaa irre1 1234
1 aab irre2 2345
2 aac irre3 3456
3 aad irre4 4567
4 aae irre5 5678
5 aad irre6 6789
Desired Result
print df1
'id' 'column1' 'addthiscolumn'
0 aaa randomdata1 1234
1 aab randomdata2 2345
2 aac randomdata3 3456
3 aad randomdata4 4567

Because you just want to merge a single column, you can select as follows:
df1.merge(df2[['sameid', 'addthiscolumn']], left_on='id', right_on='sameid')

Related

How to insert a column value by comparing columns from two data frames in pandas

I am trying to compare two columns from two different dataframes and insert a new column in second dataframe from the first one.
I have two data frames df1 and df2. I would like to compare ID column from df1 and df2 and insert filename in df2 .
df1:
ID Date filename col2
1 20220207 data1.csv AAA
2 20220207 data2.csv BBB
3 20220207 data2.csv CCC
df2:
ID Date col1
1 20220207 123XER
2 20220207 234FGY
3 20220207 000GGG
Result
df2:
ID Date col1 filename
1 20220207 123XER data1.csv
2 20220207 234FGY data2.csv
3 20220207 000GGG data2.csv
I tried with below code
df2['FileName']=np.where(df1['ID'].equals(df2['ID']), df1['filename'], '')
It throws below error.
Length of values (1863) does not match length of index (1862)
Can anyone please help me with this logic?

df2['FileName'] = np.where(df1['ID'] == df2['ID'], df1['filename'], None)

Returning the rows based on specific value without column name

I know how to return the rows based on specific text by specifying the column name like below.
import pandas as pd
data = {'id':['1', '2', '3','4'],
'City1':['abc','def','abc','khj'],
'City2':['JH','abc','abc','yuu'],
'City2':['JRR','ytu','rr','abc']}
df = pd.DataFrame(data)
df.loc[df['City1']== 'abc']
and output is -
id City1 City2
0 1 abc JRR
2 3 abc rr
but what i need is -my specific value 'abc' can be in any columns and i need to return rows values that has specific text eg 'abc' without giving column name. Is there any way? need output as below
id City1 City2
0 1 abc JRR
1 3 abc rr
2 4 khj abc

You can use any with the (1) parameter to apply it on all columns to get the expected result :
>>> df[(df == 'abc').any(1)]
id City1 City2
0 1 abc JRR
2 3 abc rr
3 4 khj abc

Combine two dataframes where column values match

I have two dataframes containing similar columns:
ID prop1
1 UUU &&&
2 III ***
3 OOO )))
4 PPP %%%
and
ID prop2
1 UUU 1234
2 WWW 4567
3 III 7890
5 EEE 0123
6 OOO 3456
7 RRR 6789
8 PPP 9012
I need to merge these two dataframes where the IDs match, and add the prop2 column to the original.
ID prop1 prop1
1 UUU &&& 1234
2 III *** 7890
3 OOO ))) 3456
4 PPP %%% 9012
Ive tried every combination of merge, join, concat, for, iter, etc. It will either fail to merge, lose the index, or straight-up drop the column values.

You can use pd.merge():
pd.merge(df1, df2, on='ID')
Output:
ID prop1 prop2
0 UUU &&& 1234
1 III *** 7890
2 OOO ))) 3456
3 PPP %%% 9012
You can also use df.merge() as follows::
df1.merge(df2, on='ID')
Same result.
The default parameter on .merge() no matter using pd.merge() or df.merge() is how='inner'. So you are already doing an inner join without specifying how= parameter.
More complex scenario:
If you require the more complicated situation to maintain the index of df1 1, 2, 3, 4 instead of 0, 1, 2, 3, you can do it by resetting index before merge and then set index on the interim index column produced when resetting index:
df1.reset_index().merge(df2, on='ID').set_index('index')
Output:
ID prop1 prop2
index
1 UUU &&& 1234
2 III *** 7890
3 OOO ))) 3456
4 PPP %%% 9012
Now, the index 1 2 3 4 of original df1 are kept.
Optionally, if you don't want the axis label index appear on top of the row index, you can do a rename_axis() as follows:
df1.reset_index().merge(df2, on='ID').set_index('index').rename_axis(index=None)
Output:
ID prop1 prop2
1 UUU &&& 1234
2 III *** 7890
3 OOO ))) 3456
4 PPP %%% 9012

You can also use .map to add the prop2 values to your original dataframe, where the ID column values match.
df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy())
Should there be any IDs in your original dataframe that aren't also in the second one (and so don't have a prop2 value to bring across, you can fill those holes by adding .fillna() with the value of your choice.
df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy()).fillna(your_fill_value_here)

How to flip dataframe with column names and values within column while keeping all static columns in dataframe?

I have a dataframe that has 500 columns, 2 columns ('FieldTitle', 'Value') columns whose rows I want to 'flip' into columns and df looks like this:
id FieldTitle Value UID number XID
1 fname aaa 12 123 345
1 lname bbb 12 123 345
2 fname ccc 23 432 543
2 lname ddd 23 432 543
How do I make the dataframe look like this?:
id fname lname UID number XID
1 aaa bbb 12 123 345
2 ccc ddd 23 432 543
currently when I pivot, only the columns in 'FieldTitle' and 'Value' are remaining while all the static columns get dropped.
pivoted_df = pd.pivot_table(df, index='Id', columns='FieldTitle', values='Value', aggfunc='first').reset_index()
I have also tried the below, with no success:
pivoted_df = pd.pivot_table(df, index='Id', columns='FieldTitle', values=['Value'], aggfunc='first').reset_index()

You can pass list of columns names to parameter index:
pivoted_df = pd.pivot_table(df, index=['id','UID','number','XID'],
columns='FieldTitle',
values='Value',
aggfunc='first').reset_index()
print (pivoted_df)
FieldTitle id UID number XID fname lname
0 1 12 123 345 aaa bbb
1 2 23 432 543 ccc ddd
If want dynamically add values to index parameter:
cols = df.columns.difference(['FieldTitle','Value']).tolist()
pivoted_df = pd.pivot_table(df, index=cols,
columns='FieldTitle',
values='Value',
aggfunc='first').reset_index()
print (pivoted_df)

collapse group into one row pandas dataframe

I have a dataframe as below:
id timestamp name
1 2018-01-23 15:49:53 "aaa"
1 2018-01-23 15:54:56 "bbb"
1 2018-01-23 15:49:57 "bbb"
1 2018-01-23 15:49:54 "ccc"
This is one example of group of id from my data. I have several groups of ids.
What I am trying to do is to collapse each group into a row but in a chronological order according to timestamp eg like this
id name
1 aaa->ccc->bbb->bbb
The values in name are in chronological order as they appear with timestamp. Any pointers regarding this ?

I too the liberty to add some data to your df:
print(df)
Output:
id timestamp name
0 1 2018-01-23T15:49:53 aaa
1 1 2018-01-23T15:54:56 bbb
2 1 2018-01-23T15:49:57 bbb
3 1 2018-01-23T15:49:54 ccc
4 2 2018-01-23T15:49:54 ccc
5 2 2018-01-23T15:49:57 aaa
Then you need:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(['id', 'timestamp'])
grp = df.groupby('id')['name'].aggregate(lambda x: '->'.join(tuple(x))).reset_index()
print(grp)
Output:
id name
0 1 aaa->ccc->bbb->bbb
1 2 ccc->aaa

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe - python

Because you just want to merge a single column, you can select as follows: df1.merge(df2[['sameid', 'addthiscolumn']], left_on='id', right_on='sameid')

Related

How to insert a column value by comparing columns from two data frames in pandas

Returning the rows based on specific value without column name

Combine two dataframes where column values match

How to flip dataframe with column names and values within column while keeping all static columns in dataframe?

collapse group into one row pandas dataframe

Categories

Resources