Concatenate 2 dataframes having different values [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
we have the following two dataframes respectively.
Dataframe 1:
id =[30,30]
month =[1,3]
less_data =['pravin','shashi']
df = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','less_data'])
Dataframe 2:
id =[30,30]
month =[1,2]
less_data =['amol','pinak']
df2 = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','zero_data'])
and expected output:
id month less_data zero_data
30 1 pravin amol
30 2 pinak
30 3 shashi
How can I use pd.concat to achieve this or suggest better solution for the same

use pd.concat:
dfn = pd.concat([
df.set_index(['id','month']),
df2.set_index(['id','month'])
], axis = 1).reset_index()

You can do an outer join:
df.join(df2.set_index(['id', 'month']), how='outer', on=['id', 'month'])

You can use pd.merge on month and id
import pandas as pd
import numpy as np
id =[30,30]
month =[1,3]
less_data =['pravin','shashi']
df = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','less_data'])
id =[30,30]
month =[1,2]
less_data =['amol','pinak']
df2 = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','zero_data'])
##### Merge can be thought of joins in SQL
>>> df_merge = pd.merge(df,df2,on=['id','month'],how='outer')
>>> df_merge
id month less_data zero_data
0 30 1 pravin amol
1 30 3 shashi NaN
2 30 2 NaN pinak

Related

Drop rows and reset_index in a dataframe [duplicate]

This question already has answers here:
Pandas reset index is not taking effect [duplicate]
(4 answers)
Closed 5 days ago.
This post was edited and submitted for review 5 days ago.
I was wondering why reset_index() has no effect in the following piece of code.
data = [0,10,20,30,40,50]
df = pd.DataFrame(data, columns=['Numbers'])
df.drop(df.index[2:4], inplace=True)
df.reset_index()
df
Numbers
0 0
1 10
4 40
5 50
UPDATE:
If I use df.reset_index(inplace=True), I see a new column which is not desired.
index Numbers
0 0 0
1 1 10
2 4 40
3 5 50
Because reset_index() has inplace=False as default, so you need to do reset_index(inplace=True). Docs
Please try this code
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'column_name': [1, 2, 0, 4, 0, 6]})
# drop rows where column 'column_name' has value of 0
df = df[df['column_name'] != 0]
# reset the index of the resulting DataFrame
df = df.reset_index(drop=True)
print(df)

Stick the columns based on the one columns keeping ids

I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5

filter in a dataframe by values of another data frame in python (pandas) [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
i have two data frames:
df1 :
ID COUNT
0 202485 6
1 215893 8
2 181840 8
3 168337 7
and another dataframe
df2:
ID
0 202485
1 215893
2 181840
i want to filter /left join the two dataframes:
desired result is
ID COUNT
0 202485 6
1 215893 8
2 181840 8
i tried df1.merge(df2, how='inner', on='ID') : error like ou are trying to merge on object and int64 columns
also used isin, but didn't work.
list=df1['ID'].drop_duplicates().to_list()
df1[df1['ID'].isin(list)]
Any help?
df1 = pd.DataFrame({'ID':[202485,215893,181840,168337],'COUNT':[6,8,8,7]})
df2 = pd.DataFrame({"ID":[202485,215893,181840]})
out_df = pd.merge(df1,df2)
print(out_df)
This gives the desired result
ID COUNT
0 202485 6
1 215893 8
2 181840 8

merging two dataframes while moving column positions [duplicate]

This question already has an answer here:
Merge DataFrames based on index columns [duplicate]
(1 answer)
Closed 4 years ago.
I have a dataframe called df1 that is:
0
103773708 68.50
103773718 57.01
103773730 30.80
103773739 67.62
I have another one called df2 that is:
0
103773739 37.02
103773708 30.25
103773730 15.50
103773718 60.54
105496332 20.00
I'm wondering how I would get them to combine to end up looking like df3:
0 1
103773708 30.25 68.50
103773718 60.54 57.01
103773730 15.50 30.80
103773739 37.02 67.62
105496332 20.00 00.00
As you can see sometimes the index position is not the same, so it has to append the data to the same index. The goal is to append column 0 from df1, into df2 while pushing column 0 in df2 over one.
result = df1.join(df2.rename(columns={0:1})).fillna(0)
Simply merge on index, and then relabel the columns:
df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
df.columns = [0,1]
df = df.fillna(0)
df1.columns = ['1'] # Rename the column from '0' to '1'. I assume names as strings.
df=df2.join(df1).fillna(0) # Join by default is LEFT
df
0 1
103773739 37.02 67.20
103773708 30.25 68.50
103773730 15.50 30.80
103773718 60.54 57.01
105496332 20.00 0.00

Combine two Pandas dataframes with the same index [duplicate]

This question already has an answer here:
What are the 'levels', 'keys', and names arguments for in Pandas' concat function?
(1 answer)
Closed 4 years ago.
I have two dataframes with the same index but different columns. How do I combine them into one with the same index but containing all the columns?
I have:
A
1 10
2 11
B
1 20
2 21
and I need the following output:
A B
1 10 20
2 11 21
pandas.concat([df1, df2], axis=1)
You've got a few options depending on how complex the dataframe is:
Option 1:
df1.join(df2, how='outer')
Option 2:
pd.merge(df1, df2, left_index=True, right_index=True, how='outer')

Categories