Merging two dataframes which has duplicated 'on' value on one side - python

I have two dataframes, and the standard dataframe has some same values(=id) which i have to use as merging point.
+----+------------+------------+------------+
| id | res_number | type | payment |
+----+------------+------------+------------+
| a | 1 | toys | 20000 |
| a | 2 | clothing | 30000 |
| a | 3 | food | 40000 |
| b | 4 | food | 40000 |
| c | 5 | laptop | 30000 |
+----+------------+------------+------------+
\I want to merge this dataframe with below dataframe.
+----+------------+------------+
| id | group | unique_num |
+----+------------+------------+
| a | 1 | 1231 |
| b | 2 | 1234 |
| c | 1 | 1241 |
+----+------------+------------+
and i want to make dataframe like this.
+----+------------+------------+------------+------------+------------+
| id | res_number | type | payment | group | unique_num |
+----+------------+------------+------------+------------+------------+
| a | 1 | toys | 20000 | 1 | 1231 |
| a | 2 | clothing | 30000 | 1 | 1231 |
| a | 3 | food | 40000 | 1 | 1231 |
| b | 4 | food | 40000 | 2 | 1234 |
| c | 5 | laptop | 30000 | 3 | 1241 |
+----+------------+------------+------------+------------+------------+
As you can notice i want to merge dataframes with 'id', but the standard dataframe has some same values on 'id'. My target is just pasting values whatever values on 'id' has.
Can you give me good example of this problem?

I think you need merge with left join:
df = pd.merge(df1, df2, how='left')
Or if possible more common columns names in both DataFrames:
df = pd.merge(df1, df2, how='left', on='id')
print (df)
id payment res_number type group unique_num
0 a 20000 1 toys 1 1231
1 a 30000 2 clothing 1 1231
2 a 40000 3 food 1 1231
3 b 40000 4 food 2 1234
4 c 30000 5 laptop 1 1241

Related

pd.MultiIndex: How do I add 1 more level (0) to a multi-index column?

This sounds trivial, but I just can't add 1 more level of index to the columns of a multi-level column df.
Current State
Category | Cat1 | Cat2 |
|Total Assets| AUMs |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |
Desired State
Importance | H | H |
Category | Cat1 | Cat2 |
|Total Assets| AUMs |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |
When I use the below code
Code 1: Error: isnull is not defined for MultiIndex
df.columns=pd.MultiIndex.from_arrays([['H','H'],df.columns])
Code 2: Error 1st level Name become a combination
df.columns=pd.MultiIndex.from_arrays([['H','H'],df.columns.value])
Importance | H | H |
Category | (Cat1, Total Assets) | (Cat2, AUMs) |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |
Use concat():
df=pd.concat([df],keys=['H'],names=['Importance'],axis=1)

Fetch values corresponding to id of each row python

Is is possible to fetch column containing values corresponding to an id column?
Example:-
df1
| ID | Value | Salary |
|:---------:--------:|:------:|
| 1 | amr | 34 |
| 1 | ith | 67 |
| 2 | oaa | 45 |
| 1 | eea | 78 |
| 3 | anik | 56 |
| 4 | mmkk | 99 |
| 5 | sh_s | 98 |
| 5 | ahhi | 77 |
df2
| ID | Dept |
|:---------:--------:|
| 1 | hrs |
| 1 | cse |
| 2 | me |
| 1 | ece |
| 3 | eee |
Expected Output
| ID | Dept | Value |
|:---------:--------:|----------:|
| 1 | hrs | amr |
| 1 | cse | ith |
| 2 | me | oaa |
| 1 | ece | eea |
| 3 | eee | anik |
I want to fetch each values in the 'Value' column corresponding to values in df2's ID column. And create column containing 'Values' in df2. The number of rows in the two dfs are not the same. I have tried
this
Not worked
IIUC , you can try df.merge after assigning a helper column by doing groupby+cumcount on ID:
out = (df1.assign(k=df1.groupby("ID").cumcount())
.merge(df2.assign(k=df2.groupby("ID").cumcount()),on=['ID','k'])
.drop("k",1))
print(out)
ID Value Dept
0 1 Amr hrs
1 1 ith cse
2 2 oaa me
3 1 eea ece
4 3 anik eee
is this what you want to do?
df1.merge(df2, how='inner',on ='ID')
Since you have duplicated IDs in both dfs, but these are ordered, try:
df1 = df1.drop(columns="ID")
df3 = df2.merge(df1, left_index=True, right_index=True)

Grouping many columns in one column in Pandas

I have a DataFrame that is similar to this one:
| | id | Group1 | Group2 | Group3 |
|---|----|--------|--------|--------|
| 0 | 22 | A | B | C |
| 1 | 23 | B | C | D |
| 2 | 24 | C | B | A |
| 3 | 25 | D | A | C |
And I want to get something like this:
| | Group | id_count |
|---|-------|----------|
| 0 | A | 3 |
| 1 | B | 3 |
| 2 | C | 3 |
| 3 | D | 2 |
Basically for each group I want to know how many people(id) have chosen it.
I know there is pd.groupby(), but it only gives an appropriate result for one column (if I give it a list, it does not combine group 1,2,3 in one column).
Use DataFrame.melt with GroupBy.size:
df1 = (df.melt('id', value_name='Group')
.groupby('Group')
.size()
.reset_index(name='id_count'))
print (df1)
Group id_count
0 A 3
1 B 3
2 C 4
3 D 2

transform pandas dataframe and combine rows

I have a pandas dataframe which looks like that:
|---------------------|------------------|------------------|
| student-id | subject-id | grade |
|---------------------|------------------|------------------|
| 1 | 1234 | 4 |
|---------------------|------------------|------------------|
| 1 | 2234 | 3 |
|---------------------|------------------|------------------|
| 1 | 3234 | 3 |
|---------------------|------------------|------------------|
| 2 | 1234 | 2 |
|---------------------|------------------|------------------|
| 2 | 2234 | 1 |
|---------------------|------------------|------------------|
| 2 | 3234 | 4 |
|---------------------|------------------|------------------|
now I want to transform it, that I get only one row for every student-id with every grade from this student in this row like that:
|---------------------|------------------|------------------|------------------|
| student-id | grade 1 | grade 2 | grade 3 |
|---------------------|------------------|------------------|------------------|
| 1 | 4 | 3 | 3 |
|---------------------|------------------|------------------|------------------|
| 2 | 2 | 1 | 4 |
|---------------------|------------------|------------------|------------------|
thx for help!
you may drop subject-id by del df['column_name'] and then df.groupBy['student-id'] will give grades with respect to student-id.

Value error when merging 2 dataframe with identical number of row

I have a dataframe like this:
+-----+-------+---------+
| id | Time | Name |
+-----+-------+---------+
| 1 | 1 | John |
+-----+-------+---------+
| 2 | 2 | David |
+-----+-------+---------+
| 3 | 4 | Rebecca |
+-----+-------+---------+
| 4 | later | Taylor |
+-----+-------+---------+
| 5 | later | Li |
+-----+-------+---------+
| 6 | 8 | Maria |
+-----+-------+---------+
I want to merge with another table based on 'id' and time:
data1=pd.merge(data1, data2,left_on=['id', 'time'],right_on=['id', 'time'], how='left')
The other table data
+-----+-------+--------------+
| id | Time | Job |
+-----+-------+--------------+
| 2 | 2 | Doctor |
+-----+-------+--------------+
| 1 | 1 | Engineer |
+-----+-------+--------------+
| 4 | later | Receptionist |
+-----+-------+--------------+
| 3 | 4 | Professor |
+-----+-------+--------------+
| 5 | later | Lawyer |
+-----+-------+--------------+
| 6 | 8 | Trainer |
+-----+-------+--------------+
It raised error:
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
What I tried:
data1['time']=data1['time'].astype(str)
data2['time']=data2['time'].astype(str)
Did not work. What can I do?
PS: in this example Id are different, but in my data Id can be the same so I need to merge both on Time and Id
Have you tried also casting 'id' column to either str or int?
Sorry but I have not enough reputation for just comment your question.

Categories