Value error when merging 2 dataframe with identical number of row - python

I have a dataframe like this:
+-----+-------+---------+
| id | Time | Name |
+-----+-------+---------+
| 1 | 1 | John |
+-----+-------+---------+
| 2 | 2 | David |
+-----+-------+---------+
| 3 | 4 | Rebecca |
+-----+-------+---------+
| 4 | later | Taylor |
+-----+-------+---------+
| 5 | later | Li |
+-----+-------+---------+
| 6 | 8 | Maria |
+-----+-------+---------+
I want to merge with another table based on 'id' and time:
data1=pd.merge(data1, data2,left_on=['id', 'time'],right_on=['id', 'time'], how='left')
The other table data
+-----+-------+--------------+
| id | Time | Job |
+-----+-------+--------------+
| 2 | 2 | Doctor |
+-----+-------+--------------+
| 1 | 1 | Engineer |
+-----+-------+--------------+
| 4 | later | Receptionist |
+-----+-------+--------------+
| 3 | 4 | Professor |
+-----+-------+--------------+
| 5 | later | Lawyer |
+-----+-------+--------------+
| 6 | 8 | Trainer |
+-----+-------+--------------+
It raised error:
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
What I tried:
data1['time']=data1['time'].astype(str)
data2['time']=data2['time'].astype(str)
Did not work. What can I do?
PS: in this example Id are different, but in my data Id can be the same so I need to merge both on Time and Id

Have you tried also casting 'id' column to either str or int?
Sorry but I have not enough reputation for just comment your question.

Related

pandas group by category and assign a bin with pd.cut

I have a dataframe like the following:
+-------+-------+
| Group | Price |
+-------+-------+
| A | 2 |
| B | 3 |
| A | 1 |
| C | 4 |
| B | 2 |
+-------+-------+
I would like to create a column, that would give me the in which range (if I divided each group into 4 intervals) my price value is within each group.
+-------+-------+--------------------------+
| Group | Price | Range |
+-------+-------+--------------------------+
| A | 2 | [1-2] |
| B | 3 | [2-3] |
| A | 1 | [0-1] |
| C | 4 | [0-4] |
| B | 2 | [0-2] |
+-------+-------+--------------------------+
Anyone has any idea by using pandas pd.cut and groupby operations?
Thanks
You can pass pd.cut to groupby():
df['Range'] = df.groupby('Group')['Price'].transform(pd.cut, bins=4)

Fetch values corresponding to id of each row python

Is is possible to fetch column containing values corresponding to an id column?
Example:-
df1
| ID | Value | Salary |
|:---------:--------:|:------:|
| 1 | amr | 34 |
| 1 | ith | 67 |
| 2 | oaa | 45 |
| 1 | eea | 78 |
| 3 | anik | 56 |
| 4 | mmkk | 99 |
| 5 | sh_s | 98 |
| 5 | ahhi | 77 |
df2
| ID | Dept |
|:---------:--------:|
| 1 | hrs |
| 1 | cse |
| 2 | me |
| 1 | ece |
| 3 | eee |
Expected Output
| ID | Dept | Value |
|:---------:--------:|----------:|
| 1 | hrs | amr |
| 1 | cse | ith |
| 2 | me | oaa |
| 1 | ece | eea |
| 3 | eee | anik |
I want to fetch each values in the 'Value' column corresponding to values in df2's ID column. And create column containing 'Values' in df2. The number of rows in the two dfs are not the same. I have tried
this
Not worked
IIUC , you can try df.merge after assigning a helper column by doing groupby+cumcount on ID:
out = (df1.assign(k=df1.groupby("ID").cumcount())
.merge(df2.assign(k=df2.groupby("ID").cumcount()),on=['ID','k'])
.drop("k",1))
print(out)
ID Value Dept
0 1 Amr hrs
1 1 ith cse
2 2 oaa me
3 1 eea ece
4 3 anik eee
is this what you want to do?
df1.merge(df2, how='inner',on ='ID')
Since you have duplicated IDs in both dfs, but these are ordered, try:
df1 = df1.drop(columns="ID")
df3 = df2.merge(df1, left_index=True, right_index=True)

Transform a Pandas dataframe in a pandas with multicolumns

I have the following pandas dataframe, where the column id is the dataframe index
+----+-----------+------------+-----------+------------+
| | price_A | amount_A | price_B | amount_b |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
And I want to convert this dataframe in to a multi column data frame, that looks like this
+----+-----------+------------+-----------+------------+
| | A | B |
+----+-----------+------------+-----------+------------+
| id | price | amount | price | amount |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
I've tried transforming my old pandas dataframe in to a dict this way:
dict = {"A": df[["price_a","amount_a"]], "B":df[["price_b", "amount_b"]]}
df = pd.DataFrame(dict, index=df.index)
But I had no success, how can I do that?
Try renaming columns manually:
df.columns=pd.MultiIndex.from_tuples([x.split('_')[::-1] for x in df.columns])
df.index.name='id'
Output:
A B b
price amount price amount
id
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495
You can split the column names on the underscore and convert to a tuple. Once you map each split column name to a tuple, pandas will convert the Index to a MultiIndex for you. From there we just need to call swaplevel to get the letter level to come first and reassign to the dataframe.
note: in my input dataframe I replaced the column name "amount_b" with "amount_B" because it lined up with your expected output so I assumed it was a typo
df.columns = df.columns.str.split("_", expand=True).swaplevel()
print(df)
A B
price amount price amount
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495

How to apply multiple custom functions on multiple columns in grouped DataFrame in pandas?

I have a pandas DataFrame which is grouped by p_id.
The goal is to get a DataFrame with data shown under 'Output I'm looking for'.
I've tried a few things, but I am struggling applying two custom aggregated functions:
apply(list) for x_id
'||'.join for x_name.
How can I solve this problem?
Input
| p_id | x_id | x_name |
|------|------|--------|
| 1 | 4 | Text |
| 2 | 4 | Text |
| 2 | 5 | Text2 |
| 2 | 6 | Text3 |
| 3 | 4 | Text |
| 3 | 7 | Text4 |
Output I'm looking for
| p_id | x_ids | x_names |
|------|---------|--------------------|
| 1 | [4] | Text |
| 2 | [4,5,6] | Text||Text2||Text3 |
| 3 | [4,7] | Text||Text4 |
You can certainly do:
df.groupby('pid').agg({'x_id':list, 'x_name':'||'.join})
Or a little more advanced with named agg:
df.groupby('pid').agg(x_ids=('x_id',list),
x_names=('x_name', '||'.join))

Joining two dataframes based on the columns of one of them and the row of another

Sorry if the title doesn't make sense, but wasn't sure how eles to explain it. Here's an example of what i'm talking about
df_1
| ID | F\_Name | L\_Name |
|----|---------|---------|
| 0 | | |
| 1 | | |
| 2 | | |
| 3 | | |
df_2
| ID | Name\_Type | Name |
|----|------------|--------|
| 0 | First | Bob |
| 0 | Last | Smith |
| 1 | First | Maria |
| 1 | Last | Garcia |
| 2 | First | Bob |
| 2 | Last | Stoops |
| 3 | First | Joe |
df_3 (result)
| ID | F\_Name | L\_Name |
|----|---------|---------|
| 0 | Bob | Smith |
| 1 | Maria | Garcia |
| 2 | Bob | Stoops |
| 3 | Joe | |
Any and all advice are welcomed! Thank you
I guess that what you want to do is to reshape your second DataFrame to have the same structure of the first one, right?
You can use pivot method to achieve it:
df_3 = df_2.pivot(columns="Name_Type", values="Name")
Then, you can rename the index and the columns:
df_3 = df_3.rename(columns={"First": "F_Name", "Second": "L_Name"})
df_3.columns.name = None
df_3.index.name = "ID"

Categories