Joining two dataframes based on the columns of one of them and the row of another - python

Sorry if the title doesn't make sense, but wasn't sure how eles to explain it. Here's an example of what i'm talking about
df_1
| ID | F\_Name | L\_Name |
|----|---------|---------|
| 0 | | |
| 1 | | |
| 2 | | |
| 3 | | |
df_2
| ID | Name\_Type | Name |
|----|------------|--------|
| 0 | First | Bob |
| 0 | Last | Smith |
| 1 | First | Maria |
| 1 | Last | Garcia |
| 2 | First | Bob |
| 2 | Last | Stoops |
| 3 | First | Joe |
df_3 (result)
| ID | F\_Name | L\_Name |
|----|---------|---------|
| 0 | Bob | Smith |
| 1 | Maria | Garcia |
| 2 | Bob | Stoops |
| 3 | Joe | |
Any and all advice are welcomed! Thank you

I guess that what you want to do is to reshape your second DataFrame to have the same structure of the first one, right?
You can use pivot method to achieve it:
df_3 = df_2.pivot(columns="Name_Type", values="Name")
Then, you can rename the index and the columns:
df_3 = df_3.rename(columns={"First": "F_Name", "Second": "L_Name"})
df_3.columns.name = None
df_3.index.name = "ID"

Related

How to split a column into several columns by taking the string values as column headers?

This is my dataset:
| Name | Dept | Project area/areas interested |
| -------- | -------- |-----------------------------------|
| Joe | Biotech | Cell culture//Bioinfo//Immunology |
| Ann | Biotech | Cell culture |
| Ben | Math | Trigonometry//Algebra |
| Keren | Biotech | Microbio |
| Alice | Physics | Optics |
This is how I want my result:
| Name | Dept |Cell culture|Bioinfo|Immunology|Trigonometry|Algebra|Microbio|Optics|
| -------- | -------- |------------|-------|----------|------------|-------|--------|------|
| Joe | Biotech | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| Ann | Biotech | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| Ben | Math | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| Keren | Biotech | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| Alice | Physics | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Not only do I have to split the last column into the different columns based on the rows - I have to resplit certain column values that are seperated by "//". And the values in the dataframe have to be replaced with 1 or 0 (int).
I've been stuck on this for a while now (-_-;)
You can use pandas.concat in combination with pandas.get_dummies like this:
pd.concat([df[["Name", "Dept"]], df["Project area/areas interested"].str.get_dummies(sep='//')], axis=1)

pandas group by category and assign a bin with pd.cut

I have a dataframe like the following:
+-------+-------+
| Group | Price |
+-------+-------+
| A | 2 |
| B | 3 |
| A | 1 |
| C | 4 |
| B | 2 |
+-------+-------+
I would like to create a column, that would give me the in which range (if I divided each group into 4 intervals) my price value is within each group.
+-------+-------+--------------------------+
| Group | Price | Range |
+-------+-------+--------------------------+
| A | 2 | [1-2] |
| B | 3 | [2-3] |
| A | 1 | [0-1] |
| C | 4 | [0-4] |
| B | 2 | [0-2] |
+-------+-------+--------------------------+
Anyone has any idea by using pandas pd.cut and groupby operations?
Thanks
You can pass pd.cut to groupby():
df['Range'] = df.groupby('Group')['Price'].transform(pd.cut, bins=4)

Transform a Pandas dataframe in a pandas with multicolumns

I have the following pandas dataframe, where the column id is the dataframe index
+----+-----------+------------+-----------+------------+
| | price_A | amount_A | price_B | amount_b |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
And I want to convert this dataframe in to a multi column data frame, that looks like this
+----+-----------+------------+-----------+------------+
| | A | B |
+----+-----------+------------+-----------+------------+
| id | price | amount | price | amount |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
I've tried transforming my old pandas dataframe in to a dict this way:
dict = {"A": df[["price_a","amount_a"]], "B":df[["price_b", "amount_b"]]}
df = pd.DataFrame(dict, index=df.index)
But I had no success, how can I do that?
Try renaming columns manually:
df.columns=pd.MultiIndex.from_tuples([x.split('_')[::-1] for x in df.columns])
df.index.name='id'
Output:
A B b
price amount price amount
id
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495
You can split the column names on the underscore and convert to a tuple. Once you map each split column name to a tuple, pandas will convert the Index to a MultiIndex for you. From there we just need to call swaplevel to get the letter level to come first and reassign to the dataframe.
note: in my input dataframe I replaced the column name "amount_b" with "amount_B" because it lined up with your expected output so I assumed it was a typo
df.columns = df.columns.str.split("_", expand=True).swaplevel()
print(df)
A B
price amount price amount
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495

Value error when merging 2 dataframe with identical number of row

I have a dataframe like this:
+-----+-------+---------+
| id | Time | Name |
+-----+-------+---------+
| 1 | 1 | John |
+-----+-------+---------+
| 2 | 2 | David |
+-----+-------+---------+
| 3 | 4 | Rebecca |
+-----+-------+---------+
| 4 | later | Taylor |
+-----+-------+---------+
| 5 | later | Li |
+-----+-------+---------+
| 6 | 8 | Maria |
+-----+-------+---------+
I want to merge with another table based on 'id' and time:
data1=pd.merge(data1, data2,left_on=['id', 'time'],right_on=['id', 'time'], how='left')
The other table data
+-----+-------+--------------+
| id | Time | Job |
+-----+-------+--------------+
| 2 | 2 | Doctor |
+-----+-------+--------------+
| 1 | 1 | Engineer |
+-----+-------+--------------+
| 4 | later | Receptionist |
+-----+-------+--------------+
| 3 | 4 | Professor |
+-----+-------+--------------+
| 5 | later | Lawyer |
+-----+-------+--------------+
| 6 | 8 | Trainer |
+-----+-------+--------------+
It raised error:
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
What I tried:
data1['time']=data1['time'].astype(str)
data2['time']=data2['time'].astype(str)
Did not work. What can I do?
PS: in this example Id are different, but in my data Id can be the same so I need to merge both on Time and Id
Have you tried also casting 'id' column to either str or int?
Sorry but I have not enough reputation for just comment your question.

Pivot or Transpose a table in Python/Pandas

I have the following data
+------+-------+---------+---------+---------+
| Name | Group | Limit1W | Limit1M | Limit3M |
+------+-------+---------+---------+---------+
| Bob | A | 100 | 50 | 0 |
| Bob | B | 100 | 50 | 0 |
| Alex | A | 20 | 50 | 0 |
| Alex | B | 0 | 0 | 0 |
+------+-------+---------+---------+---------+
I want to get
+------+-------+---------+-------+
| Name | Group | Game | Limit |
+------+-------+---------+-------+
| Bob | A | Limit1W | 100 |
| Bob | A | Limit1M | 50 |
| Bob | A | Limit3M | 0 |
| Bob | B | Limit1W | 100 |
| Bob | B | Limit1M | 50 |
| Bob | B | Limit3M | 0 |
| Alex | A | Limit1W | 20 |
| Alex | A | Limit1M | 50 |
| Alex | A | Limit3M | 0 |
| Alex | B | Limit1W | 0 |
| Alex | B | Limit1M | 0 |
| Alex | B | Limit3M | 0 |
+------+-------+---------+-------+
I have tried using pivot_table but I keep getting:
typeError can only concatentante list (not "tuple") to list, even though the limits only contains numbers

Categories