Transform a Pandas dataframe in a pandas with multicolumns - python

I have the following pandas dataframe, where the column id is the dataframe index
+----+-----------+------------+-----------+------------+
| | price_A | amount_A | price_B | amount_b |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
And I want to convert this dataframe in to a multi column data frame, that looks like this
+----+-----------+------------+-----------+------------+
| | A | B |
+----+-----------+------------+-----------+------------+
| id | price | amount | price | amount |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
I've tried transforming my old pandas dataframe in to a dict this way:
dict = {"A": df[["price_a","amount_a"]], "B":df[["price_b", "amount_b"]]}
df = pd.DataFrame(dict, index=df.index)
But I had no success, how can I do that?

Try renaming columns manually:
df.columns=pd.MultiIndex.from_tuples([x.split('_')[::-1] for x in df.columns])
df.index.name='id'
Output:
A B b
price amount price amount
id
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495

You can split the column names on the underscore and convert to a tuple. Once you map each split column name to a tuple, pandas will convert the Index to a MultiIndex for you. From there we just need to call swaplevel to get the letter level to come first and reassign to the dataframe.
note: in my input dataframe I replaced the column name "amount_b" with "amount_B" because it lined up with your expected output so I assumed it was a typo
df.columns = df.columns.str.split("_", expand=True).swaplevel()
print(df)
A B
price amount price amount
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495

Related

merge / concat two dataframes on column values and drop subsequent rows from the resulting dataframe

I have 2 data frames
df1
| email | ack |
| -------- | -------------- |
| first#abc.com | 1 |
| second#abc.com | 1 |
| third#abc.com | 1 |
| fourth#abc.com | 1 |
| fifth#abc.com | 1 |
| sixth#abc.com | 1 |
| seventh#abc.com | 1 |
| eight#abc.com | 1 |
df2
| email | ack |name| date|
| -------- | -------------- |-------------- |-------------- |
|first#abc.com | 0 |abc | 01/01/2022 |
| second#abc.com | 0 |xyz | 01/02/2022 |
| third#abc.com | 0 |mno | 01/03/2022 |
| fourth#abc.com | 0 |pqr | 01/04/2022 |
| fifth#abc.com | 0 |adam| 01/05/2022 |
| sixth#abc.com | 0 |eve |01/06/2022|
| seventh#abc.com | 0 |mary|01/07/2022|
| eight#abc.com | 0 |john|01/08/2022|
| nine#abc.com | 0 |kate|01/09/2022|
| ten#abc.com | 0 |matt|01/10/2022|
How do i merge the above two dataframes so as to replace the values in 'ack' column of df2 wherever applicable i.e., on email address.
result
df2
| email | ack |name| date|
| -------- | -------------- |-------------- |-------------- |
|first#abc.com | 1 |abc|01/01/2022|
| second#abc.com | 1 |xyz|01/02/2022|
| third#abc.com | 1 |mno|01/03/2022|
| fourth#abc.com | 1 |pqr|01/04/2022|
| fifth#abc.com | 1 |adam|01/05/2022|
| sixth#abc.com | 1 |eve|01/06/2022|
| seventh#abc.com | 1 |mary|01/07/2022|
| eight#abc.com | 1 |john|01/08/2022|
| nine#abc.com | 0 |kate|01/09/2022|
| ten#abc.com | 0 |matt|01/10/2022|
I tried left join and outer join, it appended rows to existing rows.
Assuming df1['ack'] is always 1, the following code should work:
df2.loc[df2['email'].isin(df1['email']), 'ack'] = 1
In English:
If df2['email'] is found in df1['email'], set df2['ack'] = 1

pandas group by category and assign a bin with pd.cut

I have a dataframe like the following:
+-------+-------+
| Group | Price |
+-------+-------+
| A | 2 |
| B | 3 |
| A | 1 |
| C | 4 |
| B | 2 |
+-------+-------+
I would like to create a column, that would give me the in which range (if I divided each group into 4 intervals) my price value is within each group.
+-------+-------+--------------------------+
| Group | Price | Range |
+-------+-------+--------------------------+
| A | 2 | [1-2] |
| B | 3 | [2-3] |
| A | 1 | [0-1] |
| C | 4 | [0-4] |
| B | 2 | [0-2] |
+-------+-------+--------------------------+
Anyone has any idea by using pandas pd.cut and groupby operations?
Thanks
You can pass pd.cut to groupby():
df['Range'] = df.groupby('Group')['Price'].transform(pd.cut, bins=4)

Python Pivot Table based on multiple criteria

I was asking the question in this link SUMIFS in python jupyter
However, I just realized that the solution didn't work because they can switch in and switch out on different dates. So basically they have to switch out first before they can switch in.
Here is the dataframe (sorted based on the date):
+---------------+--------+---------+-----------+--------+
| Switch In/Out | Client | Quality | Date | Amount |
+---------------+--------+---------+-----------+--------+
| Out | 1 | B | 15-Aug-19 | 360 |
| In | 1 | A | 16-Aug-19 | 180 |
| In | 1 | B | 17-Aug-19 | 180 |
| Out | 1 | A | 18-Aug-19 | 140 |
| In | 1 | B | 18-Aug-19 | 80 |
| In | 1 | A | 19-Aug-19 | 60 |
| Out | 2 | B | 14-Aug-19 | 45 |
| Out | 2 | C | 15-Aug-20 | 85 |
| In | 2 | C | 15-Aug-20 | 130 |
| Out | 2 | A | 20-Aug-19 | 100 |
| In | 2 | A | 22-Aug-19 | 30 |
| In | 2 | B | 23-Aug-19 | 30 |
| In | 2 | C | 23-Aug-19 | 40 |
+---------------+--------+---------+-----------+--------+
I would then create a new column and divide them into different transactions.
+---------------+--------+---------+-----------+--------+------+
| Switch In/Out | Client | Quality | Date | Amount | Rows |
+---------------+--------+---------+-----------+--------+------+
| Out | 1 | B | 15-Aug-19 | 360 | 1 |
| In | 1 | A | 16-Aug-19 | 180 | 1 |
| In | 1 | B | 17-Aug-19 | 180 | 1 |
| Out | 1 | A | 18-Aug-19 | 140 | 2 |
| In | 1 | B | 18-Aug-19 | 80 | 2 |
| In | 1 | A | 19-Aug-19 | 60 | 2 |
| Out | 2 | B | 14-Aug-19 | 45 | 3 |
| Out | 2 | C | 15-Aug-20 | 85 | 3 |
| In | 2 | C | 15-Aug-20 | 130 | 3 |
| Out | 2 | A | 20-Aug-19 | 100 | 4 |
| In | 2 | A | 22-Aug-19 | 30 | 4 |
| In | 2 | B | 23-Aug-19 | 30 | 4 |
| In | 2 | C | 23-Aug-19 | 40 | 4 |
+---------------+--------+---------+-----------+--------+------+
With this, I can apply the pivot formula and take it from there.
However, how do I do this in python? In excel, I can just use multiple SUMIFS and compare in and out. However, this is not possible in python.
Thank you!
One simple solution is to iterate and apply a check (function) over each element being the result a new column, so: map.
Using df.index.map we get the index for each item to pass as a argument, so we can play with the values, get and compare. In your case your aim is to identify the change to "Out" keeping a counter.
import pandas as pd
switchInOut = ["Out", "In", "In", "Out", "In", "In",
"Out", "Out", "In", "Out", "In", "In", "In"]
df = pd.DataFrame(switchInOut, columns=['Switch In/Out'])
counter = 1
def changeToOut(i):
global counter
if df["Switch In/Out"].get(i) == "Out" and df["Switch In/Out"].get(i-1) == "In":
counter += 1
return counter
rows = df.index.map(changeToOut)
df["Rows"] = rows
df
Result:
+----+-----------------+--------+
| | Switch In/Out | Rows |
|----+-----------------+--------|
| 0 | Out | 1 |
| 1 | In | 1 |
| 2 | In | 1 |
| 3 | Out | 2 |
| 4 | In | 2 |
| 5 | In | 2 |
| 6 | Out | 3 |
| 7 | Out | 3 |
| 8 | In | 3 |
| 9 | Out | 4 |
| 10 | In | 4 |
| 11 | In | 4 |
| 12 | In | 4 |
+----+-----------------+--------+

Joining two dataframes based on the columns of one of them and the row of another

Sorry if the title doesn't make sense, but wasn't sure how eles to explain it. Here's an example of what i'm talking about
df_1
| ID | F\_Name | L\_Name |
|----|---------|---------|
| 0 | | |
| 1 | | |
| 2 | | |
| 3 | | |
df_2
| ID | Name\_Type | Name |
|----|------------|--------|
| 0 | First | Bob |
| 0 | Last | Smith |
| 1 | First | Maria |
| 1 | Last | Garcia |
| 2 | First | Bob |
| 2 | Last | Stoops |
| 3 | First | Joe |
df_3 (result)
| ID | F\_Name | L\_Name |
|----|---------|---------|
| 0 | Bob | Smith |
| 1 | Maria | Garcia |
| 2 | Bob | Stoops |
| 3 | Joe | |
Any and all advice are welcomed! Thank you
I guess that what you want to do is to reshape your second DataFrame to have the same structure of the first one, right?
You can use pivot method to achieve it:
df_3 = df_2.pivot(columns="Name_Type", values="Name")
Then, you can rename the index and the columns:
df_3 = df_3.rename(columns={"First": "F_Name", "Second": "L_Name"})
df_3.columns.name = None
df_3.index.name = "ID"

How to create query that's update columns value using sql server or pandas function even

Now I have a table something like the below table:
esn_missing_in_DF_umts
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| cell_name | n_cell_name | source_vendor | target_vendor | source_rnc | target_rnc |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 1 | 8 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 2 | 5 | x | x | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 3 | 6 | x | x | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 4 | 9 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 5 | 10 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 6 | 11 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 7 | 12 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
Now I have two columns are empty in sqlServer or dataframe the source_rnc and the target_rnc:
Here's the other two tables I want to update the two columns from
esn_umts_intra_sho
|---------------------|------------------|------------------|
| ucell | urelation | ucell_rnc |
|---------------------|------------------|------------------|
| 13 | 5 | abc567 |
|---------------------|------------------|------------------|
| 8 | 6 | abc568 |
|---------------------|------------------|------------------|
| 14 | 8 | abc569 |
|---------------------|------------------|------------------|
| 7 | 9 | abc570 |
|---------------------|------------------|------------------|
| 16 | 10 | abc571 |
|---------------------|------------------|------------------|
| 5 | 11 | abc572 |
|---------------------|------------------|------------------|
| 17 | 12 | abc573 |
|---------------------|------------------|------------------|
| 10 | 9 | abc574 |
|---------------------|------------------|------------------|
| 9 | 17 | abc575 |
|---------------------|------------------|------------------|
| 12 | 11 | abc576 |
|---------------------|------------------|------------------|
| 11 | 12 | abc577 |
|---------------------|------------------|------------------|
df_umts_carrier
|---------------------|------------------|
| cell_name_umts | rnc |
|---------------------|------------------|
| 1 | xyz123 |
|---------------------|------------------|
| 2 | xyz124 |
|---------------------|------------------|
| 3 | xyz125 |
|---------------------|------------------|
| 4 | xyz126 |
|---------------------|------------------|
| 5 | xyz127 |
|---------------------|------------------|
| 6 | xyz128 |
|---------------------|------------------|
| 7 | xyz129 |
|---------------------|------------------|
So Not I want to update the source_rnc and target_rnc through those two tables esn_umts_intra_sho and df_umts_carrier
So I imagine that the query could be like this
UPDATE [toolDB].[dbo].[esn_missing_in_DF_umts]
SET [toolDB].[dbo].[esn_missing_in_DF_umts].[target_rnc] = CASE WHEN [toolDB].[dbo].[esn_missing_in_DF_umts].[target_vendor] = 'HUA' THEN [toolDB].[dbo].[df_umts_carrier].[rnc]
FROM [toolDB].[dbo].[esn_missing_in_DF_umts]
INNER JOIN [toolDB].[dbo].[df_umts_carrier]
ON [n_cell_name] = [cell_name_umts]
ELSE
UPDATE [toolDB].[dbo].[esn_missing_in_DF_umts]
SET [toolDB].[dbo].[esn_missing_in_DF_umts].[target_rnc] = [toolDB].[dbo].[esn_umts_intra_sho].[ucell_rnc]
From [toolDB].[dbo].[esn_missing_in_DF_umts] INNER JOIN [toolDB].[dbo].[esn_umts_intra_sho]
ON [n_cell_name] = [ucell]
I want the final output to be somthing like this:
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| cell_name | n_cell_name | source_vendor | target_vendor | source_rnc | target_rnc |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 1 | 8 | x | y | xyz123 | abc568 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 2 | 5 | x | x | xyz124 | xyz127 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 3 | 6 | x | x | xyz125 | xyz128 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 4 | 9 | x | y | xyz126 | abc575 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 5 | 10 | x | y | xyz127 | abc574 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 6 | 11 | x | y | xyz128 | abc576 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 7 | 12 | x | y | xyz129 | abc577 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
I tried even with pandas but doesn't work...
I wish someone help me.
The best thing is to make the query as if you were writing a SELECT statement with the Case clause in it. Once it works as expected, you can amend it for your update.
So in this example, if the main tables Column = bla, then get the data from the first joined table, else the other table.
Quick amendment Make sure its all rows you are happy to update, else remember to put in a where statement. That's why its best to work out your logic in a SELECT and move on from there.
I think you want something like this:
UPDATE [toolDB].[dbo].[esn_missing_in_DF_umts]
SET [toolDB].[dbo].[esn_missing_in_DF_umts].[target_rnc] = (CASE WHEN UMT.target_vendor = 'HUA' THEN carrier.rnc ELSE SHO.ucell_rnc END )
FROM [toolDB].[dbo].[esn_missing_in_DF_umts] UMT
LEFT JOIN [toolDB].[dbo].[df_umts_carrier] carrier ON UMT.n_cell_name = carrier.cell_name_umts
LEFT JOIN [toolDB].[dbo].[esn_umts_intra_sho] SHO ON UMT.n_cell_name = SHO.ucell

Categories