Adding a row to an existing Pivot table - python

I have the below pivot table:
Fruit | | apple | orange | banana
Market | # num bracket | | |
:-----------------------------------------------------------:
X | 100 | 1.2 | 1.0 | NaN
Y | 50 | 2.0 | 3.5 | NaN
Y | 100 | NaN | 3.6 | NaN
Z | 50 | NaN | NaN | 1.6
Z | 100 | NaN | NaN | 1.3
I want to add in the below row at the bottom
Fruit | apple | orange | banana
Price | 3.5 | 1.2 | 2
So the new table looks like the below
Fruit | x | apple | orange | banana
Market | # num bracket | | |
:-----------------------------------------------------------:
X | 100 | 1.2 | 1.0 | NaN
Y | 50 | 2.0 | 3.5 | NaN
Y | 100 | NaN | 3.6 | NaN
Z | 50 | NaN | NaN | 1.6
Z | 100 | NaN | NaN | 1.3
Price | | 3.5 | 1.2 | 2
Does any one have a quick in easy recommendation on how to do this?

temp_df = pd.DataFrame(data=[{'Fruit Market':'Price',
'apple':3.5,
'orange':1.2
'banana':2}],
columns=['Fruit Market','x','apple','orange','banana'])
pd.concat([df, temp_df], axis=0, ignore_index=True)

Related

sumif and countif on Python for multiple columns , On row level and not column level

I'm trying to figure a way to do:
COUNTIF(Col2,Col4,Col6,Col8,Col10,Col12,Col14,Col16,Col18,">=0.05")
SUMIF(Col2,Col4,Col6,Col8,Col10,Col12,Col14,Col16,Col18,">=0.05")
My attempt:
import pandas as pd
df=pd.read_excel(r'C:\\Users\\Downloads\\Prepped.xls') #Please use: https://github.com/BeboGhattas/temp-repo/blob/main/Prepped.xls
df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float) #changing dtype to float
#unconditional sum
df['sum']=df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float).sum(axis=1)
whatever goes below won't work
#sum if
df['greater-than-0.05']=df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float).sum([c for c in col if c >= 0.05])
| | # | word | B64684807 | B64684807Measure | B649845471 | B649845471Measure | B83344143 | B83344143Measure | B67400624 | B67400624Measure | B85229235 | B85229235Measure | B85630406 | B85630406Measure | B82615898 | B82615898Measure | B87558236 | B87558236Measure | B00000009 | B00000009Measure | 有效竞品数 | 关键词抓取时间 | 搜索量排名 | 月搜索量 | 在售商品数 | 竞争度 |
|---:|----:|:--------|------------:|:-------------------|-------------:|:-------------------------|------------:|:-------------------------|------------:|:-------------------|------------:|:-------------------|------------:|:-------------------|------------:|:-------------------|------------:|-------------------:|------------:|:-------------------|-------------:|:--------------------|-------------:|-----------:|-------------:|---------:|
| 0 | 1 | word 1 | 0.055639 | [主要流量词] | 0.049416 | nan | 0.072298 | [精准流量词, 主要流量词] | 0.00211 | nan | 0.004251 | nan | 0.007254 | nan | 0.074409 | [主要流量词] | 0.033597 | nan | 0.000892 | nan | 9 | 2022-10-06 00:53:56 | 5726 | 326188 | 3810 | 0.01 |
| 1 | 2 | word 2 | 0.045098 | nan | 0.005472 | nan | 0.010791 | nan | 0.072859 | [主要流量词] | 0.003423 | nan | 0.012464 | nan | 0.027396 | nan | 0.002825 | nan | 0.060989 | [主要流量词] | 9 | 2022-10-07 01:16:21 | 9280 | 213477 | 40187 | 0.19 |
| 2 | 3 | word 3 | 0.02186 | nan | 0.05039 | [主要流量词] | 0.007842 | nan | 0.028832 | nan | 0.044385 | [精准流量词] | 0.001135 | nan | 0.003866 | nan | 0.021035 | nan | 0.017202 | nan | 9 | 2022-10-07 00:28:31 | 24024 | 81991 | 2275 | 0.03 |
| 3 | 4 | word 4 | 0.000699 | nan | 0.01038 | nan | 0.001536 | nan | 0.021512 | nan | 0.007658 | nan | 5e-05 | nan | 0.048682 | nan | 0.001524 | nan | 0.000118 | nan | 9 | 2022-10-07 00:52:12 | 34975 | 53291 | 30970 | 0.58 |
| 4 | 5 | word 5 | 0.00984 | nan | 0.030248 | nan | 0.003006 | nan | 0.014027 | nan | 0.00904 | [精准流量词] | 0.000348 | nan | 0.000414 | nan | 0.006721 | nan | 0.00153 | nan | 9 | 2022-10-07 02:36:05 | 43075 | 41336 | 2230 | 0.05 |
| 5 | 6 | word 6 | 0.010029 | [精准流量词] | 0.120739 | [精准流量词, 主要流量词] | 0.014359 | nan | 0.002796 | nan | 0.002883 | nan | 0.028747 | [精准流量词] | 0.007022 | nan | 0.017803 | nan | 0.001998 | nan | 9 | 2022-10-07 00:44:51 | 49361 | 34791 | 517 | 0.01 |
| 6 | 7 | word 7 | 0.002735 | nan | 0.002005 | nan | 0.005355 | nan | 6.3e-05 | nan | 0.000772 | nan | 0.000237 | nan | 0.015149 | nan | 2.1e-05 | nan | 2.3e-05 | nan | 9 | 2022-10-07 09:48:20 | 53703 | 31188 | 511 | 0.02 |
| 7 | 8 | word 8 | 0.003286 | [精准流量词] | 0.058161 | [主要流量词] | 0.013681 | [精准流量词] | 0.000748 | [精准流量词] | 0.002684 | [精准流量词] | 0.013916 | [精准流量词] | 0.029376 | nan | 0.019792 | nan | 0.005602 | nan | 9 | 2022-10-06 01:51:53 | 58664 | 27751 | 625 | 0.02 |
| 8 | 9 | word 9 | 0.004273 | [精准流量词] | 0.025581 | [精准流量词] | 0.014784 | [精准流量词] | 0.00321 | [精准流量词] | 0.000892 | nan | 0.00223 | nan | 0.005315 | nan | 0.02211 | nan | 0.027008 | [精准流量词] | 9 | 2022-10-07 01:34:28 | 73640 | 20326 | 279 | 0.01 |
| 9 | 10 | word 10 | 0.002341 | [精准流量词] | 0.029604 | nan | 0.007817 | [精准流量词] | 0.000515 | [精准流量词] | 0.001865 | [精准流量词] | 0.010128 | [精准流量词] | 0.015378 | nan | 0.019677 | nan | 0.003673 | nan | 9 | 2022-10-07 01:17:44 | 80919 | 17779 | 207 | 0.01 |
So my question is,
How can i do the sumif and countif on the exact table (Should use col2,col4... etc, because every file will have the same format but different header, so using df['B64684807'] isn't helpful )
Sample file can be found at:
https://github.com/BeboGhattas/temp-repo/blob/main/Prepped.xls
IIUC, you can use a boolean mask:
df2 = df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float)
m = df2.ge(0.05)
df['countif'] = m.sum(axis=1)
df['sumif'] = df2.where(m).sum(axis=1)
output (last 3 columns only):
sum countif sumif
0 0.299866 3 0.202346
1 0.241317 2 0.133848
2 0.196547 1 0.050390
3 0.092159 0 0.000000
4 0.075174 0 0.000000
5 0.206376 1 0.120739
6 0.026360 0 0.000000
7 0.147246 1 0.058161
8 0.105403 0 0.000000
9 0.090998 0 0.000000

Match and store based on two columns in python dataframe

I'm working on a project that requires me to match two data frames based on two separate columns, X and Y.
e.g.
df1 =
| X | Y | AGE |
|:--- |:---:|----:|
| 2.0 | 1.5 | 25 |
| 1.0 | 0.5 | 29 |
| 1.5 | 0.5 | 21 |
| 2.0 | 2.0 | 32 |
| 0.0 | 1.5 | 19 |
df2 =
| X | Y | AGE |
|:--- |:---:|----:|
| 0.0 | 0.0 | [] |
| 0.0 | 0.5 | [] |
| 0.0 | 1.0 | [] |
| 0.0 | 1.5 | [] |
| 0.0 | 2.0 | [] |
| 0.5 | 0.0 | [] |
| 0.5 | 0.5 | [] |
| 0.5 | 1.0 | [] |
| 0.5 | 1.5 | [] |
| 0.5 | 2.0 | [] |
| 1.0 | 0.0 | [] |
| 1.0 | 0.5 | [] |
| 1.0 | 1.0 | [] |
| 1.0 | 1.5 | [] |
| 1.0 | 2.0 | [] |
| 1.5 | 0.0 | [] |
| 1.5 | 0.5 | [] |
| 1.5 | 1.0 | [] |
| 1.5 | 1.5 | [] |
| 1.5 | 2.0 | [] |
| 2.0 | 0.0 | [] |
| 2.0 | 0.5 | [] |
| 2.0 | 1.0 | [] |
| 2.0 | 1.5 | [] |
| 2.0 | 2.0 | [] |
The goal is to sort through df1, find the row with its matching coordinates in df2, and then store the AGE value from df1 in the AGE list in df2. The expected output would be:
df2 =
| X | Y | AGE |
|:--- |:---:|----:|
| 0.0 | 0.0 | [] |
| 0.0 | 0.5 | [] |
| 0.0 | 1.0 | [] |
| 0.0 | 1.5 |[19] |
| 0.0 | 2.0 | [] |
| 0.5 | 0.0 | [] |
| 0.5 | 0.5 | [] |
| 0.5 | 1.0 | [] |
| 0.5 | 1.5 | [] |
| 0.5 | 2.0 | [] |
| 1.0 | 0.0 | [] |
| 1.0 | 0.5 |[29] |
| 1.0 | 1.0 | [] |
| 1.0 | 1.5 | [] |
| 1.0 | 2.0 | [] |
| 1.5 | 0.0 | [] |
| 1.5 | 0.5 |[21] |
| 1.5 | 1.0 | [] |
| 1.5 | 1.5 | [] |
| 1.5 | 2.0 | [] |
| 2.0 | 0.0 | [] |
| 2.0 | 0.5 | [] |
| 2.0 | 1.0 | [] |
| 2.0 | 1.5 |[25] |
| 2.0 | 2.0 |[32] |
The code I have so far is:
for n in df1:
if df1["X"].values[n] == df2["X"].values[n]:
for m in df1:
if df1["Y"].values[m]) == df2["Y"].values[m]:
df2['AGE'].push(df1['AGE'])
This is a merge operation that can be solved by not considering the Age column from df2, and merging left on ['X','Y']:
df2 = df2[['X','Y']].merge(df1,on=['X','Y'],how='left')
To store age in lists:
df2 = df2[['X','Y']].merge(df1, on=['X','Y'], how='left')
df2['AGE'] = df2.apply(lambda row: [row['AGE']] if not pd.isnull(row['AGE']) else [], axis=1)

Python Pandas - Merge specific cells

I have this massive dataframe which has 3 different columns of values under each one heading.
As an example, first it looked something like this:
| | 0 | 1 | 2 | 3 | ..
| 0 | a | 7.3 | 9.1 | NaN | ..
| 1 | b | 2.51 | 4.8 | 6.33 | ..
| 2 | c | NaN | NaN | NaN | ..
| 3 | d | NaN | 3.73 | NaN | ..
1, 2 and 3 all belong together. For simplicity of the program I used integers for the dataframe index and columns.
But now that it finished calculating stuff, I changed the columns to the appropriate string.
| | 0 | Heading 1 | Heading 1 | Heading 1 | ..
| 0 | a | 7.3 | 9.1 | NaN | ..
| 1 | b | 2.51 | 4.8 | 6.33 | ..
| 2 | c | NaN | NaN | NaN | ..
| 3 | d | NaN | 3.73 | NaN | ..
Everything runs perfectly smooth up until this point, but here's where I'm stuck.
All I wanna do is merge the 3 "Heading 1" into one giant cell, so that it looks something like this:
| | 0 | Heading 1 | ..
| 0 | a | 7.3 | 9.1 | NaN | ..
| 1 | b | 2.51 | 4.8 | 6.33 | ..
| 2 | c | NaN | NaN | NaN | ..
| 3 | d | NaN | 3.73 | NaN | ..
But everything I find online is merging the entire column, values included.
I'd really appreciate if someone could help me out here!

Split a column into multiple columns with condition

I have a question about splitting columns into multiple rows at Pandas with conditions.
For example, I tend to do something as follows but takes a very long time using for loop
| Index | Value |
| ----- | ----- |
| 0 | 1 |
| 1 | 1,3 |
| 2 | 4,6,8 |
| 3 | 1,3 |
| 4 | 2,7,9 |
into
| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| ----- | - | - | - | - | - | - | - | - | - |
| 0 | 1 | | | | | | | | |
| 1 | 1 | | 3 | | | | | | |
| 2 | | | | 4 | | 6 | | 8 | |
| 3 | 1 | | 3 | | | | | | |
| 4 | | 2 | | | | | 7 | | 9 |
I wonder if there are any packages that can help this out rather than to write a for loop to map all indexes.
Assuming the "Value" column contains strings, you can use str.split and pivot like so:
value = df["Value"].str.split(",").explode().astype(int).reset_index()
output = value.pivot(index="index", columns="Value", values="Value")
output = output.reindex(range(value["Value"].min(), value["Value"].max()+1), axis=1)
>>> output
Value 1 2 3 4 5 6 7 8 9
index
0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
1 1.0 NaN 3.0 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN 4.0 NaN 6.0 NaN 8.0 NaN
3 1.0 NaN 3.0 NaN NaN NaN NaN NaN NaN
4 NaN 2.0 NaN NaN NaN NaN 7.0 NaN 9.0
Input df:
df = pd.DataFrame({"Value": ["1", "1,3", "4,6,8", "1,3", "2,7,9"]})

combine data frames of different sizes and replacing values

I am having 2 dataframes of different size. I am looking to join the dataframes and want to replace the Nan values after combining both the dataframes and replacing the the Nan values with lower size dataframe.
dataframe1:-
| symbol| value1 | value2 | Occurance |
|=======|========|========|===========|
2020-07-31 | A | 193.5 | 186.05 | 3 |
2020-07-17 | A | 372.5 | 359.55 | 2 |
2020-07-21 | A | 387.8 | 382.00 | 1 |
dataframe2:-
| x | y | z | symbol|
|=====|=====|=====|=======|
2020-10-01 |448.5|453.0|443.8| A |
I tried concatenating and replacing the Nan values with values of dataframe2 value.
I tried df1 =pd.concat([dataframe2,dataframe1],axis=1). The result is given below but i am looking for result as in result desired. How can i achieve that desired result.
Result given:-
| X | Y | Z | symbol|symbol| value1| value2 | Occurance|
|====== | ====|=====|=======|======|=======| =======| =========|
2020-07-31|NaN |NaN | NaN | NaN | A |193.5 | 186.05 | 3 |
2021-05-17| NaN | NaN | NaN | NaN | A |372.5 | 359.55 | 2 |
2021-05-21| NaN | NaN | NaN | NaN | A |387.8 | 382.00 | 1 |
2020-10-01| 448.5 |453.0|443.8| A |NaN | NaN | NaN | NaN |
Result Desired:-
| X | Y | Z | symbol|symbol| value1| value2 | Occurance|
| ===== | ======| ====| ======| =====|=======|========|==========|
2020-10-01| 448.5 |453.0 |443.8| A | A |193.5 | 186.05 | 3 |
2020-10-01| 448.5 |453.0 |443.8| A | A |372.5 | 359.55 | 2 |
2020-10-01| 448.5 |453.0 |443.8| A | A |387.8 | 382.00 | 1 |
2020-10-01| 448.5 |453.0 |443.8| A |NaN | NaN | NaN | NaN |
Please note the datatime needs to be same in the Result Desired. In short replicating the single line of dataframe2 to NaN values of dataframe1. a solution avoiding For loop would be great.
Could you try to sort your dataframe by the index to check how the output would be ?
df1.sort_index()

Categories