Some calculations in pandas with the addition of a column - python

I have a table that has a column "Col1" that looks something like this:
| Col1 |
| 2 |
| 2 |
| 4 |
| 4 |
| 4 |
| 4 |
| 3 |
| 3 |
| 3 |
| 3 |
| 3 |
| 3 |
I need to create a new column "Col2". The table after this should look like this:
| Col1 | Col2 |
| 2 | 1 |
| 2 | 2 |
| 4 | 1 |
| 4 | 2 |
| 4 | 3 |
| 4 | 4 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
Is it possible to make so that if I have the same values in a row, the code starts from 1? As for example with 3.
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |

Let's try this pandas solution without looping:
df2 = df.assign(Col2=df.groupby('Col1')['Col1'].cumcount().mod(df['Col1']).add(1))
print(df2)
Output:
Col1 Col2
0 2 1
1 2 2
2 4 1
3 4 2
4 4 3
5 4 4
6 3 1
7 3 2
8 3 3
9 3 1
10 3 2
11 3 3

import pandas as pd
df = pd.DataFrame({'Col1':[2,2,4,4,4,4,3,3,3,3,3,3]})
i = 0
Col2 = []
Col1 = df.Col1
#Construct Col2
while i < (len(Col1)):
Col2.extend(list(range(1,Col1[i]+1)))
i = len(Col2)
#Add Col2 to Dataframe
df['Col2'] = Col2

Related

Pandas Add New Column using Lookup using Multiple Columns from another DataFrame

I have two dataframes.
df1 = pd.DataFrame({
'id':[1,1,1,1,1,1,2,2,2,2,2,2],
'pp':[3,'',2,'',1,0,4, 3, 2, 1, '', 0],
'pc':[6,5,4,3,2,1,6,5,4,3,2,1]
})
| | id | pp | pc |
|---:|-----:|:-----|-----:|
| 0 | 1 | 3 | 6 |
| 1 | 1 | | 5 |
| 2 | 1 | 2 | 4 |
| 3 | 1 | | 3 |
| 4 | 1 | 1 | 2 |
| 5 | 1 | 0 | 1 |
| 6 | 2 | 4 | 6 |
| 7 | 2 | 3 | 5 |
| 8 | 2 | 2 | 4 |
| 9 | 2 | 1 | 3 |
| 10 | 2 | | 2 |
| 11 | 2 | 0 | 1 |
df2 = pd.DataFrame({
'id':[1,1,1,2,2,2],
'pp':['', 3, 4, 1, 2, ''],
'yu':[1,2,3,4,5,6]
})
| | id | pp | yu |
|---:|-----:|:-----|-----:|
| 0 | 1 | | 1 |
| 1 | 1 | 3 | 2 |
| 2 | 1 | 4 | 3 |
| 3 | 2 | 1 | 4 |
| 4 | 2 | 2 | 5 |
| 5 | 2 | | 6 |
I'd like to merge the two so that final results look like this.
| | id | pp | pc | yu |
|---:|-----:|:-----|:-----|-----:|
| 0 | 1 | | | 1 |
| 1 | 1 | 0 | 1 | 2 |
| 2 | 1 | 3 | 6 | 3 |
| 3 | 2 | 1 | 3 | 4 |
| 4 | 2 | 2 | 4 | 5 |
| 5 | 2 | | | 6 |
Basically, the df1 has the value that I need to lookup from.
df2 is the has id and pp column that are used to lookup.
However when I do
pd.merge(df2, df1, on=['id', 'pp'], how='left') results in
| | id | pp | pc | yu |
|---:|-----:|:-----|-----:|-----:|
| 0 | 1 | | 5 | 1 |
| 1 | 1 | | 3 | 1 |
| 2 | 1 | 3 | 6 | 2 |
| 3 | 1 | 4 | nan | 3 |
| 4 | 2 | 1 | 3 | 4 |
| 5 | 2 | 2 | 4 | 5 |
| 6 | 2 | | 2 | 6 |
This is not correct because it looks at empty rows as well.
If the value in df2 is empty, there should be no mapping.
I do want to keep the empty rows in df2 as it showed so can't use inner join
We can dropna for empty row in df1
out = pd.merge(df2, df1.replace({'':np.nan}).dropna(), on=['id', 'pp'], how='left')
Out[121]:
id pp yu pc
0 1 1 NaN
1 1 3 2 6.0
2 1 4 3 NaN
3 2 1 4 3.0
4 2 2 5 4.0
5 2 6 NaN

Multiplying pandas columns based on multiple conditions

I have a df like this
| count | people | A | B | C |
|---------|--------|-----|-----|-----|
| yes | siya | 4 | 2 | 0 |
| no | aish | 4 | 3 | 0 |
| total | | 4 | | 0 |
| yes | dia | 6 | 4 | 0 |
| no | dia | 6 | 2 | 0 |
| total | | 6 | | 0 |
I want a output like below
| count | people | A | B | C |
|---------|--------|-----|-----|-----|
| yes | siya | 4 | 2 | 8 |
| no | aish | 4 | 3 | 0 |
| total | | 4 | | 0 |
| yes | dia | 6 | 4 | 0 |
| no | dia | 6 | 2 | 2 |
| total | | 6 | | 0 |
The goal is calculate column C by mulytiplying A and B only when the count value is "yes" but if the column People values are same that is yes for dia and no for also dia , then we have to calculate for the count value "no"
I tried this much so far
df.C= df.groupby("Host", as_index=False).apply(lambda dfx : df.A *
df.B if (df['count'] == 'no') else df.A *df.B)
But not able to achieve the goal, any idea how can I achieve the output
import numpy as np
#Set Condtions
c1=df.groupby('people')['count'].transform('nunique').eq(1)&df['count'].eq('yes')
c2=df.groupby('people')['count'].transform('nunique').gt(1)&df['count'].eq('no')
#Put conditions in list
c=[c1,c2]
#Mke choices corresponding to condition list
choice=[df['A']*df['B'],len(df[df['count'].eq('no')])]
#Apply np select
df['C']= np.select(c,choice,0)
print(df)
count people A B C
0 yes siya 4 2.0 8.0
1 no aish 4 3.0 0.0
2 total NaN 4 0.0 0.0
3 yes dia 6 4.0 0.0
4 no dia 6 2.0 2.0
5 total NaN 6 NaN 0.0

How to count the occurrence of a value and set that count as a new value for that value's row

Title is probably confusing, but let me make it clearer.
Let's say I have a df like this:
+----+------+---------------+
| Id | Name | reports_to_id |
+----+------+---------------+
| 0 | A | 10 |
| 1 | B | 10 |
| 2 | C | 11 |
| 3 | D | 12 |
| 4 | E | 11 |
| 10 | F | 20 |
| 11 | G | 21 |
| 12 | H | 22 |
+----+------+---------------+
I would want my resulting df to look like this:
+----+------+---------------+-------+
| Id | Name | reports_to_id | Count |
+----+------+---------------+-------+
| 0 | A | 10 | 0 |
| 1 | B | 10 | 0 |
| 2 | C | 11 | 0 |
| 3 | D | 12 | 0 |
| 4 | E | 11 | 0 |
| 10 | F | 20 | 2 |
| 11 | G | 21 | 2 |
| 12 | H | 22 | 1 |
+----+------+---------------+-------+
But this what I currently get as a result of my code (that is wrong):
+----+------+---------------+-------+
| Id | Name | reports_to_id | Count |
+----+------+---------------+-------+
| 0 | A | 10 | 2 |
| 1 | B | 10 | 2 |
| 2 | C | 11 | 2 |
| 3 | D | 12 | 1 |
| 4 | E | 11 | 2 |
| 10 | F | 20 | 0 |
| 11 | G | 21 | 0 |
| 12 | H | 22 | 0 |
+----+------+---------------+-------+
with this code:
df['COUNT'] = df.groupby(['reports_to_id'])['id'].transform('count')
Any suggestions or directions on how to get the result I want? All help is appreciated! and thank you in advance!
Use value_counts to count the reports_to_id by values, then map that to Id:
df['COUNT'] = df['Id'].map(df['reports_to_id'].value_counts()).fillna(0)
Output:
Id Name reports_to_id COUNT
0 0 A 10 0.0
1 1 B 10 0.0
2 2 C 11 0.0
3 3 D 12 0.0
4 4 E 11 0.0
5 10 F 20 2.0
6 11 G 21 2.0
7 12 H 22 1.0
Similar idea with reindex:
df['COUNT'] = df['reports_to_id'].value_counts().reindex(df['Id'], fill_value=0).values
which gives a better looking COUNT:
Id Name reports_to_id COUNT
0 0 A 10 0
1 1 B 10 0
2 2 C 11 0
3 3 D 12 0
4 4 E 11 0
5 10 F 20 2
6 11 G 21 2
7 12 H 22 1
You can try the following:
l=list[df['reports_to_id']
df['Count']=df['Id'].apply(lambda x: l.count(x))

Select a row based on column value and its previous 2 rows

+---+---+---+---+----+
| A | B | C | D | E |
+---+---+---+---+----+
| 1 | 2 | 3 | 4 | VK |
| 1 | 4 | 6 | 9 | MD |
| 2 | 5 | 7 | 9 | V |
| 2 | 3 | 5 | 8 | VK |
| 2 | 3 | 7 | 9 | V |
| 1 | 1 | 1 | 1 | N |
| 0 | 1 | 6 | 9 | V |
| 1 | 2 | 5 | 7 | VK |
| 1 | 7 | 8 | 0 | MD |
| 1 | 5 | 7 | 9 | VK |
| 0 | 1 | 6 | 8 | V |
+---+---+---+---+----+
i want to select a row based on column value and its two previous rows. For example in the given dataset (on the picture) I want to select row based on 'E' column value 'VK' and two previous rows of that selected row. So we should get a dataset like this:
+---+---+---+---+----+
| A | B | C | D | E |
+---+---+---+---+----+
| 1 | 2 | 3 | 4 | VK |
| 1 | 4 | 6 | 9 | MD |
| 2 | 5 | 7 | 9 | V |
| 2 | 3 | 5 | 8 | VK |
| 2 | 3 | 7 | 9 | V |
| 1 | 1 | 1 | 1 | N |
| 1 | 2 | 5 | 7 | VK |
| 1 | 7 | 8 | 0 | MD |
| 1 | 5 | 7 | 9 | VK |
+---+---+---+---+----+
1st we need filter the dataframe until the last VK, then create the groupkey with cumsum , then do groupby head
df=df.loc[:df.E.eq('VK').loc[lambda x : x].index.max()]
df=df.iloc[::-1].groupby(df.E.eq('VK').iloc[::-1].cumsum()).head(3).sort_index()
df
Out[102]:
A B C D E
0 1 2 3 4 VK
1 1 4 6 9 MD
2 2 5 7 9 V
3 2 3 5 8 VK
5 1 1 1 1 N
6 0 1 6 9 V
7 1 2 5 7 VK
8 1 7 8 0 MD
9 1 5 7 9 VK

Use the other columns value if a condition is met Panda

Assuming I have the following table:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 9 | 0 |
| 2 | 1 | 3 |
| -8 | 8 | 2 |
| 2 | 1 | 9 |
+----+---+---+
if column A's value is Negative, update column B's value by the value of column C. if not do nothing
This is the desired output:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 0 | 0 |
| 2 | 1 | 3 |
| -8 | 2 | 2 |
| 2 | 1 | 9 |
+----+---+---+
I've been trying the following code but it's not working
#not working
result.loc(result["A"] < 0,result['B'] = result['C'].iloc[0])
result.B[result.A < 0] = result.C
Try this:
df.loc[df['A'] < 0, 'B'] = df['C']

Categories