I have two dataframes.
df1 = pd.DataFrame({
'id':[1,1,1,1,1,1,2,2,2,2,2,2],
'pp':[3,'',2,'',1,0,4, 3, 2, 1, '', 0],
'pc':[6,5,4,3,2,1,6,5,4,3,2,1]
})
| | id | pp | pc |
|---:|-----:|:-----|-----:|
| 0 | 1 | 3 | 6 |
| 1 | 1 | | 5 |
| 2 | 1 | 2 | 4 |
| 3 | 1 | | 3 |
| 4 | 1 | 1 | 2 |
| 5 | 1 | 0 | 1 |
| 6 | 2 | 4 | 6 |
| 7 | 2 | 3 | 5 |
| 8 | 2 | 2 | 4 |
| 9 | 2 | 1 | 3 |
| 10 | 2 | | 2 |
| 11 | 2 | 0 | 1 |
df2 = pd.DataFrame({
'id':[1,1,1,2,2,2],
'pp':['', 3, 4, 1, 2, ''],
'yu':[1,2,3,4,5,6]
})
| | id | pp | yu |
|---:|-----:|:-----|-----:|
| 0 | 1 | | 1 |
| 1 | 1 | 3 | 2 |
| 2 | 1 | 4 | 3 |
| 3 | 2 | 1 | 4 |
| 4 | 2 | 2 | 5 |
| 5 | 2 | | 6 |
I'd like to merge the two so that final results look like this.
| | id | pp | pc | yu |
|---:|-----:|:-----|:-----|-----:|
| 0 | 1 | | | 1 |
| 1 | 1 | 0 | 1 | 2 |
| 2 | 1 | 3 | 6 | 3 |
| 3 | 2 | 1 | 3 | 4 |
| 4 | 2 | 2 | 4 | 5 |
| 5 | 2 | | | 6 |
Basically, the df1 has the value that I need to lookup from.
df2 is the has id and pp column that are used to lookup.
However when I do
pd.merge(df2, df1, on=['id', 'pp'], how='left') results in
| | id | pp | pc | yu |
|---:|-----:|:-----|-----:|-----:|
| 0 | 1 | | 5 | 1 |
| 1 | 1 | | 3 | 1 |
| 2 | 1 | 3 | 6 | 2 |
| 3 | 1 | 4 | nan | 3 |
| 4 | 2 | 1 | 3 | 4 |
| 5 | 2 | 2 | 4 | 5 |
| 6 | 2 | | 2 | 6 |
This is not correct because it looks at empty rows as well.
If the value in df2 is empty, there should be no mapping.
I do want to keep the empty rows in df2 as it showed so can't use inner join
We can dropna for empty row in df1
out = pd.merge(df2, df1.replace({'':np.nan}).dropna(), on=['id', 'pp'], how='left')
Out[121]:
id pp yu pc
0 1 1 NaN
1 1 3 2 6.0
2 1 4 3 NaN
3 2 1 4 3.0
4 2 2 5 4.0
5 2 6 NaN
I have a df like this
| count | people | A | B | C |
|---------|--------|-----|-----|-----|
| yes | siya | 4 | 2 | 0 |
| no | aish | 4 | 3 | 0 |
| total | | 4 | | 0 |
| yes | dia | 6 | 4 | 0 |
| no | dia | 6 | 2 | 0 |
| total | | 6 | | 0 |
I want a output like below
| count | people | A | B | C |
|---------|--------|-----|-----|-----|
| yes | siya | 4 | 2 | 8 |
| no | aish | 4 | 3 | 0 |
| total | | 4 | | 0 |
| yes | dia | 6 | 4 | 0 |
| no | dia | 6 | 2 | 2 |
| total | | 6 | | 0 |
The goal is calculate column C by mulytiplying A and B only when the count value is "yes" but if the column People values are same that is yes for dia and no for also dia , then we have to calculate for the count value "no"
I tried this much so far
df.C= df.groupby("Host", as_index=False).apply(lambda dfx : df.A *
df.B if (df['count'] == 'no') else df.A *df.B)
But not able to achieve the goal, any idea how can I achieve the output
import numpy as np
#Set Condtions
c1=df.groupby('people')['count'].transform('nunique').eq(1)&df['count'].eq('yes')
c2=df.groupby('people')['count'].transform('nunique').gt(1)&df['count'].eq('no')
#Put conditions in list
c=[c1,c2]
#Mke choices corresponding to condition list
choice=[df['A']*df['B'],len(df[df['count'].eq('no')])]
#Apply np select
df['C']= np.select(c,choice,0)
print(df)
count people A B C
0 yes siya 4 2.0 8.0
1 no aish 4 3.0 0.0
2 total NaN 4 0.0 0.0
3 yes dia 6 4.0 0.0
4 no dia 6 2.0 2.0
5 total NaN 6 NaN 0.0
Title is probably confusing, but let me make it clearer.
Let's say I have a df like this:
+----+------+---------------+
| Id | Name | reports_to_id |
+----+------+---------------+
| 0 | A | 10 |
| 1 | B | 10 |
| 2 | C | 11 |
| 3 | D | 12 |
| 4 | E | 11 |
| 10 | F | 20 |
| 11 | G | 21 |
| 12 | H | 22 |
+----+------+---------------+
I would want my resulting df to look like this:
+----+------+---------------+-------+
| Id | Name | reports_to_id | Count |
+----+------+---------------+-------+
| 0 | A | 10 | 0 |
| 1 | B | 10 | 0 |
| 2 | C | 11 | 0 |
| 3 | D | 12 | 0 |
| 4 | E | 11 | 0 |
| 10 | F | 20 | 2 |
| 11 | G | 21 | 2 |
| 12 | H | 22 | 1 |
+----+------+---------------+-------+
But this what I currently get as a result of my code (that is wrong):
+----+------+---------------+-------+
| Id | Name | reports_to_id | Count |
+----+------+---------------+-------+
| 0 | A | 10 | 2 |
| 1 | B | 10 | 2 |
| 2 | C | 11 | 2 |
| 3 | D | 12 | 1 |
| 4 | E | 11 | 2 |
| 10 | F | 20 | 0 |
| 11 | G | 21 | 0 |
| 12 | H | 22 | 0 |
+----+------+---------------+-------+
with this code:
df['COUNT'] = df.groupby(['reports_to_id'])['id'].transform('count')
Any suggestions or directions on how to get the result I want? All help is appreciated! and thank you in advance!
Use value_counts to count the reports_to_id by values, then map that to Id:
df['COUNT'] = df['Id'].map(df['reports_to_id'].value_counts()).fillna(0)
Output:
Id Name reports_to_id COUNT
0 0 A 10 0.0
1 1 B 10 0.0
2 2 C 11 0.0
3 3 D 12 0.0
4 4 E 11 0.0
5 10 F 20 2.0
6 11 G 21 2.0
7 12 H 22 1.0
Similar idea with reindex:
df['COUNT'] = df['reports_to_id'].value_counts().reindex(df['Id'], fill_value=0).values
which gives a better looking COUNT:
Id Name reports_to_id COUNT
0 0 A 10 0
1 1 B 10 0
2 2 C 11 0
3 3 D 12 0
4 4 E 11 0
5 10 F 20 2
6 11 G 21 2
7 12 H 22 1
You can try the following:
l=list[df['reports_to_id']
df['Count']=df['Id'].apply(lambda x: l.count(x))
+---+---+---+---+----+
| A | B | C | D | E |
+---+---+---+---+----+
| 1 | 2 | 3 | 4 | VK |
| 1 | 4 | 6 | 9 | MD |
| 2 | 5 | 7 | 9 | V |
| 2 | 3 | 5 | 8 | VK |
| 2 | 3 | 7 | 9 | V |
| 1 | 1 | 1 | 1 | N |
| 0 | 1 | 6 | 9 | V |
| 1 | 2 | 5 | 7 | VK |
| 1 | 7 | 8 | 0 | MD |
| 1 | 5 | 7 | 9 | VK |
| 0 | 1 | 6 | 8 | V |
+---+---+---+---+----+
i want to select a row based on column value and its two previous rows. For example in the given dataset (on the picture) I want to select row based on 'E' column value 'VK' and two previous rows of that selected row. So we should get a dataset like this:
+---+---+---+---+----+
| A | B | C | D | E |
+---+---+---+---+----+
| 1 | 2 | 3 | 4 | VK |
| 1 | 4 | 6 | 9 | MD |
| 2 | 5 | 7 | 9 | V |
| 2 | 3 | 5 | 8 | VK |
| 2 | 3 | 7 | 9 | V |
| 1 | 1 | 1 | 1 | N |
| 1 | 2 | 5 | 7 | VK |
| 1 | 7 | 8 | 0 | MD |
| 1 | 5 | 7 | 9 | VK |
+---+---+---+---+----+
1st we need filter the dataframe until the last VK, then create the groupkey with cumsum , then do groupby head
df=df.loc[:df.E.eq('VK').loc[lambda x : x].index.max()]
df=df.iloc[::-1].groupby(df.E.eq('VK').iloc[::-1].cumsum()).head(3).sort_index()
df
Out[102]:
A B C D E
0 1 2 3 4 VK
1 1 4 6 9 MD
2 2 5 7 9 V
3 2 3 5 8 VK
5 1 1 1 1 N
6 0 1 6 9 V
7 1 2 5 7 VK
8 1 7 8 0 MD
9 1 5 7 9 VK
Assuming I have the following table:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 9 | 0 |
| 2 | 1 | 3 |
| -8 | 8 | 2 |
| 2 | 1 | 9 |
+----+---+---+
if column A's value is Negative, update column B's value by the value of column C. if not do nothing
This is the desired output:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 0 | 0 |
| 2 | 1 | 3 |
| -8 | 2 | 2 |
| 2 | 1 | 9 |
+----+---+---+
I've been trying the following code but it's not working
#not working
result.loc(result["A"] < 0,result['B'] = result['C'].iloc[0])
result.B[result.A < 0] = result.C
Try this:
df.loc[df['A'] < 0, 'B'] = df['C']